Stretched VLAN over MPLS/GRE/IPSEC on SRX

Its been a long time since I last posted but I felt this was worth the effort, as there was so many incorrect posts around on this subject. That is not to critisize other bloggers and engineers, but there was something somewhat lacking in the other posts I found. Not enough explanation was given, and posts were vague on zones, policies and firewall rules.

Other posts were helpful, and led me to the right solution, and Ill mention some as I go along, and try to cover the points that I felt benefitted my solution.

So, lets start with a brief diagram of what I am trying to achieve. Very similar to the diagram shown in this post; however, with this diagram I felt that the arrows were not showing that the GRE over IPSEC tunnel was actually terminating on the SRX’s and the cisco routers played no part in the design.

So, without further ado… lets look at the high level topology I wanted to achieve.

So, there we have it… a simple GRE over IPSEC tunnel between 2 SRX endpoints, with 1 routed layer 3 VLAN and 1 flat Layer 2 circuit running between them. What I needed was a combination of the previous layer 2 circuit example, Chris Jones’s example here; and David Gee’s example here. I needed a 3rd routed mgmt segment, but that could just be simple routing.

Ok, so why do I need all this? Well, I have 3 requirements to satisfy.

  1. I need a flat shared Layer2 segment for HA Keepalives
  2. I need a shared routed segment for a shared VIP, which must be available to other servers. The VIP requirements are Active/Passive.
  3. I need a router OOB management subnet, that is local to each location.

So, having reviewed all the posts, and having done extensive Googling (as one does 🙂 ), I started to build it. But before I go on, Ill say a few words on Junipers own pages on this subject. They make various assumptions, which ultimately make it impossible to achieve the solution with the devices I am using.

  1. IDP licenses – These cost money, they are a rip off, and giving 1 single penny to Juniper, after spending years dealing with their bug filled code, just isnt going to ever happen.
  2. They asume that you are connecting multiple sites to a headend router in a large multipoint environment, using expensive high end SRX models. Nothing wrong with this, but just not suitable for my own needs.
  3. Flow and Packet based routing instances – this is not required, as shown by the previous posts, and adds unrequired complexity…as if its not complex enough.. 🙂

So onward to the first of the configuration stanzas….first we will start with the Internet facing VPN termination interface. Note here that Ive used a physical unpatched interface, defined as a loopback, so it is always up. What also should be pointed out is that this is not my ‘Internet’ interface, its a loopback. My ‘Internet’ interface just happens to be a VDSL PIM running PPP and PPPOE.

set interfaces fe-0/0/2 description "VPN Terinating interface"
set interfaces fe-0/0/2 fastether-options loopback
set interfaces fe-0/0/2 unit 0 family inet filter input re-protect
set interfaces fe-0/0/2 unit 0 family inet address 100.100.100.100/32

Notice that I have applied a filter to this interface, perhaps not required as such, but no harm is done in applying it.

The other end, the NFX end, is terminating the VPN on their ‘Internet’ facing interface so here is the first difference in setup. I would like to omit a lot of the NFX configurations so unless shown, the reader can assume that both ends are identical. (note that this is not stated in ANY of the other blog posts I read, and the closest to showing both ends was the post by David Gee. I still dont know why in his post he shows that he is participating in a single OSPF area with his upstream SRX100’s, but its not required, and should be only seen as a way to propogate the external VPN terminating subnets.)

The next step, once the public IP’s and interfaces are known, is to set up the IPSEC and IKE portions of the configuration. This should allow the IPSEC tunnel to come up, through which we can start to add the next layers.

The ‘st0.0‘ Interface (I used /30’s where I could)

set interfaces st0 unit 0 family inet address 10.1.0.2/30

And we cant forget about having a loopback interface. We will use this later for BGP and LDP.

The Looback Interface

set interfaces lo0 unit 0 family inet filter input re-protect
set interfaces lo0 unit 0 family inet address 10.1.0.10/32 primary
set interfaces lo0 unit 0 family inet address 127.0.0.1/32
set interfaces lo0 unit 0 family mpls

The IPSEC Configuration

set security ipsec vpn-monitor-options interval 10
set security ipsec vpn-monitor-options threshold 10
set security ipsec policy ipsec-policy-cfgr perfect-forward-secrecy keys group1
set security ipsec policy ipsec-policy-cfgr proposal-set standard
set security ipsec vpn ipsec-vpn-cfgr bind-interface st0.0
set security ipsec vpn ipsec-vpn-cfgr vpn-monitor optimized
set security ipsec vpn ipsec-vpn-cfgr ike gateway ike-gate-cfgr
set security ipsec vpn ipsec-vpn-cfgr ike ipsec-policy ipsec-policy-cfgr
set security ipsec vpn ipsec-vpn-cfgr establish-tunnels immediately

The IKE Configuration.

set security ike policy ike-policy-cfgr mode aggressive
set security ike policy ike-policy-cfgr proposal-set standard
set security ike policy ike-policy-cfgr pre-shared-key ascii-text "$9$12345/qwerty"
set security ike gateway ike-gate-cfgr ike-policy ike-policy-cfgr
set security ike gateway ike-gate-cfgr address 50.50.50.50
set security ike gateway ike-gate-cfgr dead-peer-detection optimized
set security ike gateway ike-gate-cfgr dead-peer-detection interval 10
set security ike gateway ike-gate-cfgr dead-peer-detection threshold 5
set security ike gateway ike-gate-cfgr external-interface fe-0/0/2.0
set security ike gateway ike-gate-cfgr version v2-only

Testing the IPSEC tunnel is up required some adjustments to my re-protect loopback filter to allow the IPSEC traffic. I was also running a dynamic VPN on the SRX and didnt want to conflict with that, but as it turns out, both together, even on the same interface, work perfectly together. Below we can see this in reality.

IPSEC

andy@srx210> show security ipsec security-associations
Total active tunnels: 2
ID Algorithm SPI Life:sec/kb Mon lsys Port Gateway
<268173314 ESP:des/ md5 2ce17b7b 3546/ 500000 - root 60528 25.25.25.25 >268173314 ESP:des/ md5 1d288501 3546/ 500000 - root 60528 25.25.25.25
<131073 ESP:3des/sha1 d646408d 2902/ unlim U root 500 50.50.50.50 >131073 ESP:3des/sha1 4e463962 2902/ unlim U root 500 50.50.50.50

IKE

andy@srx210> show security ike security-associations
Index State Initiator cookie Responder cookie Mode Remote Address
6181239 UP 85c9d31e7df0b8ae e06e80937eb086f6 Aggressive 25.25.25.25
6181238 UP 9eb04d3348162cb8 bc6f982e43e34ebf IKEv2 50.50.50.50

Although not relevant, below is my Dynamic VPN configuration, which sits alongside the IPSEC tunnel.

Dynamic VPN for use with Pulse Secure Client

set security ipsec proposal ipsec-prop2 protocol esp
set security ipsec proposal ipsec-prop2 authentication-algorithm hmac-md5-96
set security ipsec proposal ipsec-prop2 encryption-algorithm des-cbc
set security ipsec proposal ipsec-prop2 lifetime-seconds 3600
set security ipsec policy ipsec-policy perfect-forward-secrecy keys group2
set security ipsec policy ipsec-policy proposals ipsec-prop2
set security ipsec vpn dyn-vpn ike gateway dyn-vpn-local-gw
set security ipsec vpn dyn-vpn ike ipsec-policy ipsec-policy
set security ike proposal ike-prop1 authentication-method pre-shared-keys
set security ike proposal ike-prop1 dh-group group2
set security ike proposal ike-prop1 authentication-algorithm md5
set security ike proposal ike-prop1 encryption-algorithm des-cbc
set security ike proposal ike-prop1 lifetime-seconds 86400set security ike policy ike-dyn-vpn-policy mode aggressive
set security ike policy ike-dyn-vpn-policy proposals ike-prop1
set security ike policy ike-dyn-vpn-policy pre-shared-key ascii-text "$9$98765/ytrewq"
set security ike gateway dyn-vpn-local-gw ike-policy ike-dyn-vpn-policy
set security ike gateway dyn-vpn-local-gw dynamic hostname dynvpn
set security ike gateway dyn-vpn-local-gw dynamic connections-limit 10
set security ike gateway dyn-vpn-local-gw dynamic ike-user-type group-ike-id
set security ike gateway dyn-vpn-local-gw external-interface fe-0/0/2.0
set security ike gateway dyn-vpn-local-gw xauth access-profile dyn-vpn-access-profile
set security dynamic-vpn access-profile dyn-vpn-access-profile
set security dynamic-vpn clients all remote-protected-resources 192.168.1.0/24
set security dynamic-vpn clients all remote-exceptions 0.0.0.0/0
set security dynamic-vpn clients all ipsec-vpn dyn-vpn
set security dynamic-vpn clients all user andy

So, with all our IPSEC tunnels established, we can move onto the next bit of the puzzle, which is ensuring that we can build our GRE tunnel through the IPSEC tunnel. (I really dislike the term ‘over IPSEC’. Its not over, its through, there… I said it 🙂 )

The next part requires building up the GRE tunnel ‘through’ the IPSEC tunnel. First, we will configure the GRE interface for this. Again, we reference the ‘st0.0’ interface, and use /30 subnets where we can.

The GRE Interface

set interfaces gr-0/0/0 description "GRE tunnel to NFX250"
set interfaces gr-0/0/0 unit 0 clear-dont-fragment-bit
set interfaces gr-0/0/0 unit 0 tunnel source 10.1.0.2
set interfaces gr-0/0/0 unit 0 tunnel destination 10.1.0.1
set interfaces gr-0/0/0 unit 0 tunnel allow-fragmentation
set interfaces gr-0/0/0 unit 0 family inet mtu 1300
set interfaces gr-0/0/0 unit 0 family inet filter input inet-packet-mode
set interfaces gr-0/0/0 unit 0 family inet address 10.1.0.6/30
set interfaces gr-0/0/0 unit 0 family mpls mtu 1200
set interfaces gr-0/0/0 unit 0 family mpls filter input mpls-packet-mode

Now, there is some things to note here. Firstly, MTU. I dont want to discuss it, if you are attemting to build this, then I expect you to understand MTU values.
Secondly, the source and destination of the tunnel are defined to be the IPSEC endpoints. We also give the tunnel itself a /30 so that it has an inner side.
Lastly, we configure the interface with family MPLS and 2 firewall filters.

The filters are to force traffic to be packet based, and not flow based. These are simple filters, but they caused me a lot of trouble when trying to get LDP established, and this is where I feel ALL the other posts were lacking. Lets take a look at the filter and see how it looks:

The Firewall Filter Section

set firewall family inet filter inet-packet-mode term control-traffic from protocol tcp
set firewall family inet filter inet-packet-mode term control-traffic from port 22
set firewall family inet filter inet-packet-mode term control-traffic from port 80
set firewall family inet filter inet-packet-mode term control-traffic from port 8080
set firewall family inet filter inet-packet-mode term control-traffic from port 646
set firewall family inet filter inet-packet-mode term control-traffic from port 179
set firewall family inet filter inet-packet-mode term control-traffic then accept
set firewall family inet filter inet-packet-mode term packet-mode then packet-mode
set firewall family inet filter inet-packet-mode term packet-mode then accept

So, control traffic…. specifically BGP (179) and LDP (646) will be dropped without this filter.
All other filters Ive seen omit this, yet what happens is that UDP hellos for LDP get through (packet based anyway) but the TCP session gets mangled.
So, we have to allow ALL control traffic through in flow mode. This is critical. Your LDP session will get stuck in an ‘opening’ state without it.
If we look at the Juniper example, this falls into the trap, and I can assure you, after working through this with Juniper PS, its noted. 🙂

The final parts to the required filters are the mpls and l2circuit filters. These are shown below:

set firewall family mpls filter mpls-packet-mode term ALL-TRAFFIC then packet-mode
set firewall family mpls filter mpls-packet-mode term ALL-TRAFFIC then accept
set firewall family ccc filter l2circuit-packet-mode term ALL-TRAFFIC then packet-mode
set firewall family ccc filter l2circuit-packet-mode term ALL-TRAFFIC then accept

So, now we need some routing. Ive picked OSPF, nice and simple, single area (although I am running other areas, they arent required for this example).
With OSPF, the aim to to form an adjancency over the GRE interface and propogate loopback routes for the purposes of LDP. Standard layering of IGP/MPLS type stuff, if you dont know it, then you should read up on it.

OSPF Routing

set protocols ospf area 0.0.0.0 interface lo0.0 passive
set protocols ospf area 0.0.0.0 interface gr-0/0/0.0
set protocols ospf area 0.0.0.0 interface vlan.25 passive

Thats it really, I propogate the MGMT subnet (vlan.25) into ospf so the devices that will partake in this setup can be managed from both ends.

So, now that we have configured the GRE interface, and the appropriate firewall filters, along with an IGP routing protocol, its time to get some of that fancy MPLS stuff going.
For simplicities sake, Ive used LDP, RSVP would be overkill in this case, even if we used dyanmic tunnels, which we havent so I wont waffle on.

LDP, nice and simple, so here it is:

The LDP and MPLS section

set protocols ldp traceoptions file ldp_shoot
set protocols ldp traceoptions file size 10m
set protocols ldp traceoptions flag all
set protocols ldp transport-address router-id
set protocols ldp interface gr-0/0/0.0
set protocols ldp interface all disable
set protocols ldp interface lo0.0
set protocols ldp session 10.1.0.9 authentication-key "$9$54321/qwerty"
set protocols mpls interface gr-0/0/0.0
set protocols mpls interface lo0.0

Best to leave traceoptions out unless you need to debug, always handy to have it there though.

So, now lets see if we can see what all this looks like put together.

Is OSPF Working?

andy@srx210> show ospf neighbor
Address Interface State ID Pri Dead
10.1.0.5 gr-0/0/0.0 Full 10.1.0.9 128 35

So far, it looks like an adjacency has correctly formed over the GRE interface.

andy@srx210> show route 10.1.0.9 protocol ospf terse

inet.0: 45 destinations, 50 routes (44 active, 1 holddown, 1 hidden)
+ = Active Route, - = Last Active, * = Both

A Destination P Prf Metric 1 Metric 2 Next hop AS path
* 10.1.0.9/32 O 10 1 >gr-0/0/0.0

Great, we are learning the other end’s loopback in OSPF so now lets check and see if LDP is up.

Is LDP Working?

andy@srx210> show ldp neighbor
Address Interface Label space ID Hold time
10.1.0.9 lo0.0 10.1.0.9:0 34
10.1.0.5 gr-0/0/0.0 10.1.0.9:0 11

andy@gateway-lo0> show ldp session
Address State Connection Hold time
10.1.0.9 Operational Open 21

As expected (although see previous notes regarding firewalls, this bit took me a day to get working), LDP is up, and the session is working.
Unlike other bloggers, I did NOT need to use rib groups, statics, or anything else to get this session up. Nor should you have to. If you do, Id like to understand why, so please let me know, im really interested.

Here is some more output.

andy@srx210> show route protocol ldp terse

inet.0: 45 destinations, 50 routes (44 active, 1 holddown, 1 hidden)

inet.3: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

A Destination P Prf Metric 1 Metric 2 Next hop AS path
* 10.1.0.9/32 L 9 1 >gr-0/0/0.0

mpls.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

A Destination P Prf Metric 1 Metric 2 Next hop AS path
* 299888 L 9 1 >gr-0/0/0.0
* 299888(S=0) L 9 1 >gr-0/0/0.0

So, next, but not quite last, we want the l2circuit up for our HA keepalives. For this we need a local interface at each side. My SRX happens to hang off a switch, so I added a cable, and dropped an access port onto the switch from the SRX for the l2circuit access.

The L2Circuit Portion

set interfaces fe-0/0/6 description "Layer2 CCC Shared VAULT VLAN"
set interfaces fe-0/0/6 encapsulation ethernet-ccc
set interfaces fe-0/0/6 unit 0 family ccc filter input l2circuit-packet-mode

Note the use of the l2circuit filter here, packetmode for all traffic on this layer2 circuit.
Now, we can see if the l2circuit has come up.

andy@srx210> show l2circuit connections
Layer-2 Circuit Connections:
...
Legend for interface status
Up -- operational
Dn -- down
Neighbor: 10.1.0.9
Interface Type St Time last up # Up trans
fe-0/0/6.0(vc 1) rmt Up Jun 23 09:48:12 2017 1
Remote PE: 10.1.0.9, Negotiated control-word: Yes (Null)
Incoming label: 299792, Outgoing label: 299808
Negotiated PW status TLV: No
Local interface: fe-0/0/6.0, Status: Up, Encapsulation: ETHERNET

So, all looks good so far. Next Ill discuss the various security zones. I used 4 zones for the entire solution.
1. Internet – connects to the Internet
2. VAULT-VPN – Zone for tunnel termination
3. VAULT-INT – Zone for the VPLS traffic
4. VAULT – Zone for the L2Circuit and MGMT traffic

What I found missing from a log of blogs was any details around zoning. So, lets see what interfaces went where….

Zone Interfaces

set security zones security-zone Internet interfaces fe-0/0/2.0
set security zones security-zone VAULT-VPN host-inbound-traffic system-services all
set security zones security-zone VAULT-VPN host-inbound-traffic protocols all
set security zones security-zone VAULT-VPN interfaces st0.0
set security zones security-zone VAULT-VPN interfaces gr-0/0/0.0
set security zones security-zone VAULT-VPN interfaces lo0.0
set security zones security-zone VAULT-VPN interfaces lt-0/0/0.1
set security zones security-zone VAULT-INT host-inbound-traffic system-services all
set security zones security-zone VAULT-INT host-inbound-traffic protocols all
set security zones security-zone VAULT-INT interfaces fe-0/0/5.0
set security zones security-zone VAULT-INT interfaces lt-0/0/0.0
set security zones security-zone VAULT host-inbound-traffic system-services all
set security zones security-zone VAULT host-inbound-traffic protocols all
set security zones security-zone VAULT interfaces vlan.25
set security zones security-zone VAULT interfaces fe-0/0/6.0

The full zones are not shown, but I added policies that were very open. I advise you to lock these down after you have it all working correctly.

Zone Policies

set security policies from-zone VAULT-VPN to-zone VAULT policy VAULT-VPN-to-VAULT match source-address any
set security policies from-zone VAULT-VPN to-zone VAULT policy VAULT-VPN-to-VAULT match destination-address any
set security policies from-zone VAULT-VPN to-zone VAULT policy VAULT-VPN-to-VAULT match application any
set security policies from-zone VAULT-VPN to-zone VAULT policy VAULT-VPN-to-VAULT then permit

So, what havent we covered so far…. the VPLS sections 🙂

First we need some BGP to carry our VPLS routing info, and of course a nice little routing instance to put our lt-0/0/0.0 interface in. I hope you spotted that interface coming into play within the zones section.

The BGP Section

set protocols bgp group VPLS type internal
set protocols bgp group VPLS traceoptions file bgplog
set protocols bgp group VPLS traceoptions file size 10m
set protocols bgp group VPLS traceoptions flag all
set protocols bgp group VPLS local-address 10.1.0.10
set protocols bgp group VPLS mtu-discovery
set protocols bgp group VPLS family l2vpn signaling
set protocols bgp group VPLS neighbor 10.1.0.9

BGP should come up, dont forget your router ID and ASN, but yu know what youre doing right, so I neednt remind you. After all, you got this far….

The LT Interface

set interfaces lt-0/0/0 unit 0 encapsulation ethernet-vpls
set interfaces lt-0/0/0 unit 0 peer-unit 1
set interfaces lt-0/0/0 unit 1 encapsulation ethernet
set interfaces lt-0/0/0 unit 1 peer-unit 0
set interfaces lt-0/0/0 unit 1 family inet address 10.1.4.3/24 vrrp-group 10 virtual-address 10.1.4.1
set interfaces lt-0/0/0 unit 1 family inet address 10.1.4.3/24 vrrp-group 10 priority 100
set interfaces lt-0/0/0 unit 1 family inet address 10.1.4.3/24 vrrp-group 10 accept-data

Here we need 1 units, these basically form a bridge between each other, and this allows traffic to get into the routing instance.
We also need an interface to carry the VPLS traffic, as we did with the Layer2 Cicruit.

set interfaces fe-0/0/5 encapsulation ethernet-vpls
set interfaces fe-0/0/5 unit 0

Checking BGP

andy@srx210> show bgp summary
Groups: 3 Peers: 4 Down peers: 0
Table Tot Paths Act Paths Suppressed History Damp State Pending
inet.0 1 1 0 0 0 0
bgp.l2vpn.0 1 1 0 0 0 0
Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
10.1.0.9 43178 6527 6466 0 2 10:52:14 Establ
VPLS.l2vpn.0: 1/1/1/0
bgp.l2vpn.0: 1/1/1/0

So, we can see that BGP is up, we have a VPLS table, and there is a route in there.

andy@srx210> show route table VPLS.l2vpn.0

VPLS.l2vpn.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.1.0.9:600:2:1/96
*[BGP/170] 10:53:53, localpref 100, from 10.1.0.9
AS path: I
> via gr-0/0/0.0
10.1.0.10:600:1:1/96
*[L2VPN/170/-101] 2d 00:46:53, metric2 1
Indirect

Nice, we have loopbacks, and RD’s and RT’s. Again, read up if you dont know what Im refering to.

So, thats it. Final checks… is VRRP up?

VRRP Check

andy@srx210> show vrrp
Interface State Group VR state VR Mode Timer Type Address
lt-0/0/0.1 up 10 backup Active D 3.112 lcl 10.1.4.3
vip 10.1.4.1
mas 10.1.4.2

And finally…. do we have MAC’s in the forearding table for the VRRP neighbours?

andy@srx210> show route forwarding-table table VPLS
Routing table: VPLS.vpls
VPLS:
Destination Type RtRef Next hop Type Index NhRef Netif
default perm 0 rjct 1453 1
lt-0/0/0.0 user 0 comp 1459 3
lsi.1048834 user 0 comp 1465 2
fe-0/0/5.0 user 0 comp 1459 3
00:00:5e:00:01:0a/48 dynm 0 indr 262142 6
Push 262145 1449 2 gr-0/0/0.0
00:11:32:73:8d:4c/48 dynm 0 indr 262142 6
Push 262145 1449 2 gr-0/0/0.0
00:11:32:73:a7:0e/48 dynm 0 ucst 1469 5 fe-0/0/5.0
2c:21:31:5f:aa:10/48 dynm 0 indr 262142 6
Push 262145 1449 2 gr-0/0/0.0
84:18:88:75:19:80/48 perm 0 ucst 1456 1 lt-0/0/0.0
f0:1c:2d:4d:55:4e/48 dynm 0 ucst 1469 5 fe-0/0/5.0
f0:1c:2d:4d:aa:40/48 dynm 0 ucst 1469 5 fe-0/0/5.0

So, everything is working now, we just have to plug in the end hosts.

bash-4.3# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.1.3.1 0.0.0.0 UG 0 0 0 eth3
10.1.0.12 0.0.0.0 255.255.255.252 U 0 0 0 eth0
10.1.3.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3
10.1.4.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0

bash-4.3# ping 10.1.4.1
PING 10.1.4.1 (10.1.4.1) 56(84) bytes of data.
64 bytes from 10.1.4.1: icmp_seq=1 ttl=64 time=15.4 ms
64 bytes from 10.1.4.1: icmp_seq=2 ttl=64 time=15.4 ms
^C
--- 10.1.4.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 15.445/15.446/15.448/0.124 ms
bash-4.3# ping 10.1.4.2
PING 10.1.4.2 (10.1.4.2) 56(84) bytes of data.
64 bytes from 10.1.4.2: icmp_seq=1 ttl=64 time=15.9 ms
^C
--- 10.1.4.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 15.941/15.941/15.941/0.000 ms
bash-4.3# ping 10.1.4.3
PING 10.1.4.3 (10.1.4.3) 56(84) bytes of data.
64 bytes from 10.1.4.3: icmp_seq=1 ttl=64 time=0.752 ms
^C
--- 10.1.4.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.752/0.752/0.752/0.000 ms

The host can reach the far end VRRP gateway, and this shows that VPLS is working.

So, thats it for this post, hope you find it helpful, I only wrote it to clear up the points missed by other posts, and to help YOU, the reader 🙂

The final low level diagram is located here:

http://www.scot1and.org/~andy/Vault-Topology.pdf

Thanks

Andy

Posted in BGP, MPLS, OSPF, Routing | Leave a comment

MPLS LDP IGP Synchronization

A common problem with MPLS networks that are running LDP is that when the LDP session is broken on a link, the IGP still has that link as outgoing; thus, packets are still forwarded out of that link. This happens because the IGP installs the best path in the routing table for any prefix. Therefore, traffic for prefixes with a next hop out of a link where LDP is broken becomes unlabeled. Consecutively, MPLS VPN data packets can be discarded and VPN traffic is black-holed.

RFC5443 introduce a great solution for this problem which is know “LDP IGP Synchronization”. Before we dig into how LDP IGP synchronization is working, let us go through an example explain to us when and how the traffic can be blackholed which will need to call “LDP IGP synchronization” to fix.

In the topology below, will see that the traffic from CE1 to CE2 takes the path PE1-P3-P4-PE2 which is the normal behavior of LDP to follow the IGP. The next traceroute output can explain this.


let us now fail the LDP session between R3 and R4 and examine the effect on the traffic forwarding on both IP and VPN traffic.

In case of pure IPv4 traffic, R3 will un-tag the traffic and forward the packets unlabeled to R4. R4 will tag the traffic again and continue forwarding the traffic as normal MPLS traffic.

The next trace output from PE1 to PE2 loopback0, explain what happened.

So, there is not a big problem for networks that are running IPv4-over-MPLS only. At the point where LDP is broken, the packets become unlabeled and are forwarded as IPv4 packets until they become labeled again on the next LSR.

Now, let us examine the case of VPN traffic. The below traceroute output shows that the traffic is blackholed in the MPLS cloud !!

Here is the explanation for what had happened..  In the case of MPLS VPN, the packets are IPv4 packets, but they are encapsulated with two labels which are a VPN label and LDP label. Normally the P routers (P3, P4) are not aware about the VPN routes and they forward the VPN traffic based on the outer LDP label. Therefore, when the MPLS VPN packets become unlabeled on the P routers—they are dropped. The following diagram explain this.




The solution is MPLS LDP-IGP Synchronization [RFC5443]. This feature ensures that the link is not used to forward traffic when the LDP session across the link is down. Rather, the traffic is forwarded out another link where the LDP session is still established.

When the MPLS LDP-IGP synchronization is active for an interface, the IGP announces that link with maximum metric until the synchronization is achieved, or until the LDP session is running across that interface. The maximum link metric for OSPF is 65536 (hex 0xFFFF). No path through the interface where LDP is down is used unless it is the only path.

Will run the LDP IGP Synchronization now, and see how it fix out issue…

 

 

Let us have a look to P3 OSPF Database, and see what have been changed… ya! now P3 as shown below is advertising P3-P4 link with metric of 65535 which is the maximum metric

This will cause IGP to prefer the path through P5 to forward the traffic and avoid the P3-P4 link where LDP is broken. Now let us come back to CE1 and run a trace to CE2 and see how it will work


That is really great … I am happy 🙂 I hope that you happy too. thanks to [RFC5443]. Now let us have a look on the configuration of this feature in both IOS and JUNOS. (to be updated)

Thank you.


Ahmad Alhady


Posted in MPLS, Uncategorized | Leave a comment

Juniper Event Scripts – A brief HOWTO

A few weeks back there was a requirement to deploy 2 SRX5600 devices in the core of the network. Yes, I know, a hugely bad plan with first generation hardware, buggy as hell and we are still suffering from the event. However, what came to light was the need to implement a trigger that would shut down an interface should one of the SRX’s turn into retarded mode and start doing stupid things or the interconnecting layer 2 trunk carrying HA traffic died. Yes, we plan for events just such as these.

Anyway, I had already written a few OP scripts and am fairly OK with coding so after a short search on “Junosscriptorium“, a site that contains as they put it “A Repository for JUNOScripts: Commit, Event, and Op scripts for JUNOS“, I found an example of what I needed.

The script I found was called “toggle-interface.slax” and its function simply disabled or enabled an interface based on a configured trigger event.

Now, although thanks must definitely go to the author, an “Efrain Gonzalez“, the script wasn’t exactly what I was looking for. It needed some minor modifications for my environment, but the script wasn’t hard to modify, and I am fairly certain the original author wouldn’t mind if people were hacking up his script to get whatever functionality they required out of it.

So, my environment is shown below, simplified for obvious reasons.

Basic Topology

So, to explain, the SRX’s are acting in an “Active/Active” HA manner, and pass their HA signaling traffic along a VLAN between Router-A and Router-B. During this implementation, there was only a single layer 2 path between Router A and Router B, but that layer 2 path was not given any resilience through Router 3. The reason for this being that the powers that be decided that Spanning Tree Protocol wasn’t supportable by my NOC teams, so a single trunk is what I had to work with.

Now, to complicate matters even more, all VRF’s on the network that participate in certain unmentioned government services have a requirement to be routed via one of the SRX’s. So, to facilitate this, a default route is dropped into these VRFs that points to an interface on the SRX. The SRX then does whatever it does and spits the traffic back out into an egress VRF towards the centralized service that is the final destination.

Now, what happens if the 10Gigabit Ethernet link between Router-A and Router-B fails? Traffic will still pass to both routers via normal routing and MPLS to each SRX. But, the Layer 2 interlink between Router-A and Router-B which is carrying the HA traffic required by the SRX’s for signaling will be dead. So, if the link goes down, my “Active/Active” functionality disappears, and the SRX’s go into what is termed “Split Brain” mode, whereby each can service traffic, but neither device carries state info about the other device. This can have a nasty effect on any flows that pass through the firewalls as one would expect.

So, if the interlink went down, I needed a way to shut down the SRX facing interface on Router-B to stop “Split Brain” from happening. The trigger for doing this action was a “Link Down Event” on the interlink.

So, we can already see what we need here, an event script that monitors for the “Link Down” event on the site to site interface, which in turn runs the “Event Script” to shut down the SRX facing interface.

It should also be noted that in order for the “Active/Active” firewalls to carry the same IP addressing, VRRP was implemented on the SRX facing firewalls with its signaling going across the layer2 trunk between sites.

Now, to the script and its configuration…….

First, we defined the location of the script using the following Junos syntax:

user@mx960-A> show configuration system scripts
op {
file toggle-interface.slax;
}

Note that this simply means that the file path “/var/db/scripts/op/toggle-interface.slax” is used as thats where Op scripts are located. There doesnt seem to be a way of getting the script to go to the “/var/db/scripts/event” directory, I tried for several hours to get this path working, but failed.

Next, we defined the event options on Router-B that would hopefully will monitor for the failure event, and react to it. These take the form of “Event Policies” as you can see below.

Now, we can see by the first part of the configuration that we are looking for a “snmp_trap_link_down” event, and when that happens we try to match that event with an interface, in this case “xe-11/0/0.1001” which happens to be the logical interface carrying our HA signaling traffic. If this goes down, then we fall into the “then” clause of the configuration.

The “then” clause calls the script “toggle-interface.slax” with 3 arguements. The first just makes the output go to somewhere, in this case the value of 2 means that the output will go nowhere, i.e. there will be no output from the script running.

The second argument is the interface that we want to shut down, in this case “xe-11/3/0“. The third argument is what we want the new state of the interface to be. In our case we want the interface to be “admin down” or disabled.

In the second part of the configuration, we do the opposite, we look for a “snmp_trap_link_up” event, and match to the correct interface. If the interface matches we again fall into the “then” clause and run the script, this time with the final argument to bring the “xe-11/3/0” interface back into its up state.

The Toggle-Interface.slax Script

Now, for completeness Ill show the actual script that is running. Bear in mind that this is my modified version, but the original source code is, as mentioned previously, available on the “Junosscriptorium” web site.

Next, Ill show some logs of the event actually taking place:

Sep  7 12:08:24  MX-A mib2d[1759]: SNMP_TRAP_LINK_DOWN: ifIndex 202, ifAdminStatus down(2), ifOperStatus down(2), ifName xe-11/0/0
Sep  7 12:08:24  MX-A fpc1 XETH(1/1): disabled Link 1.
Sep  7 12:08:24  MX-A fpc1 XETH(1/1): disabled Link 1.
Sep  7 12:08:25  MX-A root: invoke-commands: Executed /tmp/evt_cmd_a8cHo4, output to /tmp/evt_op_9p9Fra in text format

Its hard to see what is actually happening here, but the event script is being run.

user@MX-A# run show interfaces xe-11/3/0
Physical interface: xe-11/3/0, Administratively down, Physical link is Down

So, as we can see, its fully working. I hope this example is of some use for people out there who have to hack up solutions as Ive had to do.

Thank You

Andy Wilson

Posted in Routing, Switching | Leave a comment

How OSPF SPF Adaptive Timers are implemented in IOS and JUNOS

It became a fact that both of Cisco Systems and Juniper Networks have proved their strong market penetration and most of the operators and providers deploying the various platforms of both of them. Based on this, it became an essential for the networking engineers specially those who are working on operator’s environment to know how each vendor’s platforms are architectured, and how their OS are structured as well as how to configure it. However this will not be adequate for the design engineers who has to assure their multi-vendor network are perfectly merged and converged without any interoperability issues, so, they have to dig more and understand how each of the leading vendors are implementing the technologies and matching the RFCs.

Today we will start explaining how each of Cisco Systems and Juniper networks are implementing the OSPF SPF adaptive timers or what called SPF throttling (Cisco) or SPF hold-down (Juniper)

Before we dig into that, let’s talk a little bit about what OSPF SPF Adaptive Timers are designed to do for us, and then we’ll take a look at how each vendor is implementing the concept.

If we can recall from our OSPF background, OSPF SPF algorithm has design to run upon arrivals of LSAs. So, if each LSA triggers a full or incremental SPF run, and if they are arriving fast, SPF can begin eating up the majority of your CPU.

The challenge in large-scale networks is to quickly react to network changes while at the same time not allowing SPF calculations to dominate the route processors. This is the goal of SPF delay, also called SPF hold-down or SPF throttling.

Rather than kick off an SPF calculation every time a new LSA/LSP arrives, SPF delay forces the router to wait a bit between SPF runs. If a large number of LSA/LSPs are being flooded, a delay between SPF runs means that more LSA/LSPs are added to the link state database during the hold-down period. Efficiency is then increased because when the hold-down period expires and SPF is run, more network changes are included in a single calculation.

But this efficiency you are getting from SPF delay, it has its costs which it increase your network convergence time. So, the challenge is to set the delay interval long enough when abnormal things happen while keeping it short when the network is stable so you got a quick convergence. This leads to the concept of adaptive SPF timers.

Both Cisco and Juniper are offering adaptive SPF timers, but with different approaches. In the coming sections, we are going to explain the mechanism used by each vendor.

Adaptive SPF Timers in JUNOS

 

Juniper Networks uses a linear fast/slow algorithm for adaptive SPF timers. So, it introduced the SPF delay timer which is the minimum delay in the time between the detection of a topology change and when the SPF algorithm actually runs. This period is 200ms by default. The period is configurable with the spf-delay command to between 50 and 8000ms.

Secondly, they introduce a second parameter which is rapid-runs.  If three (the default) SPF runs are triggered in quick succession, indicating instability in the network, the router will enter the “slow mode” and a third parameter called the hold-down timer will start. Any subsequent SPF calculation is not run until the hold-down timer expires. The routers remain in this “slow mode” until the hold-down period have passed since the last SPF run—indicating that the network has converged—and then switches back to “fast mode”, and the system reverts to the configured values for the delay and rapid-runs statements.

 

The default values for SPF calculations in JUNOS can be seen below:

Default SPF timers values in JUNOS
r2@r2> show ospf overview | match SPF
Full SPF runs: 280SPF delay: 0.200000sec, SPF holddown: 5 sec, SPF rapid runs: 3

 

Changing SPF Timers in JUNOS

The configuration stanza for JunOS shows how these settings may be changed.

These default values can be changed with the following command:

[edit protocols ospf]
r1@r1> set spf-options delay milliseconds holddown milliseconds rapid-runs number

 

Now we are going to play with the timers and run the debugs, and examine the behavior. We will set the delay to 1 sec and the hold-down timer to 20 sec while keeping the rapid-runs as default.

 

The log entry below shows, on lines 2,6 and 10, that the SPF run occurs every 1 second after the LSA Update. Once the SPF run has completed 3 iterations it moves into a slower mode of operation.

The next log entry shows that SPF started after 20 sec from the SPF run (at t=12:02:13). The default number of SPF calculations that can occur in succession is 3. The range that you can configure is from 1 through 5. Each SPF algorithm is run after the configured SPF delay. When the maximum number of SPF calculations occurs, the hold-down timer begins. We previously configured this to be 20 seconds. Any subsequent SPF calculation is not run until the hold-down timer expires. This is why the received LSA update on line 4 does not immediately trigger an SPF run.

Next, the log shows the router once again enters the fast mode…

We can also observe from the previous log that although 3 more SPF runs have taken place, the router does not move into slow mode again. This is because there has been 50sec between the first and the last SPF run in the set of 3. If the 3 SPF runs happen within 3 x “delay value“, or in our case 3 seconds, the router will start to throttle the number of SPF runs, and start the holddown timer countdown. If the SPF runs are outwith 3 x the configured delay value, the rapid-run counter is reset to 0 and no back-off algorithms are run.

Now, shown in the next log snippet, the router will enter the slow mode and the holddown timer will start, because three SPF runs have occurred in succession.

And finally, the following log shows that SPF again started after 20 sec from the last SPF run (at t=12:04:16)

 

The figure below is charting the above debug which can help you in more understanding the JunOS behaviour with the SPF timers

 

Adaptive SPF Timers in IOS

Cisco Systems introduced an exponential backoff algorithm for the adaptive SPF timers by using three different configurable timers.

This exponential functionality limits the number of SPF computations during times of network instability by doubling the delay associated with the SPF run, up to a maximum hold delay, for the period of instability. When the period of instability ends, the delay is reset to the original value. Three timers are associated SPF exponential backoff: Start Time, Initial-Hold Time, and Max-Hold Time.

IOS internally has an internal timer called the waiting-interval which the SPF computation will be delayed till it expires. When a topology change is received for the first time, the waiting-interval will be set to the start timer which is similar to the spf-delay in JUNOS, and the SPF computation is delayed for the value set by start timer. When the SPF computation completes, a waiting-interval starts with the value of the initial-hold timer and the router will enter the “slow mode”. If there is a topology change during waiting-interval, the SPF computation will run at the expiration of the initial-hold timer. At the completion of the SPF computation the waiting-interval is set to the twice the value of initial-hold timer and then run again. So for example, if the start timer is 100ms and the initial-hold timer is 1000ms, the router delays the first SPF run by 100ms, the second by 1000ms, the third by 2000ms, the fourth by 4000ms, and so on.

The waiting-interval grows exponentially as 2^t*initial-hold until it reaches the max_hold-time value. After this, any topology change during the current waiting-interval would result in the next SPF computation will run at the expiration of the max hold time and next waiting-interval being equal to the constant max-hold timer. This ensures that exponential growth is limited. If the SPF has not run for twice the time specified by the max-hold timer, the router switches back to “fast” mode in which the start delay timer is used and the waiting-interval is reset back to the initial value.

The default values for SPF calculations in IOS can be seen below:

 

Default SPF timers values in IOS
R2#sh ip ospf | i SPF�
Initial SPF schedule delay 5000 msecs
Minimum hold time between two consecutive SPFs 10000 msecs
Maximum wait time between two consecutive SPFs 10000 msecs

Changing SPF Timers in IOS

These default values can be changed with the following command:

R1(config)# router ospf 100
R1(config-router)# timers throttle spf spf-start spf-hold spf-max-wait

 

As we did above with JunOS, will play with the SPF throttle timers and run the debugs, and examine the behavior. We will set the Start delay timer to 1 sec and the initial-hold timer to 5 sec and the max-hold timer to 50 sec.

The log entry below shows, on lines 2, that the SPF run at t= 21:30 which is 1 second after the LSA Update, and the next wait_interval set to the initial-hold time which is 5 sec as shown in line 5.

The next log entry shows that the waiting_interval is getting doubled after each SPF run. Starting with a waiting_interval equal to 5 sec which is the initial-hold timer as shown on line 3, the next waiting_interval on line 8 is set to 10 sec  then to 20 sec on line 14 and 40 sec on line 22.

While the router is in the slow mode no SPF will run until the wait_interval elapses no matter how many topology changes have been detected. This is why the received LSA update on lines 13 and 14 and also on lines 20,21,22 and 23 does not immediately trigger an SPF run.

The next log entry shows that the waiting_interval is reached the max-hold time (50 sec) and upcoming waiting_interval being equal to the constant max-hold timer as on lines 3 and 8 .

We can also observe from the previous log that although the LSA on line 6 arrived 60 sec after last SPF run has taken place which is more than the waiting_interval , the router does not move into fast mode again. This is because that the condition is that to divert back to the fast mode the SPF should not run for twice the time specified by the max-hold timer.
Now, shown in the next log snippet, the router will enter the slow mode and the holddown timer will start, because the SPF has not run for 100 sec which is twice the time specified by the maximum delay period.

For more clarity, I have reflected the debugs on the following figure, so you can use both the debugs and the figure to examine the behavior

 

 

I hope that you have enjoyed this article and it helped you understanding SPF implementation in both IOS and JunOS. Also would appreciate to leave your comments if any.

 

 

Thank you

Ahmad Alhady & Andy Wilson

Posted in OSPF, Routing | Leave a comment

Bridging on the MX platform

I recently had the pleasure of setting up some bridging interfaces (or IRB) on an MX960 router. It wasnt very complex and after a few hours in the lab doing some testing I was ready to deploy. While I was waiting on the outage window, I decided to note some things down about this task in order to help my support guys better understand the solution.

The first step in configuring the router is configuring Layer 3 information on logical units within the IRB interface:

interfaces {
    ge-3/1/0 {
        encapsulation flexible-ethernet-services;
        flexible-vlan-tagging;
        unit 0 {
            encapsulation vlan-bridge;
            vlan-id 100;
        }
        unit 1 {
            encapsulation vlan-bridge;
            vlan-id 200;
        }
    }
    ge-3/1/1 {
        encapsulation flexible-ethernet-services;
        flexible-vlan-tagging;
        unit 0 {
            encapsulation vlan-bridge;
            vlan-id 100;
        }
        unit 1 {
            encapsulation vlan-bridge;
            vlan-id 200;
        }
    }
    ge-3/3/0 {
        encapsulation flexible-ethernet-services;
        flexible-vlan-tagging;
        unit 0 {
            encapsulation vlan-bridge;
            vlan-id 100;
        }
        unit 1 {
            encapsulation vlan-bridge;
            vlan-id 200;
        }
    }
    irb {
        unit 0 {
            family inet {
                address 172.17.1.2/24;
            }
        }
        unit 1 {
            family inet {
                address 12.12.1.2/28;
            }
        }
    }
}

In such a router configuration, you must configure a bridge domain with the interfaces through which the host traffic can travel. In this configuration, the bridge domains are configured on a virtual switch and the interfaces are divided into logical units within each bridge domain. The following example shows the bridge domain customer0_bd0 that is within the virtual switch routing instance customer. Note that there are MAC limits in place for the the logical interfaces configured on this bridge domain:

routing-instances {
    customer {
        instance-type virtual-switch;
        bridge-domains {
            customer0_bd0 {
                domain-type bridge;
                vlan-id 100;
                interface ge-3/1/0.0;
                interface ge-3/1/1.0;
                interface ge-3/3/0.0;
                routing-interface irb.0;
                bridge-options {
                    mac-table-size {
                        1048575;
                    }
                    interface ge-3/1/0.0 {
                        interface-mac-limit {
                            131071;
                        }
                    }
                    interface ge-3/1/1.0 {
                        interface-mac-limit {
                            131071;
                        }
                    }
                    interface ge-3/3/0.0 {
                        interface-mac-limit {
                            131071;
                        }
                    }
                }
            }
        }
    }
}

The bridge domain customer1_bd1, which is also configured under the customer virtual switch routing instance, is configured for a VLAN ID of 200 and
routing-interface irb.1. There are no MAC limits set for this bridge domain:

routing-instances {
    customer {
        instance-type virtual-switch;
            bridge-domains {
            customer1_bd1 {
                domain-type bridge;
                vlan-id 200;
                interface ge-3/1/0.1;
                interface ge-3/1/1.1;
                interface ge-3/3/0.1;
                routing-interface irb.1;
            }
        }
    }
}

Thank you

Andy Wilson

Posted in Routing, Switching | Leave a comment

Where “IP OSPF mtu-ignore” is really needed ?

 

Some days back, I was bringing OSPF adjacencyies between two routers, which shown in the below diagram.

 

 

 

 

 The interface configuration for both routers are shown below

 

 The adjacency was not coming up and using “debug ip ospf adj” on R2, I noticed the following message “Nbr 1.1.1.1 has larger interface MTU”. Immediatly I realized that there is an issue with the MTU settings, as all of us recall that MTU should match for the OSPF adjacencies to being established.

Cisco has a great well-known fix for this issue whichip ospf mtu-ignore command. I configure it and the adjacency came up. That is great, right?

Till now, there is nothing new and a question come to your mind why you are writing this article!

The interesting thing was when I jumped to the second router (R1) to configure it with ip ospf mtu-ignore command; I found that the OSPF adjacency already came UP.  Strange!! What happened?

To go further, I took the command out from the R2, and clear the OSPF process and start digging deep into the issue to figure out what is going on.

First, I configured the ip ospf mtu-ignore command on the R1, but the adjacency didn’t go UP!

Second, I noticed that R2 is stucking on the EXSTART state but R1 take one further step and stepping on the EXCHANGE state.

Here is the debug output from each of them

 

So, we can notice that although R1 is receiving the DBD packets from R2 with mismatching MTU, it is sending back a DBD packet with flag 0x2 (declaring that he is slave as MS=0) and with adopted sequence number (seq 0x847) acknowledging the R2 DBD. With this he transits to the EXCHANGE state

At the same time, R2 is ignoring R1 MTUs and keeps retransmitting his BDB packets which are stuck in EXTART state.

This loop continues indefinitely, which prevents either router from transitioning out of the exstart/exchange state.

Now, we can link both observations together, configuringip ospf mtu-ignore on R1 is not solving the issue as he is already accepting R2 DBD packets while configuring the same command ONLY in R2 solve the problem as we are telling him to ignore the MTU from the other side, so he can accept the DBD packets from R1 and transit from EXSTART state.

 

The question now is why R1 is accepting the DBD packets with mismatching MTUs!!! The OSPF RFC2328 answer this question:

“If the Interface MTU field in the Database Description packet indicates an IP datagram size that is larger than the router can accept on the receiving interface without fragmentation, the Database Description packet is rejected.”

So here is the key!

The OSPF adjacency will not come up in mismatching MTU scenario because that the router will reject the DBD packet with larger MTU that what he can accept on his interface, because of this R2 was rejecting the DBD packets from R1 and saying that “Nbr 1.1.1.1 has larger interface MTU”

And because of this ip ospf mtu-ignore is only needed on the router has the lower MTU.

.

An Alternative to “IP OSPF MTU-IGNORE”

Actually, if we consider a scenario where R1 is a Cisco switch, we can use an alternative solution to ip ospf mtu-ignore, which is “system mtu routing 1490”. That will enable switch to still use a SYSTEM-MTU as needed, but for routed traffic (ie the ospf) the switch will use a MTU of 1490.

But just consider the following notes about this command:
– It is not interface specific like ‘ip ospf mtu-ignore’, it is applied globally on the switch.
– The routing MTU cannot exceed the SYSTEM-MTU.         �
– Once the SYSTEM-MTU is active, changing the routing mtu does not require a reload.

 

 

Thank you

Ahmad Alhady

Posted in OSPF, Routing | Leave a comment