Juniper Event Scripts – A brief HOWTO

A few weeks back there was a requirement to deploy 2 SRX5600 devices in the core of the network. Yes, I know, a hugely bad plan with first generation hardware, buggy as hell and we are still suffering from the event. However, what came to light was the need to implement a trigger that would shut down an interface should one of the SRX’s turn into retarded mode and start doing stupid things or the interconnecting layer 2 trunk carrying HA traffic died. Yes, we plan for events just such as these.

Anyway, I had already written a few OP scripts and am fairly OK with coding so after a short search on “Junosscriptorium“, a site that contains as they put it “A Repository for JUNOScripts: Commit, Event, and Op scripts for JUNOS“, I found an example of what I needed.

The script I found was called “toggle-interface.slax” and its function simply disabled or enabled an interface based on a configured trigger event.

Now, although thanks must definitely go to the author, an “Efrain Gonzalez“, the script wasn’t exactly what I was looking for. It needed some minor modifications for my environment, but the script wasn’t hard to modify, and I am fairly certain the original author wouldn’t mind if people were hacking up his script to get whatever functionality they required out of it.

So, my environment is shown below, simplified for obvious reasons.

Basic Topology

So, to explain, the SRX’s are acting in an “Active/Active” HA manner, and pass their HA signaling traffic along a VLAN between Router-A and Router-B. During this implementation, there was only a single layer 2 path between Router A and Router B, but that layer 2 path was not given any resilience through Router 3. The reason for this being that the powers that be decided that Spanning Tree Protocol wasn’t supportable by my NOC teams, so a single trunk is what I had to work with.

Now, to complicate matters even more, all VRF’s on the network that participate in certain unmentioned government services have a requirement to be routed via one of the SRX’s. So, to facilitate this, a default route is dropped into these VRFs that points to an interface on the SRX. The SRX then does whatever it does and spits the traffic back out into an egress VRF towards the centralized service that is the final destination.

Now, what happens if the 10Gigabit Ethernet link between Router-A and Router-B fails? Traffic will still pass to both routers via normal routing and MPLS to each SRX. But, the Layer 2 interlink between Router-A and Router-B which is carrying the HA traffic required by the SRX’s for signaling will be dead. So, if the link goes down, my “Active/Active” functionality disappears, and the SRX’s go into what is termed “Split Brain” mode, whereby each can service traffic, but neither device carries state info about the other device. This can have a nasty effect on any flows that pass through the firewalls as one would expect.

So, if the interlink went down, I needed a way to shut down the SRX facing interface on Router-B to stop “Split Brain” from happening. The trigger for doing this action was a “Link Down Event” on the interlink.

So, we can already see what we need here, an event script that monitors for the “Link Down” event on the site to site interface, which in turn runs the “Event Script” to shut down the SRX facing interface.

It should also be noted that in order for the “Active/Active” firewalls to carry the same IP addressing, VRRP was implemented on the SRX facing firewalls with its signaling going across the layer2 trunk between sites.

Now, to the script and its configuration…….

First, we defined the location of the script using the following Junos syntax:

user@mx960-A> show configuration system scripts
op {
file toggle-interface.slax;
}

Note that this simply means that the file path “/var/db/scripts/op/toggle-interface.slax” is used as thats where Op scripts are located. There doesnt seem to be a way of getting the script to go to the “/var/db/scripts/event” directory, I tried for several hours to get this path working, but failed.

Next, we defined the event options on Router-B that would hopefully will monitor for the failure event, and react to it. These take the form of “Event Policies” as you can see below.

Now, we can see by the first part of the configuration that we are looking for a “snmp_trap_link_down” event, and when that happens we try to match that event with an interface, in this case “xe-11/0/0.1001” which happens to be the logical interface carrying our HA signaling traffic. If this goes down, then we fall into the “then” clause of the configuration.

The “then” clause calls the script “toggle-interface.slax” with 3 arguements. The first just makes the output go to somewhere, in this case the value of 2 means that the output will go nowhere, i.e. there will be no output from the script running.

The second argument is the interface that we want to shut down, in this case “xe-11/3/0“. The third argument is what we want the new state of the interface to be. In our case we want the interface to be “admin down” or disabled.

In the second part of the configuration, we do the opposite, we look for a “snmp_trap_link_up” event, and match to the correct interface. If the interface matches we again fall into the “then” clause and run the script, this time with the final argument to bring the “xe-11/3/0” interface back into its up state.

The Toggle-Interface.slax Script

Now, for completeness Ill show the actual script that is running. Bear in mind that this is my modified version, but the original source code is, as mentioned previously, available on the “Junosscriptorium” web site.

Next, Ill show some logs of the event actually taking place:

Sep  7 12:08:24  MX-A mib2d[1759]: SNMP_TRAP_LINK_DOWN: ifIndex 202, ifAdminStatus down(2), ifOperStatus down(2), ifName xe-11/0/0
Sep  7 12:08:24  MX-A fpc1 XETH(1/1): disabled Link 1.
Sep  7 12:08:24  MX-A fpc1 XETH(1/1): disabled Link 1.
Sep  7 12:08:25  MX-A root: invoke-commands: Executed /tmp/evt_cmd_a8cHo4, output to /tmp/evt_op_9p9Fra in text format

Its hard to see what is actually happening here, but the event script is being run.

user@MX-A# run show interfaces xe-11/3/0
Physical interface: xe-11/3/0, Administratively down, Physical link is Down

So, as we can see, its fully working. I hope this example is of some use for people out there who have to hack up solutions as Ive had to do.

Thank You

Andy Wilson

This entry was posted in Routing, Switching. Bookmark the permalink.

Leave a Reply