How OSPF SPF Adaptive Timers are implemented in IOS and JUNOS

It became a fact that both of Cisco Systems and Juniper Networks have proved their strong market penetration and most of the operators and providers deploying the various platforms of both of them. Based on this, it became an essential for the networking engineers specially those who are working on operator’s environment to know how each vendor’s platforms are architectured, and how their OS are structured as well as how to configure it. However this will not be adequate for the design engineers who has to assure their multi-vendor network are perfectly merged and converged without any interoperability issues, so, they have to dig more and understand how each of the leading vendors are implementing the technologies and matching the RFCs.

Today we will start explaining how each of Cisco Systems and Juniper networks are implementing the OSPF SPF adaptive timers or what called SPF throttling (Cisco) or SPF hold-down (Juniper)

Before we dig into that, let’s talk a little bit about what OSPF SPF Adaptive Timers are designed to do for us, and then we’ll take a look at how each vendor is implementing the concept.

If we can recall from our OSPF background, OSPF SPF algorithm has design to run upon arrivals of LSAs. So, if each LSA triggers a full or incremental SPF run, and if they are arriving fast, SPF can begin eating up the majority of your CPU.

The challenge in large-scale networks is to quickly react to network changes while at the same time not allowing SPF calculations to dominate the route processors. This is the goal of SPF delay, also called SPF hold-down or SPF throttling.

Rather than kick off an SPF calculation every time a new LSA/LSP arrives, SPF delay forces the router to wait a bit between SPF runs. If a large number of LSA/LSPs are being flooded, a delay between SPF runs means that more LSA/LSPs are added to the link state database during the hold-down period. Efficiency is then increased because when the hold-down period expires and SPF is run, more network changes are included in a single calculation.

But this efficiency you are getting from SPF delay, it has its costs which it increase your network convergence time. So, the challenge is to set the delay interval long enough when abnormal things happen while keeping it short when the network is stable so you got a quick convergence. This leads to the concept of adaptive SPF timers.

Both Cisco and Juniper are offering adaptive SPF timers, but with different approaches. In the coming sections, we are going to explain the mechanism used by each vendor.

Adaptive SPF Timers in JUNOS

 

Juniper Networks uses a linear fast/slow algorithm for adaptive SPF timers. So, it introduced the SPF delay timer which is the minimum delay in the time between the detection of a topology change and when the SPF algorithm actually runs. This period is 200ms by default. The period is configurable with the spf-delay command to between 50 and 8000ms.

Secondly, they introduce a second parameter which is rapid-runs.  If three (the default) SPF runs are triggered in quick succession, indicating instability in the network, the router will enter the “slow mode” and a third parameter called the hold-down timer will start. Any subsequent SPF calculation is not run until the hold-down timer expires. The routers remain in this “slow mode” until the hold-down period have passed since the last SPF run—indicating that the network has converged—and then switches back to “fast mode”, and the system reverts to the configured values for the delay and rapid-runs statements.

 

The default values for SPF calculations in JUNOS can be seen below:

Default SPF timers values in JUNOS
r2@r2> show ospf overview | match SPF
Full SPF runs: 280SPF delay: 0.200000sec, SPF holddown: 5 sec, SPF rapid runs: 3

 

Changing SPF Timers in JUNOS

The configuration stanza for JunOS shows how these settings may be changed.

These default values can be changed with the following command:

[edit protocols ospf]
r1@r1> set spf-options delay milliseconds holddown milliseconds rapid-runs number

 

Now we are going to play with the timers and run the debugs, and examine the behavior. We will set the delay to 1 sec and the hold-down timer to 20 sec while keeping the rapid-runs as default.

 

The log entry below shows, on lines 2,6 and 10, that the SPF run occurs every 1 second after the LSA Update. Once the SPF run has completed 3 iterations it moves into a slower mode of operation.

The next log entry shows that SPF started after 20 sec from the SPF run (at t=12:02:13). The default number of SPF calculations that can occur in succession is 3. The range that you can configure is from 1 through 5. Each SPF algorithm is run after the configured SPF delay. When the maximum number of SPF calculations occurs, the hold-down timer begins. We previously configured this to be 20 seconds. Any subsequent SPF calculation is not run until the hold-down timer expires. This is why the received LSA update on line 4 does not immediately trigger an SPF run.

Next, the log shows the router once again enters the fast mode…

We can also observe from the previous log that although 3 more SPF runs have taken place, the router does not move into slow mode again. This is because there has been 50sec between the first and the last SPF run in the set of 3. If the 3 SPF runs happen within 3 x “delay value“, or in our case 3 seconds, the router will start to throttle the number of SPF runs, and start the holddown timer countdown. If the SPF runs are outwith 3 x the configured delay value, the rapid-run counter is reset to 0 and no back-off algorithms are run.

Now, shown in the next log snippet, the router will enter the slow mode and the holddown timer will start, because three SPF runs have occurred in succession.

And finally, the following log shows that SPF again started after 20 sec from the last SPF run (at t=12:04:16)

 

The figure below is charting the above debug which can help you in more understanding the JunOS behaviour with the SPF timers

 

Adaptive SPF Timers in IOS

Cisco Systems introduced an exponential backoff algorithm for the adaptive SPF timers by using three different configurable timers.

This exponential functionality limits the number of SPF computations during times of network instability by doubling the delay associated with the SPF run, up to a maximum hold delay, for the period of instability. When the period of instability ends, the delay is reset to the original value. Three timers are associated SPF exponential backoff: Start Time, Initial-Hold Time, and Max-Hold Time.

IOS internally has an internal timer called the waiting-interval which the SPF computation will be delayed till it expires. When a topology change is received for the first time, the waiting-interval will be set to the start timer which is similar to the spf-delay in JUNOS, and the SPF computation is delayed for the value set by start timer. When the SPF computation completes, a waiting-interval starts with the value of the initial-hold timer and the router will enter the “slow mode”. If there is a topology change during waiting-interval, the SPF computation will run at the expiration of the initial-hold timer. At the completion of the SPF computation the waiting-interval is set to the twice the value of initial-hold timer and then run again. So for example, if the start timer is 100ms and the initial-hold timer is 1000ms, the router delays the first SPF run by 100ms, the second by 1000ms, the third by 2000ms, the fourth by 4000ms, and so on.

The waiting-interval grows exponentially as 2^t*initial-hold until it reaches the max_hold-time value. After this, any topology change during the current waiting-interval would result in the next SPF computation will run at the expiration of the max hold time and next waiting-interval being equal to the constant max-hold timer. This ensures that exponential growth is limited. If the SPF has not run for twice the time specified by the max-hold timer, the router switches back to “fast” mode in which the start delay timer is used and the waiting-interval is reset back to the initial value.

The default values for SPF calculations in IOS can be seen below:

 

Default SPF timers values in IOS
R2#sh ip ospf | i SPF�
Initial SPF schedule delay 5000 msecs
Minimum hold time between two consecutive SPFs 10000 msecs
Maximum wait time between two consecutive SPFs 10000 msecs

Changing SPF Timers in IOS

These default values can be changed with the following command:

R1(config)# router ospf 100
R1(config-router)# timers throttle spf spf-start spf-hold spf-max-wait

 

As we did above with JunOS, will play with the SPF throttle timers and run the debugs, and examine the behavior. We will set the Start delay timer to 1 sec and the initial-hold timer to 5 sec and the max-hold timer to 50 sec.

The log entry below shows, on lines 2, that the SPF run at t= 21:30 which is 1 second after the LSA Update, and the next wait_interval set to the initial-hold time which is 5 sec as shown in line 5.

The next log entry shows that the waiting_interval is getting doubled after each SPF run. Starting with a waiting_interval equal to 5 sec which is the initial-hold timer as shown on line 3, the next waiting_interval on line 8 is set to 10 sec  then to 20 sec on line 14 and 40 sec on line 22.

While the router is in the slow mode no SPF will run until the wait_interval elapses no matter how many topology changes have been detected. This is why the received LSA update on lines 13 and 14 and also on lines 20,21,22 and 23 does not immediately trigger an SPF run.

The next log entry shows that the waiting_interval is reached the max-hold time (50 sec) and upcoming waiting_interval being equal to the constant max-hold timer as on lines 3 and 8 .

We can also observe from the previous log that although the LSA on line 6 arrived 60 sec after last SPF run has taken place which is more than the waiting_interval , the router does not move into fast mode again. This is because that the condition is that to divert back to the fast mode the SPF should not run for twice the time specified by the max-hold timer.
Now, shown in the next log snippet, the router will enter the slow mode and the holddown timer will start, because the SPF has not run for 100 sec which is twice the time specified by the maximum delay period.

For more clarity, I have reflected the debugs on the following figure, so you can use both the debugs and the figure to examine the behavior

 

 

I hope that you have enjoyed this article and it helped you understanding SPF implementation in both IOS and JunOS. Also would appreciate to leave your comments if any.

 

 

Thank you

Ahmad Alhady & Andy Wilson

This entry was posted in OSPF, Routing. Bookmark the permalink.

Leave a Reply