Cisco Standby Supervisor FIX

On a current project I was tasked with fixing a broken standby supervisor card in a 4500 series switch and thought I would share the experience and the fix applied.

For those of you not familiar, the Standby Supervisor cards provide your chassis with redundancy on your primary (and most vital) links on your network. By fitting 2 or more Supervisor line cards to your chassis you can have a primary and a HOT secondary ready to take over should one card fail.

In this case study, said supervisor engine had failed on the secondary card and as such the card showed as DISABLED, meaning that the redundancy would require manual intervention (or not work at all). This completely defeats the object of the cards and obviously needed to be fixed.

If you do a show module on your system you should see the below if configured in SSO mode:

Mod  Redundancy role     Operating mode      Redundancy status

—-+——————-+——————-+———————————-

5   Active Supervisor   SSO                 Active

6   Standby Supervisor  SSO                 Standby hot

In said system, module 6 was showing a redundancy status of DISABLED.

Steps to resolve:

1. Back-up all configs
2. At # prompt type config-register 0x2 – this sets the configuration register on your switch to the recommended value for 4k platforms and means that the unit will boot the latest iOS.
3. Enable logging to your console session – at conf t level type logging console
4. write mem
5. Once complete you are going to reload the redundant peers (this will result in downtime so ensure you have agreed the change). Type redundancy reload peer at the hash prompt.
6. This takes a few mins to complete so leave your console running so you can view the output and go make yourself a coffee 🙂
7. Once done, the cards should synchronise all of their running configuration and should come up in their desired states as above with your secondary card in STANDBY HOT state.

The standby hot state means that should the primary slot fail, it will failover within 3 seconds and *touch wood* you should not drop a single packet.

You can simulate the switchover to ensure they go over by running redundancy force-switchover from the # prompt. This will not fail them over within 3 seconds as it would normally but it will prove that the config is good and the protocol works.

Tidy-up:

Make sure you only have the current iOS version running on both your bootflash and slavebootflash directories. By all means have other iOS versions stored on a central repository, but never on the device itself.

To see what you have on there run dir bootflash: from # prompt:

dir bootflash:

Directory of bootflash:/

6  -rw-    25646261  Jul 10 2010 12:09:52 +01:00  cat4500e-entservicesk9-mz.122-53.SG2.bin

131436544 bytes total (98299904 bytes free)

To see what you have on slave flash run dir slavebootflash: from # prompt:

dir slavebootflash:Directory of slavebootflash:/
6  -rw-    25646261  Jul 10 2010 12:24:02 +01:00  cat4500e-entservicesk9-mz.122-53.SG2.bin
131436544 bytes total (98287616 bytes free)

NB: Ensure these are both on the same iOS version (as was not the case when I fixed this chassis) and that only 1 iOS exists on either flash directory (the one you are intending on booting from).

If you need to remove any unused iOS versions run delete bootflash:[iOSname.bin] and delete slavebootflash:[iOSname.bin] from # prompt.
Any questions please ask.

Comments are closed.