IOS Upgrade on Cisco WS-C4507R Chassis with Dual Supervisor V Engines

Today we will upgrade the IOS version on both WS-X4516 supervisor engines V in a WS-C4507R chassis. This blog post assumes that your 4507R chassis’s supervisor engine already has network support for you to SSH into it.

First, go to the Cisco support site and download the latest IOS version (you need a Cisco support contract to have access to new IOS images). Place this image on your TFTP server. In this example, the TFTP server is a CentOS Linux machine called alice.company.com.


scp ~/Downloads/cat4500-entservicesk9-mz.150-2.SG5.bin alice.company.com:/tmp
ssh alice.company.com

If you don’t have a TFTP server installed, then set one up.

sudo yum -y install tftp-server
sudo vi /etc/xinetd.d/tftpd
sudo chkconfig xinetd on
chkconfig –list xinetd
sudo mkdir -p /tftpboot/ios

Now move the IOS image into the /tftpboot/ios directory.

sudo mv /tmp/cat4500-entservicesk9-mz.150-2.SG5.bin /tftpboot/ios

(Optional) Create a symbolic link to the image so that you can remember which hardware it’s for. When your site has several different Cisco models, those symbolic links can be handy!

sudo ln -s /tftpboot/ios/cat4500-entservicesk9-mz.150-2.SG5.bin /tftpboot/ios/ws-c4507r

Check the file size of the new IOS image?
du -sb /tftpboot/ios/cat4500-entservicesk9-mz.150-2.SG5.bin
19458176 /tftpboot/ios/cat4500-entservicesk9-mz.150-2.SG5.bin

That means we need 19458176 bytes of free space on both supervisor engines internal flash storage. To check this, we must connect to the chassis’s IP address and check the remaining space.

ssh 172.16.1.1
switch> enable

switch# show version | inc bytes of memory

cisco WS-C4507R (MPC8245) processor (revision 4) with 524288K bytes of memory.

We appear to have enough space. If we didn’t, we could check the flash content and erase some old IOS images. For example, let’s pretend there is the old cat4000-i9s-mz.122-20.EW3.bin IOS file on the bootflash. We could remove it like this :
switch# delete bootflash:cat4000-i9s-mz.122-20.EW3.bin
Check the redundancy status of both supervisor engines.

switch# show redundancy states
my state = 13 -ACTIVE
peer state = 4  –STANDBY COLD
Mode = Duplex
Unit = Primary
Unit ID = 1

Redundancy Mode (Operational) = RPR
Redundancy Mode (Configured)  = RPR
Redundancy State              = RPR
Maintenance Mode = Disabled
Manual Swact = enabled
Communications = Up

client count = 35
client_notification_TMR = 240000 milliseconds
keep_alive TMR = 9000 milliseconds
keep_alive count = 1
keep_alive threshold = 18
RF debug mask = 0x0

As we can see, we are running in RPR redundancy mode (or Route Processor Redundancy mode). This means that if the active supervisor engines fails or reloads, all ports will loose connection for several minutes while they synchronize with the other supervisor engine. That is not super duper.

Fortunately, there is another redundancy mode which offers a faster switchover. This mode is called Stateful SwitchOver or SSO for short. When the supervisors are running in SSO redundancy mode, the switch will keep working fine with layer 2 during a supervisor switchover, but all layer 3 connections will loose the neighbor relationship for 50 milliseconds because they are synchronized while they are running.

So let’s change our redundancy mode from RPR to SSO.
switch# conf terminal
switch(config)# redundancy
switch(config-red)# mode sso
Changing to sso mode will reset the standby. Do you want to continue?[confirm]
As you can see, when we do this, the standby supervisor engine will reload. So be sure to check this supervisor’s console output. Once the standby engine is back online, double-check the redundancy mode again.
switch# sh redundancy states
       my state = 13 -ACTIVE
     peer state = 4  -STANDBY COLD
           Mode = Duplex
           Unit = Primary
        Unit ID = 1
Redundancy Mode (Operational) = RPR
Redundancy Mode (Configured)  = Stateful Switchover
Redundancy State              = RPR
Maintenance Mode = Disabled
    Manual Swact = enabled
  Communications = Up
   client count = 35
 client_notification_TMR = 240000 milliseconds
          keep_alive TMR = 9000 milliseconds
        keep_alive count = 1
    keep_alive threshold = 18
           RF debug mask = 0x0
Ok, so now only the standby supervisor engine is running in SSO mode. That’s because we can’t be in SSO mode on both supervisor engines without a reload of the active one. Which means that the standby has to take over. During that supervisor switchover, there will be a layer 2 downtime and a layer 3 downtime of about 1 to 3 minutes depending on the amount of configured ports.

So, in order to continue with the IOS upgrade, you need to schedule a network maintenance!

Download the new IOS to both supervisor engines.

switch# copy tftp:/ios/cat4500-entservicesk9-mz.150-2.SG5.bin bootflash:

Address or name of remote host [172.16.1.33]?
Source filename [/ios/cat4500-entservicesk9-mz.150-2.SG5.bin]?
Destination filename [cat4500-entservicesk9-mz.150-2.SG5.bin]?

The file will be transfered from the TFTP server into the active supervisor engine’s internal flash storage. You will see a series of exclamation points (one for each packet) and then a few captial C letters. The C letters are showned when IOS verifies the new IOS image. You can do it manually with the verify command if you prefer.

While we’re working with our TFTP server, we should make a backup of our configuration.

switch# copy run start
switch# copy start tftp

Next, copy the IOS to the standby supervisor engine.

switch# copy bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin slavebootflash:

Now check to see if both supervisor engines have the new IOS image?

switch# dir bootflash:

Directory of bootflash:/

2  -rwx    19458176  Sep 24 2012 13:08:32 -04:00  cat4500-entservicesk9-mz.150-2.SG5.bin

59244544 bytes total (26308040 bytes free)

switch# dir slavebootflash:
Directory of slavebootflash:/

1  -rwx    13478072  Jun 29 2006 11:02:01 -04:00  cat4500-entservicesk9-mz.122-31.SG.bin
3  -rwx    19458176  Sep 24 2012 13:52:19 -04:00  cat4500-entservicesk9-mz.150-2.SG5.bin

59244544 bytes total (6849736 bytes free)

We must make sure that configuration changes from the active supervisor engine is properly transfered to the standby one.

switch# conf terminal
switch(config)# redundancy
switch(config-red)# main-cpu
switch(config-r-mc)# auto-sync standard
switch(config-r-mc)# end

Now manually force a resynchronization of the supervisor engines.

switch# copy run start

Check your syslog output. When the synchronization occurs, you should see lines like these :

Sep 24 14:13:04 c4507r 230: 000258: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The bootvar has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 231: 000259: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The config-reg has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 232: 000260: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The startup-config has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 233: 000261: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC: The private-config has been successfully synchronized to the standby supervisor
Sep 24 14:13:04 c4507r 234: 000262: Sep 24 14:13:03: %C4K_REDUNDANCY-5-CONFIGSYNC_RATELIMIT: The vlan database has been successfully synchronized to the standby supervisor

Prepare the switch to boot the new IOS image.

switch# config terminal
switch(config)# config-register 0x2
switch(config)# boot system flash bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin

switch(config)# end
switch# copy running-config start-config
We can now update the IOS image on the standby supervisor image. This can be done anytime because there is no traffic disruption. To see the standby supervisor upgrade messages, make sure to connect to the standby supervisor’s console port.
switch# redundancy reload peer
This will reload the standby supervisor engine. You will see these syslog messages from the active one :
Sep 24 14:11:07 c4507r 224: 000249: Sep 24 14:11:06: %C4K_REDUNDANCY-5-CONFIGSYNC: The startup-config has been successfully synchronized to the standby supervisor
Sep 24 14:11:07 c4507r 225: 000250: Sep 24 14:11:07: %C4K_REDUNDANCY-5-CONFIGSYNC: The private-config has been successfully synchronized to the standby supervisor
Sep 24 14:11:17 c4507r 226: 000251: Sep 24 14:11:16: %C4K_REDUNDANCY-3-COMMUNICATION: Communication with the peer Supervisor has been lost
Sep 24 14:11:17 c4507r 227: 000252: Sep 24 14:11:16: %C4K_REDUNDANCY-3-SIMPLEX_MODE: The peer Supervisor has been lost
At the standby supervisor’s console, you will see the upgrade messages. Once the new IOS is up and running on the standby supervisor, you should see this in your syslog server :
Sep 24 14:13:03 c4507r 228: 000256: Sep 24 14:13:02: %C4K_REDUNDANCY-2-IOS_VERSION_CHECK_FAIL: IOS version mismatch. Active supervisor version is 12.2(31)SG,. Standby supervisor version is 15.0(2)SG5,. Redundancy feature may not work as expected.
Sep 24 14:13:03 c4507r 229: 000257: Sep 24 14:13:03: %C4K_REDUNDANCY-3-COMMUNICATION: Communication with the peer Supervisor has been established
As the messages tell you, the IOS version is not the same on both supervisor engines. That’s normal because we haven’t updated the active one yet. We can check the version of both engines with this :
switch# show module
M MAC addresses                    Hw  Fw           Sw               Status
–+——————————–+—+————+—————-+———
 1 0017.e0fa.58c0 to 0017.e0fa.58c1 4.0 12.2(20r)EW1 12.2(31)SG       Ok
 2 0017.e0fa.58c2 to 0017.e0fa.58c3 4.0 12.2(20r)EW1 15.0(2)SG5       Ok
We can see that module in slot 2 (the standby supervisor) is running version 15.0(2)SG5 while the active module in slot 1 is running version 12.2(31)SG.
To upgrade the active supervisor engine, we must issue the following. Make sure you’re connected to the active supervisor’s console port when you issue this command. THIS COMMAND WILL CAUSE A LAYER 2 AND LAYER 3 NETWORK OUTAGE! So make sure you do this on a scheduled maintenance.
switch# redundancy force-switchover
The active supervisor will reload, forcing a supervisor engine switchover. This will make the standby supervisor engine take control of the chassis. During the switchover, the supervisor that used to be the active one reloads into the new IOS version. After a few minutes, we can see that they are both running the same IOS version :

ssh 172.16.1.1
switch> enable

switch# sh mod | inc 15.0
1 0017.e0fa.58c0 to 0017.e0fa.58c1 4.0 12.2(20r)EW1 15.0(2)SG5       Ok
2 0017.e0fa.58c2 to 0017.e0fa.58c3 4.0 12.2(20r)EW1 15.0(2)SG5       Ok

We can also see that our redundancy has changed for both engines from RPR to SSO.

switch# sh redundancy states
my state = 13 -ACTIVE
peer state = 8  –STANDBY HOT 
Mode = Duplex
Unit = Secondary
Unit ID = 2

Redundancy Mode (Operational) = Stateful Switchover
Redundancy Mode (Configured)  = Stateful Switchover
Redundancy State              = Stateful Switchover
Maintenance Mode = Disabled
Manual Swact = enabled
Communications = Up

client count = 60
client_notification_TMR = 240000 milliseconds
keep_alive TMR = 9000 milliseconds
keep_alive count = 1
keep_alive threshold = 18
RF debug mask = 0x0

That’s it! We now have both supervisor engines running the new IOS version and the statefull switchover redundancy mode.

Troubleshooting

Standby Supervisor Engine Reload Loop

Sometimes the standby supervisor might go into a reload loop because of this message :
Current BOOT file is — flash:cat4500-entservicesk9-mz.150-2.SG5.bin
Invalid filename flash:cat4500-entservicesk9-mz.150-2.SG5.bin. It must begin with device name.
You are then presented with a ROMMON prompt. Hit Ctrl-C to prevent a reload. Then, at the prompt, issue a boot command to load the IOS.
rommon 1 > boot bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin
The standby supervisor engine will boot the IOS. You can then check your configuration and fix the problem.
switch# show running-config | inc boot system
boot system flash flash:cat4500-entservicesk9-mz.150-2.SG5.bin
boot system flash bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin
First line is our problem. It says to boot from flash: instead of bootflash: like the second line. So to fix our problem, we only need to remove that first line.
switch# conf terminal
switch(config)# no boot system flash flash:cat4500-entservicesk9-mz.150-2.SG5.bin
switch(config)# end
switch# copy run start
switch# sh run | inc boot system
boot system flash bootflash:cat4500-entservicesk9-mz.150-2.SG5.bin
There, that should do it. Now issue the standby supervisor reload again and see how it goes?
switch# redundancy reload peer

TFTP Configuration Backup Error

When we backup the switch’s configuration to our TFTP server, we might have an access denied. That’s because our TFTP server doesn’t have the right to write into the /tftpboot directory. A simple, yet not very secure way to fix this is to change the permissions right before we do the config backup. Then place it back to what it was.
sudo chown o+rwx /tftpboot
switch# copy start tftp
sudo chmod o-w /tftpboot

Comments are closed.