Vous êtes sur la page 1sur 7

Wednesday, October 16, 2002

ISP Essentials Supplement

Whitepaper is supplement to the Cisco Press publication The ISP Essentials by Barry Raveendran Greene, and Philip Smith. Materials can
be used with the permission of the authors and Cisco Press. Materials can be used with the permission of the authors and Cisco Press. Public
copies are available at www.cisco.com/public/cons/isp/essentials/ or www.ispbook.com .

When and How to Upgrade IOS in an ISPs Network


Supplement to Chapter 1, page 9
Version 0.3

WHEN AND HOW TO UPGRADE


ISPs should not upgrade their router software every time Cisco releases a new image. Frequent
upgrades are recognized by most industry operators as being bad practice and incur unnecessary
operational risk. The only time that any ISP should be upgrading software is when it is required
to fix bugs, patch security vulnerabilities, support new hardware, or implement new software
features. In many other industries, changing core-operating software is seen as a major event not
to be undertaken lightly. Yet for some reason, some ISPs seem to think that a fortnightly upgrade
is good practice. It is our recommendation that a critical and compelling reason must exist for an
ISP to upgrade IOS images on their router.
Based on what most Tier 1 and Tier 2 ISPs now do, software upgrades are carried out only when
they are absolutely required. Extensive testing is carried out in the test lab (how many ISPs have
a test network that looks like one of their PoPs, or a portion of their network?). Deployment
happens only after extensive testing, and even then new images are implemented with caution on
a quieter part of the network. For example, the software versions in one PoP might be updated
and left running for a week or a fortnight to check for any issues; after this initial deployment
phase, the rest of the network will be upgraded.
Caution is of paramount importance on a commercial-grade network. Even when upgrades are
carried out, remember the recommendations discussed in this section. IOS Software makes it
easier by giving backout paths through alternative images. Some of the core guidelines for ISPs
are:

Never attempt an upgrade without being aware of potential side effects from unforeseen
problems that will happen during the upgrade.

Never attempt an upgrade without a backout plan.

Cisco Systems, Inc.


170 West Tasman Drive.
San Jose, CA 95134-1706
Phone: +1 408 526-4000
Fax: +1 408 536-4100

Wednesday, October 16, 2002

ISP Essentials Supplement

Never mix a hardware and software upgrade. Do the software upgrade first, gain
confidence in the image, then do the hardware upgrade. It minimizes the confusion when
you need to troubleshoot.

Never attempt an upgrade without having read the release notes that come with the
software release. It also helps to read the release notes for all intermediate releases
because that will give the engineer good information about what has changed in the
software over the release cycle.
Key Guidelines:
Stability is the Objective
Know the Potential Side Effects during an Upgrade
Always have a Backout Plan
Do not mix Software and Hardware Upgrades
Read the Release Notes

Minimize the Number of Images


Another practice implemented by most Tier 1 and Tier 2 ISPs is to minimize the number of
different versions of IOS Software images running on their networks routers. This is almost
always done for administrative and management reasons. Apart from reducing the number of
potential interoperability issues due to bugs and new features, it is easier to train operations staff
on the features and differences between a few images than it is to train them on the differences
among many images. Typically ISPs aim to run no more than two different IOS Software
releases. One image is the old release; the other is the one on which they are doing the blanket
upgrade on the backbone. Upgrades tend to be phased, not carried out en masse overnight. If the
ISPs have access equipment, such as the AS5x00 series, or cable/xDSL aggregation devices, they
may deploy different IOS Software images on these devices. But again, if one dial box needs to
be upgraded, ISPs tend to upgrade them all to ensure a consistent IOS Software release on that
network.
A typical software version strategy is something like the following:

Core/backbone networkOne software release (xxxx-p-mz.120-17.S1) runs on all


backbone routers. The software on these routers probably is changed every six months or
even less frequently. The Internet core carries only IP packets, and rarely are new
features or capabilities added. Well-run Internet cores often have routers with uptimes
exceeding six months, sometimes even over one year.

Cisco Systems, Inc.


170 West Tasman Drive.
San Jose, CA 95134-1706
Phone: +1 408 526-4000
Fax: +1 408 536-4100

Wednesday, October 16, 2002

ISP Essentials Supplement

Distribution and leased-line aggregation layerOne software release runs on all


routers. This tends to be the part of the network that customers connect to, so often new
features and newly deployed connection services demand a more frequent software
update cycle.

Dial access layerA common software release is run on all access platforms. As with
the previous example, a more frequent cycle might be necessary. Some ISPs build new
infrastructure for new services, so when infrastructure is unchanging, it makes little sense
to upgrade software. Some dialup networks that we have had experience with have
hardware running the same software image for several years.

VPN access layerA common software release is run on all platforms. This example is
included because it is the current fashion in the industry. Often ISPs use bleeding-edge
software and hardware to deliver VPN services, and frequent upgrades for new features
can be necessary from time to time. Again, the usual rule applies: Dont change it unless
new features are necessary; it saves the customers from going through pain.

Software Review Meeting


Some of the bigger ISPs have weekly software strategy meetings, with the aim to ensure
consistency across the company business for software deployed on the backbone. New software
has to be approved across the engineering, security, and operations management. It is then
deployed only after fairly intensive proof and confidence testing in the lab. Software version
consistency monitored by the ISPs NOC, often through automatic or cron-based tools that log
into all the routers and other equipment and grab the version number of the running software and
the contents of the routers Flash memory.
ISPs should insist on vendor participation in these software review meetings. While most
information on the bugs fixed can be seen in the release notes, other information (i.e. pending
problems) are not as easily seen. Some vendors like Cisco - provide an extensive Bug
Navigator Tool on the Internet support site. These tools can provide a list of pending bugs.
Vendor participation can help point out these pending issues while adding more insight with data
from internal Development and Regression Testing.
Finally, adopting some strategy is strongly recommended. Having no strategy usually means that
in times of crisis during network problems, the operations engineers will resort to a random walk
through different software versions in the desperate hope that something might work to stabilize
Cisco Systems, Inc.
170 West Tasman Drive.
San Jose, CA 95134-1706
Phone: +1 408 526-4000
Fax: +1 408 536-4100

Wednesday, October 16, 2002

ISP Essentials Supplement

a network problem. Having strong control over software versions will mean that diagnosing
network problems can be achieved more easily.

Software Test Lab and Certification


The largest ISPs have exclusive Software Test & Certification Labs that simulate major sections
of their network. In a way, these large ISPs have re-learned the lessons of the Voice
Telecommunications world with duplicate testing facilities to test, emulate, and certify
equipment and software before it is allowed on the live network. Yet, most ISPs in the world
cannot afford this huge investment. What they can afford is a small multipurpose lab where they
can stress test the new IOS images before putting on their network.
Before going further, one key principle of testing must be understood. Test in a lab environment
will never cover all the possibilities on an ISPs network. There will always be something that
cannot be effectively simulated in the lab. Hence, a methodical phased rollout on the network is
highly recommended. Test Lab plus phased rollouts have been the only way demonstrated to
insure confidence in the new image.
Software testing in the lab should try as much as possible to follow a documented test plan. This
test plan will be small in the beginning. Over time it will grow as lessons are learned and new
features/functionality test are written. Cisco System Engineers may be able to help get examples
of test plans used internally by Ciscos Software Regression Teams. The key is to have a
methodology, documentation that can be peer reviewed, and consistency between the testing of
various images. Essentially the ISP wants to be able to run a test and if they find a problem, have
the vendor replicate the test in their facilities. This speeds the fixes of problems and provides a
validation metric for the testing of fixed software.
Traffic Generation in the test lab can be achieved through a variety of means. Commercial
software, shareware software, specialized testing equipment, and homemade scripts have all been
used. The recommended theme is to use a traffic generation tool in which the vendor can
replicate the test. For example, Ciscos PAGENT software is provided through special license to
customers with test labs.1 These customers can then use spare Cisco routers as the traffic
generation tool. Pagent scripts can then be provided for replication and validation test inside
Cisco speeding the resolution to problems found in the lab. Use of other commercial testing
packages accomplishes the same objective allowing an exchange of testing scripts, validation,
and peer review. In summary, use a traffic generation tool available to others.
1

Ask your Cisco Systems Engineer for the Network Verification Tool (NVS). This is the public name for the Pagent
suite of testing software designed to run on Ciscos routers.
Cisco Systems, Inc.
170 West Tasman Drive.
San Jose, CA 95134-1706
Phone: +1 408 526-4000
Fax: +1 408 536-4100

Wednesday, October 16, 2002

ISP Essentials Supplement

Phased Software Roll Outs


Phased software rollouts are nothing new to the industry. It has been used in the early days of
mini-computer operations where a new operating system version was methodically phased in
over time in a way that provided minimized risk, software validation, incremental back-outs, and
maximum confidence. These lessons have been applied to ISP Operations in the way their
networks are upgraded. There is no one set of rules since every ISP Network and its operations
team has their own flavor. Yet, there are key principles and lessons that can be used by all ISPs
in their phased rollouts.

Minimize Risk. Selecting specific routers and sections of the network that have a low
impact on the overall operations of the network minimizes risk. For example, some ISPs
use redundant routers in a part of the network with a lot of redundancy. If the router used
for the phased deployment has an unexpected failure, service impact is minimized
through the redundancy.

Software Validation. Validating the features, functionality, and stability of the software
is paramount. The ultimate validation is running the software live in the network. Most
ISPs will do this initial step on a router with maximum redundancy (minimizing risk).
They will leave the software on for a given period of time, watching a variety of metrics,
seeking to determine if something was missed in their lab testing. The next phase of the
software rollout will be triggered once a given time as passed on this software validation
router.

Incremental Backouts. Backout plans are essential. What is even more essential is for
the back-out plans to be incremental. One of the most common back-out techniques is for
the ISP to roll back every router in the network to the original image. This, despite having
the new software running OK on other parts of the phased rollout. Often something may
happen during a deployment phase that has nothing to do with the software. Other times
it has everything to do with the software. So an ISP who have completed three phases of
rollout and runs into a problem on phase four, should fall back to phase three. At that
point, detailed analysis of what happen should occur. That analysis will determine if the
backout should continue or if there was something else happening in the network.

Maximum Confidence. The ultimate goal with any phased rollout is maximized
confidence in the software. This only happens with time live in the network. Many ISPs
have key time periods between each rollout phase insuring the team has confidence and

Cisco Systems, Inc.


170 West Tasman Drive.
San Jose, CA 95134-1706
Phone: +1 408 526-4000
Fax: +1 408 536-4100

Wednesday, October 16, 2002

ISP Essentials Supplement

consensus before moving forward to the next phase. When everything works, the
operations team can assure itself that everything that could be done has been done.

Rapid Certification and Rollout


Cisco Systems is one of the few (if not only) networking company in the industry to publicly
announce all security vulnerabilities.2 When these security vulnerabilities are announced, ISPs
are advices to perform their own security risk assessment. If the vulnerability is deemed critical,
a rapid test, validation, and certification of the fixed images are warranted. This rapid
certification and deployment plan is worth having. It will minimize stress during times of
security incidents on the Internet.

Lessons Learned
Learning from past experience and the experience of your peers will minimize risk during an
upgrade cycle. After each successful software upgrade on a network, review what worked, what
did not work, and where improvements can be made. Update and document this experience so
that the next upgrade will build on the past experience. Besides what has been highlighted
earlier, some of the lessons weve learned include:

Check Flash on all the Devices. Many times ISPs get into the middle of an
upgrade cycles and find that the devices flash is not large enough to hold the
image load (normally the old image and the new image).

Check the Route Processor Memory. As a rule of thumb, always assume a new
IOS image will consume more memory than pervious versions. New features and
functions will add to this memory consumption. Check all the Route Processor
memory (including secondary route processors) to insure there is enough
memory.

Check Line Card/VIP Memory. Distributed architectures have Line Cards with
their own memory and processor requirements. A general rule of thumb is that

The Product Security Incident Response Team (PSIRT) handles all Cisco Security Vulnerabilities. Information can
be found at http://www.cisco.com/go/psirt/
Cisco Systems, Inc.
170 West Tasman Drive.
San Jose, CA 95134-1706
Phone: +1 408 526-4000
Fax: +1 408 536-4100

Wednesday, October 16, 2002

ISP Essentials Supplement

the Line Cards memory should be half of the required memory on the Route
Processor. 3

Check the CPU Load of all Route Processors and Line Cards. Know your
networks condition before the upgrade started. Many times a engineer will
upgrade a router, get it back into production, then notice a CPU level is at
99%/99%. The immediate assumption is that this CPU spike is caused by the
upgrade. That assumption can be false. With out a data point before the upgrade,
the engineer will not know if the CPU spike is caused by something in the
software or if there was a pre-existing problem on the device.

Check the Logs. What has been happening on the day before the upgrade?
Examination of the logs will provide the ISP Engineer insight into potential preexisting problems.

Of course the authors advise all ISPs to max out all memory. It is more expensive to execute field upgrades of
memory than it is to max out the memory at the time of purchase. Memory at the time of purchase is a depredated
capital expense. Field upgrades are a operational cost that incurs downtime, field upgrade time, and potential
outages.
Cisco Systems, Inc.
170 West Tasman Drive.
San Jose, CA 95134-1706
Phone: +1 408 526-4000
Fax: +1 408 536-4100

Vous aimerez peut-être aussi