Académique Documents
Professionnel Documents
Culture Documents
Bootcamp
Chapter 13
Catalyst 6500 High Availability
© 2006 Cisco Systems, Inc. All rights reserved. CISCO PARTNER CONFIDENTIAL 1
Agenda
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 2
Physical Redundancy
Catalyst 6500 integrated hardware resiliency
§ Separate control and forwarding plane
§ Redundant Supervisors (1:1)
NSF/SSO switchover results in sub-second recovery
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 6
NSF/SSO
Introduction
Redundant
Redundant Supervisors
Supervisors
Redundant
Redundant Supervisors
Supervisors …Nonstop
…Nonstop Forwarding
Forwarding and
and Stateful
Stateful
IfIf the
the active
active Supervisor
Supervisor fails
fails due
due to
to Switchover
Switchover (NSF
(NSF and
and SSO)
SSO) result
result in
in
aa hardware
hardware or
or software
software fault…
fault… sub-second recovery on the
sub-second recovery on the
standby
standby Supervisor
Supervisor
Route Route
Route
Route Processor
Processor Processor
Processor Redundancy
Redundancy
Redundancy Redundancy
where
where the where
where the
the redundant
redundantthe
Sup
Sup is
is not
not
RPR
RPR redundant Sup is not initialised 90
90 sec
sec failover
failover
redundant Sup is not initialised
initialized
initialized
Route
Route Processor
Processor Redundancy
Redundancy Plus
Plus -- Redundant
Redundant Sup Sup is
is not
not
RPR+
RPR+ 30+
30+ sec
sec failover
failover
stateful
stateful -- L2
L2 protocols
protocols restart
restart and
and state
state table
table is
is purged
purged
Stateful
Stateful Switchover
Switchover -- on
on Switchover,
Switchover, physical
physical links
links kept
kept up
up --
SSO
SSO 0-3
0-3 sec
sec failover
failover
Sup
Sup redundancy
redundancy is
is stateful
stateful for
for L2
L2 protocols
protocols and
and hwhw tables
tables
Non
Non Stop
Stop Forwarding
Forwarding with
with Stateful
Stateful Switchover
Switchover -- on
on
NSF/SSO
NSF/SSO Switchover,
Switchover, allow
allow packet
packet routing
routing to
to continue
continue until
until L3
L3 protocol
protocol 0-3
0-3 sec
sec failover
failover
converges
converges
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 7
NSF/SSO
SSO Operation
Layer
Layer 22 Control
Control Plane
Plane information
information and
and state
state
synchronized
synchronized
-- Spanning
Spanning Tree
Tree State
State
-- Trunking/channeling
Trunking/channeling
Active -- Port
Port state
state (link
(link up/down)
up/down)
-- Security/IP
Security/IP phone
phone state
state
Standby
Hardware
Hardware tables
tables Synchronized
Synchronized
-- FIB/ADJ
FIB/ADJ Tables
Tables
-- QoS
QoS and
and Security
Security ACLs
ACLs
-- MAC
MAC address
address tables
tables replicated
replicated
What it Does
Stateful Switch Over synchronizes Layer 2, ACL, and state information. Beneficial for wiring
closet deployments with dual supervisor engines Works in conjunction with Non-stop
Forwarding (NSF) to ensure total Supervisor resiliency in Layer 3 environments
Benefit
Seamless Supervisor Engine sub-second switchover with NO interruption to packet forwarding
and Layer 2 sessions. IP Telephone calls do not drop. Wireless access points do not need to
re-authenticate with network
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 8
NSF/SSO
SSO Operation
Standby
Active Standby
Active
No
Sup1a DFC’s Not affected by
Support
for SSO
DFCx Supervisor Failover
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 9
NSF/SSO
NSF Operation
Active Graceful
Gracefulrestart
restartfor
forLayer
Layer33Routing
Routing
Standby Protocols
Protocolsbetween
betweenSupervisors
Supervisorsand
and
other
otherLayer
Layer33devices
devices
§ What it Does
NSF maintains Layer 3 route and protocol state information. Works in conjunction
with Stateful Switch Over (SSO) to ensure total Supervisor resiliency
§ Benefit
Routing protocols (OSPF, BGP, EIGRP, IS-IS) do not have to re-converge,
ensuring better network availability. Frame Relay, PPP and ATM sessions on
router modules to not reset
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 10
NSF/SSO
NSF Terminology
NSF Aware
§ NSF Capable Router
NSF Capable
(restarting router)
A router that preserves it’s forwarding table and
rebuilds it’s routing topology after an RP switch over;
currently a dual RP router
§ NSF Aware Router (peer)
A router that assists an NSF capable during restart
and can preserve routes reachable via the restarting
router
§ NSF Unaware Router
A router that is not capable of assisting an NSF NSF Aware
Capable router during an RP switchover
§ NSF Capable Router is § NSF – Nonstop Forwarding
NSF Aware, too!!!!
Cisco terminology and marketing
§ SSO Aware or HA aware name for feature set
Cisco IOS subsystem – an HA client § Graceful Restart (GR)
Term used in some protocol
standards and drafts
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 11
NSF/SSO
Building Relationship
During Restart
– I Will Preserve My
“I Can Preserve My
Forwarding Table
Forwarding Table
– I Will Not Declare
During Restart. Agreement You Dead
– I Will Not Inform
My Neighbors
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 12
NSF/SSO
NSF
OK. I Acknowledge.
Restart Notification I Will Stick to My
I Have Restarted
and Acknowledgement Agreement
Knowledge Transfer
I Will Use Your This Is My Knowledge
Knowledge to of the Network
Build My
Database Updates
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 13
NSF/SSO
NSF/SSO synchronization process
Active Standby
Supervisor Supervisor
Synchronization
RP CPU Configuration
RP CPU
Routing Protocol
process
Control Path
Hardware Tables
Hardware Synchronization Hardware
FIB Adjacency FIB Adjacency
Table Table Table Table
Forwarding Path
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 14
NSF/SSO
NSF Switchover Details
Active Supervisor Fails
1 Newly Active Supervisor
ic ation RP
RP CPU
Control Plane
ti f
s ta rt No CPU 5
Re are
OSPF EIGRP IS-IS BGP
7 A w Process Process Process Process
NSF Control
. I Am
iz a tion 9
o
Hell nc hron Path
7 s e Sy Routing Information Base ARP Table
b a
Data 10
8 2 6
4
Cisco IOS CEF Tables Global Epoch = 1
FIB Table Adjacency Table
Prefix Next Hop InterfaceEpoch Next Hop MAC Epoch
10.2 10.1.1.1 Vlan 10 01 1
10.1.1.1 AA-BB-.. 0
192.1 192.168.1.1Vlan 192 0 192.168.1.1 EE-DD.. 0
1
Forwarding Path
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 15
NSF/SSO
NSF Switchover Details
1. Switchover is triggered. Standby Supervisor (shown) becomes active.
2. Control plane and data plane separation: the FIB is detached from the RIB.
3. Packet forwarding continues based on last-known FIB and adjacency entries while the
standby takes over.
4. The global epoch number is incremented.
5. The Supervisor brings its interfaces and control plane online.
6. The software adjacency table is populated with the pre-switchover ARP table contents.
Updated CEF entries receive the new global epoch number. New adjacency entries are
downloaded in hardware.
7. The routing protocol specific neighbor and adjacency reacquisition occurs.
8. The routing protocol specific database synchronization occurs.
9. The RIB is repopulated with new routing entries. The corresponding CEF entries are
updated.
10. Updated entries receive the global epoch number to indicate that they have been
refreshed. Corresponding FIB entries and hardware entries are updated.
11. Each routing protocol notifies CEF that it has converged. Once all of them have
converged, the last one flushes the stale route and adjacency information.
12. The IOS CEF tables on the RP and the forwarding tables on the SP and PFC are now
synchronized. Generic non-NSF specific operations can take place.
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 16
Design Considerations for NSF/SSO
NSF and Hello Timer Tuning?
§ NSF is intended to provide availability through
route convergence avoidance Neighbor Loss, No
§ Fast IGP timers are intended to provide Graceful Restart
availability through fast route convergence
§ In an NSF environment dead timer must be
greater than SSO Recovery + RP restart + time
to send Si Si
first hello
§ Switches running Native IOS
OSPF 2/8 seconds for hello/dead
EIGRP 1/4 seconds for hello/hold
§ Switches running Hybrid
OSPF 3/12 seconds for hello/dead
EIGRP 2/8 seconds for hello/hold
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 17
NSF/SSO
Failover Results
§ Time to recover the data plane depends on how fast the forwarding
engine, switch fabric and bus can be recovered
System Bus
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 18
NSF/SSO
NSF Comparison
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 19
NSF/SSO
Supervisor Uplinks
§ Cisco Catalyst 6500: both the active supervisor and the standby supervisor
uplink ports are active as long as the supervisors are up and running
Uplink ports go down when the supervisor is reset
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 20
NSF/SSO
Supervisor Uplinks and Pre-IOS 12.2(18)SXF5 issue overview
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 21
NSF/SSO
Supervisor Uplinks
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 23
ONLINE INSERTION AND REMOVAL
OIR Improvements
Time of bus stall is from when line card touches the long pin to when it
touches the shortest pin
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 25
ONLINE INSERTION AND REMOVAL
Online Removal Operation
Time of bus stall is from when line card touches the shortest pin to when it
touches the longest pin
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 26
Agenda
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 27
GOLD
Introduction
§§ GOLD
GOLD defines
defines aa common
common framework
framework for
for diagnostics
diagnostics operations
operations
across Cisco platforms running Cisco IOS Software.
across Cisco platforms running Cisco IOS Software.
§§ Goal:
Goal: check
check the
the health
health of
of hardware
hardware components
components and and verify
verify proper
proper
operation
operation of
of the
the system
system data
data plane
plane and
and control
control plane
plane at
at run-time
run-time
and
and boot-time.
boot-time.
§§ Provides
Provides aa common
common CLI CLI and
and scheduling
scheduling for
for field
field diagnostics
diagnostics
GOLD
GOLD Tests
Tests
Bootup
Bootup Tests
Tests (includes
(includes online
online insertion)
insertion)
Health
Health Monitoring
Monitoring Tests
Tests (background
(background non-disruptive)
non-disruptive)
On-Demand
On-Demand Tests
Tests (disruptive
(disruptive and
and Non-disruptive)
Non-disruptive)
User
User Scheduled
Scheduled Tests
Tests (disruptive
(disruptive and
and Non-disruptive)
Non-disruptive)
CLI
CLI access
access to
to data
data via
via Management
Management Interface
Interface
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 28
GOLD
How does it work?
Ports
Ports working
working Linecards
Linecards working
working
properly?
properly? properly?
properly?
Standby
Standby Sup
Sup ready
ready to
to
Is
Is the
the supervisor
supervisor take
take over?
over?
control
control plane
plane and
and
forwarding
forwarding plane
plane
functioning
functioning properly?
properly?
GOLD
GOLD cancan catch
catch the
the
following:
following:
Port
Port Failure
Failure
Backplane Bent
Bent backplane
backplane pin
pin
Backplane
connection Bad
Bad fabric
fabric connection
connection
connection
working? Malfunctioning
Malfunctioning PFC/DFC
PFC/DFC
working?
Bad
Bad memory
memory
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 29
GOLD
Diagnostic Integration
Configuration/reporting
Boot-up
Boot-up Diagnostics
Diagnostics
•Default corrective action
Supervisor reset
Configure online diagnostics Runtime
Runtime Diagnostics
Diagnostics Supervisor switch-over
Fabric switch-over
and check diagnostics results Port shut down
Scheduled
Scheduled Line card reset
On-Demand Line card power down
On-Demand Generate a call-home
Health
Health Monitoring
Monitoring message
•Trigger Syslog
•Trigger EEM policies
•Generate SNMP Trap
Detect
Detect and
and identify
identify problems
problems before
before they
they result
result in
in network
network downtime!
downtime!
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 30
GOLD
Diagnostic Integration
Boot-Up diagnostics Run
RunDuring
DuringSystem
SystemBootup,
Bootup,Line
LineCard
CardOIR
OIRoror
Supervisor
SupervisorSwitchover
Switchover
Switch(config)#diagnostic
Switch(config)#diagnostic bootup
bootup level
level complete Makes
complete Makes sure faultyhardware
sure faulty hardwareisistaken
takenout
outofof
service
service
Runtime diagnostics
Health-Monitoring
Non-disruptive
Non-disruptivetests
testsrun
runininthe
the
Switch(config)#diagnostic
Switch(config)#diagnostic monitor
monitor module
module 55 test
test 22
Switch(config)#diagnostic
background
background
Switch(config)#diagnostic monitor
monitor interval
interval module
module 55 test
test 22 00:00:15
00:00:15
Serves
Servesas
asHA
HAtrigger
trigger
On-Demand
Switch#diagnostic
Switch#diagnostic start
start module
module 44 test
test 88
Module
Module 4:
4: Running
Running test(s)
test(s) 88 may
may disrupt
disrupt normal
normal system
system
operation
operation
Do
Do you
you want
want to
to continue? [no]: yy
continue? [no]: All
Alldiagnostics
diagnosticstests
testscan
canbe
berun
runon
on
Switch#diagnostic
Switch#diagnostic stopstop module
module 44
demand, for troubleshooting purposes.ItIt
demand, for troubleshooting purposes.
Scheduled can
canalso
alsobe
beused
usedas
asaapre-deployment
pre-deployment
Switch(config)#diagnostic
Switch(config)#diagnostic schedule
schedule module
module 44 test
test 11 port
port 33
tool.
tool.
on
on Jan
Jan 33 2005
2005 23:32
23:32
Switch(config)#diagnostic
Switch(config)#diagnostic schedule
schedule module
module 44 test
test 22 daily
daily Schedule
Schedulediagnostics
diagnosticstests,
tests,for
for
14:45
14:45 verification and troubleshooting
verification and troubleshooting
purposes
purposes
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 31
GOLD
High Level Architecture
Fault
Fault Policy
Policy Manager
Manager and
and
other
other NMS
NMS Applications
Applications NMS Layer
Embedded
Embedded Embedded
Embedded Event
Event Call-
Call-
MIB/SNMP
MIB/SNMP Syslog
Syslog Manager
Manager Manager
Manager Home
Home
GOLD Subsystems
SEA
SEA &&
OBFL
OBFL Platform Specific Diagnostics
Runtime
Runtime Software
Software Drivers
Drivers
IOS Layer
HARDWARE
HARDWARE
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 32
GOLD
GOLD Test Suite
Boot-up Diagnostics On-Demand Diagnostics
§ Forwarding Engine Learning Tests (Sup & DFC) § Exhaustive Memory Test
§ L2 Tests (Channel, BPDU, Capture) § Exhaustive TCAM Search Test
§ L3 Tests (IPv4, IPv6, MPLS) § Stress Testing
§ Span and Multicast Tests § All bootup and health monitoring tests
can be run on-demand
§ CAM Lookup Tests (FIB, NetFlow, QoS CAM)
§ Port Loopback Test (all cards)
Scheduled Diagnostics
§ Fabric Snake Tests
§ All boot-up and health monitoring
tests can be schedule
Health Monitoring Diagnostics § Scheduled Switch-over
§ SP-RP Inband Ping Test (Sup’s SP/RP,
EARL(L2&L3), RW engines
§ Fabric Channel Health Test (Fabric enabled line
cards)
§ MacNotification Test (DFC line cards)
§ Non Disruptive Loopback Test
§ Scratch Registers Test (PLD & ASICs)
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 33
GOLD
Example - Supervisor Data Path
MSFC Monitors
Monitors forwarding
forwarding path
path between
between the
the
PFC3
RP CPU
Switch
Switch Processor,
Processor, Route
Route Processor
Processor and
and
Port ASIC Forwarding
L3/4 Forwarding Engine
Engine
Engine
SP CPU
Runs
Runs Periodically
Periodically every
every 15
15 Seconds
Seconds
L2 Engine Fabric Switch Fabric after
after System
System is
is Online
Online (Configurable)
(Configurable)
Interface/
Replication
Engine
10
10 Consecutive
Consecutive Failures
Failures is
is treated
treated as
as
FATAL
FATAL and
and will
will result
result in
in supervisor
supervisor
switchover
switchover or
or supervisor
supervisor reset
reset
DBUS
RBUS
16 Gbps EOBC
Bus
Switch(config)#diagnostic
Switch(config)#diagnostic monitor
monitor module
module 55 test
test 22
Switch(config)#diagnostic
Switch(config)#diagnostic monitor
monitor interval
interval module
module 55 test
test 22 00:00:15
00:00:15
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 34
GOLD
Using it for Pre-Deployment
§ GOLD can be used for pre-stage testing. The order in which tests
are run matters!!!!!
Run diagnostics first on linecards, then on supervisors
Run packet switching tests first, run memory tests after
Switch#diagnostic
Switch#diagnostic start
start module
module 66 test
test all
all
Module
Module 6:
6: Running
Running test(s)
test(s) 88 will
will require
require resetting
resetting the
the line
line card
card after
after the
the test
test has
has completed
completed
Module
Module 6:
6: Running
Running test(s)
test(s) 1-2,5-9
1-2,5-9 may
may disrupt
disrupt normal
normal system
system operation
operation
Do
Do you
you want
want to
to continue?
continue? [no]:
[no]: yes
yes
*Mar
*Mar 25
25 22:43:16:
22:43:16: %DIAG-SP-6-TEST_RUNNING:
%DIAG-SP-6-TEST_RUNNING: Module
Module 6:
6: Running
Running TestTransceiverIntegrity{ID=1}
TestTransceiverIntegrity{ID=1} ...
...
*Mar 25 22:43:16: %DIAG-SP-3-TEST_SKIPPED: Module 6: TestTransceiverIntegrity{ID=1} is skipped
*Mar 25 22:43:16: %DIAG-SP-3-TEST_SKIPPED: Module 6: TestTransceiverIntegrity{ID=1} is skipped
*Mar
*Mar 25
25 22:43:16:
22:43:16: %LINK-5-CHANGED:
%LINK-5-CHANGED: Interface
Interface GigabitEthernet6/1,
GigabitEthernet6/1, changed
changed state
state to
to administratively
administratively down
down
*Mar
*Mar 25
25 22:43:16:
22:43:16: %DIAG-SP-6-TEST_RUNNING:
%DIAG-SP-6-TEST_RUNNING: Module
Module 6:
6: Running
Running TestLoopback{ID=2}
TestLoopback{ID=2} ...
...
*Mar
*Mar 25
25 22:43:16:
22:43:16: %DIAG-SP-6-TEST_RUNNING:
%DIAG-SP-6-TEST_RUNNING: Module
Module 6:
6: Running
Running TestAsicMemory{ID=8}
TestAsicMemory{ID=8} ...
...
*Mar
*Mar 25
25 22:43:16:
22:43:16: SP:
SP: ******************************************************************
******************************************************************
*Mar
*Mar 25
25 22:43:16:
22:43:16: SP:
SP: ** WARNING:
WARNING:
*Mar
*Mar 25 22:43:16: SP: * ASIC Memory
25 22:43:16: SP: * ASIC Memory test
test on
on module
module 66 may
may take
take up
up to
to 2hr
2hr 30min.
30min.
*Mar
*Mar 25
25 22:43:16:
22:43:16: SP:
SP: ** During
During this
this time,
time, please
please DO
DO NOT
NOT perform
perform any
any packet
packet switching.
switching.
*Mar 25 22:43:16: SP: ******************************************************************
*Mar 25 22:43:16: SP: ******************************************************************
<snip>
<snip>
Switch#diagnostic
Switch#diagnostic start
start module
module 55 test
test all
all
Module
Module 5:
5: Running
Running test(s)
test(s) 27-30
27-30 will
will power-down
power-down line
line cards
cards and
and standby
standby supervisor
supervisor should
should be
be power-down
power-down manually
manually and
and
supervisor
supervisor should
should be
be reset
reset after
after the
the test
test
Module
Module 5:
5: Running
Running test(s)
test(s) 26
26 will
will shut
shut down
down the
the ports
ports of
of all
all linecards
linecards and
and supervisor
supervisor should
should be
be reset
reset after
after the
the test
test
Module
Module 5:
5: Running
Running test(s)
test(s) 3,5,8-10,19,22-23,26-31
3,5,8-10,19,22-23,26-31 maymay disrupt
disrupt normal
normal system
system operation
operation
Do
Do you
you want
want to
to continue?
continue? [no]:
[no]: yes
yes
<snip>
<snip>
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 35
GOLD
Operation Example
Switch#show
Switch#show diagnostic
diagnostic content
content mod
mod 55
Module
Module 5: Supervisor Engine 720 (Active)
5: Supervisor Engine 720 (Active)
<snip>
<snip>
Testing
Testing Interval
Interval
ID
ID Test
Test Name
Name Attributes
Attributes (day hh:mm:ss.ms)
(day hh:mm:ss.ms)
====
==== ==================================
================================== ============
============ =================
=================
1)
1) TestScratchRegister
TestScratchRegister ------------->
-------------> ***N****A***
***N****A*** 000
000 00:00:30.00
00:00:30.00
2) TestSPRPInbandPing -------------->
2) TestSPRPInbandPing --------------> ***N****A***
***N****A*** 000 00:00:15.00
000 00:00:15.00
3)
3) TestTransceiverIntegrity
TestTransceiverIntegrity -------->
--------> **PD****I***
**PD****I*** not
not configured
configured
4)
4) TestActiveToStandbyLoopback ----->
TestActiveToStandbyLoopback -----> M*PDS***I***
M*PDS***I*** not configured
not configured
5)
5) TestLoopback
TestLoopback -------------------->
--------------------> M*PD****I***
M*PD****I*** not
not configured
configured
6)
6) TestNewIndexLearn
TestNewIndexLearn --------------->
---------------> M**N****I***
M**N****I*** not
not configured
configured
7)
7) TestDontConditionalLearn
TestDontConditionalLearn -------->
--------> M**N****I***
M**N****I*** Diagnostics
not configured
Diagnostics
not configuredtest
test suite
suite attributes:
attributes:
8)
8) TestBadBpduTrap
TestBadBpduTrap ----------------->
-----------------> M**D****I***
M**D****I*** not
notM/C/*
configured
M/C/* -- Minimal
Minimal bootup
configured bootup level
level test
test // Complete
Complete bootup
bootup level
level
test
test // NA
NA
9)
9) TestMatchCapture
TestMatchCapture ---------------->
----------------> M**D****I***
M**D****I*** not
not configured
configured
B/*
B/* -- Basic
Basic ondemand
ondemand test
test // NA
NA
10) TestProtocolMatchChannel --------> M**D****I***
10) TestProtocolMatchChannel --------> M**D****I*** not
not configured
configured
P/V/*
P/V/* - Per port test / Per device test
- Per port test / Per device test // NANA
11)
11) TestFibDevices
TestFibDevices ------------------>
------------------> M**N****I***
M**N****I*** not
not configured
configured
D/N/*
D/N/* -- Disruptive
Disruptive test
test // Non-disruptive
Non-disruptive test
test // NA
NA
12)
12) TestIPv4FibShortcut ------------->
TestIPv4FibShortcut -------------> M**N****I***
M**N****I*** not
not configured
configured
S/* - Only applicable to standby unit
S/* - Only applicable to standby unit / NA / NA
13)
13) TestL3Capture2
TestL3Capture2 ------------------>
------------------> M**N****I***
M**N****I*** not
not configured
configured
X/*
X/* -- Not
Not aa health
health monitoring
monitoring test
test // NA
NA
14)
14) TestIPv6FibShortcut
TestIPv6FibShortcut ------------->
-------------> M**N****I***
M**N****I*** not
not configured
configured
F/*
F/* -- Fixed
Fixed monitoring
monitoring interval
interval test
test // NA
NA
15)
15) TestMPLSFibShortcut
TestMPLSFibShortcut ------------->
-------------> M**N****I***
M**N****I*** not
not configured
configured
E/* - Always enabled monitoring test
E/* - Always enabled monitoring test / NA / NA
16)
16) TestNATFibShortcut
TestNATFibShortcut -------------->
--------------> M**N****I***
M**N****I*** not
not configured
configured
A/I
A/I -- Monitoring
Monitoring is
is active
active // Monitoring
Monitoring is is inactive
inactive
17)
17) TestAclPermit
TestAclPermit ------------------->
-------------------> M**N****I***
M**N****I*** not
not configured
configured
R/*
R/* -- Power-down
Power-down line
line cards
cards and
and need
need reset
reset supervisor
supervisor //
18) TestAclDeny ---------------------> M**N****A***
18) TestAclDeny ---------------------> M**N****A*** 000
000 00:00:05.00
NA00:00:05.00
NA
19)
19) TestQoSTcam
TestQoSTcam --------------------->
---------------------> M**D****I***
M**D****I*** not
not configured
configured
K/*
K/* -- Require
Require resetting
resetting thethe line
line card
card after
after thethe test
test has
has
<snip> completed
completed // NA
NA
<snip>
T/*
T/* -- Shut
Shut down
down all
all ports
ports and
and need
need reset
reset supervisor
supervisor // NANA
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 36
GOLD
Operation Example
20)
20) TestL3VlanMet
TestL3VlanMet ------------------->
-------------------> M**N****I***
M**N****I*** not
not configured
configured n/a
n/a
21)
21) TestIngressSpan ----------------->
TestIngressSpan -----------------> M**N****I***
M**N****I*** not configured
not configured n/a
n/a
22)
22) TestEgressSpan
TestEgressSpan ------------------>
------------------> M**D****I***
M**D****I*** not
not configured
configured n/a
n/a
23)
23) TestNetflowInlineRewrite
TestNetflowInlineRewrite -------->
--------> C*PD****I***
C*PD****I*** not
not configured
configured n/a
n/a
24)
24) TestFabricSnakeForward
TestFabricSnakeForward ---------->
----------> M**N****I***
M**N****I*** not
not configured
configured n/a
n/a
25)
25) TestFabricSnakeBackward
TestFabricSnakeBackward --------->
---------> M**N****I***
M**N****I*** not
not configured
configured n/a
n/a
26)
26) TestTrafficStress
TestTrafficStress --------------->
---------------> ***D****I**T
***D****I**T not
not configured
configured n/a
n/a
27) TestFibTcamSSRAM ----------------> ***D*X**IR**
27) TestFibTcamSSRAM ----------------> ***D*X**IR** not configured
not configured n/a
n/a
28)
28) TestAsicMemory
TestAsicMemory ------------------>
------------------> ***D*X**IR**
***D*X**IR** not
not configured
configured n/a
n/a
29)
29) TestNetflowTcam ----------------->
TestNetflowTcam -----------------> ***D*X**IR**
***D*X**IR** not configured
not configured n/a
n/a
30)
30) ScheduleSwitchover
ScheduleSwitchover -------------->
--------------> ***D****I***
***D****I*** not
not configured
configured n/a
n/a
31)
31) TestFirmwareDiagStatus
TestFirmwareDiagStatus ---------->
----------> M**N****I***
M**N****I*** not
not configured
configured n/a
n/a
32)
32) TestAsicSync
TestAsicSync -------------------->
--------------------> ***N****A***
***N****A*** 000
000 00:00:15.00
00:00:15.00 10
10
Pay
Pay extra
extra attention
attention to
to
Memory
Memory tests:
tests:
Memory
Memory tests
tests can
can take
take
hours
hours to
to complete
complete and
and aa
reset
reset is
is required
required after
after
running
running these
these tests!
tests!
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 37
GOLD
Operation Example
20)
20) TestL3VlanMet
TestL3VlanMet ------------------->
-------------------> M**N****I***
M**N****I*** not
not configured
configured n/a
n/a
21)
21) TestIngressSpan ----------------->
TestIngressSpan -----------------> M**N****I***
M**N****I*** not configured
not configured n/a
n/a
22)
22) TestEgressSpan
TestEgressSpan ------------------>
------------------> M**D****I***
M**D****I*** not
not configured
configured n/a
n/a
23)
23) TestNetflowInlineRewrite
TestNetflowInlineRewrite -------->
--------> C*PD****I***
C*PD****I*** not
not configured
configured n/a
n/a
24)
24) TestFabricSnakeForward
TestFabricSnakeForward ---------->
----------> M**N****I***
M**N****I*** not
not configured
configured n/a
n/a
25)
25) TestFabricSnakeBackward
TestFabricSnakeBackward --------->
---------> M**N****I***
M**N****I*** not
not configured
configured n/a
n/a
26)
26) TestTrafficStress
TestTrafficStress --------------->
---------------> ***D****I**T
***D****I**T not
not configured
configured n/a
n/a
27) TestFibTcamSSRAM ----------------> ***D*X**IR**
27) TestFibTcamSSRAM ----------------> ***D*X**IR** not configured
not configured n/a
n/a
28)
28) TestAsicMemory
TestAsicMemory ------------------>
------------------> ***D*X**IR**
***D*X**IR** not
not configured
configured n/a
n/a
29)
29) TestNetflowTcam ----------------->
TestNetflowTcam -----------------> ***D*X**IR**
***D*X**IR** not configured
not configured n/a
n/a
30)
30) ScheduleSwitchover
ScheduleSwitchover -------------->
--------------> ***D****I***
***D****I*** not
not configured
configured n/a
n/a
31)
31) TestFirmwareDiagStatus
TestFirmwareDiagStatus ---------->
----------> M**N****I***
M**N****I*** not
not configured
configured n/a
n/a
32)
32) TestAsicSync
TestAsicSync -------------------->
--------------------> ***N****A***
***N****A*** 000
000 00:00:15.00
00:00:15.00 1010
Diagnostics
Diagnostics test
test suite
suite attributes:
attributes:
M/C/*
M/C/* -- Minimal
Minimal bootup
bootup level
level test
test // Complete
Complete bootup
bootup level
level test
test
// NA
NA
B/*
B/* -- Basic
Basic ondemand
ondemand test
test // NA
NA
P/V/*
P/V/* - Per port test / Per device test
- Per port test / Per device test // NA
NA
D/N/*
D/N/* -- Disruptive
Disruptive test
test // Non-disruptive
Non-disruptive test
test // NA
NA
S/* - Only applicable to standby unit
S/* - Only applicable to standby unit / NA / NA
X/*
X/* -- Not
Not aa health
health monitoring
monitoring test
test // NA
NA
Pay
Pay extra
extra attention
attention to
to F/*
F/* -- Fixed
Fixed monitoring
monitoring interval
interval test
test // NA
NA
Memory
Memory tests:
tests: E/*
E/* -- Always
Always enabled
enabled monitoring
monitoring test
test // NA
NA
A/I
A/I -- Monitoring
Monitoring is
is active
active // Monitoring
Monitoring isis inactive
inactive
Memory
Memory tests
tests can
can take
take R/*
R/* -- Power-down
Power-down line
line cards
cards and
and need
need reset
reset supervisor
supervisor // NA
NA
hours
hours to
to complete
complete and
and aa K/* - Require resetting the line card after the test
K/* - Require resetting the line card after the test has
completed
has
completed // NA
NA
reset
reset is
is required
required after
after T/*
T/* -- Shut
Shut down
down all
all ports
ports and
and need
need reset
reset supervisor
supervisor // NA
NA
running
running these
these tests!
tests!
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 38
GOLD
Operation Example
GOLD generic Syslog messages start with the string “DIAG”; “CONST_DIAG”
messages platform specific…
Bootup
Bootup Test
Test Failure:
Failure:
%CONST_DIAG-SP-3-BOOTUP_TEST_FAIL:
%CONST_DIAG-SP-3-BOOTUP_TEST_FAIL: Module
Module 2:
2: TestL3VlanMet
TestL3VlanMet failed
failed
Health
Health Monitoring
Monitoring Test
Test Failure:
Failure:
%CONST_DIAG-SP-3-HM_TEST_FAIL:
%CONST_DIAG-SP-3-HM_TEST_FAIL: Module
Module 55 TestSPRPInbandPing
TestSPRPInbandPing consecutive
consecutive failure
failure count:10
count:10
%CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=3% RP=12% Traffic=0% %CONST_DIAG-SP-4-
%CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=3% RP=12% Traffic=0% %CONST_DIAG-SP-4-
HM_TEST_WARNING:
HM_TEST_WARNING: Sup
Sup switchover
switchover will
will occur
occur after
after 10
10 consecutive
consecutive failures
failures
On
On Demand
Demand Diagnostics
Diagnostics Test
Test Failure:
Failure:
%DIAG-SP-3-TEST_FAIL:
%DIAG-SP-3-TEST_FAIL: Module
Module 5:
5: TestTrafficStress{ID=24}
TestTrafficStress{ID=24} has
has failed.
failed. Error
Error code
code == 0x1
0x1
Scheduled
Scheduled Diagnostics
Diagnostics Test
Test Failure:
Failure:
%DIAG-SP-3-TEST_FAIL:
%DIAG-SP-3-TEST_FAIL: Module
Module 3:
3: TestLoopback{ID=1}
TestLoopback{ID=1} has
has failed.
failed. Error
Error code
code == 0x1
0x1
Generic
Generic Minor
Minor and
and Major
Major Failure:
Failure:
%DIAG-SP-3-MINOR:
%DIAG-SP-3-MINOR: Module
Module 3:
3: Online
Online Diagnostics
Diagnostics detected
detected aa Minor
Minor Error.
Error. Please
Please use
use 'show
'show diagnostic
diagnostic
result
result <target>'
<target>' to
to see
see test
test results.
results.
%DIAG-SP-3-MAJOR:
%DIAG-SP-3-MAJOR: Module
Module 6:
6: Online
Online Diagnostics
Diagnostics detected
detected aa Major
Major Error.
Error. Please
Please use
use 'show
'show diagnostic
diagnostic
Module 6' to see test results.
Module 6' to see test results.
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 39
GOLD
Recommendations
Boot-up diagnostics:
- Set level to complete
On demand diagnostics:
- Use as a pre-deployment tool: run complete diagnostics
before putting hardware into production environment
- Use as a troubleshooting tool when suspecting hardware
failure
Scheduled diagnostics:
- Schedule key diagnostics tests periodically
- Schedule all non-disruptive tests periodically
Health-monitoring diagnostics:
- Key tests running by default
- Enable additional non-disruptive tests for specific
functionalities enabled in your network: IPv6, MPLS, NAT…
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 40
GOLD
Case Study
6500_cust#
6500_cust# show
show module
module
Mod
Mod Ports
Ports Card
Card Type
Type Model
Model Serial
Serial No.
No.
---
--- -----
----- --------
-------- ------------------------------
------------------------------ ------------------
------------------ -----------
-----------
12
12 48
48 CEF720
CEF720 48
48 port
port 1000mb
1000mb SFP
SFP WS-X6748-SFP
WS-X6748-SFP SALxxxxxxx
SALxxxxxxx
•Situation:
•Situation:
Customer
Customer was
was running
running into
into aa problem
problem :: 6500_cust#
6500_cust# show
show diagnostic
diagnostic result
result module
module 12
12 test
test 13
13 de
de
packets
packets ingress
ingress on
on aa particular
particular line
line card
card
were
were getting dropped intermittently. All
getting dropped intermittently. All
Current
Current boot
boot up
up diagnostic
diagnostic level:
level: complete
complete
Thanks
Thanks to
to GOLD
GOLD !!!!
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 41
GOLD
Case Study
6500_cust#
6500_cust# diagnostic
diagnostic start
start module
module 11 test
test 28
28
Module
Module 1:
1: Running
Running test(s)
test(s) 28
28 may
maydisrupt
disrupt normal
normal operation
operation
Do
Do you
you want
want to
to run
run disruptive
disruptive tests?
tests? [no]
[no] yes
yes
Mar
Mar 17 17 15:58:34:
15:58:34: SP:SP: ******************************************************************
******************************************************************
Mar
Mar 17 17 15:58:34:
15:58:34: SP:SP: ** WARNING:
WARNING:
Mar
Mar 17 17 15:58:34:
15:58:34: SP:SP: ** ASIC
ASIC Memory
Memorytest test on
on module
module 11 may maytaketake up up toto 1hr
1hr 30min.
30min.
Mar
Mar 17 17 15:58:34:
15:58:34: SP:SP: ** During
During this
this time,
time, please
please DO DO NOTNOT perform
perform any anypacket
packet switching.
switching.
•Situation:
•Situation: Mar
Mar 17
Mar
Mar 17
17 15:58:34:
15:58:34: SP:
17 16:10:27:
16:10:27: SP:
SP: ******************************************************************
******************************************************************
SP: diag_scp_asic_mem_test
diag_scp_asic_mem_test [1/1/RN_PBIF]: [1/1/RN_PBIF]: LCP LCP TEST
TEST FAILED.
FAILED. fail_addr
fail_addr
Customer
Customer was
was running
running into
into aa problem
problem :: == 0xE923,
55
0xE923, test_data
55 || 55,
55, 55
test_data || result_data:
55 || 55,
55, 55
result_data:
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 5555 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55,
packets
packets ingress
ingress on
on aa particular
particular line
line card
card 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 5555 || 55,
55, 55
55 || 55,
55, 55
55 || 53,
53, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55,
were
were getting dropped intermittently. All
getting dropped intermittently. All 55
55 || 55,
55
55, 55
55 || 55,
55 || 55,
55, 55
55, 55
55 || 55,
55 || 55,
55, 55
55, 55
55 || 55,
55 || 55,
55, 55
55, 55
55 || 55,
55 || 55,
55, 55
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55 || 55,
55, 55
55, 55
55 || 55,
55 || 55,
55, 55
55, 55
55 || 55,
55 || 55,
55, 55
55, 55
55 || 55,
55 || 55,
55, 55
55, 55
55 || 55,
55 || 55,
55, 55
55, 55
55 || 55,
55 || 55,
55, 55
55,
55 || 55,
55,
software/hardware entries etc
software/hardware entries etc werewere 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 5555 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55, 55
55 || 55,
55,
checked.
checked.
55
55 || 55,
Mar
Mar 17
55, 55
55 || 55,
55, 55
17 16:10:27:
55 || 55,
16:10:27: SP:
55, 55
55 || 55,
SP: do_mem_test
55,
do_mem_test [1/1]: [1/1]: test
test RN_PBIF
RN_PBIF memorymemoryfailed failed
•Action:
•Action:
Mar
Mar 17
Mar
Mar 17
17 16:10:27:
16:10:27: SP:
17 16:10:27:
16:10:27: SP:
SP: ******************************************************************
******************************************************************
SP: ** WARNING:
WARNING: Please Please RESET
RESET module
module 11 prior
prior to to normal
normal use.use. Also,
Also,
TAC
TAC engineer
engineer requested
requested customer
customer to
to packet
packet
Mar
Mar 17 17 16:10:27:
16:10:27: SP:SP: ** switching
switching tests
tests willwill no
no longer
longer workwork (i.e.
(i.e. test
test failure)
failure) because
because
run line card memory test
run line card memory test Mar 17 16:10:27: SP: * its memories are filled with
Mar 17 16:10:27: SP: * its memories are filled with test patterns. test patterns.
Mar
Mar 17 17 16:10:27:
16:10:27: SP:SP: ******************************************************************
******************************************************************
•Results:
•Results: MarchCMem:
MarchCMem: got got data
data mismatch
mismatch at at addr:
addr: 0xE923,
0xE923, dev#:
dev#: 11
Diagnostics
Diagnostics results
results revealed
revealed that
that rc
rc == 0x12
0x12 comparison
comparison data|rslt:
data|rslt: 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55
memory
memory was failing. Line card was
was failing. Line card was
55|55
55|55 55|55
55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|53
55|55 55|55
55|53 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55
55|55 55|55
55|55
replaced and the switch functionality
replaced and the switch functionality 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55
was
was restored
restored in
in aa very
very short
short time
time
55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55 55|55
55|55
Mar
Mar 17
17 16:10:27:
16:10:27: %DIAG-SP-3-TEST_FAIL:
%DIAG-SP-3-TEST_FAIL: Module
Module 1:
1: TestLinecardMemory{ID=28}
TestLinecardMemory{ID=28} has
has
failed.
failed. Error
Error code
code == 0x1
0x1
Thanks
Thanks to
to GOLD
GOLD !!!!
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 42
GOLD
GOLD Paper
http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper0900aecd801e659f.shtml
http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper0900aecd801e659f.shtml
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 43
© 2006 Cisco Systems, Inc. All rights reserved. Cisco Partner Confidential 44