Académique Documents
Professionnel Documents
Culture Documents
and
ES+ Crashes
EDCS-694119 CA Training
Cisco Confidential
Generic Online
Diagnostics
Si
Forwarding
Engine
Line
Car
d
Fabric
Forwarding
Engine
CPU
Active Supervisor
Standby
Supervisor
Line
Car
d
Diagnostics capabilities
built in hardware
Depending on hardware,
Gold can catch:
Port failure
Bent backplane connector
Bad fabric connection
Malfunctioning forwarding engines
Stuck control plane
Bad memory
Runtime Diagnostics
Health-Monitoring
Switch(config)# diagnostic monitor module 5 test 2
Switch(config)# diagnostic monitor interval module 5 test 2 00:00:15
Non-Disruptive Tests
Run in the Background
Serves As HA Trigger
On-Demand
Switch# diagnostic start module 4 test 8
Module 4: Running test(s) 8 may disrupt normal system
operation
Do you want to continue? [no]: y
Switch# diagnostic stop module 4
Scheduled
Switch(config)# diagnostic schedule module 4 test 1
port 3 on Jan 3 2005 23:32
Switch(config)# diagnostic schedule module 4 test 2
daily 14:45
Test Name
Attributes
(day hh:mm:ss.ms)
=================
000 00:00:30.00
000 00:00:15.00
not configured
not configured
not configured
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
2) TestLoopback:
Port
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
---------------------------------------------------------------------------.
3) TestScratchRegister -------------> .
4) TestSynchedFabChannel -----------> .
<snip>
On demand diagnostics:
Use as a pre-deployment tool: run complete diagnostics
before putting hardware into production environment
Use as a troubleshooting tool when suspecting
hardware failure
Scheduled diagnostics:
Schedule key diagnostics tests periodically
Schedule all non-disruptive tests periodically
Health-monitoring diagnostics:
Key tests running by default
Enable additional non-disruptive tests for specific functionalities
enabled in your network: IPv6, MPLS, NAT
Si
Reference:
http://www.cisco.com/c/en/us/td/docs/routers/7600/ios/15S/configu
ration/guide/7600_15_0s_book/diagtest.html
Google cisco 7600 configuring online diagnostics 1st Link
VCC
P
VCCP
The Issue
Cisco has been working with individual customers on an issue
related to memory components manufactured by a single supplier
between 2005 and 2010.
The affected memory component is the DRAM. So, in most of the platforms,
its required only to replace the DIMM and not the entire linecard/SUP.
In some cases, you might be required to replace the entire Linecard/SUP.
This can be confirmed by the TAC engineer.
he Field notice for all the individual products and related error messages can be
ccessed via
www.cisco.com/go/memory
Symptoms
This issue does not affect boards while the boards are in
operation. The board failure might occur after one or more of
the actions that are executed.
Reload.
Software Upgrade.
Power cycle.
One of these symptoms might be observed in the syslog for a
7600 platform based devices:
*May 16 02:59:54.575: %PM_SCP-SP-1-LCP_FW_ERR: System
resetting module 1 to recover from error: Linecard received system
exception
*May 16 02:59:54.575: %OIR-SP-3-PWRCYCLE: Card in module 1, is
being power-cycled Off (Module Reset due to exception or user
request)
Alternatively, the card might crash repeatedly with this error reported in
the syslog:
%EARL-DFC<n>-2-PATCH_INVOCATION_LIMIT: 10 Recovery patch
doesnt affect the recent products that are less than 5 years old / older products
are more than 10 years old. This only affects few products that were
ufactured only by a single vendor in between 2005 and 2010.
Reference:
www.cisco.com/go/memory
ES+
What is it ?
1G
10G
Ginsu [10G-OTN]
7600-ES+20G3C
7600-ES+2TG3C
7600-ES+ITU-2TG
7600-ES+20G3CXL
7600-ES+2TG3CXL
7600-ES+ITU-4TG
7600-ES+40G3C
7600-ES+4TG3C
7600-ES+40G3CXL
7600-ES+4TG3CXL
7600-ES+20C3C
7600-ES+20C3CXL
7600-ES+40C3C
7600-ES+40C3CXL
ES+
Each ES+ board consists of one Baseboard, one Link Daughter card and one
Earl Daughter card.
ES+ Troubleshooting
Getting Started
ES+ Modules
Hardware requirement
Supported by all the Cisco 7600 series routers:
7604, 7606, 7609, 7613 router (not in slot 1-8) and 7606-S, 7609-S.
7600-ES+xx will be supported by all SUP720 models except PFC3A
7600-ES+xx will be supported with RSP720
7600-ES+xx will not be supported by SUP2, SUP32
Software Requirement
Supported from version 12.2(33)SRD of the Native IOS image
CatOS and Hybrid images are not supported.
Show module
the Linecard
Show power
Incorrect optics
Unsupported optics.
routerdfc12#shplatformhardwaretransceiver?
briefBriefdeviceinformation
configDeviceconfiguration
countersDevicestatistics
errorsDeviceerrorinformation
registersDeviceregistercontents
statusDevicestatus
Transceiver Verification
Router#show module 8
Mod Ports Card Type
Model
Serial No.
--- ----- -------------------------------------- ------------------ ----------8
4 7600 ES+
7600-ES+4TG3CXL
XXXABCDXXX
Mod MAC addresses
Hw
Fw
Sw
Status
--- ---------------------------------- ------ ------------ ------------ ------8 001f.9e13.76e0 to 001f.9e13.76ef
0.303 12.2(33r)SRD 12.2(2008102 Ok
Mod
---8
8
Sub-Module
--------------------------7600 ES+ DFC XL
7600 ES+ 4x10GE XFP
Model
-----------------7600-ES+3CXL
7600-ES+4TG
Serial
Hw
Status
----------- ------- ------XXXABCDXXX 0.200 Ok
XXXABCDXXX 0.250 Ok
Name
Status
connected
disabled
notconnect
disabled
Vlan
routed
1
1
1
Duplex
full
full
full
full
Speed Type
10G 10Gbase-LR
10G DWDM-51.72
10G No Connector
10G No Connector
Transceiver Verification
Router#show idprom interface te8/1
IDPROM for transceiver TenGigabitEthernet8/1:
Description
=
Transceiver Type:
=
Product Identifier (PID)
=
Vendor Revision
=
Serial Number (SN)
=
Vendor Name
=
Vendor OUI (IEEE company ID)
=
CLEI code
=
Cisco part number
=
Device State
=
Date code (yy/mm/dd)
=
Connector type
=
Encoding
=
Minimum bit rate
Maximum bit rate
Power dissipation class
cdr function
Tx Reference clock
Max link length for SMF fiber
Max link length for EBW 50/125um fiber
Max link length for 0/125um fiber
Max link length for 62.5/125um fiber
Max link length for copper
Tx device technology
Wavelength control technology
Transceiver cooling technology
Detector type
Transmitter tuning
Supported CDR rates
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
Port
------Te8/1
Temperature
(Celsius)
----------35.0
Voltage
(Volts)
------0.00
Current
(mA)
-------51.5 --
Optical
Tx Power
(dBm)
--------3.1
Optical
Rx Power
(dBm)
--------3.5
ES+ LC IOS
Crash
Watchdog Reset
Problem:
ES+ line card crashes during the execution of "show platform hardware
config-pld" or "show platform hardware version". Both commands are
included in the line card " show hw-module slot X tech-support ".
Root Cause: In both cases crash happens when the attempt is made
to read the PLD register on the ES+ line card. The read may time out,
which triggers the watchdog to restart the line card.
Known DDTS: CSCtw77894, CSCti78408,
CSCtz30983
Next Action: Please contact TAC with the crashinfo in order to
confirm
Sample symptom:
%EARL-DFC1-2-PATCH_INVOCATION_LIMIT: 10 Recovery patch
invocations in the last 30 secs have been attempted. Max limit reached
Root Cause: Multiple root causes are possible. Also, issue is not limited
to ES+ linecards. When EARL detects a certain type of errors, it activates
a 'patch'. This is effectively a restart of ASICs connected to EARL. If the
limit on the number of consecutive patches is reached, line card crash is
triggered.
Next action:
Please collect crashinfo and
remote command {switch|module 1} show platform software earl
reset {histry|data}
OIR the card. Software reset of the line card does not help. It
really has to be removed and re-inserted.
Parity
Errors
Soft parity errors : These errors occur when an energy level within
the chip (for example, a one or a zero) changes, most often due to
radiation.
When referenced by the CPU, such errors cause the system to crash.
Incase of a soft parity error, there is no need to swap the board or any
of the parity
Hard
components.
errors These errors occur when there is a chip or board
failure that corrupts data. In this case, you need to re-seat or replace
the affected component, which usually involves a memory chip swap
or a boar swap.
Sup720:
OBC-SP-0-EOBC_JAM_FATAL: Primary supervisor in slot 5 is jamming the EOBC cha
It has been disabled. Supervisor will return to ROMMON
RSP720:
TSEC-SP-3-RESTART: Interface EOBC0/0 Restarted Due to TX Freeze Error
TSEC-SP-2-EXCEPTION: Fatal Error, Interface EOBC0/0 not transmitting
ot Cause: - CSCtu50337
s issue may only occur if the RSP/SUP is running HW revision 5.0/5.1/5.2 and the
should be NON-S type.
f the offending packet is always same, there is a possibility of bad end device.
PLEASE get TACs help to capture the offending packet using ELAM tool.
Fabric Errors
1. Fabric Sync Failure
%C6KPWR-SP-4-DISABLED: power to module in slot 4 set off
(Fabric channel errors)
2. Fabric CRC Errors
FABRIC_INTF_ASIC-DFC2-4-FABRICCRCERRS: Fabric ASIC 0: 5 Fabric
CRC error events in 100ms period
3. Repeated Fabric Sync
%FABRIC_INTF_ASIC-DFC10-5-FABRICSYNC_REQ
4. Fabric Channel Counter Errors
Error counters incrementing on a specified fabric
channel.
show fabric errors
Contd..
Watts
A @42V
76-ES+XC-20G3C
309.12
7.36
76-ES+XC-20G3CXL
337.26
8.03
76-ES+XC-40G3C
399
9.5
76-ES+XC-40G3CXL
427.14
10.17
Values more than the above will proportionally increase "total
available power" which would cause other module to power down
with insufficient power.
oubleshooting and Recommendations
1. confirm the power values for ES+XC is NOT as per the above
table using "show power"
2. Check if any of the modules in the same chassis fail to power
up with error
"power denied" using "show power" or "show
module"
On 7600/ES+ platform, recommended releases to take
care of the problem are 12.2(33)SRE4 or later, 15.0(1)S4
or later.
Known DDTS: CSCtn41667
Questions ?