Network Diagnosis and Troubleshooting Summary

Network Diagnosis and Troubleshooting Summary by Bob Chan
 Documentation
 Baselining
 Objective
 Discover the true performance of the network
 Provide comparison between normal and abnormal situations
 Verify policies
 Identify over-utilization and under-utilization areas
 Long-term performance and capacity prediction
 Steps of baselining
 Planning for the first baseline
 Start with data points which represent defined policies
 Collect data for day or two before actual baseline to
determine whether the right data is collected from right
devices
 Conduct network baselining on regular basis
 Speed up fault isolation
 Understand how the network affected by changes
 Identifying devices and ports of interest
 More clear report
 Either keep from change or change informing manner
 Use port description field to track the ports
 Determine the duration of baseline
 At least 7 days, 2 – 4 weeks is adequate
 Network documentation
 Overview
 Facilitate more effective troubleshooting
 Save time to build network configurations again
 Network configuration table
 Contain accurate and up-to-date records of components of the
network.
 Provide information to identify and correct faults
 Should include: type, model, hostname, location, data link layer
address, network layer address, other physical aspects
 Table for budgetary purpose should be separated
 Network topology diagrams
 Notations and symbols should be consistent
 Cloud symbol = out of scope network
 Should include: device name, interface name, IP address, routing
protocols
 Discover network configuration information
 show version – device name, model, OS version (all)
 show ip interfaces – active interfaces + addresses (R)
 show ip interfaces brief – brief summary of interfaces (R)
 show ip interface {interface-name} – MAC address (R)
 show ip protocols – routing protocols enabled (R)
 show spanning tree/spantree – spanning tree status (all)
 show cdp neighbors – directly connected Cisco devices (all)
 show cdp entry {device id} – details of connected devices (all)
 show interfaces description – active ports + addresses (S)
 show interfaces status – ports summary (S)
 show etherchannel summary – EtherChannel (S)
 show interfaces trunk – Trunk ports (S)
 show tech-support – all information (many than needed)
 End system configuration table
 End systems are important, can affect network performance
 Provide complete picture of the network
 Should include: device name, OS, IP address, subnet mask,
default gateway, DNS server, high-bandwidth network
applications
 End system topology diagrams
 Should include: device name, OS, IP address, subnet mask,
interface names, VLANs
 Discover end system configuration information
 OS and hardware information
 Access command line
 ipconfig / winipcfg / ifconfig - TCP/IP setting
 route print – active routes
 arp –a – ARP information
 ping – check connectivity
 tracert / traceroute – view routes
 Documentation guidelines
 Determine scope  Know the objective  Be consistent  Keep
the documents accessible  Maintain the documentation
 Troubleshooting methodologies and tools
 Overview
 Systematic approach can make troubleshooting manageable, less
confuse and less time wasting
 Rocket scientist approach (theorist)
 Analyze until identify root cause, then correct with precision
 Time wasting, resources demanding
 Caveman approach (practical)
 Swap the things until the network functions again
 Not reliable, root cause may still present
 General troubleshooting process
 Remarks: stages are not mutually exclusive, policies should be
established in each stage
 Step 1 – Gather symptoms
 From alerts from NM systems, console message and users
 Break down the problems to smaller ones
 Questioning technique
 Ask questions which related to the problem
 Use each question to eliminate or discover possibilities
 Make the question understandable by users
 Ask the time of the problem first seen
 Ask user to recreate the problem if possible
 Determine the event sequence before the problem happened
 Match the symptoms with common problem causes
 Step 2 – Isolate the problem
 Use the layer models to categorize the problems
 Further gather and document symptoms
 Step 3 – Correct the problem
 Implement
 Test
 Document (especially a new problem is made)
 Approaches
 Types
 Bottom-up
 Work up through OSI layer model
 Good to deal with physical problems
 Check every device and document all conclusions and
possibilities after obtain authorization
 Top-down
 Work down through OSI layer model
 Good to deal with application problems
 Check every network applications and document all
conclusions and possibilities after obtain authorization
 Divide and conquer
 Work directly on a particular layer, based on troubleshooter’s
experience and symptoms
 If a layer is functioning, normally underneath layers are
working too
 Selecting guidelines
 Tools
 Network management system frameworks
 End stations can send alerts when problems are recognized
 Management entities are programmed to react
 Agent in end stations gather information
 Such information will be sent via NM protocols like SNMP
 Five areas: Performance, Configuration, Accounting, Fault and
Security
 Knowledge base tools - databases
 Performance measurement and reporting tools - Cisco view, Netsys
baseliner
 Event and fault management tools – Cisco Network Analysis Module,
protocol analyzers, pair / cable testers
 OSI layer 1 troubleshooting
 Critical characteristics
 As physical layer failed, upper layers cannot operate too
 Ping timeout
 Not able to telnet
 Not able to access network drives and servers
 “Page cannot be displayed” when attempting to access web pages
 Noncritical characteristics
 Equipment indicators
 System LED - It shows whether the system is receiving power
and functioning correctly
 POST – off = running, green = success, amber = failed
 Remote Power Supply (RPS) LED - It indicates whether or not
the remote power supply is in use
 Port Mode LED - It indicates the current state of the Mode button.
 Port Status LED - They have different meanings, depending on
the current value of the Mode LED.
 Console messages
 show interfaces
 no keepalive – pretend interface up, should not be used
 Performance lower than baseline
 Poor configuration
 Incorrect clock rate, incorrect clock source, incorrect serial
links (sync/async), interface shutdown, encapsulations, IP
addressing, duplex and speed
 Inadequate capacity
 Unstable routing due to marginal link or port
 Excessive traffic across low speed link
 Overload server or service
 Exceed design limits
 Distance limit of cable  signal attenuation
 Collisions
 Large collision domains, duplex mismatch, late collisions
 Use show interface ethernet/fastethernet
 Electromagnetic Interference (EMI) effects
 Impulse noise (voltage fluctuation, 270mV on 10BaseT and
30 or 40mV on 1000BaseT)), Random noise, Alien cross-talk
(parallel cables) and Near End Cross Talk (untwisted cable >
13mm)
 Faulty media or hardware
 Loose cable, dirty contacts, wrong cable, return loss
 Power  LED, Fan, power cable
 Resources and utilization
 CPU and memory
 Power
 Network
 Console (error) messages
 Format: %FACILITY-SEVERITY-MNEMONIC: Message-text
 Facility (hardware, protocol, or module)
 Severity (of the situation, lower number = more serious)
 Mnemonic (Unique identifier of the message)

 Message-text (describe the condition)
 Useful commands
 Show buffers – memory buffer pool statistics
 Show environment – power supply and temperature
 Show processes cpu/memory – resources utilizations
 Show stacks – display processor stacks, requires stack decoder
 Show context – show exception information in NVRAM
 More difficult to troubleshoot because of suboptimal operations, either
frames not transmitting through best paths or dropped frames
 Framing errors
 A frame which is not ended on 8-bit byte boundary
 Noisy serial line
 Improperly designed cable
 Incorrect clock (rate)
 T1 link problem because of incorrect framing or coding
specification
 Use show interfaces to reveal
 Frame error count
 Invalid Cyclic Redundancy Check
 Layer-2 to Layer-3 address mapping errors
 Occur in point-to-multipoint, Frame Relay and broadcast Ethernet
 A correct destination Layer-2 address must be given to a frame
 Layer-2 to Layer-3 address mapping mechanism and potential errors
 Static maps
 In Ethernet environment, change of NIC can lead to problem
 In Frame Relay environment, incorrect DLCIs assigned by
Telco
 Dynamic maps (ARP)
 Devices do not respond to ARP or Inverse-ARP requests
 Invalid ARP replies due to misconfiguration, DoS or Man-in-
the-middle attacks
 Symptoms (except man-in-the-middle attack)
 No direct Layer-3 communications
 Layer-2 communications are ok
 No or incorrect Layer-2 address when doing ARP inspection
 Useful commands
 Show arp
 Show cdp neighbor detail
 Show frame-relay map
 Spanning Tree Protocol
 Problem occur when exchange of Bridge Protocol Data Units (BPDUs)
failed
 Symptoms
 Unusually high backplane utilization
 Rapid address re-learning
 Rapidly incrementing frame counters
 Poor link performance
 Broadcast storm within Layer-2 domain
 Causes
 Bad transceivers
 Cabling issues
 Hardware failures such as ports and Supervisor engine
 Unidirectional link between bridges (cause STP loops)
 UDLD protocol (to prevent STP loops)
 A Layer-2 protocol which works with Layer-1 mechanisms
 Able to detect neighbors’ identity and shutdown misconnected
ports
 Operations
 Exchange protocol packets between neighbors
 Packets contains device/port ID of device itself and of
neighbors’
 Neighboring ports should see their own echo in packets
received from another side, otherwise the link will be
considered as unidirectional link after specific time
 The ports in unidirectional link will be disabled by UDLD,
and only can reenable manually
 Configuration
 UDLD is disabled by default
 Use udld enable either in global mode or in a particular
interface (interface command overwrites global ones)
 Use show udld interface to verify UDLD operation
 Ethernet broadcast traffic
 Causes
 Poorly programmed or configured applications
 Huge Layer-2 broadcast domain
 Other network problems such as STP loops or route flapping
 Discover
 Either compare with baseline or use protocol analyzer
 Solutions
 Create separate VLANs
 Configure switches to be multicast aware
 Use scheduling for distribution services to control broadcast
 Ethernet switch flooding
 Causes
 Asymmetric routing because of HSRP configuration on Layer-3
switches
 STP Topology Change Notification (TCN)
 Overflow of switch forwarding table (CAM)
 Solutions
 Set the router’s ARP timeout and switches’ forwarding table-aging
time close to each other
 Enable STP portfast feature on ports
 Use port security feature
 EtherChannel
 Cause
 Non-identical configuration on both sides
 Symptoms
 Loss of connectivity (due to switching loops)
 Increased backplane utilization
 Rapid MAC address re-learning
 Interfaces may turn to ErrDisable state
 Solution
 Configure the ports on both sides to have same speed, duplex, and
native VLAN trunk
 T1 framing errors
 Use show controllers t1
 Check if clock source is provided by Telco (Line)
 Check if the framing format is same as the line
 Check if the line coding matches
 ISDN
 Useful commands
 show isdn status
 debug isdn q931 – show Layer-2 exchange
 debug dialer – show dialer list and dialer map
 Check PPP connection
 Frame Relay
 Check physical connectivity
 Verify LMI information exchange (show frame-relay lmi)
 Verify PVC status (Active, inactive or deleted)
 Verify Frame Relay encapsulation
 General
 Distribute-list blocking (except OSPF and ISIS)
 Passive interface (RIP/IGRP can still receive routing updates)
 Missing or incorrect network or neighbor statement
 Layer-1 and 2 problem
 show ip protocols
 show ip interface
 show ip interface brief
 debug ip routing
 RIP
 Incompatible version types
 By default, router receives version 1 and 2 but send version 1 only
 Mismatched authentication key (in version 2 only)
 Hop count limit (more than 15)
 Discontiguous networks
 Add static route
 Change the middle network into a part of major network also
 Use version 2 with no auto-summary
 Invalid source address
 Cause by IP unnumbered
 Use no validate-update-source to solve the problem
 Flapping routes
 Large routing table
 debug ip rip
 EIGRP
 Mismatched K values on both sides
 Default K1=1, K2=1, K3=1, K4 and K5=0
 Stuck in active
 Congested or bad link
 Low router resources
 Long query range
 Excessive redundancy
 Duplicate router ID
 Change loopback address
 show ip eigrp interfaces
 show ip eigrp neighbors
 debug ip eigrp
 OSPF
 Access list blocking (multicast hello 224.0.0.5)
 Mismatched parameters
 Hello and dead interval
 Authentication type
 Area ID
 Area options
 State issues
 Stuck in ATTEMPT
 No response when trying to contact a neighbor
 Misconfigured neighbor statement
 Stuck in INIT
 Two-way communication has not been established
 Access list blocking OSPF hellos
 Authentication enabled on one side only
 Stuck in EXCHANGE
 Fail to exchange Database Descriptor (DBD) packets
 Duplicate router ID
 Mismatched interface MTU
 Point-to-point link unnumbered
 show ip ospf interface
 debug ip ospf events
 BGP
 Neighbors not initializing
 Updates will only be exchanged upon Established neighbor state
 Routes not being installed in routing table
 IBGP
 Routes not synchronized
 Next hop is unreachable
 EBGP
 Next hop is unreachable in case of multihop EBGP
 Multiexit discriminator (MED) value is infinite
 ISIS
 Adjacency problems
 Show clns neighbors
 debug isis adj packets
 debug isis update-packets
 ACL
 Implementing the standard access list as close to the protected
destination
 Implementing the extended access list as close as possible to the source
of the traffic being filtered.
 show log
 show ip access-list {number/name}
 show ip interface
 NAT
 DHCP
 Source address of DHCP-Request packet is 0.0.0.0
 Since NAT requires both valid destination and source address,
DHCP is difficult to run on router with NAT
 DNS and WINS
 When using dynamic NAT, the inside and outside addresses
relationship changes frequently, so the outside DNS servers can
not accurately present the network inside the router
 SNMP
 SNMP management station may not be able to contact SNMP
agents on the other side of the NAT router because NAT can alter
the addressing information in the payload
 show ip nat
 debug ip nat
 Others
 Local system logging
 logging on
 Network Time Protocol (NTP)
 ntp peer {NTP server IP address}
 ntp peer authenticate
 Logging timestamps
 Service timestamps debug datetime {local time} {msec} {show
timezone}
 NetBIOS
 Netstat –display protocol statistics and current TCP/IP
connections
 Nbstat - display protocol statistics and current NetBIOS
connections running on TCP/IP

Network Diagnosis and Troubleshooting Summary

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Network Diagnosis and Troubleshooting Summary

Transféré par

Droits d'auteur :

Formats disponibles

Network Diagnosis and Troubleshooting Summary by Bob Chan

 Severity (of the situation, lower number = more serious)

 Mnemonic (Unique identifier of the message)

Vous aimerez peut-être aussi