Vous êtes sur la page 1sur 72

Building Resilient Enterprise

Campus Networks
BRKRST-3032

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 1
What Are Your Uptime Requirements?
Global Enterprise
 Campus network design is evolving in Availability
response to multiple drivers
User Expectations: Always ON Access to
communications
Business Requirements: Globalization means true
7x24x365
Technology Requirements: Unified Communications
Unexpected Requirements: Worms, Viruses, …
Collaboration
and Real-Time
Communication

Requires a Structured ‘and’


Resilient Design Security
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 2
How Does Downtime Affect Voice
or Video?
 Availability Requirements for UC are more than just five 9’s
 Also need to consider the subjective impact to real time communications

50
45
Seconds of Data Loss

40
35
30
25
20
15
10
5
5-6 sec
200ms 1 sec
0
No impact to Minimal Impact User Hangs Phone Resets*
Voice or Video to Voice Up
* The time for a phone to reset is variable and depends on the signaling protocol (SCCP or
SIP) and the state of the call (active, ringing, …)
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 3
High Availability Campus Design
Agenda

 Network Level Resiliency


High Availability Design Principles
Redundancy in the Distribution Block
Campus Routing Best Practices

 System Level Resiliency


Integrated Hardware and Software Resiliency
NSF/SSO
ISSU & IOS Modularity

System Management Resiliency


GOLD & EEM

 Hardening the Campus Network Design

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 4
Structured and Modular Designs
Work Best
Redundant
 Optimize the interaction of Supervisor
the physical redundancy
Layer 2 or
with the network protocols Layer 3
Provide the necessary amount
Si
of redundancy Si Si Si Si Si

Pick the right protocol for the Redundant


requirement Links
Layer 3 Equal
Optimize the tuning of the Cost Link’s Redundant
protocol Si Si Switches

 The network looks like this


so that we can map the
protocols onto the physical Si Si

Si Si
Si Si

topology
 We want to build networks
that look like this
WAN Data Center Internet

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 5
Do Not to Extend Direct Links with a
Passive Switch
 Indirect link failures are harder to Hellos
detect
Si
 Caused by switches not participating
in the recovery protocol such as hubs Si

or dumb switches
 With no direct HW notification of link SW initiated Si

recovery
loss or topology change convergence
times are dependent on SW
notification via Spanning Tree BPDUs
or Routing Protocol Hellos
Si

Si

HW detect & Si
recovery
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 6
Did You Know that Copper Takes Longer
than Fiber for Failure Detection?
Remote IEEE
 Direct point to point fiber provides for fast Fault Detection
Mechanism
failure detection
 IEEE 802.3z and 802.3ae link negotiation Si Si
define the use of Remote Fault Indicator & 1
Link Fault Signaling mechanisms
 Debounce (Catalyst default to disabled)
1

Linecard
2 Throttling:
Debounce Timer

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 7
Routed Interfaces Offer Better
Convergence Properties than SVIs
 Configuring L3 routed interfaces provides for faster convergence
than a L2 switchport with an associated L3 SVI
1. Link Down
~ 8 msec 2. Interface Down L3
loss Si Si
3. Routing Update

21:38:37.042 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet3/1, changed state to down
21:38:37.050 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet3/1, changed state to down
21:38:37.050 UTC: IP-EIGRP(Default-IP-Routing-Table:100): Callback: route_adjust GigabitEthernet3/1

1. Link Down
2. Interface Down
L2
3. Autostate
Si Si
~ 150-200
4. SVI Down
msec loss
5. Routing Update
21:32:47.813 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2/1, changed state to down
21:32:47.821 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet2/1, changed state to down
21:32:48.069 UTC: %LINK-3-UPDOWN: Interface Vlan301, changed state to down
21:32:48.069 UTC: IP-EIGRP(Default-IP-Routing-Table:100): Callback: route, adjust Vlan301

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 8
Build Triangles Instead of Squares
 This design enables effective use of Equal Cost Links: Link/Box Failure
Does Not Require Multi-Box Interaction
Equal Cost Multi-Path or Multi-
chassis Etherchannel
Recovery
 Time to restore traffic flows is
based on 200ms
• Time to detect link failure Si Si vs
• Update the HW forwarding ~1000 to
2000ms
 No dependence on external events
(no routing protocol or spanning
tree convergence required) Si Si

 Increased delay could be added if


an alternate router has to take over
in a layer 2 to the access design
All Links Forwarding: In an
Environment with All Links Active
Traffic Is Restored Based on HW
Recovery
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 9
Routing Failover Takes Only 5 Steps
Switch
2
Software Routing Table (RIB)
Prefix Next Hop Interface
Si
10.255.0.0/16 10.10.1.1 gig 1/1
1 10.20.1.1 gig 1/2

Si Software
Forwarding Table
3
FIB Table Adjacency Table
Si Prefix Adjacency Ptr Rewrite Information
10.255.0.0/16 Adj1 (gig 1/1) AA.AA.AA.AA.AA, VLAN
Adj2 (gig 1/2) BB.BB.BB.BB.BB, VLAN

1 Link failure detection

2 Removal of the entries in the Hardware Tables 4


routing table
FIB Table Adjacency Table
3 Update of the software CEF table Prefix Adjacency Ptr Rewrite Information
to reflect to loss of the next hop 10.255.0.0/16 Adj1 (gig 1/1) AA.AA.AA.AA.AA, VLAN
adjacencies Adj2 (gig 1/2) BB.BB.BB.BB.BB, VLAN

4 Update of the hardware tables


5
5 Routing protocol notification and Routing Protocol Process
reconvergence

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 10
High Availability Campus Design
Agenda

 Network Level Resiliency


High Availability Design Principles
Redundancy in the Distribution Block
Campus Routing Best Practices

 System Level Resiliency


Integrated Hardware and Software Resiliency
NSF/SSO
ISSU & IOS Modularity

System Management Resiliency


GOLD & EEM

 Hardening the Campus Network Design

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 12
We Will Be Talking About Solutions for
Two Distribution Block Models

Si Si Si Si

Vlan 10 Vlan 20 Vlan 30 Vlan 30 Vlan 30 Vlan 30

 Each access switch has  At least some VLAN’s span


unique VLAN’s multiple access switches
 No layer 2 loops  Layer 2 loops
 Layer 3 link between distribution  Layer 2 and 3 running over link
between distribution
 No blocked links
 Blocked links

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 13
If Spanning Tree Is Used Multiple Features
Are Needed to Protect From Failure
Bridge
 Place the root bridge in the Assurance
distribution layer for optimal
traffic flow STP Root

 Some hardware issues can


Si Si
cause spanning tree to fail
 Understand and optimize the L2 Rootguard
element of your design Loopguard or
 L2 has no native mechanism to Bridge Assurance
dampen a problem
Storm Control
 Storm control can be utilized on
backup links to limit
Broadcast traffic volumes during
a STP Loop
BPDU Guard or
Utilize Sup720 rate limiters or Rootguard
SupIV/V/6E with HW queuing structure PortFast
Port Security

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 14
Root Guard Prevents Root from
Unexpectedly Moving
 Enable Root Guard on links
connecting to access layer to
protect from edge switches
becoming root and causing sub- Si Si

optimal traffic flow


 Forces Layer 2 LAN interface to STP Root BkUp Root
be a designated port. If port
Si Si
receives a superior BPDU, Root
Guard puts the interface into the Rootguard
root-inconsistent (blocked) state
 Channel the trunk between VLAN 30
Distribution Switches so failure
doesn’t break topology
Router(config-if)# switchport
Router(config-if)# spanning-tree guard root

%SPANTREE-2-ROOTGUARDBLOCK: Port 3/3 tried to become non-designated in VLAN 800.


Moved to root-inconsistent state

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 15
PortFast and BPDU Guard Protects
Access Switch Ports
 Enable PortFast on edge ports to allow
them to quickly move to forwarding Si Si

bypassing listening and learning and


avoid TCN (RPVST Topology Change
Notification) messages
 BPDU Guard prevents ports configured Si Si
with PortFast from being incorrectly
connected to another switch
 Enable BPDU Guard to prevent loops
by moving PortFast-configured VLAN 30
BPDU
Receive
interfaces that receive BPDUs to
errdisable state
 When enabled globally, BPDU Guard PortFast +
applies to all interfaces that are in an BPDU Guard
operational PortFast state
Router(config-if)#spanning-tree portfast
Router(config-if)#spanning-tree bpduguard enable

1w2d: %SPANTREE-2-BLOCK_BPDUGUARD: Received BPDU on port FastEthernet3/1 with


BPDU Guard enabled. Disabling port.
1w2d: %PM-4-ERR_DISABLE: bpduguard error detected on Fa3/1, putting Fa3/1 in
err-disable state
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 16
Loop Guard Protects Network from
Unidirectional Links and Software Errors
 BPDUs flow between switches providing
a means to detect a loop and block the
higher cost path to the root Forwarding

 If BPDUs stop flowing due to soft errors, BPDU


max-age timer expires, port transitions STP Root
from blocked to listening then forwarding Si Si

state creating a loop


 When Loop Guard is enabled and Loopguard
BPDUs are no longer received on a non-
Alt Path to Root,
Forwarding
Designated port the port is moved to a Forwarding SPT Blocking
STP loop-inconsistent blocking state X Lack of BPDUs
preventing a loop Received triggers
Loopguard
 Works in PVST, RPVST+, and MST
STP Domains
Router(config)# interface gigabitEthernet 2/1
Router(config-if)# spanning-tree guard loop
%SPANTREE-2-LOOPGUARD_BLOCK: Loop guard blocking port gigabitEthernet 2/1 on VLAN0050
The port is moved to a STP loop-inconsistent blocking state preventing a loop
Automatic recovery upon receiving BPDUs on port
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 17
Bridge Assurance Also Protects Net from
Unidirectional Links and Software Errors
 BPDUs flow between switches providing
a means to detect a loop and block the
higher cost path to the root Forwarding

 If BPDUs stop flowing due to soft errors, BPDU


max-age timer expires, port transitions STP Root
from blocked to listening then forwarding Si Si

state creating a loop


 Bridge Assurance is recommended Bridge
network wide with latest software Assurance

 Point-to-point links only


 Only works in RPVST+, and MST STP
Domains

%STP-2-BRIDGE_ASSURANCE_BLOCK: Bridge Assurance blocking port Ethernet 2/48 VLAN0700


Switch# sh spanning vlan 700 | in –i bkn
Eth2/48 Altn BKN*4 l28.304 Network P2p *BA_Inc
Switch#
The port is moved to a STP loop-inconsistent blocking state preventing a loop
Automatic recovery upon receiving BPDUs on port
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 18
Dynamic Port Security Mac Move Violation
Watches for Access Layer Loops
 Some consumer grade ‘personal’
switches don’t send BPDU’s
 If there are no BPDU’s BPDU Guard
Si Si

can’t detect and ‘stop’ them


 If no BPDU’s STP can’t prevent/stop a
loop at the edge Si Si

 Port security detects MAC learned


address on 2 ports and error disables
or ignores traffic on second interface
breaking the loop
Port-
Security

switchport port-security (Switch that doesn’t


switchport port-security maximum 3 send BPDU’s)
switchport port-security violation restrict
switchport port-security aging time 2
switchport port-security aging type inactivity

If Violation is set for Error-Disable, the Following Log Message Will Be Produced:
4w6d: %PM-4-ERR_DISABLE: Psecure-Violation Error Detected on Gi3/2, Putting Gi3/2 in Err-Disable State
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 19
Harden the Network Links
Storm Control
 Protect the network from intentional and unintentional flood attacks
e.g. STP loop
 Limit the combined rate of broadcast and multicast traffic to normal
peak loads
 Limit broadcast and when possible multicast to 1.0% of a GigE link
to ensure distribution CPU stays in safe zone
! Enable storm control
Broadcast Traffic CPU Impact
storm-control broadcast
90
Percentage of CPU Utilizaiton

level 1.0
80 storm-control multicast
70 level 1.0

60
50
40
30
20
10
0
0.05 0.1 1 1.5 2 2.5 3
Percentage of Broadcast Traffic
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 20
Convergence Is Longer with Complex
Spanning Tree Loops and Older Protocols
400 msec Convergence
 Time to converge is dependent on the for a Simple Loop
protocol implemented 802.1d, 802.1s or
802.1w (all now a part of IEEE 802.1d
Si Si
2004 spec)
 It is also dependent on:
Size and shape of the L2 topology (how
deep is the tree)
Number of VLAN’s being trunked across
each link 900 msec Convergence
for a More Complex Loop
Number of ports in the VLAN on
each switch
Si Si

 Complex Topologies Take Longer to


Converge
 Multi-chassis Etherchannel is a good
option to avoid this increase in
convergence

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 21
Even with Faster Convergence from RPVST+
We Still Have to Wait on FHRP Convergence
VRRP Config
FHRP Active FHRP Standby
interface Vlan4
ip address 10.120.4.1 255.255.255.0
ip helper-address 10.121.0.5 R1 R2
no ip redirects
vrrp 1 description Master VRRP Si Si
vrrp 1 ip 10.120.4.1
vrrp 1 timers advertise msec 250
vrrp 1 preempt delay minimum 180

HSRP Config
interface Vlan4
ip address 10.120.4.2 255.255.255.0
standby 1 ip 10.120.4.1
standby 1 timers msec 250 msec 750
standby 1 priority 150
standby 1 preempt
standby 1 preempt delay minimum 180
 GLBP offers load balancing within a
GLBP Config
VLAN
interface Vlan4
ip address 10.120.4.2 255.255.255.0  For Voice, sub-second Hello timer
glbp 1 ip 10.120.4.1 enables < 1 Sec traffic recovery
glbp 1 timers msec 250 msec 750
glbp 1 priority 150
upstream
glbp 1 preempt
glbp 1 preempt delay minimum 180  Not necessary for Multi-chassis
Etherchannel designs
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 22
Inbound Traffic Has to Wait on ARP
Resolution

 The ARP process is a


CPU bound process
which is rate limited to Si Si

prevent Denial of service


Attacks.
FHRP Active FHRP Standby
 Does not apply to Multi- R1 R2
Chassis Etherchannel Si Si

designs
ARP

ARP
Responses

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 23
Layer 3 Boundary Should Be as Close to Edge
as Possible to Minimize ARP Processing Delay

When an alternate router has to take over for inbound


traffic, there is a delay that is based on the number of
active flows.
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 24
Multi-Chassis EtherChannel Solutions as
an Alternative to STP and FHRP
 Spanning tree loops are replaced with ether-channel connections
 FHRP nodes are virtualized and ARP cache is replicated
 All links are fully utilized based on Ether-channel load balancing
 Similar benefits to L3 Equal Cost Multi Path
Seconds till Recovery

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 25
Three Options for Multi-Chassis EtherChannel
Designs to Remove Spanning Tree
Virtual Switching Virtual Port Channel Stackwise+
System  Separate control plane  Single control plane
 controlled by Master
Single control plane  Separate management Switch
plane with VPC state
 Single management synchronization (CFS)  Master switch controls
plane etherchannel
 Redundant supervisors per
 Single supervisor per chassis with hitless SSO  Redundant master
switches per stack
chassis  Manual port sync config
(DataCenterNetworkMgr)  Automatic port config
 Automatic port config sync (single control
sync (single control  Local SVI HSRP/PIM plane)
forwarding enhancements
plane) to act as active-active pair  Stack appears as a
single router, no need
 Single L3 domain for FHRP
(single SVI) no need
for FHRP
SW1 VPC FT-Link SW2

VPC peer-link

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 26
VSS Enabled Campus Design
End-to-End VSS Design Option

B B B

Si Si

Si Si

Si Si

B B
B

STP-Based Fully Redundant


Redundant Topology Virtual Switch Topology
B = STP Blocked Link
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 27
Now We Will Focus on the Unique VLAN
per Switch Model

Si Si Si Si

Vlan 10 Vlan 20 Vlan 30 Vlan 30 Vlan 30 Vlan 30

 Each access switch has  At least some VLAN’s span


unique VLAN’s multiple access switches
 No layer 2 loops  Layer 2 loops
 Layer 3 link between distribution  Layer 2 and 3 running over link
between distribution
 No blocked links
 Blocked links

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 28
The Best Deployment for This Model Is Routed
Access

EIGRP/OSPF EIGRP/OSPF
Layer 3
Si Si

EIGRP/OSPF GLBP Model EIGRP/OSPF Layer 2

Vlan 10 Vlan 20

 Extend convergence benefits of ECMP to the edge with Routed


Access
 Improved convergence with fewer protocols
 No need to manage FHRP
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 29
Routed Access Provides Rapid Convergence
with Optimized Traffic Flow and Ease of Mgmt

 EIGRP converges in <200 Both L2 and L3 Can Provide


Sub-Second Convergence
msec
 OSPF with sub-second 2
tuning converges in <200 1.8

Seconds of VoIP packet loss


msec 1.6
Upstream
1.4
1.2 Downstream
1
0.8
0.6
0.4
0.2
0
RPVST+ OSPF EIGRP
12.2S

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 30
Routed Access Optimized Multicast
Operation
 Layer 2 access has two multicast routers on the access subnet, causing
one to have to discard frames
 Routed Access has a single multicast router which simplifies management
of multicast topology

IGMP Querier
(Low IP address)

Si Si
Si Si

Non-DR has to Designated


drop all non-RPF Router
Traffic (High IP Address)
Designated
Router & IGMP
Querier

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 31
Evolution Possibilities

Si Si
or Si Si

Vlan 10 Vlan 20 Vlan 30 Vlan 30 Vlan 30 Vlan 30

Use Routed Use MultiChassis


Access Etherchannel with
Virtual FHRP
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 32
High Availability Campus Design
Agenda

 Network Level Resiliency


High Availability Design Principles
Redundancy in the Distribution Block
Campus Routing Best Practices

 System Level Resiliency


Integrated Hardware and Software Resiliency
NSF/SSO
ISSU & IOS Modularity

System Management Resiliency


GOLD & EEM

 Hardening the Campus Network Design

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 33
Managing the Number of Routes Directly
Affects Convergence Time
 Managing the number of routes in the network is important
 Both EIGRP and OSPF need summarization
 Does not apply to Multi-chassis Etherchannel designs

3
Time for ECMP Recovery
Time to Restore Voice (Sec.)

Si Si

2.5

1.5
Si Si

0.5

0
800 1000 3000 6000 9000 12000
Number or Routes Core/Distribution – Sup720

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 34
EIGRP Is Unique with It’s Multi-Level
Summarization Capability
 The greatest advantages of
10.10.0.0/16
EIGRP are gained when the
network has a structured
addressing plan that allows
for use of summarization and
stub routers
Si Si
 EIGRP provides the ability to 10.10.0.0/17 10.10.128.0/17
implement multiple tiers
of summarization and route
filtering
 Able to maintain a Si Si Si Si

deterministic convergence
time in very large L3 topology

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 35
EIGRP Convergence Is Improved with
Summarization and Filtering
 EIGRP convergence is largely dependent on
query response times
 Minimize the number and time for query
response to speed up convergence Si Si

 Summarize distribution block routes upstream


to the core
 Configure all L3 access switches as EIGRP
stub routers
 Filter routes sent down to L3 access switches
Si Si
interface TenGigabitEthernet 4/1
ip summary-address eigrp 100 10.120.0.0 255.255.0.0 5

router eigrp 100


network 10.0.0.0
distribute-list Default out <mod/port>

ip access-list standard Default


permit 0.0.0.0

router eigrp 100


network 10.0.0.0
eigrp stub connected

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 36
OSPF Area Boundaries Offer
Summarization for Improved Scale
Area 100 Area 110 Area 120
 Area boundaries provide
buffers between fault
domains
Si Si Si Si Si Si

 Keep area 0 for core


infrastructure
 Do not extend area 0 to Area 0
the access routers when Si Si

using Routed Access


Si Si
Si Si

Si Si

WAN Data Center Internet

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 37
OSPF Downstream Summarization Is
Accomplished with Multiple Area Types
 ABR for a regular area forwards
Summary LSAs (Type 3)
Si Si
ASBR summary (Type 4)
Specific externals (Type 5)
 Stub area ABR forwards
Summary LSAs (Type 3)
Summary default (0.0.0.0) Si Si

 A totally stubby area ABR forwards OSPF


Area
Summary default (0.0.0.0) 120

router ospf 100


area 120 stub no-summary

network 10.120.0.0 0.0.255.255 area 120


network 10.122.0.0 0.0.255.255 area 0

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 38
OSPF Upstream Summarization Helps
Minimize LSA Churn in the Core
 Summarize routes from the distribution
block upstream into the core
 Minimize the number of LSA’s and Si Si

routes in the core


 Reduce the need for SPF calculations
due to internal distribution block changes

Si Si

ABR’s originate
Summary 10.120.0.0/16

router ospf 100


area 120 stub no-summary
area 120 range 10.120.0.0 255.255.0.0 cost 10
network 10.120.0.0 0.0.255.255 area 120
network 10.122.0.0 0.0.255.255 area 0

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 39
Routing Protocol Churn Can Be Reduced
with IP Event Dampening
 Prevents routing protocol churn caused by
constant interface state changes
Si Si
 Dampening is applied on a system: nothing
is exchanged between routing protocols Up
Down
Up
 Supports all IP routing protocols Down
Up
Static routing, RIP, EIGRP, OSPF, IS-IS, BGP Down
Si
Up
In addition, it supports HSRP and CLNS routing
Applies on physical interfaces and can’t be applied on
subinterfaces individually

Up Interface State
interface GigabitEthernet1/1
description Uplink to Distribution 1 Down
dampening
ip address 10.120.0.205 255.255.255.254
Interface State Perceived by OSPF
Up

Down

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 40
High Availability Campus Design
Agenda

 Network Level Resiliency


High Availability Design Principles
Redundancy in the Distribution Block
Campus Routing Best Practices

 System Level Resiliency


Integrated Hardware and Software Resiliency
NSF/SSO
ISSU & IOS Modularity

System Management Resiliency


GOLD & EEM

 Hardening the Campus Network Design

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 41
Supervisor Redundancy Is Provided by
Stateful Switch Over

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 42
Stateful Switchover (SSO) Keeps Both
Processors in Synch for Hot Standby Mode
Active RP Hot
Standby
Standby
RPRP
HA - Aware HA - Aware
Application Application
Interconnect
Run Used for IPC and Suspend
state RF Checkpointing RF
RF state
Interfaces I I Interfaces
Drivers IPC Message
P P IPC Message Drivers
Protocols Queues Heart Beat Queues Protocols
CEF C C CEF
CF CF
CF
Bulk and Dynamic Bulk and Dynamic
Sync Sync
ARP FIB/ADJ VLAN STP ACL/QOS ARP FIB/ADJ VLAN STP ACL/QOS

Line Card

 Active/Standby supervisors run in synchronized mode (boot-env, running-


configuration, protocol state and line cards status gets synchronized)
 Depending on platform, line card and protocol this incurs from 0 to 3
seconds of outage
 Switch processors synchronize L2 information, (e.g., STP, MAC address)
and L2/L3 FIB, QOS and ACL tables
 DFCs are populated with L2/L3 FIB, Netflow, and ACL tables
 Line card protocol status is maintained during failover
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 43
Stackwise+ Provides Stack Master Switch
Redundancy
 1:N redundancy
Any member can become stack master if the master fails. The switch with the highest stack-
member priority-value becomes the master using RPR+ and NSF
 Stack master provides centralized functionality
Layer 3 control plane, management plane (telnet, SSH, SNMP, HTTP), and stack
configuration
Controls CDP and EtherChannel and propagates VLAN database and Spanning Tree info
Builds and propagates the hardware information (L3 FIB, ACL, QoS)
Control routing process and neighbor adjacency for L3 routed operation
 Stack member provides distributed forwarding of data and local control plane
Local instance of STP, BPDU processing, and MAC address management
Two Stack
Cables
Switch Fabric Stack PHY
TCAM TCAM TCAM
SRAM SRAM SRAM
Port Port Port
ASIC ASIC ASIC CPU StackWise,
StackWise
Plus

Switch Fabric Stack PHY

Switch Fabric Stack PHY

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 44
NSF Works with SSO to Keep Neighbors
Forwarding During a Supervisor Switchover

 Non-Stop Forwarding provides


graceful restart enhancements to
EIGRP, OSPF, IS-IS and BGP
NSF-Aware
 An NSF-capable router continuously Si Si

forwards packets during an SSO


processor recovery
 NSF-aware and NSF-capable routers NSF-Aware,
provide for transparent routing NSF-Capable
protocol recovery
Graceful restart extensions enable
neighbor recovery without resetting
adjacencies NSF-Aware,
NSF-Capable
Routing database re-synchronization
occurs in the background

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 45
NSF/SSO Offers Better Convergence in
Non Full Mesh Topologies
 Redundant topologies with equal cost
paths provide sub-second convergence
 NSF/SSO provides superior
availability in environments with ?
non-redundant paths

5
Seconds of Lost Voice

RP Convergence Is
Si Si
4 Dependent
on IGP and Tuning
3

1 Si

0
Link Node NSF/SSO OSPF
Failure Failure Convergence
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 48
If Using the Supervisor Uplinks,
EtherChannel Is Recommended

 The use of uplinks with NSF/SSO


No Traffic Flow Change
results in a dual component failure
Supervisor Failure + Supervisor Ports
Failure

 To avoid network protocol


convergence use EtherChannel
across supervisor uplinks or simply
use line card ports as uplinks

EtherChannel or
Uplink from line card

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 49
Avoid Using Uplinks on Stack Master in a
Stackwise + Design
 Using uplink ports on non-Master Summary
switches avoids unnecessary subnets
network protocol convergence Reroute
when a master fails Distribution

 You want to basically avoid Si Si

having 2 failures when the master


fails
L2 or L3
 Convergence impact depends on
Access
tuning but could be up to 7
Master
seconds.
 Recommendation is NOT to
combine Master and uplink in Stack ring 64 Gig
single switch Single logical Switch
 Use Multi-chassis etherchannel to
minimize this failure

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 50
High Availability Campus Design
Agenda

 Network Level Resiliency


High Availability Design Principles
Redundancy in the Distribution Block
Campus Routing Best Practices

 System Level Resiliency


Integrated Hardware and Software Resiliency
NSF/SSO
ISSU & IOS Modularity

System Management Resiliency


GOLD & EEM

 Hardening the Campus Network Design

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 51
In Service Software Upgrade Allows
Upgrade Without Taking Switch Down
 In redundant topology
standard maintenance
Si Si Si Si
practice is to shut down
devices during upgrade
and let the network
converge
 IOS Modularity and ISSU Si Si Si Si

provide the ability to patch


or upgrade software in
place without having to Si Si
shut down
 Offers significant uptime Scheduled ISSU—All Paths
improvements Maintenance— and Switches Active
Half Capacity During Upgrade

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 52
System Image Software Upgrade
Downtime Varies By Chassis
 Catalyst 6500 software
upgrade with enhanced
Fast Software Upgrade
enabled line cards to load Si Si Si Si

software with 40+ seconds


of seconds of downtime
 Catalyst 4500 with In-
Service Software Upgrade
Si Si Si Si
will load software with 10
to 30 ms of downtime
 Nexus 7000 with In- Si Si

Service Software Upgrade


will load software with no Scheduled ISSU—All Paths
downtime Maintenance— and Switches Active
Half Capacity During Upgrade

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 53
Multichassis EtherChannel Automates Half
Capacity in ~200 msec
Cisco ISSU Phase 1 (In-Service-Software-Upgrade) & VSS
12.2(33)SXI
• Greatly reduces SW upgrade maintenance window (however still requires linecard reload)
• VSS & ISSU will provide ~200msec upgrade window (assuming dual-homed access devices)
• Very suitable for voice deployments, where acceptable loss of service is close to 300 msec

VSS with ISSU 5 min

Switch 1 Switch 2 3 min


VSS, no ISSU
1-2 min
1 min
Si Si

Failover
Time

1 sec

VSS, with ISSU


Network Impact - 200- 200-300 msec

300 ms 100ms
12.2(33)SXH1 12.2(33)SXI
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 54
Catalyst 6500 Support Subsystem
Patching
 A patch is a single fix that can affect one or
multiple subsystems.
 The patches will include the delivery of
security advisories reported by the Product
Security Incident Reporting Team (PSIRT) Routing IPFS TCP UDP

 Most patches changing modular processes
will not require a supervisor failover or
system restart CDP EEM INETD IOS-
BASE

 A patch only affects the components High Availability Infrastructure
required to fix a particular software issue, Network Optimized Microkernel
and therefore the code certification time is
significantly reduced Catalyst 6500 Hardware Data Plane
 Routing protocol patches will require NSF
mechanism to be used during module
reload to prevent neighbors from removing
the routes learned through the patched
switch

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 55
Cisco IOS Software Modularity Benefits
Minimize Unplanned Downtime

X
Software FIB

Graceful restart
Routing IPFS

TCP UDP
Restart!

Graceful restart

CDP EEM INETD IOS-


BASE

High Availability Infrastructure
Routing updates Network Optimized Microkernel Routing updates
Routing
Update Hardware Data Plane

If an error occurs in a modular process…


 HA infrastructure determines best recovery action
Restart process
Switchover to standby
Hardware FIB  Process restarts with no impact to the data plane
Utilizes Cisco Nonstop Forwarding (NSF) where appropriate
State Checkpointing allows quick process recovery
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 56
High Availability Campus Design
Agenda

 Network Level Resiliency


High Availability Design Principles
Redundancy in the Distribution Block
Campus Routing Best Practices

 System Level Resiliency


Integrated Hardware and Software Resiliency
NSF/SSO
ISSU & IOS Modularity

System Management Resiliency


GOLD & EEM

 Hardening the Campus Network Design

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 57
GOLD and EEM Offer Proactive Fault
Detection and Reaction
 Challenge: In today’s highly available networks improved physical
redundancy is not enough, intelligent system failure detection and
recovery are key

Memory Corruption Enhanced System


Stability
Software Inconsistency Detect
System Faults and
Isolate Enhanced Network
Si
Link Faults Stability

Enhanced Object Tracking


Generic Online Diagnostics (EOT), Embedded Event
(GOLD) Provides Proactive, Manager (EEM), and Smart
Scheduled and Manual Call Home (SCH) Provide
System Diagnostics Intelligent Response to
System Events
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 58
Generic Online Diagnostics Verifies Operation
of Data Plane and Control Plane

 GOLD: Common Framework to check


the health of hardware components and
verify proper operation of the system Forwarding
data plane and control plane at run-time Engine

and boot-time Line


Card
 Diagnostic packet switching tests verify
that the system is operating correctly: Fabric
Forwarding
Is the supervisor control plane and forwarding Engine CPU
plane functioning properly?
Is the standby supervisor ready to take over? Active Supervisor

Are linecards forwarding packets properly?


Standby Supervisor
Are all ports working?
Is the backplane connection working?

 Other types of diagnostics tests


Line
including memory and error correlation Card
tests are also available

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 59
Multiple Tests Are Provided for Different
Scenarios
Boot-Up Diagnostics Run During System Bootup, Line
Card OIR or Supervisor Switchover
Switch(config)#diagnostic bootup level complete Makes Sure Faulty Hardware Is
Taken out of Service
Runtime Diagnostics
Health-Monitoring
Switch(config)#diagnostic monitor module 5 test 2
Non-Disruptive Tests Run
Switch(config)#diagnostic monitor interval module 5 test 2 in the Background
00:00:15 Serves as HA Trigger

On-Demand
Switch#diagnostic start module 4 test 8 All Diagnostics Tests Can Be Run
Module 4: Running test(s) 8 may disrupt normal
system operation on Demand, for Troubleshooting
Do you want to continue? [no]: y Purposes. It Can Also Be Used As
Switch#diagnostic stop module 4 A Pre-deployment Tool
Scheduled
Switch(config)#diagnostic schedule module 4 Schedule Diagnostics Tests, for
test 1 port 3 on Jan 3 2005 23:32
Switch(config)#diagnostic schedule module 4 Verification and Troubleshooting
test 2 daily 14:45 Purposes

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 60
Embedded Event Manager
EEM Application Example
Upon Matching the Provided SYSLOG Message ‘LINK-3-
UPDOWN’, the Switch Performs the Following Actions:
 Display error statistics for the link that has gone down
 Start a Time Domain Reflectometry (TDR) test
 Start a GOLD Loopback test
 Send the results using a provided template Interface Error Counters
to a user-configurable address
TDR Test
Cable
P P
O O
Interface Down R R
T T
Fault
EEM
Si Loopback Test
GOLD

Send Results in Email Alert

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 61
Embedded Event Manager
Cisco Beyond: Product Extension Community

 On Cisco.com
 /go/ciscobeyond
 Open source scripts,
share, upload,
download, learn by
example
 Categories include:
Ntwk mgmt, routing,
QoS, High availability,
User interface, etc.
 Scripts can be
programmed in
IOS CLI or TCL

http://cisco.com/go/ciscobeyond
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 62
High Availability Campus Design
Agenda

 Network Level Resiliency


High Availability Design Principles
Redundancy in the Distribution Block
Campus Routing Best Practices

 System Level Resiliency


Integrated Hardware and Software Resiliency
NSF/SSO
ISSU & IOS Modularity

System Management Resiliency


GOLD & EEM

 Hardening the Campus Network Design

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 63
Impact of a Network Attack
Direct and Collateral Damage

System Si

Under Si
Attack Si Infected
Source

Core
Si

Distribution Routers
Overloaded
Access Network Links
 High CPU
End Systems Overloaded
 Instability
Overloaded
 High packet loss  Loss of mgmt
 Mission critical
 High CPU
applications
 Applications
impacted
impacted

Availability of Networking Resources Impacted by


Malicious Activity
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 64
Mitigating Network Attacks
Preventing and Limiting the Pain

System Si

Under
Attack Si Si Infected
Source

Core
Si

Prevent the Attack


Distribution  NAC and IBNS
 ACLs and NBAR
Access
Protect the End Systems
Protect the Links Protect the Switches
 End point Protection
 QoS  Rate Limiters
 Scavenger Class  CoPP

Allow the network to protect itself

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 65
Use a Multilayer Approach to Hardening
The Edge

IP Source Guard Switch


Email 00:0e:00:aa:aa:aa

Dynamic ARP
Inspection
acts like
Server
a hub X DHCP Server
00:0e:00:bb:bb:bb
00:0e:00:aa:aa:cc
00:0e:00:bb:bb:dd
etc
132,000
DHCP “Your Email
Bogus MACs
Snooping password Is
‘joecisco’ !”
Port Security
“Use this IP
Address !”

Man in the Middle

 Port security prevents CAM attacks and DHCP Starvation attacks


 DHCP Snooping prevents Rogue DHCP Server attacks
 Dynamic ARP Inspection prevents current ARP attacks
 IP Source Guard prevents IP/MAC Spoofing

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 66
Use QoS Scavenger Class to Protect the
Network
 Identify bad traffic and mark as scavenger class
 Scavenger class is an Internet-2 Draft Specification  CS1/CoS1

Access Distribution Core


Voice
Voice

Data Data

Scavenger Scavenger

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 67
Identify Potential Worm Traffic and Drop
It During Abnormal Conditions
 All end systems generate traffic spikes
 Sustained traffic loads beyond ‘normal’ from each source device are considered
suspect and marked as scavenger
 First order anomaly detection—no direct action taken
Scavenger Bandwidth
 During ‘normal’ traffic conditions
Network Entry Points network is operating within designed
capacity
 During ‘abnormal’ worm traffic
conditions traffic marked as Scavenger
is aggressively dropped—second
order detection
 Priority queuing ensuring low latency
and minimum jitter for VoIP
 Stations not generating abnormal
traffic volumes continue to receive
good network service
Aggregation Points

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 68
The Switches Also Need to Be Protected
 System CPU still has to be able to process
certain traffic MGMT
Routing IP
SNMP, ICMP
Updates Options
BPDUs, CDP, EIGRP, OSPF Telnet

Telnet, SSH, SNMP


ARP, ICMP, IGMP
 Throttling on CPU-bound traffic helps Software
protect the critical traffic Protection
IOS Based SW Rate Limiters
Multiple CPU queues on 4500 & 3750
Hardware
Hardware Rate Limiters on 6500 & Nexus 7000 Protection
Hardware Control Plane Policing (CoPP)

Traffic to
the CPU

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 69
Summary

 Optimize your network using a topology that give


you the most options for network level redundancy
 Take a close look at Multi-Chassis Etherchannel
solutions to really consolidate your network
redundancy options into one
 Look at device level recovery options for speeding
convergence in redundant topologies and
absolutely leverage it at the access layer
 Don’t forget to protect your network from malicious
activity

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 71
Q&A

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 72
Please Visit the Cisco Booth in the
World of Solutions
See the technology in action
 Network Infrastructure and Systems
NS1 – Cisco Catalyst Series: Optimize and
Virtualize
NS2 – Cisco Catalyst Series: Fueling
Collaboration
NS3 – Cisco ISR: Application Integration at
Branch
NS4 – Enhance Collaboration with Cisco
WebEx Node
NS5 – Optimize the WAN with Cisco ASR
1000 Series
NS6 – Pedal Power for the Cisco Catalyst
4500

BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 73
Recommended Reading
 Continue your Cisco Live learning
experience with further reading from
Cisco Press
 Check the Recommended Reading
flyer for suggested books
End-to-End QoS Network Design: Quality
of Service in LANs, WANs,
and VPNs
ISBN: 1-58705-176-1
Building Resilient IP Networks
ISBN: 1-58705-215-6
Top-Down Network Design, Second Ed.
ISBN: 1-58705-152-4

Available Onsite at the Cisco Company Store


BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 74
Complete Your Online
Session Evaluation
 Give us your feedback and you
could win fabulous prizes.
Winners announced daily.
 Receive 20 Passport points for
each session evaluation you
complete.
 Complete your session evaluation
online now (open a browser
through our wireless network to
access our portal) or visit one of
the Internet stations throughout
the Convention Center.
Don’t forget to activate your
Cisco Live Virtual account for access to
all session material, communities, and
on-demand and live activities throughout
the year. Activate your account at the
Cisco booth in the World of Solutions or visit
www.ciscolive.com.
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 75
BRKRST-3032_c2 © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 77

Vous aimerez peut-être aussi