Vous êtes sur la page 1sur 89

Chapter 4

The Network Layer


-Finding the best path to a destination
The network layer Thinking of travel plan and driving

application
transport
network
data link
forwarding: move physical
network network

packets from routers network


data link
data link
physical
data link
physical
physical network network
input to appropriate data link
physical
data link
physical

router output (within network network


data link data link
one router) physical
network
data link
physical

physical
application
transport
routing: determine network
data link network
network
data link
network physical data link
physical
route taken by data link
physical
physical

packets from source


to destination
Interplay between routing and forwarding

routing algorithm

local forwarding table Every router has own forwarding table


header value output link
0100 3
0101 2
0111 2
1001 1

value in arriving
packets header
0111 1

3 2
Network Service Models

Examples
Guaranteed delivery
Guaranteed delivery with bounded delay
In-order packet delivery
Guaranteed minimal bandwidth
Guaranteed maximum jitter
Security services
And more.
Virtual-circuit and Datagram networks

Virtual-circuit networks
Analogous to road number

Datagram networks
Analogous to today transport system
Find a path to a destination with road number
VC implementation

A VC consists of:
1. Path from source to destination
2. VC numbers, one number for each link along
path
3. Entries in forwarding tables in routers along path
Packet belonging to VC carries VC number
(rather than dest address)
VC number can be changed on each link.
New VC number comes from forwarding table
Forwarding table VC number
B
A R1 22 R2 32
12
1 3
2

interface
Forwarding table in R1: number

Incoming interface Incoming VC # Outgoing interface Outgoing VC #

1 12 3 22
2 63 1 18
3 7 2 17
1 97 3 87

Routers maintain connection state information!


Virtual circuits
Call setup, teardown for each call before data can flow
Each packet carries VC identifier (not destination host
address)
Every router on source-dest path maintains state for
each passing connection
Link, router resources (bandwidth, buffers) may be
allocated to VC (dedicated resources = predictable
service)
Virtual circuits: signaling protocols

Used to setup, maintain teardown VC


Used in ATM, frame-relay, X.25
Not used in todays Internet

application
transport 5. Data flow begins 6. Receive data application
transport
network 4. Call connected 3. Accept call
network
data link 1. Initiate call 2. incoming call
data link
physical
physical
Datagram networks
No call setup at network layer
Routers: no state about end-to-end connections
No network-level concept of connection
Packets forwarded using destination host address
Packets between same source-dest pair may take different
paths

application
application
transport
transport
network
network
data link 1. Send data 2. Receive data
data link
physical
physical
Address

Think of address in binary format


00
11

00
0
01 1

10 Dest. Intf.
01 2 00 0
11 01 2
11 1
Address: more example
001

000 111 000


011
001
0 010
1
011
100
101 2
Dest. Intf. 101
000-011 0
101 2 110
111 1
111
Forwarding table 4 billion
possible entries

Destination Address Range Link Interface

11001000 00010111 00010000 00000000


through 0
11001000 00010111 00010111 11111111

11001000 00010111 00011000 00000000


through 1
11001000 00010111 00011000 11111111

11001000 00010111 00011001 00000000


through 2
11001000 00010111 00011111 11111111

otherwise 3
Longest prefix matching

Prefix Match Link Interface


11001000 00010111 00010 0
11001000 00010111 00011000 1
11001000 00010111 00011 2
otherwise 3

Examples

DA: 11001000 00010111 00010110 10100001 Which interface?

DA: 11001000 00010111 00011000 10101010 Which interface?


Datagram or VC network: why?
Internet (datagram) ATM (VC)
data exchange among evolved from telephony
computers human conversation:
elastic service, no strict strict timing, reliability
timing req. requirements
smart end systems need for guaranteed
(computers) service
can adapt, perform dumb end systems
control, error recovery telephones
simple inside network,
complexity inside
complexity at edge network
many link types
different characteristics

uniform service difficult


Router Architecture Overview
Two key router functions:
Run routing algorithms/protocol (RIP, OSPF, BGP)
Forwarding datagram from incoming to outgoing
link
Three types of switching fabrics
Switching Via Memory
First generation routers:
traditional computers with switching under direct
control of CPU
packet copied to systems memory

speed limited by memory bandwidth (2 bus


crossings per datagram)

Input Memory Output


Port Port

System Bus
Switching Via a Bus

Datagram from input port memory


to output port memory via a
shared bus
Bus contention: switching speed
limited by bus bandwidth
32 Gbps bus, Cisco 5600:
sufficient speed for access and
enterprise routers
Switching Via An Interconnection Network

Overcome bus bandwidth limitations


Banyan networks, other interconnection nets
initially developed to connect processors in
multiprocessor
Cisco 12000: switches 60 Gbps through the
interconnection network
The Internet Network layer
Host, router network layer functions:

Transport layer: TCP, UDP

Routing protocols 1 IP protocol


path selection addressing conventions
RIP, OSPF, BGP datagram format
Network packet handling conventions
layer 2
forwarding
ICMP protocol
table
error reporting
3
router signaling

Link layer

physical layer
IP datagram format
IP protocol version 32 bits
number total datagram
header length head. type of
length (bytes)
ver length
(bytes) len service
for
type of data fragment
16-bit identifier flgs
offset
fragmentation/
max number time to upper reassembly
header
remaining hops live layer checksum
(decremented at
32 bit source IP address
each router)
32 bit destination IP address
upper layer protocol
to deliver payload to Options (if any) E.g. timestamp,
record route
how much overhead data taken, specify
with TCP? (variable length, list of routers
typically a TCP to visit.
20 bytes of TCP
or UDP segment)
20 bytes of IP

= 40 bytes + app
layer overhead
IP datagram fragmentation

Ethernet allows 1,500 bytes


Some wide-area links allows 576 bytes
One IP packet can be fragmented into
several packets
IP Fragmentation & Reassembly
Network links have MTU
(max.transfer size) - largest
possible link-level frame.
different link types,
fragmentation:
in: one large datagram
different MTUs out: 3 smaller datagrams
Large IP datagram divided
(fragmented) within net
one datagram becomes
reassembly
several datagrams
reassembled only at
final destination
IP header bits used to

identify, order related


fragments
IP Fragmentation and Reassembly
Example length ID fragflag offset
4000 byte =4000 =7 =0 =0
datagram
One large datagram becomes
(data+header)
several smaller datagrams
4000-20 = 3980

MTU = 1500 bytes length ID fragflag offset


=1500 =7 =1 =0
1480 bytes in
data field length ID fragflag offset
=1500 =7 =1 =185
offset =
1480/8 length ID fragflag offset
=1040 =7 =0 =370
IPv4 Addressing

Please refer to Internet Addressing


presentation
IP addresses: how to get one?

Q: How does a host get IP address?

hard-coded by system admin in a file


Windows: control-panel->network-

>configuration->tcp/ip->properties
UNIX: /etc/rc.config

DHCP: Dynamic Host Configuration Protocol:


dynamically get address from as server
plug-and-play
DHCP: Dynamic Host Configuration Protocol
Goal: allow host to dynamically obtain its IP address
from network server when it joins network
Can renew its lease on address in use
Allows reuse of addresses (only hold address while
connected an on)
Support for mobile users who want to join network (more
shortly)
DHCP overview:
host broadcasts DHCP discover msg

DHCP server responds with DHCP offer msg

host requests IP address: DHCP request msg

DHCP server sends address: DHCP ack msg


DHCP client-server scenario

A 223.1.1.1 DHCP 223.1.2.1


server
223.1.1.2
223.1.1.4 223.1.2.9
B
223.1.2.2 arriving DHCP
223.1.1.3 223.1.3.27 E client needs
address in this
223.1.3.1 223.1.3.2
network
DHCP client-server scenario
DHCP server: 223.1.2.5 arriving
DHCP discover
client
src : 0.0.0.0, 68
dest.: 255.255.255.255,67
yiaddr: 0.0.0.0
transaction ID: 654

DHCP offer
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
transaction ID: 654
Lifetime: 3600 secs
DHCP request
src: 0.0.0.0, 68
dest:: 255.255.255.255, 67
yiaddrr: 223.1.2.4
transaction ID: 655
time Lifetime: 3600 secs

DHCP ACK
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
transaction ID: 655
Lifetime: 3600 secs
NAT: Network Address Translation

rest of local network


Internet (e.g., home network)
10.0.0/24 10.0.0.1

10.0.0.4
10.0.0.2
138.76.29.7

10.0.0.3

All datagrams leaving local Datagrams with source or


network have same single source destination in this network
NAT IP address: 138.76.29.7, have 10.0.0/24 address for
different source port numbers source, destination (as usual)
NAT: Network Address Translation

Motivation: local network uses just one IP address as


far as outside world is concerned:
range of addresses not needed from ISP: just one
IP address for all devices
can change addresses of devices in local network
without notifying outside world
can change ISP without changing addresses of

devices in local network


devices inside local net not explicitly addressable,
visible by outside world (a security plus).
NAT: Network Address Translation

NAT translation table


2: NAT router 1: host 10.0.0.1
WAN side addr LAN side addr
changes datagram sends datagram to
138.76.29.7, 5001 10.0.0.1, 3345 128.119.40.186, 80
source addr from

10.0.0.1, 3345 to
138.76.29.7, 5001, S: 10.0.0.1, 3345
updates table D: 128.119.40.186, 80
10.0.0.1
1
S: 138.76.29.7, 5001
2 D: 128.119.40.186, 80 10.0.0.4
10.0.0.2
138.76.29.7 S: 128.119.40.186, 80
D: 10.0.0.1, 3345
4
S: 128.119.40.186, 80
D: 138.76.29.7, 5001 3 10.0.0.3
4: NAT router
3: Reply arrives changes datagram
dest. address: dest addr from
138.76.29.7, 5001 138.76.29.7, 5001 to 10.0.0.1, 3345
NAT: Network Address Translation

16-bit port-number field:


60,000 simultaneous connections with a single
LAN-side address!
NAT is controversial:
routers should only process up to layer 3
violates end-to-end argument
NAT possibility must be taken into account by app
designers, eg, P2P applications
address shortage should instead be solved by
IPv6
NAT traversal problem
client wants to connect to
server with address
10.0.0.1 10.0.0.1
server address 10.0.0.1 local Client

to LAN (client cant use it as
?
destination addr) 10.0.0.4
only one externally visible
NATted address: 138.76.29.7 138.76.29.7 NAT
router
solution 1: statically
configure NAT to forward
incoming connection
requests at given port to
server
e.g., (138.76.29.7, port 2500)
always forwarded to 10.0.0.1
port 25000
NAT traversal problem
solution 2: Universal Plug and
Play (UPnP) Internet Gateway 10.0.0.1
Device (IGD) Protocol. Allows
IGD
NATted host to:
10.0.0.4
learn public IP address
(138.76.29.7) 138.76.29.7 NAT
add/remove port mappings router

(with lease times)

i.e., automate static NAT port


map configuration
NAT traversal problem
solution 3: relaying (used in Skype)
NATed client establishes connection to relay

External client connects to relay

relay bridges packets between to connections

2. connection to
relay initiated 1. connection to
by client relay initiated
10.0.0.1
by NATted host
3. relaying
Client
established
138.76.29.7 NAT
router
ICMP: Internet Control Message Protocol

used by hosts & routers to


Type Code description
communicate network-level 0 0 echo reply (ping)
information 3 0 dest. network unreachable
error reporting: unreachable 3 1 dest host unreachable
host, network, port, protocol 3 2 dest protocol unreachable
3 3 dest port unreachable
echo request/reply (used by
3 6 dest network unknown
ping) 3 7 dest host unknown
network-layer above IP: 4 0 source quench (congestion
ICMP msgs carried in IP
control - not used)
datagrams 8 0 echo request (ping)
9 0 route advertisement
ICMP message: type, code 10 0 router discovery
plus first 8 bytes of IP 11 0 TTL expired
datagram causing error 12 0 bad IP header
Traceroute and ICMP
Source sends series of When ICMP message
UDP segments to dest arrives, source calculates
First has TTL =1 RTT
Second has TTL=2, etc. Traceroute does this 3

Unlikely port number times


When nth datagram arrives Stopping criterion
to nth router: UDP segment eventually

Router discards datagram arrives at destination host


And sends to source an Destination returns ICMP
ICMP message (type 11, host unreachable packet
code 0) (type 3, code 3)
Message includes name of When source gets this
router& IP address ICMP, stops.
IPv6
Initial motivation: 32-bit address space soon
to be completely allocated.
Additional motivation:
header format helps speed processing/forwarding
header changes to facilitate QoS

IPv6 datagram format:


fixed-length 40 byte header

no fragmentation allowed
IPv6 Header (Cont)
Priority: identify priority among datagrams in flow
Flow Label: identify datagrams in same flow.
(concept offlow not well defined).
Next header: identify upper layer protocol for data
Other Changes from IPv4

Checksum: removed entirely to reduce


processing time at each hop
Options: allowed, but outside of header,
indicated by Next Header field
ICMPv6: new version of ICMP
additional message types, e.g. Packet Too Big
multicast group management functions
Transition From IPv4 To IPv6

Not all routers can be upgraded simultaneous


no flag days
How will the network operate with mixed IPv4 and
IPv6 routers?
Dual-stack approach
Some information lose
Tunneling: IPv6 carried as payload in IPv4
datagram among IPv4 routers
Tunneling
A B E F
Logical view: tunnel

IPv6 IPv6 IPv6 IPv6

A B E F
Physical view:
IPv6 IPv6 IPv4 IPv4 IPv6 IPv6
Tunneling
A B E F
Logical view: tunnel

IPv6 IPv6 IPv6 IPv6

A B C D E F
Physical view:
IPv6 IPv6 IPv4 IPv4 IPv6 IPv6

Flow: X Src:B Src:B Flow: X


Src: A Dest: E Dest: E Src: A
Dest: F Dest: F
Flow: X Flow: X
Src: A Src: A
data Dest: F Dest: F data

data data

A-to-B: E-to-F:
B-to-C: B-to-C:
IPv6 IPv6
IPv6 inside IPv6 inside
IPv4 IPv4
Routing algorithms

Using a graph to represent network topology


Finding the best path from a source to a
destination
Graph abstraction
5

v 3 w
2 5
u 2 z
1
3
1
x y 2
Graph: G = (N,E) 1

N = set of routers = { u, v, w, x, y, z }

E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

Remark: Graph abstraction is useful in other network contexts

Example: P2P, where N is set of peers and E is set of TCP connections


Graph abstraction: costs

5 c(x,x) = cost of link (x,x)


v 3 w
5 - e.g., c(w,z) = 5
2
u 2 z
1 cost could always be 1, or
3
1 inversely related to bandwidth,
x y 2
or inversely related to
1
congestion

Cost of path (x1, x2, x3,, xp) = c(x1,x2) + c(x2,x3) + + c(xp-1,xp)

Question: Whats the least-cost path between u and z ?

Routing algorithm: algorithm that finds least-cost path


Routing Algorithm classification
Global or decentralized Static or dynamic?
information? Static:
Global:
routes change slowly over
all routers have complete
time
topology, link cost info
Dynamic:
link state algorithms
routes change more
Decentralized:
router knows physically-
quickly
connected neighbors, link periodic update

costs to neighbors in response to link cost


iterative process of changes
computation, exchange of
info with neighbors
distance vector algorithms
A Link-State Routing Algorithm
Dijkstras algorithm
Notation:
net topology, link costs
known to all nodes c(x,y): link cost from node

accomplished via link


x to y; = if not direct
state broadcast neighbors
all nodes have same info D(v): current value of cost
computes least cost paths of path from source to dest.
from one node (source) to v
all other nodes p(v): predecessor node
gives forwarding table for along path from source to v
that node N': set of nodes whose
iterative: after k iterations, least cost path definitively
know least cost path to k known
dest.s
Dijsktras Algorithm
1 Initialization:
2 N' = {u}
3 for all nodes v
4 if v adjacent to u
5 then D(v) = c(u,v)
6 else D(v) =
7
8 Loop
9 find w not in N' such that D(w) is a minimum
10 add w to N'
11 update D(v) for all v adjacent to w and not in N' :
12 D(v) = min( D(v), D(w) + c(w,v) )
13 /* new cost to v is either old cost to v or known
14 shortest path cost to w plus cost from w to v */
15 until all nodes in N'
Dijkstras algorithm: example

Step N' D(v),p(v) D(w),p(w) D(x),p(x) D(y),p(y) D(z),p(z)


0 u 2,u 5,u 1,u
1 ux 2,u 4,x 2,x
2 uxy 2,u 3,y 4,y
3 uxyv 3,y 4,y
4 uxyvw 4,y
5 uxyvwz

v 3 w
2 5
u 2 z
1
3
1
x y 2
1
Find the shortest paths from A to all
nodes using Dijkstras algorithm
Bellman-Ford Algorithm (1)
Distributed routing algorithm
Distance-vector algorithm
Ask neighbors for cheapest path to destination

Cost (A, Z) = smallest of


Cost from A to B + cost from B to Z
Cost from A to C + cost from C to Z
Cost from A to D + cost from D to Z
Distance Vector Algorithm

Bellman-Ford Equation (dynamic programming)


Define
dx(y) := cost of least-cost path from x to y

Then

dx(y) = minv{c(x,v) + dv(y) }

where min is taken over all neighbors v of x


Bellman-Ford example

5
Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
v 3 w
2 5
u 2 z B-F equation says:
1
3
1 du(z) = min { c(u,v) + dv(z),
x y 2
1 c(u,x) + dx(z),
c(u,w) + dw(z) }
= min {2 + 5,
1 + 3,
5 + 3} = 4
Node that achieves minimum is next
hop in shortest path forwarding table
Distance Vector Algorithm (5)

Iterative, asynchronous: Each node:


each local iteration
caused by:
wait for (change in local link
local link cost change
cost or msg from neighbor)
DV update message from
neighbor
Distributed: recompute estimates
each node notifies
neighbors only when its if DV to any dest has
DV changes
changed, notify neighbors
neighbors then notify their
neighbors if necessary
Step 1 Step 2
cost to cost to cost to
x y z x y z x y z
x 0 2 7 x 0 2 7 x 0 2 3
from

from

from
y y 2 0 1 y 2 0 1
z z 7 1 0 z 3 1 0

cost to
x y z y
2 1
x
x z
y 2 0 1
from

7
z
node z table
cost to
x y z
x
from

y
z 71 0
time
Using DV algorithm to find a routing table
Count to infinity problem Link CE down
1

Count-to-infinity problem 2 A and C update to B

A to E cost 7
A If link C-E down
1
E
C sets it to and
2 1 tells B
2 B unfortunately
misunderstands
B 2 that B can reach E
via A
4 It can be solved by
C using threshold
value, for example, 16

Why does X need to send information


N back to N for route that go through N?
X Using Poisoned reverse, X tells N that
D its cost to D is
Problems with Bellman-Ford algorithm

A D E
2 Poisoned reverse
will work for all
1 2 cases?
2
B C
4

As route to E goes through D first


Bs route to E goes through C first
Ds route to E goes through C first
If threshold value is used to stop the loop, it will stop.
However, it will take some time.
Comparison of LS and DV algorithms
Message complexity Robustness: what
LS: with n nodes, E links, happens if router
O(nE) msgs sent malfunctions?
DV: exchange between
neighbors only
LS:
convergence time varies
node can advertise
incorrect link cost
Speed of Convergence each node computes
LS: O(n2) algorithm requires only its own table
O(nE) msgs DV:
may have oscillations DV node can advertise
DV: convergence time incorrect path cost
varies each nodes table used
may be routing loops by others
count-to-infinity problem error propagate thru
network
Hierarchical Routing
Our routing study thus far - idealization
all routers identical

network flat

not true in practice

Scale: with 200 million Administrative autonomy


destinations: Internet = network of
Cant store all dests in networks
routing tables! Each network admin may
Routing table exchange want to control routing in its
would swamp links! own network
Hiding aspects of its
network
Hierarchical Routing

Problems solved by Gateway router


Aggregate routers into Direct link to router in
regions, autonomous
systems (AS)
another AS
Routers in same AS
run same routing
protocol
intra-AS routing
protocol
Routers in different AS
can run different intra-
AS routing protocol
Interconnected ASes

3c
3a 2c
3b 2a
AS3 2b
1c AS2
1a 1b
1d AS1
Forwarding table
configured by both
Intra-AS Inter-AS
intra- and inter-AS
Routing
algorithm
Routing
algorithm
routing algorithm
intra-AS sets entries for
Forwarding
table internal dests
inter-AS & intra-AS sets
entries for external
dests
Inter-AS tasks AS1 must:
Suppose router in 1. Learn which dests
AS1 receives are reachable
datagram destined through AS2, which
outside of AS1: through AS3
Router should 2. Propagate this
forward packet to reachability info to all
gateway router, but routers in AS1
which one? Job of inter-AS routing!

3c
3a 2c
3b 2a
AS3 2b
1c AS2
1a 1b
1d AS1
Example: Setting forwarding table in router 1d
Suppose AS1 learns (via inter-AS protocol) that
subnet x reachable via AS3 (gateway 1c) but not via
AS2.
Inter-AS protocol propagates reachability info to all
internal routers.
Router 1d determines from intra-AS routing info that
its interface I is on the least cost path to 1c.
Installs forwarding table entry (x,I)

x
3c
3a 2c
3b 2a
AS3 2b
1c AS2
1a 1b AS1
1d
Example: Choosing among multiple ASes
Now suppose AS1 learns from inter-AS protocol
that subnet x is reachable from AS3 and from AS2.
To configure forwarding table, router 1d must
determine towards which gateway it should
forward packets for dest x.
This is also job of inter-AS routing protocol!

x
3c
3a 2c
3b 2a
AS3 2b
1c AS2
1a 1b
1d AS1
Example: Choosing among multiple ASes
Now suppose AS1 learns from inter-AS protocol
that subnet x is reachable from AS3 and from AS2.
To configure forwarding table, router 1d must
determine towards which gateway it should
forward packets for dest x.
This is also job of inter-AS routing protocol!
Hot potato routing: send packet towards closest of
two routers.

Use routing info Determine from


Learn from inter-AS Hot potato routing: forwarding table the
from intra-AS
protocol that subnet Choose the gateway interface I that leads
protocol to determine
x is reachable via that has the to least-cost gateway.
costs of least-cost
multiple gateways smallest least cost Enter (x,I) in
paths to each
of the gateways forwarding table
Intra-AS Routing

Also known as Interior Gateway Protocols (IGP)


Most common Intra-AS routing protocols:

RIP: Routing Information Protocol


OSPF: Open Shortest Path First

IGRP: Interior Gateway Routing Protocol (Cisco


proprietary)
RIP ( Routing Information Protocol)

Distance vector algorithm


Included in BSD-UNIX Distribution in 1982
Distance metric: # of hops (max = 15 hops)

From router A to subnets:

u destination hops
v
u 1
A B w v 2
w 2
x 3
x y 3
z C D z 2
y
RIP advertisements

Distance vectors: exchanged among


neighbors every 30 sec via Response
Message (also called advertisement)
Each advertisement: list of up to 25
destination subnets within AS
RIP: Example

z
w x y
A D B

C
Destination Network Next Router Num. of hops to dest.
w A 2
y B 2
z B 7
x -- 1
. . ....
Routing/Forwarding table in D
RIP: Example
Dest Next hops
w - 1 Advertisement
x - 1 from A to D
z C 4
. ...
z
w x y
A D B

C
Destination Network Next Router Num. of hops to dest.
w A 2
y B 2
z B A 7 5
x -- 1
. . ....
Routing/Forwarding table in D
RIP: Link Failure and Recovery
If no advertisement heard after 180 sec -->
neighbor/link declared dead
Routes via neighbor invalidated

New advertisements sent to neighbors

Neighbors in turn send out new advertisements (if


tables changed)
Link failure info quickly (?) propagates to entire

net
Poison reverse used to prevent ping-pong loops
(infinite distance = 16 hops)
RIP Table processing

RIP routing tables managed by application-level


process called route-d (daemon)
Advertisements sent in UDP packets, periodically
repeated
routed routed

Transprt Transprt
(UDP) (UDP)
network forwarding forwarding network
(IP) table table (IP)
link link
physical physical
OSPF (Open Shortest Path First)
open: publicly available
Uses Link State algorithm
LS packet dissemination
Topology map at each node
Route computation using Dijkstras algorithm

OSPF advertisement carries one entry per neighbor


router
Advertisements disseminated to entire AS (via
flooding)
Carried in OSPF messages directly over IP (rather than
TCP or UDP
OSPF advanced features (not in RIP)

Security: all OSPF messages authenticated (to


prevent malicious intrusion)
Multiple same-cost paths allowed (only one path in
RIP)
For each link, multiple cost metrics for different TOS
(e.g., satellite link cost set low for best effort; high
for real time)
Integrated uni- and multicast support:
Multicast OSPF (MOSPF) uses same topology

data base as OSPF


Hierarchical OSPF in large domains.
Hierarchical OSPF
Hierarchical OSPF

Two-level hierarchy: local area, backbone.


Link-state advertisements only in area

each nodes has detailed area topology; only know


direction (shortest path) to nets in other areas.
Area border routers: summarize distances to nets
in own area, advertise to other Area Border routers.
Backbone routers: run OSPF routing limited to
backbone.
Boundary routers: connect to other ASs.
BGP basics
Pairs of routers (BGP peers) exchange routing
info over semi-permanent TCP connections: BGP
sessions
BGP sessions need not correspond to physical
links.
When AS2 advertises a prefix to AS1:
AS2 promises it will forward datagrams towards
that prefix.
AS2 can aggregate prefixes in its advertisement

eBGP session
3c iBGP session
3a 2c
3b 2a
AS3 2b
1c AS2
1a 1b
AS1 1d
Distributing reachability info
Using eBGP session between 3a and 1c, AS3
sends prefix reachability info to AS1.
1c can then use iBGP do distribute new prefix
info to all routers in AS1
1b can then re-advertise new reachability info
to AS2 over 1b-to-2a eBGP session
When router learns of new prefix, it creates entry
for prefix in its forwarding table.
eBGP session
3c iBGP session
3a 2c
3b 2a
AS3 2b
1c AS2
1a 1b
AS1 1d
Path attributes & BGP routes

Advertised prefix includes BGP attributes.


prefix + attributes = route
Two important attributes:
AS-PATH: contains ASs through which prefix
advertisement has passed: e.g, AS 67, AS 17
NEXT-HOP: indicates specific internal-AS router
to next-hop AS. (may be multiple links from
current AS to next-hop-AS)
When gateway router receives route
advertisement, uses import policy to
accept/decline.
BGP route selection

Router may learn about more than 1 route to


some prefix. Router must select route.
Elimination rules:
1. local preference value attribute: policy decision
2. shortest AS-PATH
3. closest NEXT-HOP router: hot potato routing
4. additional criteria
BGP messages

BGP messages exchanged using TCP.


BGP messages:
OPEN: opens TCP connection to peer and

authenticates sender
UPDATE: advertises new path (or withdraws old)

KEEPALIVE keeps connection alive in absence of


UPDATES; also ACKs OPEN request
NOTIFICATION: reports errors in previous msg;
also used to close connection
BGP routing policy

legend: provider
B network
X
W A
customer
C network:

A,B,C are provider networks


X,W,Y are customer (of provider networks)
X is dual-homed: attached to two networks
X does not want to route from B via X to C

.. so X will not advertise to B a route to C


BGP routing policy (2)

legend: provider
B network
X
W A
customer
C network:

A advertises path AW to B
B advertises path BAW to X
Should B advertise path BAW to C?
No way! B gets no revenue for routing CBAW
since neither W nor C are Bs customers
B wants to force C to route to w via A

B wants to route only to/from its customers!


Why different Intra- and Inter-AS routing ?

Policy:
Inter-AS: admin wants control over how its traffic
routed, who routes through its net.
Intra-AS: single admin, so no policy decisions
needed
Scale:
Hierarchical routing saves table size, reduced
update traffic
Performance:
Intra-AS: can focus on performance

Inter-AS: policy may dominate over performance