Vous êtes sur la page 1sur 264

CS716 Advanced Computer Networks

By Dr. Amir Qayyum


1

Lecture No. 25

Review Lecture

Switched Networks
A network can be defined recursively as...
Two or more nodes connected by a link
Circular nodes (switches) implement the network Squared nodes (hosts) use the network

Switched Networks
A network can be defined recursively as...
Two or more networks connected by one or more nodes: internetworks
Circular nodes (router or gateway) interconnects the networks A cloud denotes any type of independent network

Switching Strategies
Circuit switching: Carry bit streams
a. establishes a dedicated circuit b. links reserved for use by communication channel c. send/receive bit stream at constant rate d. example: original telephone network

Packet switching: Store-and-forward messages


a. operates on discrete blocks of data b. utilizes resources dynamically according to traffic demand c. send/receive messages at variable rate d. example: Internet
6

Multiplexing
Physical links/switches must be shared among users
(Synchronous) Time-Division Multiplexing (TDM) Frequency-Division Multiplexing (FDM)
L1 L2
Multiple flows on a single link

R1 R2

L3

Switch 1

Switch 2

R3

Do you see any problem with TDM / FDM ?


7

Statistical Multiplexing
On-demand time-division, possibly synchronous (ATM) Schedule link on a per-packet basis Buffer packets in switches that are contending for the link Packets from different sources interleaved on link

Do you see any problem ?


8

Inter-Process Communication
Turn host-to-host connectivity into process-to-process communication, making the communication meaningful. Fill gap between what applications expect and what the underlying technology provides.
Host Host Application

Host Application

Channel

Abstraction for application-level communication


Host

Host

Abstract Channel Functionality


What functionality does a channel provide ?
Smallest set of abstract channel types adequate for largest number of applications

Where the functionality is implemented ?


Network as a simple bit-pipe with all high-level communication semantics at the hosts More intelligent switches allowing hosts to be dumb devices (telephone network)

10

Performance Metrics
and to do so while delivering good performance Bandwidth (throughput)
Data transmitted per unit time, e.g. 10 Mbps Link bandwidth versus end-to-end bandwidth Notation KB = 210 bytes Kbps = 103 bits per second
11

Performance Metrics
Latency / Delay
Time to send message from point A to point B One-way versus Round-Trip Time (RTT) Components Latency = Propagation + Transmit + Queue Propagation = Distance / c Transmit = Size / Bandwidth

Note:
No queuing delay in direct (point-to-point) link Bandwidth irrelevant if size = 1 bit Process-to-process latency includes software processing overhead (dominates over shorter distances)
12

Delay x Bandwidth Product


Amount of data in flight or in the pipe Example: 100ms RTT x 45Mbps BW = 560KB This much data must be buffered before the sender responds to slowdown the request
Delay Bandwidth

13

Network Architecture
The challenge is to fill the gap between hardware capabilities and application expectations, and to do so while delivering good performance
Designers cope with this complex task by developing a network architecture as a guideline
Layering, protocols, standards

14

Layering
Alternative abstractions at each layer Manageable network components Modify layers independently
Application programs Request/reply channel Message stream channel

Host-to-host connectivity Hardware


15

Protocols
Building blocks of a network architecture Each protocol object has two different interfaces
service interface: operations on this protocol peer-to-peer interface: messages exchanged with peer

Term protocol is overloaded


Specification of peer-to-peer interface Module that implements this interface Peer modules are interoperable if both accurately follow the specifications
16

Protocol Interfaces
Host 1 Host 2

High-level object

Service interface

High-level object

Protocol

Peer-to-peer

Protocol

interface

17

Protocol Graph Network Architecture


Collection of protocols and their dependencies
Most peer-to-peer communication is indirect Peer-to-Peer is direct only at hardware level
Host 1 Digital Video File library application application application Host 2 Digital Video File library application application application

RRP: Request Reply Protocol MSP: Message Stream Protocol HHP: Host-toHost Protocol

RRP

MSP

RRP

MSP

HHP

HHP

18

Protocol Machinery
Multiplexing and Demultiplexing (demux key) Encapsulation (header/body) in peer-to-peer interfaces
Indirect communication (except at hardware level) Each protocol adds a header Part of header includes demultiplexing field (e.g., pass up to request/reply or to message stream?)

19

Encapsulation
Host 1 Application program Application program Data RRP RRP Data HHP HHP RRP RRP Data Host 2

Data

HHP

RRP Data

20

Standard Architectures
Open System Interconnect (OSI) Architecture
International Standards Organization (ISO) International Telecommunications Union (ITU), formerly CCITT X dot series: X.25, X.400, X.500 Primarily a reference model

21

OSI Architecture
End host Application End host Application

Application Data formatting Connection management Process-to-process communication channel Host-to-host packet delivery Framing of data bits

Presentation

Presentation

User level

Session

Session

OS kernel

Transport

Transport

Network

Network

Network

Network

Data link

Data link

Data link

Data link

Physical

Physical

Physical

Physical

Transmission of raw bits

One or more nodes within the network

22

Internet Architecture
TCP/IP Architecture
Developed with ARPANET and NSFNET Internet Engineering Task Force (IETF) Culture: implement, then standardize OSI culture: standardize, then implement Became popular with release of Berkeley Software Distribution (BSD) Unix; i.e. free software Standard suggestions traditionally debated publically through Request For Comments (RFCs)
23

Internet Architecture
Implementation and design done together Hourglass Design (bottleneck is IP) Application vs Application Protocol (FTP, HTTP)
FTP HTTP NV
TFTP

TCP

UDP

IP

NET1

NET2

NETn
24

Internet Architecture
Layering is not very strict
Application TCP

UDP
IP Network
25

Networking in the Internet Age

26

Network Application Programming Interface (API)


Interface that the OS provides to its networking subsystem
Most network protocols are implemented in software All systems implement network protocols as part of the OS Each OS is free to define its own network API Applications can be ported from one OS to another if APIs are similar *IF* application program does not interact with other parts of the OS other than the network (file system, fork processes, display )
27

Protocols and API


Protocols provide a certain set of services API provides a syntax by which those services can be invoked Implementation is responsible for mapping API syntax onto protocol services

28

Socket API
Use sockets as abstract endpoints of communication Issues
Creating & identifying sockets Sending & receiving data

Mechanisms
UNIX system calls and library routines socket process
29

Protocol-to-Protocol Interface
A protocol interacts with a lower level protocol like an application interacts with underlying network Why not using available network APIs for PPI ?
Inefficiencies built into the socket interface Application programmer tolerate them to simplify their task
inefficiency at one level

Protocol implementers do not tolerate them


inefficiencies at several layers of protocols
30

Protocol-to-Protocol Interface Issues


Configure Multiple Layers
Static vs Extensible

Process Model
Avoid context switches

Buffer Model
Avoid data copies
31

Process Model

inter-process communication

procedure call

Process-per-Protocol

(a)

Process-per-Message
32

(b)

Buffer Model
Application Process

Buffer Copy
send() deliver()

Buffer Copy

Topmost Protocol

33

Network Programming
Things to Learn
Internet protocols (IP, TCP, UDP, ) Sockets API (Application Programming Interface)

Why IP and Sockets


Allows a common name space across most of Internet IP (Internet Protocol) is standard Reduces number of translations, which incur overhead Sockets: reasonably simple and elegant Unix interface (most servers run Unix)

34

Socket Programming
Reading: Stevens 2nd edition, Chapter 1-6 Sockets API: A transport layer service interface
Introduced in 1981 by BSD 4.1 Implemented as library and/or system calls Similar interfaces to TCP and UDP Can also serve as interface to IP (for super-user) known as raw sockets Linux also provides interface to MAC layer (for superuser) known as data-link sockets
35

Client-Server Model
Asymmetric relationship Server/Daemon
Well-known name Waits for contact Process requests, sends replies

Server Client Client

Client
Initiates contact Waits for response

Client

36

Client-Server Model
Bidirectional communication channel Service models
Sequential: server processes only one clients requests at a time Concurrent: server processes multiple clients requests simultaneously Hybrid: server maintains multiple connections, but processes requests sequentially

Server and client categories not disjoint


Server can be client of another server Server as client of its own client (peer-to-peer architecture)

37

TCP Connections
TCP connection setup via 3-way handshake
J and K are sequence numbers for messages

SYN J
SYN K ACK J+1 ACK K+1

Client

Server
Hmmm RTT is important!

38

TCP Connections
TCP connection teardown (4 steps) (either client or server can initiate connection teardown)
active close FIN J ACK J+1 passive close closes connection

Client
ACK K+1

FIN K

Server
Hmmm Latency matters!

39

UDP - Aspects of Services


Unit of transfer is a datagram (variable length packet) Unreliable, drops packets silently No ordering guarantees No flow control 16-bit port space (distinct from TCP ports) allows multiple recipients on a single host
40

Addresses and Data


Internet domain names: human readable
Mnemonic Variable Length e.g. www.case.edu.pk, www.carepvtltd.com (FQDN)

IP addresses: easily handled by routers/computers


Fixed Length Tied (loosely) to geography e.g. 131.126.143.82 or 212.0.0.1
41

Endianness
Machines on Internet have different endianness Little-endian (Intel, DEC): least significant byte of word stored in lowest memory address Big-endian (Sun, SGI, HP): most significant byte...
42

Socket Address Structures


Socket address structures (all fields in network byte order except sin_family) IP address
struct in_addr { in_addr_t s_addr; };
/* 32-bit IP address */

TCP or UDP address


struct sockaddr_in { short sin_family; ushort sin_port; struct in_addr; };
/* e.g., AF_INET */ /* TCP / UDP port */ /* IP address */
43

Address Conversion
All binary values used and returned by these functions are network byte ordered
struct hostent* gethostbyname (const char* hostname);

translates English host name to IP address (uses DNS)


struct hostent* gethostbyaddr (const char* addr, size_t len, int family);

translates IP address to English host name (not secure)


int gethostname (char* name, size_t namelen);

reads hosts name (use with gethostbyname to find local IP)


44

Address Conversion
in_addr_t inet_addr (const char* strptr);

translate dotted-decimal notation to IP address; returns -1 on failure, thus cannot handle broadcast value 255.255.255.255
int inet_aton (const char* strptr, struct in_addr inaddr);

translate dotted-decimal notation to IP address; returns 1 on success, 0 on failure


char* inet_ntoa (struct in_addr inaddr);

translate IP address to ASCII dotted-decimal notation (e.g., 128.32.36.37); not thread-safe


45

Socket API
Creating a socket
int socket(int domain, int type, int protocol) domain (family) = AF_INET, PF_UNIX, AF_OSI type = SOCK_STREAM, SOCK_DGRAM protocol = TCP, UDP, UNSPEC return value is a handle for the newly created socket 46

Sockets (cont)
Passive Open (on server)
int bind(int socket, struct sockaddr *addr, int addr_len) int listen(int socket, int backlog) int accept(int socket, struct sockaddr *addr, int addr_len)

Active Open (on client)


int connect(int socket, struct sockaddr *addr, int addr_len)
47

Sockets (cont)
Sending Messages
int send(int socket, char *msg, int mlen, int flags)

Receiving Messages
int recv(int socket, char *buf, int blen, int flags)

48

Point-to-Point Links
Reading: Peterson and Davie, Ch. 2
Outline
Hardware building blocks Encoding Framing Error Detection Reliable transmission Sliding Window Algorithm

49

Direct Link Issues in the OSI and Hardware/Software Contexts


Application Presentation Session

user-level software

Transport
Network Data Link Physical

reliability

kernel software (device drivers)

framing, error detection, MAC encoding hardware (network adapter)


50

Hardware Building Blocks


Nodes
Hosts: general-purpose computers Switches: typically special-purpose hardware Routers (connecting networks): varies

Links
Copper wire with electronic signaling Glass fiber with optical signaling Wireless with electromagnetic (radio, infrared, microwave) signaling 51

Links
Physical Media
Twisted pair cable Coaxial cable Optical fiber Space

Media is used to propagate signals Signals are electromagnetic waves of certain frequency, traveling at speed of light
52

Signals Over a Link


Signal is modulated for transmission
Varying frequency/amplitude/phase to receive distinguishable signals

Binary data (0s and 1s) is encoded in a signal


Make it understandable by the receiving host
53

Bits Over a Link


Bit streams may be transmitted both ways at a time on a point-to-point link
Full Duplex

Sometimes two nodes must alternate link usage


Half Duplex
54

Encoding
Signals propagate over a physical medium
Modulate electromagnetic waves e.g. vary voltage

Encode binary data onto signals that propagate


Signalling component
Signal Bits

Node

Adaptor

Adaptor

Node

55

Encoding
Digital data (a string of symbols) modulator a string of signals demodulator

Digital data (a string of symbols)

Problems with signal transmission


Attenuation: signal power absorbed by medium Dispersion: a discrete signal spreads in space Noise: random background signals
56

RS-232(-C)
Communication between computer and modem Uses two voltage levels (+15V, -15V), a binary voltage encoding Data rate limited to 19.2 kbps (RS-232-C) raised in later standards
57

Binary Voltage Encoding


NRZ (Non-Return to Zero) NRZI (NRZ Inverted) Manchester (used by IEEE 802.3, 10 Mbps Ethernet) 4B/5B (8B/10B) in Fast Ethernet
58

Non-Return to Zero (NRZ)


Encode binary data onto signals
e.g. 0 as low signal and 1 as high signal Voltage does not return to zero between bits Known as Non-Return to Zero (NRZ)
Bits 0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0

NRZ

59

Problem: Consecutive 1s or 0s
Low signal (0) may be interpreted as no signal High signal (1) leads to baseline wander Unable to recover clock
Senders and receivers clock have to be precisely synchronized Receiver resynchronizes on each signal transition Clock Drift in long periods without transition
Senders clock Receivers clock
60

Alternative Encodings
Non-Return to Zero Inverted (NRZI) Make a transition from current signal (switch voltage level) to encode/transmit a one Stay at current signal (maintain voltage level) to encode/transmit a zero Solves the problem of consecutive ones (shifts to 0s)
61

Alternative Encodings
Manchester (in IEEE 802.3 10 Mbps Ethernet) Split cycle into two parts
Send high--low for 1, low--high for 0 Transmit XOR of NRZ encoded data and the clock

Only 50% efficient (1/2 bit per transition)

62

4B/5B Encoding
Every 4 consecutive bits of data encoded in a 5-bit code (symbol)
4-bit pattern is translated to a 5-bit pattern (not addition)

5-bit codes selected to have no more than one leading 0 and no more than two trailing 0s
00xxx (8 symbols) and xx000 (4 symbols) are illegal 5 free symbols (non-data)

Thus, never gets more than three consecutive 0s Resulting 5-bit codes are transmitted using NRZI Achieves 80% efficiency
63

Binary Voltage Encoding


Problem: wide frequency range required, implying
Significant dispersion Uneven attenuation

Prefer to use narrow frequency band (carrier frequency) Types of modulation


Amplitude Modulation (AM) Frequency Modulation (FM) Phase/Phase Shift Combination of these (e.g. QAM)
64

Phase Modulation Algorithm


Send carrier frequency for one period Perform phase shift Shift value encodes symbol
Value in range [0, 360] Multiple values for multiple symbols Represent as circle
1350 1800 900 450 00 2250 2700 3150

8-symbol example

65

Constellation Pattern for V.32 QAM


For a given symbol:
1. Perform phase shift 2. Change to new amplitude

450 150

Points in constellation diagram


Chosen to maximize error detection Process called trellis coding

66

Bit Rate and Baud Rate


Bit rate is bits per second Baud rate is symbols per second If each symbol contains 4 bits then data rate is 4 times the baud rate

67

What Limits Baud Rate ?


Baud rates are typically limited by electrical signaling properties No matter how small the voltage or how short the wire, changing voltages takes time

Electronics are slow as compared to optics


68

Summary of Encoding
Problems: attenuation, dispersion, noise Digital transmission allows periodic regeneration Variety of binary voltage encodings
High frequency components limit to short range More voltage levels provide higher data rate

Carrier frequency and modulation


Amplitude, frequency, phase, and combination (QAM)

Nyquist (noiseless) and Shannon (noisy) limits on 69 data rates

Framing
Breaks continuous stream/sequence of bits into a frame and demarcates units of transfer Typically implemented by network adaptor
Adaptor fetches/deposits frames out of or into host memory

Node A

Adaptor

Bits

Adaptor

Node B

Frames
70

Advantages of Framing
Synchronization recovery
Consider continuous stream of unframed bytes Recall RS-232 start and stop bits

Multiplexing of link
Multiple hosts on shared medium Simplifies multiplexing of logical channels

Efficient error detection


Frame serves as unit of detection (valid or invalid) Error detection overhead scales as log N
71

Approaches
Organized by end of frame detection method Approaches to framing
Sentinel (marker, like C strings) Length-based (like Pascal strings) Clock-based

72

Approaches

Other aspects of a particular approach


Bit-oriented or byte-oriented Fixed or variable length Data-dependent or data-independent length
73

Framing with Sentinels


End of frame: special byte or bit pattern Choice of end of frame marker
Valid data byte or bit sequence e.g. 01111110 Physical signal not used by valid data symbol
8 16 16 8

Beginning sequence

Header

Body

CRC

Ending sequence

74

Sentinel Based Approach


Problem: equal size frames are not possible
Frame length is data-dependent

Sentinel based framing examples


High-Level Data Link Control (HDLC) protocol Point-to-Point Protocol (PPP) ARPANET IMP-IMP protocol IEEE 802.4 (token bus)
75

Length-based Framing
Include payload length in header e.g., DDCMP (byte-oriented, variable-length) e.g. RS-232 (bit-oriented, implicit fixed length)
8 8 8 14 42 16

SYN

SYN

Class

Length

Header

Body

CRC

Problem: count field corrupted Solution: catch when CRC fails


76

Clock-based Framing
Continuous stream of fixed-length frames
Each frame is 125s long (all STS formats) (why?)

Clocks must remain synchronized e.g. SONET: Synchronous Optical NETwork


Dominated standard for long distance transmission Multiplexing of low-speed links onto one high-speed link Byte-interleaved multiplexing Payload bytes are scrambled (data XOR 127 bit-pattern) STS-n (STS 1 = 51.84 Mbps)
77

SONET Frame Format (STS-1)


Ov erhead Pay load

9 rows

90 columns

78

Clock-based Framing
Problem: how to recover frame synchronization
2-byte synchronization pattern starts each frame (unlikely to occur in data) Wait until pattern appears in same place repeatedly

79

Clock-based Framing
Problem: how to maintain clock synchronization
NRZ encoding, data scrambled (XORd) with 127-bit pattern Creates transitions Also reduces chance of finding false sync pattern
80

Error Detection
Validates correctness of each frame Errors checked at many levels Demodulation of signals into symbols (analog) Bit error detection/correction (digital) our main focus
Within network adapter (CRC check) Within IP layer (IP checksum) Possibly within application as well

81

Error Detection and Correction


+15

0
voltage

? (erasure) 1

-15
Possible binary voltage encoding symbol Neighborhoods and erasure region Possible QAM symbol Neighborhoods in green All other space results in erasure

Input to digital level: valid symbols or erasures


82

Error Detection: How ? How to detect error ?


Add redundant information to a frame to determine errors

Transmit two complete copies of data


n redundant bits for n-bit message Error at the same position in two copies go undetected

83

Error Detection: How ?


We want only k redundant bits for an n-bit message, where k < < n
In Ethernet, 32-bit CRC for 12,000 bits (1500 bytes)

k bits are derived from the original message Both the sender and receiver know the algorithm

84

Hamming Distance (1950 Paper)


Minimum number of bit flips between code words
2 flips for parity 3 flips for voting

n-bit error detection


No code word changed into another code word Requires Hamming distance of n+1
85

Hamming Distance (1950 Paper) n-bit error correction


N-bit neighborhood: all code words within n bit flips No overlap between n-bit neighborhoods Requires Hamming distance of 2n+1
86

Digital Error Detection Techniques


Two-dimensional parity
Detects up to 3-bit errors Good for burst errors

Internet checksum (used as backup to CRC)


Simple addition Simple in software

Cyclic redundancy check (CRC)


Powerful mathematics Tricky in software, simple in hardware Used in network adapter

87

Two-Dimensional Parity
Adding one extra bit to a 7-bit code to balance 1s extra parity byte for the entire frame Catches all 1, 2 and 3 bit errors and most 4 bit errors 14 redundant bits for a 42-bit message, in the example
Parity bits
0101001 1

1101001
1011110 Data

0
1 1 1 0 0
88

0001110
0110100 1011111

Parity byte

1111011

Internet Checksum Algorithm


Not used at the link level but provides same sort of functionality as CRC and parity Idea:
Add up all words (16-bit integers) that are transmitted Transmit the result (checksum) of that sum Receiver performs the same calculation on received data and compares the result with the received checksum If the results do not match, an error is detected

16 redundant bits for a message of any length Weak protection, accepted as a last line of defense
89

Cyclic Redundancy Check


Theory
Based on finite-field (binary-valued) arithmetic Bit string represented as polynomial Coefficients are binary-valued Divide bit string polynomial by generator polynomial to generate CRC

Practice
Bitwise XORs
90

Cyclic Redundancy Check


Add k bits of redundant data to an n-bit message
Want k << n e.g. k = 32 and n = 12,000 (1500 bytes)

Represent n-bit message as n-1 degree polynomial


e.g. MSG=10011010 as M(x) = x7 + x4 + x3 + x1 Sender and receiver exchange polynomials

Let k be the degree of some agreed-upon divisor/ generator polynomial


e.g. C(x) = x3 + x2 + 1
91

Cyclic Redundancy Check


Transmit polynomial P(x) that is evenly divisible by C(x)
Shift left k bits, i.e. M(x)xk Add remainder of M(x)xk / C(x) into M(x)xk

Receiver receives polynomial P(x) + E(x)


E(x) = 0 implies no errors

Receiver divides (P(x) + E(x)) by C(x); remainder will be zero ONLY if:
E(x) was zero (no error), or E(x) is exactly divisible by C(x)
92

Reliable Transmission
Error-correcting codes are not advanced enough to handle the range of bit and burst errors
Corrupt frames generally must be discarded A reliable link-level protocol must recover from discarded frames

Goals for reliable transmission


Make channel appear reliable Maintain packet order (usually) Impose low overhead / allow full use of link

93

Reliable Transmission
Reliability accomplished using acknowledgments and timeouts
ACK is a small control frame confirming reception of an earlier frame Having no ACK, sender retransmits after a timeout

94

Reliable Transmission
Automatic Repeat reQuest (ARQ) algorithms
Stop-and-wait Concurrent logical channels Sliding window Go-back-n, or selective repeat

Alternative: Forward Error Correction (FEC)


95

Automatic Repeat reQuest


Acknowledgement (ACK)
Receiver tells sender when frame received Cumulative ACK (used by TCP): have received specified frame and all previous Selective ACK (SACK): specifies set of frames received Negative ACK (NACK or NAK): receiver refuses to accept frame now, e. g. when out of buffer space
96

Automatic Repeat reQuest


Timeout: sender decides that frame was lost and tries again ARQ also called Positive Acknowledgement with Retransmission (PAR)

97

Stop-and-Wait
Send a single frame Wait for ACK or timeout
If ACK received, continue with next frame If timeout occurred, send again (and wait)
Frame lost in transit; or corrupted and discarded
Frame 0

Sender

ACK0 Frame1 ACK1

Receiver

98

Stop-and-Wait
Frames delivered reliably and in order Is that enough ?
No, we need performance, too.

Problem: keeping the pipe full ? Example


1.5Mbps link x 45ms RTT = 67.5Kb (~8KB) 1KB frames implies 182 Kbps (1/8th link utilization) Want the sender to transmit 8 frames before waiting for ACK Throughput remains 182 Kbps regardless of the link bandwidth !!
99

Concurrent Logical Channels


Multiplex several logical channels over a single p-to-p physical link (include channel ID in header) Use stop-and-wait for each logical channel Maintain three bits of state for each logical channel: Boolean saying whether channel is currently busy Sequence number for frames sent on this channel Next sequence number to expect on this channel ARPANET IMP-IMP supported 8 logical channels over each ground link (16 over each satellite link)

100

Concurrent Logical Channels


Header for each frame include 3-bit channel number and 1-bit sequence number
Same number of bits (4) as the sliding window requires to support up to 8 outstanding frames on the link

101

Sliding Window
Allow sender to transmit multiple frames before receiving an ACK, thereby keeping the pipe full Upper bound on outstanding un-ACKed frames Also used at the transport layer (by TCP)

Time

Sender

Receiver

102

Sliding Window Concepts


Consider ordered stream of data
Broken into frames Stop-and-Wait Window of one frame Slides along stream over time

time

Sliding window algorithms generalize this notion


Multiple-frame send window Multiple-frame receive window
103

Sliding Window - Sender


Assign sequence number to each frame (SeqNum) Maintain three state variables:
Send Window Size (SWS) Last Acknowledgment Received (LAR) Last Frame Sent (LFS) Maintain invariant: LFS LAR SWS

LAR=13
11 12 13 14

SWS 15 16 17

LFS=18
18 19 20

time

Advance LAR when ACK arrives Buffer up to SWS frames and associate timeouts
104

Sliding Window - Receiver


Maintain three state variables
Receive Window Size (RWS) Largest Frame Acceptable (LFA) Next Frame Expected (NFE) Maintain invariant: LFA NFE+1 RWS NFE=13 RWS LFA=17 11 12 13 14 15 16 17 18 19 20

time

Frame SeqNum arrives:


If NFE SeqNum LFA accept If SeqNum NFE or SeqNum > LFA discarded

Send cumulative ACKs


105

Sliding Window Issues


When a timeout occurs, data in transit decreases
Pipe is no longer full when packet losses occur Problem aggravates with delay in packet loss detection

Early detection of packet losses improves performance:


Negative Acknowledgements (NACKs) Duplicate Acknowledgements Selective Acknowledgements (SACKs) Adds complexity but helps keeping the pipe full
106

Sliding Window Classification


Stop-and-wait: Go-back-N: Selective repeat: SWS=1, RWS=1 SWS=N, RWS=1 SWS=N, RWS=M (usually M = N)

Stop-and-Wait
Go-back-N Selective Repeat
107

Sequence Number Space


SeqNum field is finite; sequence numbers wrap around Sequence number space must be larger than number of outstanding frames (SWS) SWS <= MaxSeqNum-1 is not sufficient
Suppose 3-bit SeqNum field (0..7); SWS=RWS=7 Sender transmits frames 0..6; which arrive successfully (receiver window advances) ACKs are lost; sender retransmits 0..6 Receiver expecting 7, 0..5, but receives second incarnation of 0..5 assuming them as 8th to 13th frame
108

Required Sequence Number Space ?


Assume SWS=RWS (simplest, and typical)
Sender transmits full SWS Two extreme cases at receiver None received (waiting for 0SWS 1) All received (waiting for SWS2 SWS 1)

All possible packets must have unique SeqNum SWS < (MaxSeqNum+1)/2 or SWS+RWS < MaxSeqNum+1 is the correct rule Intuitively, SeqNum slides between two halves of sequence number space
109

Shared Media: Problems


Problem: demands can conflict, e. g. two hosts send simultaneously
STDM does not address this problem centralized Solution is a medium access control (MAC) algorithm

110

Shared Media: Solutions


Three solutions (out of many)
Carrier Sense Multiple Access with Collision Detection (CSMA / CD) Send only if medium is idle Stop sending immediately if collision detected Token Ring/FDDI pass a token around a ring; only token holder sends Radio / wireless (IEEE 802.11) 111

History of Ethernet
Developed by Xerox PARC in mid-1970s Roots in Aloha packet-radio network Standardized by Xerox/DEC/Intel in 1978 Similar to IEEE 802.3 standard IEEE 802.3u standard defines Fast Ethernet (100 Mbps) New switched Ethernet now popular
112

Ethernet Alternative Technologies


Can be constructed from a thinner cable (10Base2) rather than 50-ohm coax cable (10Base5) Newer technology uses 10BaseT (twisted pair)
Several point-to-point segments coming out of a multiway repeater called hub

Hub

Hub

113

Ethernet Multiple Segments


Repeaters forward the broadcast signal on all out going segments (10Base5) Maximum of 4 repeaters (2500m), 1024 hosts

Repeater

Host

114

Ethernet Packet Frame


Preamble allows the receiver to synchronize with signal Frame must contain at least 46 bytes to detect collision 802.3 standard substitutes length with type field
Type field (demux key) is the first thing in data portion A device can accept both frames: type > 1500
64 Preamble 48 48 Src addr 16 Type Body 32 CRC
115

Dest addr

Ethernet MAC CSMA/CD


Multiple access
Nodes send and receive frames over a shared link

Carrier sense
Nodes can distinguish between an idle and busy link

Collision detection
A node listens as it transmits to detect collision
116

CSMA/CD MAC Algorithm

If line is idle (no carrier sensed)


Send immediately Upper bound message size of ~1500 bytes Must wait 9.6s between back-toback frames
117

CSMA/CD MAC Algorithm


If line is busy (carrier sensed)
Wait until the line becomes idle and then transmit immediately Called 1-persistent (special case of ppersistent)

If collision detected
Stop sending data and jam signal Try again later
118

Constraints on Collision Detection


In our example, consider
my-machines message reaches your-machine at T your-machines message reaches my-machine at 2T

Thus, my-machine must still be transmitting at 2T


119

Ethernet Min. Frame Size


RTT on a maximally configured Ethernet of 2500m, with 4 repeaters is about 51.2 s
2500m / 2 x 108 m/s = 12.5 s 2 x 12.5 = 25 us + repeater delays

51.2 s on 10 Mbps corresponds to 512 bits (64 bytes) Therefore, the minimum frame length for Ethernet is 64 bytes (header + 46 bytes data)
120

Retry After the Collision How long should a host wait to retry after a collision ?
Binary exponential backoff Maximum backoff doubles with each failure (exponential) After N failures, pick an N-bit number 2N discrete possibilities from 0 to maximum
121

Ethernet Frame Reception Sender handles all access control Receiver simply pulls frames from network Ethernet controller/card
Sees all frames Selectively passes frames to host processor

122

Experience With Ethernet Number of hosts limited to 200 in practice, standard allows 1024 Range much shorter than 2.5 km limit in standard Round-trip time is typically 5 or 10 s, not 50s
123

Token Ring Overview


Token Ring network was a candidate to replace Ethernet; used in some MAN backbones
16Mbps IEEE 802.5 (based on earlier 4Mbps IBM ring) 100Mbps Fiber Distributed Data Interface (FDDI)

124

IBM Token Ring IEEE 802.5


Ring is viewed as a single shared medium
Each node is allowed to transmit according to some distributed algorithm for medium access All nodes see all frames; destination saves a copy of frame as it flows past

The term token indicates the way the access to shared channel is managed

125

Token in a Token Ring


Token is a special bit pattern that rotates around the ring
A node must capture token before transmitting A node releases token after done transmitting Immediate release-token follows last frame (FDDI) Delayed release after last frame returns 126 to sender

Token in a Token Ring

Remove your frame when it comes back around


Transmit another frame or re-insert the token

Stations get round-robin service as the token circulates around the ring

127

Physical Properties
Data rate can be 4 Mbps or 16 Mbps Encoding of bits uses differential Manchester Ring may have up to 250 (802.5) or 260 (IBM) nodes Physical medium is twisted pair (IBM Token Ring)
128

Token Ring MAC


Network adaptor contains receiver, transmitter and some storage of bits between them Token circulates if no station has anything to send
Ring must have enough capacity to store entire token At least 24 stations with 1-bit storage for 24-bit long token (if propagation delay is negligible) This situation is avoided by designating a 129 monitor

Token Ring MAC


Any station that has a data to send can seize token In 802.5, simply 1 bit in second byte token is modified First two bytes of modified token become preamble for the next frame
130

Frame Format
Illegal Manchester codes in the start and end delimiters Frame priority and reservation bits in access control byte Demux key in frame control byte A and C bits for reliable delivery, in status byte
8 Start delimiter 8 Access control 8 Frame control 48 Dest addr 48 Src addr Variable Body 32 CRC 8 End delimiter 8 Frame status

131

Timed Token Algorithm


Token Holding Time (THT)
Upper limit on how long a station can hold the token A node checks before putting each frame on ring that its transmit time would not cause THT to exceed Long THT achieves better utilization with few senders Short THT helps when multiple nodes 132 have data to send

Reliable Delivery

The A and C bit in the packet trailer for reliability Both bits are initially set to 0 Destination sets A bit if it sees the frame and sets C bit if it copies the frame into its adaptor
133

Token Ring Packet Priorities


A station willing to send priority n packet can set reservation bits to n, if this makes it lower in value
It captures the token when the current sender releases it with priority set to n

Strict priority scheme: no lowerpriority packets get sent when higher priority packets are waiting

134

Token Maintenance
Token rings have a designated monitor node Any station can become the monitor according to a well defined procedure Monitor is elected when the ring is first connected, or when the current monitor fails
135

Token Maintenance
Monitor periodically announces its presence Claim token sent by a station seeing no monitor
If the sender receives back the claim token, it becomes monitor If another station is also contending for monitor, some rule defines the monitor
136

Fiber Distributed Data Interface


Similar to 802.5/IBM token rings but runs on fiber Consists of a dual ring: two independent rings that transmit data in opposite directions at 100Mbps Tolerates a single link break or node failure (selfhealing ring)

(a)

(b)

137

FDDI Physical Properties Variable size buffer (9 80 bits) between input and output interfaces (10ns bit time)
Not required to fill buffer before starting transmission

Maximum 500 stations, maximum 2 km distance between any pair of stations

138

FDDI Physical Properties

Total 200 km fiber: dual nature implies 100 km cable connecting all stations Physical media can be coax or twisted pair cable Uses 4B/5B encoding
139

Timed Token Algorithm


Token Holding Time (THT)
Upper limit on how long a station can hold the token Configured to some suitable value

Token Rotation Time (TRT)


How long it takes the token to traverse the ring (time since a host released the token) TRT <= ActiveNodes x THT + 140 RingLatency

Timed Token Algorithm

Target Token Rotation Time (TTRT)


agreed-upon or negotiated upper bound on TRT

141

MAC Algorithm

Each node measures TRT between successive token arrivals If measured-TRT > TTRT
Token is late Can not send data
142

FDDI Traffic Classes

Synchronous traffic
Latency sensitive Gets higher priority Can always send data
143

Bounded Priority Traffic


If a node has large amount of synchronous data
It will send regardless of measured TRT TTRT will become meaningless !!!

Therefore, total synchronous data during one token rotation is bounded by TTRT
144

Token Maintenance The procedure when a node


Joins the ring (startup) Suspects a failure

Claim frame is used in order to


Generate a new Token Agree on TTRT (so that an application can meet its timing constraints)

A node can send a claim frame without holding the token


145

Frame Format
4B/5B control symbols for start and end of frame Control Field
1st bit: asynchronous (0) versus synchronous (1) data 2nd bit: 16-bit (0) versus 48-bit (1) addresses Last 6 bits: demux key (includes reserved patterns for token and claim frame)

Status Field
From receiver back to sender; error in frame Recognized address; accepted frame (flow control)
8 Start of frame 8 Control 48 Dest addr 48 Src addr Variable 32 8 End of frame 24 Status
146

Body

CRC

Wireless LANs
IEEE 802.11 standard
Designed for use in a small area (offices, campuses)

Bandwidth: 1, 2 or 11 Mbps
Up to 56Mbps in newer 802.11a standard

Targets three physical media


Two spread spectrum radio (2.4GHz freq) One diffused infrared (10m range, 850 nm band) 147

802.11 MAC: CSMA/CA Similar to Ethernet


Defer the transmission until the link becomes idle Take back off if collision occurs

Is it sufficient ? All nodes are not always within reach of (to hear) each other

148

Hidden and Exposed Nodes


Hidden nodes
Sender thinks its OK to send when its not (false +ve) A-C and B-D are hidden nodes in the figure below

Exposed nodes
Sender does not send when its OK to send (false ve) B and C are exposed nodes in the figure below

149

Multiple Access with Collision Avoidance (MACA)

Sender transmits RequestToSend (RTS) frame


Contains intended time to hold the medium

Receiver replies with ClearToSend (CTS) frame


150

MACA for Wireless (MACAW)

Collision detection
No active collision detection Known only if CTS or ACK is not received Binary exponential back off (BEB) is used in case of collision, like in Ethernet

151

802.11 - Distribution System Nodes roam freely but operate within a structure
Tethered by wired network infrastructure (Ethernet ?) Each Access Point (AP) services nodes in some region Each mobile node associates itself with an AP

152

Managing Connectivity/Roaming
How wireless nodes select Access Point ? Scanning (active search for an AP)
Node sends Probe frame All APs within reach reply with Probe Response frame Node selects one AP; sends it Associate Request frame AP replies with Association Response New AP informs old AP via wired backbone 153

Managing Connectivity
Active scanning: when a node join or move Passive scanning: AP periodically sends Beacon frame, advertising its capabilities
Distribution sy stem

AP-1 AP-2 A B H C C D G

AP-3 F

154

Frame Format
Control field contains three subfields:
6-bit Type field (data, RTS, CTS, scanning); 1-bit ToDS; and 1-bit FromDS

A single frame contains up to 2312 bytes of data


16 Control 16 Duration 48 Addr1 48 Addr2 48 Addr3 16 SeqCtrl 48 Addr4 0 18,496 Pay load 32 CRC

ToDS=0, FromDS=0

ToDS=1, FromDS=1

AP-3

AP-1

A
155

Overview
Also called network interface card (NIC) Components (high-level overview) Options for use
Data motion Event notification

Potential performance bottlenecks Programming device drivers

156

Typical Workstation Architecture


CPU

Communication ?

Cache $
memory bus

Network Adaptor

To Network

Memory

I/O bus

Typically where data link functionality is implemented


157

Components of a Network Adaptor


Bus interface communicates with a specific host
Bus defines protocol for CPU-adaptor communication

Link interface speaks correct protocol on network


Implemented by a chip set, in software or on FPGA

Buffering between different speed bus and link


Host I/O bus

Bus Interface Network Adaptor

Link Interface

network

158

Host Perspective

Adaptor is ultimately programmed by CPU Adaptor exports a Control Status Register (CSR) CSR is readable and writable from CPU at some memory address

159

Data Motion Options for Network Adaptor Use


Transfer frames between adaptor and host memory Programmed input/output (PIO)
Processor manages itself each access (loads/stores) Faster than DMA for small amounts of data
160

Data Motion Options for Network Adaptor Use


Direct memory access (DMA)
Adaptor gets buffer descriptor lists by host for read/write Processor is not involved: free to do other things Can be faster than memory copy through CPU Start-up cost

161

Data Motion
CPU

Data movement path using PIO

Cache $ memory bus

Network Adaptor

To Network

Memory

I/O bus

Data movement path using DMA


162

Network Adaptor: Event Notification

Hardware interrupts
Processor free to do other things Events delivered immediately State (register) save/restore expensive Context switches more expensive

163

Network Adaptor: Event Notification

Event polling
Processor must periodically check Events wait until next check No extra state changes
164

Device Drivers

Operating system routines anchoring protocol stack to network hardware Initialize device, transmit frames, field interrupts Code contains device specific details
Difficult to read but simple in logic
165

Performance Bottlenecks

Link capacity Processor computing power I/O bus bandwidth


Overhead involved in each bus transfer
166

Performance Bottlenecks

Memory bus bandwidth


Memory hierarchy with cache levels Memory accesses results in multiple memory copies in different buffers

167

Packet Switches
A multi-input multi-output device Local star topology Performance independent of connectivity
(e.g. adding new host) if switch is designed with enough aggregate capacity

Maximum degree < physical network limit


168

Forwarding
Packets arrive at one of the several inputs and have to be forwarded/switched to one of the available outputs
Connectionless and connection-oriented approach to determine the correct output

First challenge: forwarding Which way should it go ?


169

Routing
Forwarding requires information

Second challenge: routing

How to maintain forwarding information ?


170

Contention and Congestion


If arrival rate for a certain output is greater than the output capacity, then contention occurs If arrival rate of packets is too high to cause buffer overflow, then congestion occurs
Who goes first ?
171

Any one is dropped ?

Network Layers and Switches


Application
User level

host

Presentation
Session switch Transport switch between different physical layers

OS kernel

Network

Network

Data Link Physical

Data Link
One or more nodes

Physical

within the network

172

Packet Switching / Forwarding

Three approaches
Datagram or connectionless approach Virtual circuit or connection-oriented approach Source routing

Important notion: unique global address per host

173

Datagram Switching / Forwarding


Every packet contains enough information
Enables switch to decide how to forward it

Switch translates global address to output port


Maintains forwarding table for translations

Each packet forwarded and travels independently

174

Datagram Switching
Managing tables in large, complex networks with dynamically changing topologies is a real challenge for the routing protocol
Host E

At switch 1: Dest Port#/Interface A 2 B 1 C 3 D 0 E 1

Host D Host F 2 0 3 1 Host C 2 Switch 1 3 1 Switch 2

Host A

Host G 1

Switch 3 Host B 3

Host H

175

Datagram Model
No round trip time delay waiting for connection setup
Host can send data anywhere, anytime as soon as it is ready Source has no way of knowing if the network is capable of delivering a packet or if the destination host is even up

Packets are treated independently


Possible to route around link and node failures dynamically
176

Virtual Circuit Switching


Explicit connection setup (and teardown) phase from source to destination: connection-oriented model
Subsequence packets follow established circuit

Supporting connections in network layer may be useful for service notions


177

VC Tables in VC Switching

Setup message in signaling process (to create VC table) is forwarded like a datagram Acknowledgment of connection setup to downstream neighbors to complete signaling
Data transfer phase can start after ACK is received
178

Signaling in VC Switching
Setup message is forwarded from Host A to Host B On connection request, each switch creates an entry in VC table with a VCI for the connection
I/F VCI I/F VCI in in out out I/F VCI I/F VCI in in out out 2 5 1
2

3
Switch 2

0 Switch 1 3 2 1

3
0

setup setup B

setup

B
2 Host A 1 3 Switch 3

Host B

I/F VCI I/F VCI in in out out 2 7 3

setup

179

Virtual Circuit Model

Typically wait full RTT for connection setup before sending first data packet
Can not avoid failures dynamically, must re-establish connection (old one is torn down to free storage space)
180

Source Routing

Packet header contains sequence of address/ports on path from source to destination


One direction per switch: port, next switch (absolute) Switches read, use, and then discard directions
181

Data Transfer in Source Routing


Analogous to following directions
3 2 Switch 2

Switch 1 1

3 0

data
2

0 data 0 1 3

data

Switch 3 Host B

data data

1 2

0 3

3 0 1

Host A 1 3 0

data

182

Source Routing Model

Source host needs to know the correct and complete topology of the network
Changes must propagate to all hosts

Packet headers may be large and variable in size: the length is unpredictable

183

Implementation and Performance


I/O bus CPU Interface 1

Interface 2

Interface 3 Main memory

Packet arriving at interface 1 has to go on interface 2 Point of contention for packets: I/O and memory bus
184

Building Extended LANs


Traditional LAN
Shared medium (e.g. Ethernet) Cheap, easy to administer Supports broadcast traffic

Problem
Want to scale LAN concept Larger geographic area (Greater than O(1 km)) More hosts (Greater than O(100)) But retain LAN-like functionality

Solution: bridges

185

Bridges
Connect two or more LANs with a bridge
Transparently extends a LAN over multiple networks Accept & forward strategy (in promiscuous mode) Level 2 connection (does not add packet header)
A B C Port 1 Bridge Port 2 X Y Z

186

Learning Bridges
Learn table entries based on source address
Timeout entries to allow movement of hosts

Table is an optimization need not be complete Always forward broadcast frames Uses datagram or connectionless forwarding
A B C Port 1 Bridge Port 2 X Y Z

Host A B C X Y Z

Port 1 1 1 2 2 2

187

Learning Bridges
A B B3 C D

B5
B2 E

B7
F

B1 G H

B6

B4 J

Problem

Redundancy (desirable to handle failures, but ) Makes extended LAN structure cyclic Frames may cycle forever

Solution: spanning tree

188

Spanning Tree
Subset of forwarding possibilities All LANs reachable, but Acyclic Bridges run a distributed algorithm to calculate the spanning tree
Select which bridge actively forward Developed by Radia Perlman of DEC Now IEEE 802.1 specification Reconfigurable algorithm

189

Spanning Tree Algorithm


All designated bridges forward frames
On all designated ports On preferred port (path leading to root)
A
B3

B
B5

LAN

C
B2 E D

B7 F B1

Designated port
Preferred port
B2

Designated bridge

B6 I

B4 J

190

Distributed Spanning Tree Algorithm

Bridges exchange configuration messages


ID for bridge sending the message ID for what the sending bridge believes to be root bridge Distance (hops) from sending bridge to root bridge
191

Limitations of Bridges

Do not scale
Spanning tree algorithm does not scale Broadcast does not scale

Do not accommodate heterogeneity


Only supports networks with same address formats
192

ATM (Asynchronous Transfer Mode)


Common in WANs, can also be used in LANs
Competing technology with Ethernet, but areas of application only partially overlap

Connection-oriented packetswitched network


Virtual-circuit routing

Typically implemented on SONET (other physical layers possible)

193

ATM Signaling
Connection setup called signaling (standard Q.2931) Route discovery, resource resv, QoS, ... Send through network
Request setup circuit Send setup frame on setup circuit

Establish locally
No intermediate switch involvement Requires pre-established virtual path
194

Cell Switching (ATM) Fixed length (53 bytes) frames are called cells
5-byte (header + 1 byte CRC 8) + 48-byte payload

Standard defines 3 layers (5 sublayers)


Layers interface to physical media and to higher layers (e.g. encapsulating variable-length frames)
195

Cell Switching (ATM)

2-level connection hierarchy


Virtual circuits Virtual paths Bundles of virtual circuits Travel along common route Reduces forwarding information

196

ATM Cell Format


User-Network Interface (UNI)
4 8 16 3 1 8 384 (48 bytes)

GFC

VPI

VCI

Type

CLP HEC(CRC-8)

payload

Host-to-switch format GFC: Generic Flow Control (still being defined) VCI/VPI: Virtual Circuit/Path Identifier Type: management, congestion control, AAL5 (later) CLP: Cell Loss Priority HEC: Header Error Check (CRC-8)

Network-Network Interface (NNI)


Switch-to-switch format GFC becomes part of VPI field
197

Segmentation and Reassembly


ATM Adaptation Layer (AAL)
Application to ATM cell mapping AAL header contains information for reassembly AAL1, AAL2 for applications needing guaranteed rate AAL3/4 designed for variable-length packet data AAL5 is an alternative standard for packet data

AAL

AAL

ATM ATM

198

ATM Layers
ATM Adaptation Layer (AAL)
Convergence Sublayer (CS) supports different application service models Segmentation and Reassembly (SAR) supports variable-length frames

CS AAL SAR

ATM Layer
Handles virtual circuits, cell header generation, flow control ATM

Physical layer
Transmission Convergence (TC) handles error detection, framing Physical medium dependent (PMD) sublayer handles encoding

TC PHY PMD

199

AAL 3/4
Provides information to allow variable size packets to be sent in fixed-size ATM cells Convergence Sublayer Protocol Data Unit (CS-PDU)
8 8 16 < 64 KB 0-24 8 8 16

CPI

Btag

BAsize

payload

Pad

Etag

Length

CPI: Common Part Indicator (version field) Btag/Etag:beginning and ending tags (same) BAsize: hint on reassembly buffer space to allocate Length: size of whole PDU
200

Segmented into cells: header/trailer + 44-byte data

ATM Cell Format for AAL 3/4


40
2

10

352 (44 bytes)

16

ATM header

type

seq

MID

payload

length

CRC-10

Type (is-start? and is-end? bits)


BOM (10): Beginning Of Message COM (00): Continuation Of Message EOM (01): End Of message SSM (11): Single-Segment Message

SEQ: Sequence Number (for cell loss/reordering) MID: multiplexing ID (mux onto virtual circuits) Length: number of bytes of PDU in this cell
201

Encapsulation and Segmentation for AAL3/4


4 bytes < 64 KB 4-7 bytes

CS-PDU header
44 bytes

User data
44 bytes

CS-PDU trailer
44 bytes <44 bytes

ATM header

AAL header

Cell payload

AAL trailer

Padding

202

AAL 5 CS-PDU
CS-PDU Format
< 64 KB 0 - 47 2 2 32

data

pad

reserved

length

CRC-32

Pad so trailer always falls at the end of ATM cell Length: size of PDU (data only) CRC-32 (detects missing or misordered cells)

Cell Format
End-of-PDU bit in Type field of ATM header
203

Encapsulation and Segmentation for AAL 5


Padding User data 48 bytes 48 bytes CS-PDU trailer 48 bytes

ATM header

Cell payload
204

Virtual Paths with ATM


Two level hierarchy of virtual connection: 8-bit VPI and 16-bit VCI
Switches in the public network use 8-bit VPI Corporate sites use full 24-bit address (VPI + VCI) Much less connection-state info in switches Virtual path: fat pipe with bundle of virtual circuits
Public network Network A Network B

205

ATM as a LAN Backbone


H5 ATM links H6 E1 H7 ATM-attached host E3 H1 H2 E2 H4 Ethernet links H3 Ethernet switch ATM switch

Different from traditional LANs, no native support for broadcast or multicast


206

Shared Ethernet Emulation with LANE


All hosts think they are on the same Ethernet
H

ATM Switch

LANE / Ethernet Adaptor Card

Ethernet Switch

H H H

H
H

ATM Switch

LANE / Ethernet Adaptor Card

Ethernet Switch

H H H H
207

ATM / LANE Protocol Layers


Higher-layer protocols (IP, ARP, . . .) Signalling + LANE
AAL5 ATM PHY ATM PHY PHY Ethernet-like interface

Higher-layer protocols (IP, ARP, . . .) Signalling + LANE


AAL5 ATM PHY

Host

Switch

Host

208

Clients and Servers in LANE

LAN Emulation Client (LEC)


Host, bridge, router or switch

LAN Emulation Server (LES)


Maintains clients MAC and ATM addresses Maintains ATM address of BUS
209

Clients and Servers in LANE LAN Emulation Configuration Server (LECS)


High-level network management when LEC starts up Reachable by preset VC (recall known server port#) Maintains mapping of ATM address to LANE type

210

Clients and Servers in LANE


Broadcast and Unknown Server (BUS)
Emulates broadcast and multicast; critical to LANE Uses point-to-multipoint VC with all clients

Servers physically located in one or more devices


LES BUS ATM network Point-to-point VC Point-to-multipoint VC

LECS

H1

H2
211

LANE Registration
1. Client contacts LECS on predefined VC, and sends ATM address to it 2. LECS returns LAN type, MTU and ATM address of LES 3. Client signals connection to LES, and registers MAC and ATM addresses with LES 4. LES returns ATM address of BUS 5. Client signals connection to BUS 6. Bus adds client to point-to-multipoint VC H3 LES BUS

ATM Network

H1
LECS

H2

212

LANE Circuit Setup


1. Client (H1) knows destination MAC address of receiver (H2) 2. Client (H1) sends 1st packet to BUS 3. BUS sends address resolution request to LES 4. LES returns ATM address to client (H1) 5. Client (H1) signals connection to H2 for subsequent packets
H3 LES BUS

ATM Network

H1
LECS

H2

213

Some packets destined for same output


One goes first Others delayed or dropped

Contention in Switches

Delaying packets requires buffering


Finite capacity, some packets must still drop At inputs Increases/adds false contention Sometimes necessary At outputs 214 Can also exert backpressure

Output Buffering
Standard check-in lines

Customer service
x a

you trying to check-in

Mr. A waiting to claim refund of Rs.100

1x6 Switch

Mr. X writing complaint letter


215

Input Buffering: Head-of-line Blocking Customer


Standard check-in lines service x agents are standing by !

1x6 Switch
a

trying to check-in

Mr. X Mr. A writing waiting to complaint claim refund letter of Rs.100 you
216

Backpressure
Switch 1
no more, please

Switch 2

Propagation delay requires that switch 2 exert backpressure at high-water mark rather than when buffer completely full It is thus typically only used in networks with small propagation delays (e.g. switch fabrics)
217

Switching Fabric
Special-purpose (switching) hardware General problem
Connect N inputs to M outputs (NxM switch) Often N=M (bidirectional links)

Design goals
High throughput: want aggregate close to MIN (sum of inputs, sum of outputs) Avoid contention (fabric faster than ports) Good scalability:linear size/cost growth in 218 N/M

Switch: Fabric and Ports


Fabric has a job to deliver packets to the right output
Input Port Output Port

Input Port

Switch Fabric fabric


(with small internal buffering)

Output Port

Input Port

Output Port

Input Port

Output Port
219

Ports and Fabric


Ports deals with the complexity of the real world
Virtual circuit management is handled in ports Determine output port using forwarding tables

Input port is the first in performance bottlenecks


Header processing and handling packet to fabric 220

Design Goals - Throughput An n x m switch can provide max ideal throughput of: S = S1+ S2 + + Sn
Only possible if traffic at inputs is evenly distributed across all outputs Sustained throughput higher than link speed of output is not possible
221

Design Goals - Scalability


Cost of hardware rises fast with increasing the number of ports n
Adding ports increases hardware & design complexity Scalability in terms of rate of increase in cost

Design complexity determines maximum switch size


Switch designs run into problems at some maximum number of inputs and outputs
222

Switch Performance
Avoid contention with buffering
Use output buffering when possible Apply backpressure through fabric Input buffering with peeking (non-FIFO semantics) to reduce head-of-line blocking problems Drop packets if input buffer overflows

Good scalability
O(N) ports Port design complexity O(N) gives O(N2) for switch Port design complexity O(1) gives O(N) for switch
223

Crossbar (Perfect) Switch

Problem: hardware scales as O(N2)


224

Knockout Switch: Pick L from N


4

8-to-4 Concentrator
3

2 2 random selector
2

delay unit Inputs

Outputs
1

Problem: what if more than L arrive?


225

Shared Memory Switch


Inputs Outputs

Mux

Buffer memory

Demux

Write control

Read control

226

Self-Routing Fabrics
Use source routing on network within switch Input port attaches output port number as header Fabric routes packet based on output port Types
Banyan Network Batcher-Banyan Network Sunshine Switch

227

Banyan Network
001 011 110 111 011 001

110

MSB

LSB

111

Sends 0 bit up, 1 bit down


228

Batcher (Merge Sort) Network


7 3 3 7 3 6 3 6 3 1 1 3

6 1 Sort

6 1

7 1

1 7

6 7 Merge

6 7

Merge

Routing packets through a Batcher network

Batcher-Banyan Network
Attach the two-back-to-back Arbitrary unique permutations routed without 229 contention

Batcher-Banyan Network

Sends 1 bit up Sends 0 bit down

Sends 0 bit up Sends 1 bit down

230

Sunshine Switch
k Delay k

Inputs

Batcher

n +k

Trap
(marks overflow packets)

n +k

n Selector n n

l bany ans

n n n

Outputs

Like a Knockout switch Re-circulates overflow packets i.e. when more than L arrive in one cycle
231

What we understand
Concepts of networking and network programming
Elements of networks: nodes and links Building a packet abstraction on a link

Transmission, and units of communication data


How to detect transmission errors in a frame after encoding and framing it How to simulate a reliable channel (sliding window) How to arbitrate access to shared media in any network

Design issues of direct link networks


Functionality of network adaptors
232

We also understand
How switches may provide indirect connectivity
Different ways to move through a network (forwarding) Bridge approach to extending LAN concept Example of a real virtual circuit network (ATM) How switches are built and contention within switches

Next: lets different networks work together


233

Internetworking
Reading: Peterson and Davie, Ch. 4 Basics of Internetworking Heterogeneity
The IP protocol, address resolution, control messages

Dealing with simple heterogeneity issues


Defining a service model Defining a global namespace Structuring the namespace to simplify forwarding Hiding variations in frame size limits

234

Internetworking
Routing moving forward with IP
Building forwarding information

Dealing with global internets-scale


Virtual geography and addresses Hierarchical routing Name translation and lookup: translating between global and local (physical) names Multicast traffic

Future internetworking: IPv6

235

Internet Protocol (IP)


Network protocol for the Internet Operates on all hosts and routers (routers connect distinct networks into the Internet)
FTP HTTP NV TFTP

TCP

UDP

IP

FDDI

Ethernet

ATM
236

Provided to transport layer (TCP, UDP)


Global name space Host-to-host connectivity (connectionless) Best effort packet delivery (datagram-based)

IP Service Model

No delivery guarantees on bandwidth, delay, etc.


Packet delayed for very long time Packet lost Packet delivered more than once Packets delivered out of order
237

Simplest model: ability of IP to run over anything

Internetwork
Concatenation of networks
H6 H1

H7 R3

Network 1 Ethernet
H2 H3

R1

Network 2
Point -topoint

R2

Network 3 FDDI
H5
H4

Network 4 Ethernet
H8

Protocol stack
H1 TCP R1 IP ETH PPP PPP IP FDDI FDDI R2 IP ETH R3 H8 TCP

IP
ETH

IP
ETH

238

IP Addresses
7 bits (126 nets) Class A: 0 Network 14 bits (16k nets) Class B: 24 bits (16 million hosts) Host 16 bits (64K hosts)

Network
21 bits (2 million nets)

Host
8 bits (256) Host

Class C:

Network

18.10.5.22 130.126.143.254 192.12.70.111

host in class A network (MIT) host in class B network (UIUC) host in class C network

More recent classes


Multicast (class D): starts with 1110 Future expansions (class E): starts with 1111
239

Datagram Format
0 V ersion 4 HLen 8 TOS 16 19 Length 31

Ident
TTL Protocol

Flags

Offset
Checksum

SourceAddr DestinationAddr Options (variable) Data Pad (variable)

4-bit version (4 for IPv4, 6 for IPv6) 4-bit header length (in words, minimum of 5) 8-bit type of service (TOS) more or less unused 16-bit datagram length (in bytes) 8-bit protocol (e.g. TCP=6 or UDP=17)
240

Internet Protocol (IP)


Service model: glob address, H-H connect, BE Overview of message transmission Host addressing and address translation Datagram forwarding Fragmentation and reassembly Error reporting/control messages Dynamic configuration Protocol extensions through tunneling Note: congestion control not handled by IP
241

Fragmentation and Reassembly Example


H1 R1 R2 R3 H8
Start of header Ident= x 1 Rest of header Offset= 0

ETH IP (1400)

FDDI IP (1400)

PPP IP (512) PPP IP (512) PPP IP (376)

ETH IP (512) ETH IP (512) ETH IP (376)


Ident= x

512 data bytes

Start of header 1 Offset= 64 Rest of header 512 data bytes

Start of header Ident= x

0
Rest of header

Offset= 0 Ident= x

Start of header 0 Offset= 128 Rest of header 376 data bytes

1400 data bytes

242

Datagram Forwarding
Network # 18.0.0.0 128.32.0.0 0.0.0.0
dest: 18.26.10.0

Netmask 255.0.0.0 255.255.0.0 0.0.0.0


mask with 255.0.0.0

Nest hop / port 1 2 3


matched! send to port 1

dest: 128.16.14.0 mask with 255.0.0.0 not matched mask with 255.255.0.0 not matched matched! send to port 3 mask with 0.0.0.0
243

ARP Packet Format


0 8 Hardware type = 1 HLen = 48 PLen = 32 16 Protocol Type = 0x0800 31

Operation

SourceHardwareAddr (bytes 0 3) SourceHardwareAddr (bytes 4 5) SourceProtocolAddr (bytes 2 3) SourceProtocolAddr (bytes 0 1) TargetHardwareAddr (bytes 0 1)

TargetHardwareAddr (bytes 2 5) TargetProtocolAddr (bytes 0 3)

244

Internet Control Message Protocol (ICMP)


IP companion protocol (not necessary) Handles error and control messages
FTP HTTP NV TFTP

TCP

UDP

IP

ICMP

FDDI

Ethernet

ATM
245

Sent to the source when a node is unable to process IP datagram successfully Error messages
Destination unreachable (protocol, port, or host) Reassembly failed IP Checksum failed; or invalid header TTL exceeded (so datagrams dont cycle forever) Cannot fragment

ICMP Message

Control messages
Echo (ping) request and reply Redirect (from router to source host, to change route)
246

Dynamic Host Configuration Protocol- DHCP


DHCP server is required to provide configuration information to each host
Each host retrieve this information on bootup

DHCP server can be configured manually, or it may allocate addresses on-demand


Addresses are leased for some period of time

Each host is not configured for DHCP server, it performs a DHCP server discovery
A broadcast discovery message is sent by the host 247 and a unicast reply is sent by the server

Virtual Private Networks - VPN


Controlled connectivity
Restrict forwarding to authorized hosts

Controlled capacity
Change router drop and priority policies Provide guarantees on bandwidth, delay, etc.

Virtual net replaces leased line with shared net Unwanted connectivity is prevented on this logical link using IP tunnel
248

IP Tunnel in VPNs
Virtual point-to-point link between a pair of nodes separated by many networks
Network 1 R1

Internetwork

R2

Network 2

10.0.0.1 IP header, Destination = 2.x


IP payload

IP header, Destination = 10.0.0.1


IP header, Destination = 2.x IP payload

IP header, Destination = 2.x


IP payload

249

IP Tunneling for Multicast


Set up a tunnel between each pair of universities Multicast packets
Received by tunnel entry node Encapsulated (another IP header added for tunnel exit) Travel through the Internet (the tunnel) Received by tunnel exit node Unwrapped and delivered to another 250 multicast-capable university campus

What is Routing ?
Definition: task of constructing and maintaining forwarding information (in hosts or in switches) Goals for routing
Capture notion of best routes Propagate changes effectively Require limited information exchange Admit efficient implementation
251

Important notion: graph representation of network

Routing Overview
Hierarchical routing infrastructure defines routing domains Network as a Graph
Nodes are routers Edges are links Each link has a cost Where all routers are under same administrative control A
6 3 4 C B 9 1 1 D 1 2 E F

Problem: Find lowest cost path between two nodes


Maintain information about each link Static: topology changes are not incorporated Dynamic (or distributed): complex algorithms

252

Routing Outline
Algorithms
Static shortest path algorithms Bellman-Ford: all pairs shortest paths to destination Dijkstras algorithm: single source shortest path Distributed, dynamic routing algorithms Distance Vector routing (based on Bellman-Ford) Link State routing (Dijkstras algorithm at each node)

Metrics (from ArpaNet, with informative names)


Original New Revised
253

Bellman-Ford Algorithm
Static, centralized algorithm, (local iterations/destination) Requires: directed graph with edge weights (cost) Calculates: shortest paths for all directed pairs Check use of each node as successor in all paths For every node N for each directed pair (B,C) is the path B N C better than BC ? is cost BNdestination smaller than previously known? For N nodes Uses an NxN matrix of (distance, successor) values
254

Dijkstras Algorithm
Static, centralized algorithm, build tree from source Requires directed graph with edge weights (distance) Calculates: shortest paths from 1 node to all other Greedily grow set S of known minimum paths From node N
Start with S = {N} and one-hop paths from N Loop n-1 times add closest outside node M to S for each node P not in S
is the path N .....MP better than NP ?
255

Distance Vector Routing


Distributed, dynamic version of Bellman-Ford Each node maintains distance vector: set of triples
(Destination, Cost, NextHop)

Edge weights starting at a node assumed known by that node

Exchange updates of distance vector (Destination, Cost) with directly connected neighbors (known as advertising the routes)
Periodically (on the order of several seconds to minutes) Whenever vector changes (called triggered update)
256

Distance Vector Routing Example


Information in routing table of each node:
Iteration 3 At distance to reach node node A B C D E F G A 0 1 1 2 1 1 2 B 1 0 1 2 2 2 3 C 1 1 0 1 2 2 2 D 2 2 1 0 3 2 1 E 1 2 2 3 0 2 3 F 1 2 2 2 2 0 1 G 2 3 2 1 3 1 0

B C A D E

257

Distance Vector Routing: Link Failure


F detects that link to G has failed F sets distance to G to infinity and sends update to A A sets distance to G to infinity since it uses F to reach G A receives periodic update from C with 2-hop path to G A sets distance to G to 3 and sends update to F F decides it can reach G in 4 hops via A

B C A D E

258

Count to Infinity Problem


Link from A to E fails A advertises distance of infinity to E, but B and C advertise a distance of 2 to E ! B decides it can reach E in 3 hops; A advertises this to all A decides it can read E in 4 hops; F advertises this to all C decides that it can reach E in 5 hops We are counting to infinity

B C D E

259

Split Horizon
Avoid counting to infinity by solving mutual deception problem When sending an update to node X, do not include destinations that you would route through X
If X thinks route is not through you, no effect If X thinks route is through you, X will timeout route
C:2:B

Loop of > 2 nodes fails split horizon !!!

A
C:2:B

B
C:1:C C::-

C
260

Split Horizon with Poison Reverse


When sending update to node X, include destinations that you would route through X with distance set to infinity Dont need to wait for X to timeout

261

Link State Routing


Distributed, dynamic form of Dijkstras algorithm Strategy
Send to all nodes (not just neighbors) information about directly connected nodes (not entire route table) in LSP

Basic data structure: Link State Packet (LSP)


ID of the node that created the LSP Cost of link to each directly connected neighbor: vector of (distance, successor) values Sequence number (SEQNO) Time-to-live (TTL) for this packet

262

Link State Routing


Each node maintains a list of (ideally all) LSPs
Runs Dijkstras algorithm on list May discover its neighbors by Hello messages

Information acquisition via reliable flooding


Create new LSP periodically; send to 1-hop neighbors Increment SEQNO (start SEQNO at 0 when reboot) Store most recent (higher SEQNO) LSP from each node Forward new LSP to all nodes but the one that sent it Decrement TTL of each LSP; discard when TTL=0 263 Try to minimize routing traffic overhead

Route Calculation
At node D Confirmed list Tentative list 1. (D,0,-) 2. (D,0,-) (C,2,C), (B,11,B) 3. (D,0,-), (C,2,C) (B,11,B) 4. (D,0,-), (C,2,C) (B,5,C), (A,12,C) 5 5. (D,0,-), (C,2,C), (B,5,C) (A,12,C) 6. (D,0,-), (C,2,C), (B,5,C) (A,10,C) A 7. (D,0,-), (C,2,C), (B,5,C), (A,10,C)

B 3 10 11 D
264

C 2

Vous aimerez peut-être aussi