Académique Documents
Professionnel Documents
Culture Documents
Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP
This is not a lecture on how to program TCP/IP, rather an introduction to how major portions works IP Addressing: IP addresses, ARP, routing ICMP UDP TCP: flow control, error recovery, establishment, diconnect References:
Internetworking with TCP/IP, volume I, principles, protocols & Architecture, by Douglas Comer TCP/IP Illustrated: the protocols, by W. Richard Stevens Most information also available free via Web searches
2
Overview
Connectionless packet delivery service Layering allows one to replace one service without affecting others IP layer (basic unit of transfer in TCP/IP) provides: Best-effort (does not discard capriciously), unreliable (no guarantees) Packet may be lost, duplicated, out-of-order with no notification Connectionless (each packet treated independently) IP software provides routing 3
Internet datagram
Basic transfer unit
Datagram header Datagram data area
Vers (4 bits): version of IP protocol (IPv4=4) Hlen (4 bits): Header length in 32 bit words, without options (usual case) = 20 Type of Service TOS (8 bits): little used in past, now being used for QoS Total length (16 bits): length of datagram in bytes, includes header and data Time to live TTL (8bits): specifies how long datagram is allowed to remain in internet
Routers decrement by 1 When TTL = 0 router discards datagram Prevents infinite loops
IP Fragmentation
How do we send a datagram of say 1400 bytes through a link that has a Maximum Transfer Unit (MTU) of say 620 bytes? Answer the datagram is broken into fragments Net 1 MTU=1500 Net 3 MTU=1500
Net 2 MTU=620
Identification: copied into fragment, allows destination to know which fragments belong to which datagram Fragment Offset (12 bits): specifies the offset in the original datagram of the data being carried in the fragment
Measured in units of 8 bytes starting at 0
Fragmentation Control
More Fragments (least sig bit): tells receiver it has got last fragment
TCP traffic is hardly ever fragmented (due to use of MTU discovery). About 0.5% - 0.1% of TCP packets are fragmented .
8
NB. If data segment contains its own header that is not replicated
Internet Addressing
IP address is a 32 bit integer
Refers to interface rather than host Consists of network and host portions
Enables routers to keep 1 entry/network instead of 1/host
Class A, B, C for unicast Class D for multicast Class E reserved Classless addresses
11
Subnets
A subnet mask is applied to the host bits to determine how the network is subnetted, e.g. if the host is: 137.138.28.228, and the subnet mask is 255.255.255.0 then the right hand 8 bits are for the host (255 is decimal for all bits set in an octet) Host addresses of all bits set or no bits set, indicate a broadcast, i.e. the packet is sent to all hosts.
12
Prefix Length
128.0.0.0 192.0.0.0 224.0.0.0 240.0.0.0 248.0.0.0 252.0.0.0 254.0.0.0 255.0.0.0 255.128.0.0 255.192.0.0 255.224.0.0 255.240.0.0 255.248.0.0 255.252.0.0 255.254.0.0 255.255.0.0
/17 /18 /19 /20 /21 /22 /23 /24 /25 /26 /27 /28 /29 /30 /31 /32
255.255.128.0 255.255.192.0 255.255.224.0 255.255.240.0 255.255.248.0 255.255.252.0 255.255.254.0 255.255.255.0 255.255.255.128 255.255.255.192 255.255.255.224 255.255.255.240 255.255.255.248 255.255.255.252 255.255.255.254 255.255.255.255
Decimal Octet
Binary Number
1000 0000 1100 0000 1110 0000 1111 0000 1111 1000 1111 1100 1111 1110 1111 1111
13
Address depletion
In 1991 IAB identified 3 dangers
Running out of class B addresses Increase in nets has resulted in routing table explosion Increase in net/hosts exhausting 32 bit address space
IP addresses that are not globally unique, but used exclusively in an organization Three ranges:
10.0.0.0 - 10.255.255.255 a single class A net 172.16.0.0 - 172.31.255.255 16 contiguous class Bs 192.168.0.0 192.168.255.255 256 contiguous class Cs
Private IP Addresses
17
Since assigned contiguously, class C CIDR has same most significant bits & so only needs one routing table entry CIDR block represented by a prefix and prefix length
Prefix = single address representing block of nets, e.g
192.32.136.0 = 11000000 00100000 10001000 00000000 while 192.32.143.0 = 11000000 00100000 10001111 00000000
21 bit prefix (2048 host Prefix length indicates number of routing bits, e.g.
addresses)
192.32.136.0/21 means 21 bits used for routing CIDR collects all nets in range 192.32.136.0 through 143.0 into a single router entry reduces router table entries
Removes address classes A, B & C boundaries For more details see RFC 1519
18
IP address is at network layer, need to map it to the MAC (Ethernet address) link layer address Use ARP to map 48 bit Ethernet address to 32 bit IP
IP requests MAC address for IP address from local ARP table If not there, then an ARP request packet for IP address is sent using physical broadcast address (all FFFs) Host with requested IP address responds with its MAC address as a unicast packet On return, host updates ARP table and returns MAC address ARP cache times out ARP packets are on top of Ethernet
19
ARP cont.
ARP requests are local only, do not cross routers
Subnet 1 134.79.10.17 134.79.10.1 Subnet 2 134.79.15.1 134.79.15.3
User A
User B
Compare local IP and subnet mask => local subnet Compare local subnet to destination IP
if local, ARP for MAC address else remote so
if ROUTE entry, ARP for router to subnet if default route, ARP for default gateway otherwise, drop packet & return error
20
Routing
Routers must select next hop for packet Get route information from other routers via a routing protocol (RIP, OSPF, EIGRP etc.) Note the following are non-routable:
private networks: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 Loopback 127.0.0.0/24
21
Packet format
22
Very commonly used diagnostic tool Implementations vary between OS Build echo request
ICMP Echo/Ping
Host reachable
Round trip timing Lost packets Packet reordering duplicate packets Example:
13cottrell@noric05:~>ping -c 4 lhr.comsats.net.pk PING lhr.comsats.net.pk (210.56.16.10) from 134.79.125.205 : 56(84) bytes of data. 64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=0 ttl=242 time=716.962 msec 64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=1 ttl=242 time=720.375 msec 64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=2 ttl=242 time=725.907 msec 64 bytes from lhr.comsats.net.pk (210.56.16.10): icmp_seq=3 ttl=242 time=710.734 msec --- lhr.comsats.net.pk ping statistics --4 packets transmitted, 4 packets received, 0% packet loss round-trip min/avg/max/mdev = 710.734/718.494/725.907/5.566 ms
25
76cottrell@flora06:~>ping islamabad-server2.comsats.net.pk ICMP 13 Unreachable from gateway 207.45.205.18 for icmp from FLORA06.SLAC.Stanford.EDU (134.79.16.101) to islamabad-server2.comsats.net.pk (210.56.8.8)
Unreachable
26
Time Exceeded
0 8 Type 11 Code 16 Unused Internet header & 8 bytes of data Time-to-live has expired at a router (code=0)
ttl sets bound on number routers datagram can transit
Prevents infinite routine loops Initialized by sender, decremented by 1 each time passes router When ttl = 0 datagram thrown away & sender notified by ICMP message
24 31 Checksum
MTU Discovery
Path MTUs vary Fragmentation is bad Small transmission units are bad SO need to discover optimum MTU (largest without fragmentation) Host sends a packet with the Dont Fragment bit set
Length is lesser of local MTU and MSS announced by remote system If MTU between hosts requires fragmentation (e.g. at an intermediate router), then
if an ICMP DF bit set & must fragment then an ICMP message is sent back to source, saying I cant fragment try again with smaller size.
28
Transport Network
TCP IP
UDP
Source port Destination port UDP message len Checksum (opt.) Data
Source/destination port: port numbers identify sending & receiving processes
Port number & IP address allow any application in any computer on Internet to be uniquely identified Used to demultiplex datagrams to processes Ports can be static or dynamic
Static (< 1024) assigned centrally, known as well known ports Dynamic
UDP applications
Message oriented, e.g. SNMP, DNS, time File system, e.g. NFS, AFS Lightweight file transfer, e.g. tftp, bootp
31
Provides buffering and flow control Takes care of lost packets, out of order, duplicates, long delays Isolates application program from network details Jargon
Segment = TCP packet Socket= source (address + port) + destination (address + port)
32
TCP layering
App. Transport
IP port 6
Port 1
Port 2
Port 1
Port 2
TCP IP
UDP
Network
To ID connection need:
Source: (address, port) AND Destination: (address, port) Only need one port on host to allow multiple connections, since each connection will have different (host, port) at other end
E.g. single host can serve multiple telnet connections
Passive open: application contacts OS & indicates will accept incoming connection, OS assigns port and listens Active open: application requests OS to connect to an (host, port)
33
Receiver site
Rcv pkt 1 Send ACK 1 Rcv pkt 2 Send ACK 2
34
Time
Network messages
Network messages
35
Window slides
1 2 3 4 5 6 7 8
36
Bandwidth end to end, i.e. min(BWlinks) AKA bottleneck bandwidth Round Trip Time (RTT) For TCP keep pipe full
Window (sometime called pipe) ~ RTT*BW
Src
Rcv
CK A
37
RTT
Implementation
Highest byte that can be sent Highest byte sent Bytes sent and acknowledged
3 pointers
Receiver keeps similar window to put stream back together Since full duplex, altogether 4 windows & pointer sets
38
Sender adjusts its window to match advertisement If receiver buffers fill, it sends smaller adverts
Used to match buffer requirements of receiver Also used to address congestion control (e.g. in intermediate routers)
39
Source port Destination port Sequence number Acknowledgement number Hlen Resv Code Window Checksum Urgent ptr Options (if any) Padding Data if any
Source/Dest port: TCP port numbers to ID applications at both ends of connection Sequence number: ID position in senders byte stream
40
41
TCP records time segment sent and time ACK received Then calculates RTT sample Smooth & use to estimate timeout, e.g.
Timeout=beta * RTTs Timeout= RTTs + eta{=4}*f(dev(RTTs))
RTT ms.
Need a timeout estimate that will work for LANs (RTT < msec.) to satellite WANs (hundreds of msec. to secs). RTT can vary a lot with time of day, day of week, or one second to next. May 12th
TCP timeout
Time of day
Site 1
Site 2
s s 1024
Rcv SYN segment Send SYN seq=y, ACK x+1 Rcv ACK segment
Initial sequence numbers (x, y) are chosen randomly Guarantees both sides ready & know it, and sets initial sequence numbers, also sets window & mss Once connection established, data can flow in both directions, equally well, there is no master or slave
44
Site 1
Rcv FIN + ACK seg Send ACK y+1 remaining data & waits App tells TCP to close, TCP sends Receive ACK segment for ACK, then sends FIN Site 2 TCP ACKs FIN, tells its application end of data Site 2 sends FIN when its app closes connection (may be long delay (e.g. require human interaction).
45
More Information
Encylopaedia
http://www.freesoft.org/CIE/index.htm
TCP/IP Resources
www.private.org.il/tcpip_rl.html
Understanding IP addresses
http://www.3com.com/solutions/en_US/ncs/501302.html
atlas is a WNT PC, sunstats is a Sun Solaris 5.6 host MSS is set in TCP option in a SYN segment, communicates the MSS the sender wants to receive len=ip_hlen/tcp_hlen:ip_total_len Initial Sequence Numbers are randomly selected Telnet = port 23 W=Receive window size advertises how much data this host will accept
47
48
Session start
SLAC>CERN: 256kbyte window,1 stream,
full speed > 30msec, 13MBytes in 20s, 5.1MBytes/s
Congestion window