Vous êtes sur la page 1sur 5

NS simulation implementing Large Window over TCP SACK

Debasree Banerjee University of California Santa Cruz Santa Cruz, California 95060 December 4, 2003
Since in high BDP environment RTT is high, Large Windows (window scaling) will allow TCP to fully utilize higher bandwidth links. In this paper network simulator 1 Introduction NS-2 has been used to simulate SACK TCP and then window scaling has been implemented over original SACK Recent work on TCP performance has shown that TCP TCP and the improvement has been shown by measuring can work well over a variety of Internet paths. How- TCP congestion window, throughput and end-to-end deever, these changes still do not work well over network lay for different simulation scenarios. paths that have a large bandwidth*delay product(BDP). This high bandwidth*delay product refers to the amount of data that may be unacknowledged so that all of the net- 1.2 Sequence number wrap-around works bandwidth is being utilized by TCP. This is also re- With High Bandwidth links, sequence number wrap ferred to as lling the pipe so that the sender of data can around is another problem.Avoiding reuse of sequence always put data onto network and the receiver will always numbers within the same connection is simple in princihave something to read, and neither end of the connec- ple: enforce a segment lifetime shorter than the time it tion will be forced to wait for the other end. In this pa- takes to cycle the sequence space, whose size is effectively per SACK TCP has been used as the base for comparison 2**31. More specically, if the maximum effective bandbeacause the cumulative positive acknowledgments em- width at which TCP is able to transmit over a particular ployed by TCP are not particularly well suited to the large path is B bytes per second, then the following constraint bandwidth*delay environment due to the time it takes to must be satised for error-free operation: obtain information about segment loss. In selective acknowledgement mechanism used by SACK TCP,the re2 31/B > M SL(secs) (2) ceiver explicitly informs the sender about which segments have arrived and which may have been lost, giving the So for 1Gbps bandwidth link it takes only 17 sec. to wrap sender more information about which segments need to be sequence number. To avoid this problem timestamp opretransmitted.In addition to selective acknowledgement tion is used which appends the timestamp with the seSACK TCP uses Slow Start, Congestion Avoidance, Fast quence number. Timestamp mechanism is already impleRetransmit and Fast Recovery. mented in NS-2. To enable it the timestamp option needs to be set to true. In this simulation the bottleneck link is only 10Mb,100ms. Therefore, sequence wrap problem is 1.1 Large Windows not observed in 10 sec. simulation.

Abstract

The original TCP standard limits the TCP receive window to 65535 bytes. TCPs receiving window size is partic- 2 Background ularly important in a large BDP environment because maximum throughput of a TCP connection is bounded by 2.1 Slow Start and Congestion Avoidance round trip time (RTT) as seen in the formula: In Slow Start, the sender starts by transmitting one segment and waiting for its ACK.When that ACK is received, the congestion window is incremented from one to two, receiver buffer size throughput = . (1) and two segments can be sent. When each of those the round trip time 1

congestion window is increased to four.This provides an exponential growth. Congestion can occur when data arrives on a big pipe (a fast LAN) and gets sent out a smaller pipe (a slower WAN). Congestion can also occur when multiple input streams arrive at a router whose output capacity is less than the sum of the inputs. Congestion avoidance is a way to deal with lost packets. Congestion avoidance and slow start are independent algorithms with different objectives. But when congestion occurs TCP must slow down its transmission rate of packets into the network, and then invoke slow start to get things going again. In practice they are implemented together. Congestion avoidance and slow start require that two variables be maintained for each connection: a congestion window, cwnd, and a slow start threshold size, ssthresh. The combined algorithm operates as follow: 1. Initialization for a given connection sets cwnd to one segment and ssthresh to 65535 bytes. 2. The TCP output routine never sends more than the minimum of cwnd and the receivers advertised window. 3. When congestion occurs (indicated by a timeout or the reception of duplicate ACKs), one-half of the current window size is saved in ssthresh. Additionally, if the congestion is indicated by a timeout, cwnd is set to one segment (i.e., slow start). 4. When new data is acknowledged by the other end, increase cwnd, but the way it increases depends on whether TCP is performing slow start or congestion avoidance. If cwnd is less than or equal to ssthresh, TCP is in slow start,otherwise TCP is performing congestion avoidance. Slow start continues until TCP is halfway to where it was when congestion otherwise TCP is performing congestion avoidance. Slow start continues until TCP is halfway to where it was when congestion occurred (since it recorded half of the window size that caused the problem in step 2), and then congestion avoidance takes over. Slow start has cwnd begin at one segment, and be incremented by one segment every time an ACK is received. As mentioned earlier, this opens the window exponentially: send one segment, then two, then four, and so on. Congestion avoidance dictates that cwnd be incremented by segsize*segsize/cwnd each time an ACK is received, where segsize is the segment size and cwnd is maintained in bytes. This is a linear growth of cwnd, compared to slow starts exponential growth. The increase in cwnd should be at most one segment each round-trip time (regardless how many ACKs are received in that RTT), whereas slow start increments cwnd by the number of ACKs received in a round-trip time. 2

2.2

Fast Retransmit

TCP generates an immediate acknowledgment (a duplicate ACK) when an out-of-order segment is received. The purpose of this duplicate ACK is to let the other end know that a segment was received out of order, and to tell it what sequence number is expected. Since TCP does not know whether a duplicate ACK is caused by a lost segment or just a reordering of segments, it waits for a small number of duplicate ACKs to be received. It is assumed that if there is just a reordering of the segments, there will be only one or two duplicate ACKs before the reordered segment is processed, which will then generate a new ACK. If three or more duplicate ACKs are received in a row, it is a strong indication that a segment has been lost. TCP then performs a retransmission of what appears to be the missing segment, without waiting for a retransmission timer to expire.

2.3

Fast Recovery

After fast retransmit sends what appears to be the missing segment, congestion avoidance, but not slow start is performed. This is the fast recovery algorithm. It is an improvement that allows high throughput under moderate congestion, especially for large windows. The reason for not performing slow start in this case is that the receipt of the duplicate ACKs tells TCP more than just a packet has been lost. Since the receiver can only generate the duplicate ACK when another segment is received, that segment has left the network and is in the receivers buffer. That is, there is still data owing between the two ends, and TCP does not want to reduce the ow abruptly by going into slow start. The fast retransmit and fast recovery algorithms are usually implemented together as follows. 1. When the third duplicate ACK in a row is received, set ssthresh to one-half the current congestion window, cwnd. Retransmit the missing segment. Set cwnd to ssthresh plus 3 times the segment size. This inates the congestion window by the number of segments that have left the network and which the other end has cached . 2. Each time another duplicate ACK arrives, increment cwnd by the segment size. This inates the congestion window for the additional segment that has left the network. Transmit a packet, if allowed by the new value of cwnd. 3. When the next ACK arrives that acknowledges new data, set cwnd to ssthresh (the value set in step 1). This ACK should be the acknowledgment of the retransmission from step 1, one round-trip time after the retransmission. Additionally, this ACK should acknowledge all the intermediate segments sent between the lost packet and the re-

ceipt of the rst duplicate ACK. This step is congestion avoidance, since TCP is down to one-half the rate it was at when the packet was lost. The fast retransmit algorithm rst appeared in the 4.3BSD Tahoe release, and it was followed by slow start. The fast recovery algorithm appeared in the 4.3BSD Reno release.

lay and congestion window is added in the appendix of this report.

3.1

Simulation Results

Simulation

After running the simulation four times using the above mentioned parameters for each topology, congestion window, end-to-end delay and throughput has been calculated from the trace of the simulation and has been plotted against time. The graphs are shown at the end.

For simulating TCP SACK with large window option the network simulator NS-2 has been used. TCP/TCPFull/Sack version of TCP in ns has been used as the base for comparison with the window scaling option. The topology used for this simulation is given below. All links are duplex and use Droptail mechanism. The topology consists of 10 nodes, n0 to n9, all links are 10Mb with 10ms delay. Only the high bandwidth-delay links, are 10Gbps with 100ms delay for one topology and 100Gbps with 10ms delay for the other topology. Trafc is generated from node 0 to 6, 1 to 7, 2 to 8 and 3 to 9 from an FTP source with a packet size of 150. The four TCP ows start at: From 0 to 6: at 0.62sec. From 1 to 7: at 0.02sec. From 2 to 8: at 0.50sec. From 3 to 9: at 0.19sec. The simulation stops at 10sec. Though the packetSize parameter is set to 150 (as specied), NS-2 uses the default segment size (specied by segsize) which is 536 bytes to send packets. NS-2 assumes innite receiver buffer. Therefore to implement TCP SACK, window parameter is set to 122 (65535/segment size). For the modied SACK, I have used two scaling factors 2 and 4 and then with scaling 2 i.e window 244, I have used a reduced max ssthresh value (122). Basically, for each topology, I have measured throughput, end-to-end delay (RTT) and congestion window (cwnd) for 4 diffent parameters as given below. I have repeated the simulation for two different topologies - one for 10Gb, 100ms and another for 100Gb, 10ms.

Analysis

With no scale option, since the window is set to 122 segments (65535/536), cwnd can never go beyond that. But with window scaling (2 times), cwnd is allowed to increase upto 244 segments. So in this case cwnd is more than that in the rst case. When window is allowed to scale upto 4 times, there is no change compared to that of window scale factor = 2. This happens because of bottleneck link at the edges which is only 10Mb,10ms. If these edge links are replaced with higher bandwidth links then scaling the window further increases cwnd. If the max ssthresh is reduced to 122 segments from 488 segments and window set to 244 segments, then cwnd increases. This is because with less max ssthresh, TCP goes into congestion avoidance mode after cwnd reaches the max ssthresh limit. So in this case it goes into congestion avoidance sooner than other scenarios. As a result the network experiences less packet loss and cwnd increases linearly over time under congestion avoidance mode. For the rst topology, cwnd vs. time shows that around 3.5 sec. simulation under both no scale and window scale scenarios, network experinces packet loss and cwnd decreases to half the previous value. But with reduced max ssthresh, there is no congestion at that point. This is because the network is already under congestion avoidance mode and as a result instead of increasing exponentially cwnd increases linearly. In case of the second topology, we can see that around 7 sec. under no scale option, cwnd reaches the window limit (122 segments)and stays at that value for the rest of the simulation. This is avoided using scenario window max ssthresh window scaling. 1 122 488 The throughput vs. time graph also shows similar 2 244 488 graph. Throughput is lowest under no scale scenario. 3 488 488 Then throughput is more with window scale and reduced 4 244 122 max ssthresh compared to that of window scale and higher max ssthresh. The nature of the graphs for both the Table 1: Parameter for different scenarios of simulation topologies are similar. The values are different. Thus, we see that with window scale throughput is increased. The tcl script for emulating the topology along with the But with reduced max ssthresh throughput is further inperl script used for calculating throughput, end-to-end de- creased. 3

The end-to-end delay vs. time graph shows that whenever there is congestion in the network, the delay is increased.Then as the network stabilizes the delay goes back to prevous value.
X Graph
cwnd

Conclusion

180.0000 170.0000 160.0000 150.0000 140.0000 130.0000 120.0000 110.0000 100.0000 90.0000 80.0000 70.0000 60.0000 50.0000 40.0000 30.0000 20.0000 10.0000 0.0000

cwnd1.tr cwnd2.tr cwnd3.tr cwnd4.tr

After analysing the simulation results, it can be concluded that window scaling improves the overall performance of high bandwidth*delay links. At the same time, in order to fully utilize the bandwidth, max ssthresh value has to be chosen very carefully. If the max ssthresh value is too high, the congestion window will try to reach that very fast under slow start mechanism and as a result network will be congested and effective throughput will reduce. If the max ssthresh value is too low, then the window go into congestion avoidance too early ans as a result it will be underutilized. In future, another simulation experiment can be done with different max ssthresh and different window scaling to nd the optimum value for a given topology.

time 0.0000 2.0000 4.0000 6.0000 8.0000 10.0000

References
[1] Kevin Fall and Sally Floyd. Simulation-based comparisons of Tahoe, Reno and SACK TCP. Computer Communication Review, 26(3):521, July 1996. [2] V. Jacobson. Congestion avoidance and control. ACM Computer Communication Review; Proceedings of the Sigcomm 88 Symposium in Stanford, CA, August, 1988, 18, 4:314329, 1988. [3] V. Jacobson and R. T. Braden. RFC 1072: TCP extensions for long-delay paths, October 1988. [4] S. McCanne and S. Floyd. ns - network simulator.

X Graph
cwnd 180.0000 170.0000 160.0000 150.0000 140.0000 130.0000 120.0000 110.0000 100.0000 90.0000 80.0000 70.0000 60.0000 50.0000 40.0000 30.0000 20.0000 10.0000 0.0000 time 0.0000 2.0000 4.0000 6.0000 8.0000 10.0000 cwnd1_2.tr cwnd2_2.tr cwnd3_2.tr cwnd4_2.tr

Figure 1: Congestion window vs. time plot. The rst 4 graph is for 10Gb, 100ms topology and the second one is for 100Gb, 10ms topology. cwnd1 - no scale,cwnd2 window scale = 2, cwnd3 - window scale = 4, cwnd4 window scale = 2 and max ssthresh = 122 segments.

throughput x 103 190.0000 180.0000 170.0000 160.0000 150.0000 140.0000 130.0000 120.0000 110.0000 100.0000 90.0000 80.0000 70.0000 60.0000 50.0000 40.0000 30.0000 20.0000 10.0000 0.0000

X Graph
throughput1.tr throughput2.tr throughput3.tr throughput4.tr

rtt x 10-3 260.0000 240.0000 220.0000 200.0000 180.0000 160.0000 140.0000 120.0000 100.0000 80.0000 60.0000 40.0000 20.0000 0.0000 time

X Graph
rtt1.tr rtt2.tr rtt3.tr rtt4.tr

time 0.0000 2.0000 4.0000 6.0000 8.0000 10.0000

0.0000

2.0000

4.0000

6.0000

8.0000

10.0000

throughput x 103 950.0000 900.0000 850.0000 800.0000 750.0000 700.0000 650.0000 600.0000 550.0000 500.0000

X Graph
throughput1_2.tr throughput2_2.tr throughput3_2.tr throughput4_2.tr

rtt x 10-3 80.0000 75.0000 70.0000 65.0000 60.0000 55.0000 50.0000 45.0000 40.0000

X Graph
rtt1_2.tr rtt2_2.tr rtt3_2.tr rtt4_2.tr

450.0000 400.0000 350.0000 300.0000 250.0000 200.0000 150.0000 100.0000 50.0000 0.0000 time 0.0000 2.0000 4.0000 6.0000 8.0000 10.0000 0.0000 2.0000 4.0000 6.0000 8.0000 10.0000 10.0000 5.0000 0.0000 time 35.0000 30.0000 25.0000 20.0000 15.0000

Figure 2: Throughput vs. time plot. The rst graph Figure 3: End-to-End vs. time plot. The rst graph is for is for 10Gb, 100ms topology and the second one is for 5 10Gb, 100ms topology and the second one is for 100Gb, 100Gb, 10ms topology. throughput1 - no scale, through- 10ms topology. rtt1 - no scale,rtt2 - window scale = 2, put2 - window scale = 2, throughput3 - window scale = 4, rtt3 - window scale = 4, rtt4 - window scale = 2 and max throughput4 - window scale = 2 and max ssthresh = 122 ssthresh = 122 segments. segments.

Vous aimerez peut-être aussi