A number of studies [Romanow93] [Minden94] have shown that TCP can perform badly over ATM networks, often wasting up to half the available bandwidth. In contrast, TCP peforms well over packet-switched networks. This paper explores the causes of this difference and suggests some solutions.
A previous study [RomFloyd94] demonstrates that TCP works better if switches drop whole packets rather than individual cells, a technique called Early Packet Discard. The reason suggested is that otherwise bandwidth is wasted by transmitting fragments of partially dropped packets, which the receiving TCP cannot use.
This work suggests a different cause of poor TCP performance over ATM, based on detailed observations using hardware monitors in the CreditNet switch. TCP performs well only if the network tends to drop a single packet when TCP increases its window too much. In fact ATM switches tend to drop cells from multiple packets when the window increases too much, causing TCP to pause for a long time rather than continuing with a reduced window.
A practical conclusion from this is that TCP might work better if it could recover quickly from multiple dropped packets, as well as single drops. Such a solution might be easier than developing the hardware required for Early Packet Discard in ATM switches.
This is a simplified version of a paper written by H. T. Kung and Robert Morris that appeared in GLOBECOM '95.
The network involved is the experimental CreditNet ATM switch. The switch has a shared memory of 16 thousand cells, per-VC queuing with control on the output ports, and 16 622 megabit/second ports. Since 622 megabit adapters are hard to find, we slow the ports down to 155 megabits/second. After SONET and ATM overhead, this leaves about 135 megabits/second (or 17 megabytes/second) available to TCP. The statistics in the graphs are taken from counters on the CreditNet switch that track the number of cells sent and received by each connection.
The CreditNet switch implements Partial Packet Discard as an option. This means that once the switch drops a cell from a packet, it will keep dropping cells until the end of the packet, whether or not it could buffer the cells. The switch does not drop the cell that marks the end of the packet unless the switch is still out of memory. This feature should be distinguished from Early Packet Discard [RomFloyd94], in which the switch decides whether to drop the entire packet when the first cell of the packet arrives. If the switch does not have roughly one packet's worth of space available, the entire packet is discarded.
When connected back to back, or through the switch but with no bottleneck, two Alphas can talk TCP at about 15 megabytes/second. this In this situation the packet size, often called MTU or Maximum Transmission Unit, has the most effect on efficiency. Except as noted, the experiments in this paper use an MTU of 9180 bytes.
Regardless of packet size, TCP performs badly unless the amount of buffer space is close to an entire 64Kbyte window, the maximum amount of data TCP will send before pausing to wait for an acknowledgment. Performance is good at slightly less than 64K because a few packets are effectively stored in the hosts. The following graph shows the shows the bandwidth achieved over time by a single connection with 9180-byte packets and 48Kbytes of switch buffer space:
With much less than 64K of switch buffering, TCP sends small bursts of data separated by 1.5-second pauses. These pauses are due to retransmission timeouts caused when TCP's window [Jacobson88] exceeds the switch's buffer space. This happens within a few tens of milliseconds after TCP starts to retransmit each time: TCP can send a window in less than 10 milliseconds, and the window increases by one packet per window sent. Thus after TCP sends a few windows of data, the switch drops some packets because the window is larger than the buffer space. Since TCP's minimum timeout is at least one second, TCP spends far more time waiting to retransmit than it does sending data.
TCP running through packet switches does not suffer from this problem. Each time TCP increases its window size to be one packet too large for the switch, the packet switch typically drops only one packet. TCP can detect a single lost packet with a mechanism called fast retransmit [Stevens95][Jacobson90] after which it decreases the window size and re-sends the lost packet with very little pause.
Why doesn't fast retransmit work over ATM? The following graph plots switch buffer occupancy in bytes just as a TCP connection is opening its window enough that the switch must drop data. Again, the switch input runs at 150 megabits/second, the output at 50, and there are 32 kilobytes of buffer space available. Each peak is caused by the arrival of a pair of packets at the full input rate. The peaks are spaced out because packets leave (and acked and thus new ones transmitted) at the slower output rate. The window increase happens just before time 0.21. The two diamonds above the buffer use line indicate times at which the switch hardware indicated it was dropping cells.
The switch typically drops cells from two packets when TCP opens its window too far. Since fast retransmit reliably recovers from only one lost packet, it leaves a lot of bandwidth unused while it times out.
Intuitively, TCP has increased the amount of data it wants the switch to buffer by one packet. Since the switch hasn't enough space, it must discard up to one packet's worth of data. A packet switch would drop one entire packet. But the ATM switch does not know about packets, so it typically drops a packet's worth of cells spread over multiple packets.
A more formal argument that drops from multiple packets are common can be made. Under the old window size, some number of cells N tended to be free just after each packet arrived, and thus N plus one packet just before each packet arrives. When TCP increases its window size by a packet, the switch can buffer the first N cells of this packet but must drop some of the rest. Since at least N extra cells are buffered, at most one packet's worth of buffer is available when the next packet arrives. If the next packet is even one cell early, some of it must be dropped.
The following artificial graph illustrates this argument. It plots switch buffer use as a function of time, much like the previous graph. In this graph, however, time is measured in output packet transmission times, and the Y axis in packets worth of switch buffering. The input runs at three times the speed of the output. The blue diamonds mark the times at which the sender starts to send a packet; the packet at time 2.333 is the extra packet in a growing window. The green line shows what would happen if there were no limit on buffer space. The red line shows what happens in a switch than can buffer only two packets. The first half of the packet sent at time 2.333 is buffered. Some of the second half is dropped, but some is buffered since the switch is transmitting at the same time. These fragments cause the switch buffer to overflow again when the packet sent at time 3 arrives. In this way two packets are damaged.
Perhaps the reason that Early Packet Discard works so much better than Partial Packet Discard is that it concentrates all the dropped cells into a single packet, from which TCP's fast-retransmit mechanism can reliably recover.
[Jacobson88] Jacobson, V, "Congestion Avoidance and Control," ACM SIGCOMM August 1988, ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
[Jacobson90] Jacobson, V, Note to end2end-interest mailing list, April 1990, vj-fast-retransmit.txt.
[Minden94] Minden, G., Frost, V., Evans, J. and Ewy, B., "TCP/ATM Experiences in the MAGIC Testbed," ftp://ftp.tisl.ukans.edu/pub/papers/TCP-Perform.ps.
[Romanow93] Romanow, Allyn, "TCP over ATM: Some Performance Results," ATM Forum/93-784 ftp://playground.sun.com/pub/tcp_atm/tcp_forum.7_93.ps.
[RomFloyd94] Romanow, A., and Floyd, S., "The Dynamics of TCP Traffic over ATM Networks," ACM SIGCOMM Computer Communications Review, October 1994, ftp://ftp.ee.lbl.gov/papers/tcp_atm.ps.Z.
[Stevens95] Stevens, W. Richard, and Wright, G., "TCP/IP Illustrated, Volume 2," Addison-Wesley, 1995.