Detailed Behavior of TCP over ATM

rtm@eecs.harvard.edu

A number of studies [Romanow93] [Minden94] have shown that TCP can perform badly over ATM networks, often wasting up to half the available bandwidth. In contrast, TCP peforms well over packet-switched networks. This paper explores the causes of this difference and suggests some solutions.

A previous study [RomFloyd94] demonstrates that TCP works better if switches drop whole packets rather than individual cells, a technique called Early Packet Discard. The reason suggested is that otherwise bandwidth is wasted by transmitting fragments of partially dropped packets, which the receiving TCP cannot use.

This work suggests a different cause of poor TCP performance over ATM, based on detailed observations using hardware monitors in the CreditNet switch. TCP performs well only if the network tends to drop a single packet when TCP increases its window too much. In fact ATM switches tend to drop cells from multiple packets when the window increases too much, causing TCP to pause for a long time rather than continuing with a reduced window.

A practical conclusion from this is that TCP might work better if it could recover quickly from multiple dropped packets, as well as single drops. Such a solution might be easier than developing the hardware required for Early Packet Discard in ATM switches.

This is a simplified version of a paper written by H. T. Kung and Robert Morris that appeared in GLOBECOM '95.

Experimental Setup

The host systems in the following graphs are all DEC Alpha 3000/400 workstations running OSF/1 V3.0. The OSF/1 TCP implementation [Chang93] is derived from 4.3-Reno. This TCP tends to acknowledge, and thus transmit, pairs of packets. The TCP window size is always 64K bytes. The workstations use 155 megabit/second OTTO TurboChannel adapters kindly donated by the DEC Systems Research Center.

The network involved is the experimental CreditNet ATM switch. The switch has a shared memory of 16 thousand cells, per-VC queuing with control on the output ports, and 16 622 megabit/second ports. Since 622 megabit adapters are hard to find, we slow the ports down to 155 megabits/second. After SONET and ATM overhead, this leaves about 135 megabits/second (or 17 megabytes/second) available to TCP. The statistics in the graphs are taken from counters on the CreditNet switch that track the number of cells sent and received by each connection.

The CreditNet switch implements Partial Packet Discard as an option. This means that once the switch drops a cell from a packet, it will keep dropping cells until the end of the packet, whether or not it could buffer the cells. The switch does not drop the cell that marks the end of the packet unless the switch is still out of memory. This feature should be distinguished from Early Packet Discard [RomFloyd94], in which the switch decides whether to drop the entire packet when the first cell of the packet arrives. If the switch does not have roughly one packet's worth of space available, the entire packet is discarded.

When connected back to back, or through the switch but with no bottleneck, two Alphas can talk TCP at about 15 megabytes/second. this In this situation the packet size, often called MTU or Maximum Transmission Unit, has the most effect on efficiency. Except as noted, the experiments in this paper use an MTU of 9180 bytes.

One TCP With a Bottleneck

The simplest situation in which TCP has trouble over ATM involves a host with fast link to a switch sending to a host with a slower link. The following graph shows useful throughput for a range of packet sizes, switch buffer sizes, and MTUs. The input links runs at 155 megabits per second, and the output link runs at 53 megabits, or about 5.7 megabytes per second after SONET and ATM overhead. The switch buffer sizes account only for the memory used by the 48-byte payloads of ATM cells.

Regardless of packet size, TCP performs badly unless the amount of buffer space is close to an entire 64Kbyte window, the maximum amount of data TCP will send before pausing to wait for an acknowledgment. Performance is good at slightly less than 64K because a few packets are effectively stored in the hosts. The following graph shows the shows the bandwidth achieved over time by a single connection with 9180-byte packets and 48Kbytes of switch buffer space:

With much less than 64K of switch buffering, TCP sends small bursts of data separated by 1.5-second pauses. These pauses are due to retransmission timeouts caused when TCP's window [Jacobson88] exceeds the switch's buffer space. This happens within a few tens of milliseconds after TCP starts to retransmit each time: TCP can send a window in less than 10 milliseconds, and the window increases by one packet per window sent. Thus after TCP sends a few windows of data, the switch drops some packets because the window is larger than the buffer space. Since TCP's minimum timeout is at least one second, TCP spends far more time waiting to retransmit than it does sending data.

TCP running through packet switches does not suffer from this problem. Each time TCP increases its window size to be one packet too large for the switch, the packet switch typically drops only one packet. TCP can detect a single lost packet with a mechanism called fast retransmit [Stevens95][Jacobson90] after which it decreases the window size and re-sends the lost packet with very little pause.

Why doesn't fast retransmit work over ATM? The following graph plots switch buffer occupancy in bytes just as a TCP connection is opening its window enough that the switch must drop data. Again, the switch input runs at 150 megabits/second, the output at 50, and there are 32 kilobytes of buffer space available. Each peak is caused by the arrival of a pair of packets at the full input rate. The peaks are spaced out because packets leave (and acked and thus new ones transmitted) at the slower output rate. The window increase happens just before time 0.21. The two diamonds above the buffer use line indicate times at which the switch hardware indicated it was dropping cells.

The switch typically drops cells from two packets when TCP opens its window too far. Since fast retransmit reliably recovers from only one lost packet, it leaves a lot of bandwidth unused while it times out.

Intuitively, TCP has increased the amount of data it wants the switch to buffer by one packet. Since the switch hasn't enough space, it must discard up to one packet's worth of data. A packet switch would drop one entire packet. But the ATM switch does not know about packets, so it typically drops a packet's worth of cells spread over multiple packets.

A more formal argument that drops from multiple packets are common can be made. Under the old window size, some number of cells N tended to be free just after each packet arrived, and thus N plus one packet just before each packet arrives. When TCP increases its window size by a packet, the switch can buffer the first N cells of this packet but must drop some of the rest. Since at least N extra cells are buffered, at most one packet's worth of buffer is available when the next packet arrives. If the next packet is even one cell early, some of it must be dropped.

The following artificial graph illustrates this argument. It plots switch buffer use as a function of time, much like the previous graph. In this graph, however, time is measured in output packet transmission times, and the Y axis in packets worth of switch buffering. The input runs at three times the speed of the output. The blue diamonds mark the times at which the sender starts to send a packet; the packet at time 2.333 is the extra packet in a growing window. The green line shows what would happen if there were no limit on buffer space. The red line shows what happens in a switch than can buffer only two packets. The first half of the packet sent at time 2.333 is buffered. Some of the second half is dropped, but some is buffered since the switch is transmitting at the same time. These fragments cause the switch buffer to overflow again when the packet sent at time 3 arrives. In this way two packets are damaged.

Discard Policies

In a switch without partial packet discard, more than N cells are buffered from the extra packet, since the queue is draining as the extra packet arrives. Thus less than one packet of space is free when the next packet arrives, so it too will be damaged even if it doesn't arrive early. With partial packet discard, only N cells are buffered, so exactly one packet's worth of buffer is available when the next packet arrives. Partial packet discard, then, should allow TCP to transmit a few more packets before suffering a loss and timing out. The following graph, almost identical to the previous graph of the effects of buffer space and MTU on throughput, shows that turning Partial Packet Discard on has little effect:

Perhaps the reason that Early Packet Discard works so much better than Partial Packet Discard is that it concentrates all the dropped cells into a single packet, from which TCP's fast-retransmit mechanism can reliably recover.

Two TCPs with a Bottleneck

The same phenomenon occurs when two TCPs enter a switch on different fast links and leave the switch sharing a slower link. The TCPs rarely compete against each other. Both spend most of their time in retransmit timeouts; whenever either starts to send, it almost immediately opens its window far enough that the switch drops multiple packets.

Conclusions

It is an ATM switch's tendency to drop cells from multiple packets that causes TCP to perform badly over ATM. Some direct improvements suggest themselves, such as implementing Early Packet Discard in switches or improving 4.3-Reno TCP's attempts to recover from multiple lost packets. Reducing TCP's minimum retransmit interval to below one second might decrease its reliance on fast retransmit.

References

[Chang93] Chang, Chran-Ham et. al., "High-performance TCP/IP and UDP/IP Networking in DEC OSF/1 for Alpha AXP," Digital Technical Journal, Winter 1993, ftp://ftp.digital.com/pub/Digital/info/DTJ/nw-06-tcp.ps

[Jacobson88] Jacobson, V, "Congestion Avoidance and Control," ACM SIGCOMM August 1988, ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.

[Jacobson90] Jacobson, V, Note to end2end-interest mailing list, April 1990, vj-fast-retransmit.txt.

[Minden94] Minden, G., Frost, V., Evans, J. and Ewy, B., "TCP/ATM Experiences in the MAGIC Testbed," ftp://ftp.tisl.ukans.edu/pub/papers/TCP-Perform.ps.

[Romanow93] Romanow, Allyn, "TCP over ATM: Some Performance Results," ATM Forum/93-784 ftp://playground.sun.com/pub/tcp_atm/tcp_forum.7_93.ps.

[RomFloyd94] Romanow, A., and Floyd, S., "The Dynamics of TCP Traffic over ATM Networks," ACM SIGCOMM Computer Communications Review, October 1994, ftp://ftp.ee.lbl.gov/papers/tcp_atm.ps.Z.

[Stevens95] Stevens, W. Richard, and Wright, G., "TCP/IP Illustrated, Volume 2," Addison-Wesley, 1995.