IP Fragmentation

This chapter is a précis of Chapter 21 of Doug Comer's book [Comer 2004] . It explains the concepts of encapsulation, fragmentation, and reassembly.

Encapsulation

When a host or router handles a datagram, the IP software determines the next hop to which the datagram should be sent. Thus a datagram could traverse many physical networks, each with their own frame formats. An IP datagram is encapsulated in the data area of the frame as illustrated in Figure 1.

IP datagram encapsulated in a hardware frame
Figure 1: IP datagram encapsulated in a hardware frame

In practice, some hardware technologies include a frame trailer as well as a frame header. The receiver of an incoming frame knows that the payload is an IP datagram because of the value of the frame type field in the frame header.

Encapsulation applies to one transmission at a time. When the frame arrives at the next hop, the IP datagram is removed and the frame discarded. If the datagram most be forwarded across another network, a new frame is created and the encapsulation is repeated. Each network can use a different hardware technology from the others, so the frame formats can differ. Figure 2 illustrates how a datagram appears as it is encapsulated and unencapsulated from source to destination.

Traversing different networks
Figure 2: Traversing different networks

Hosts and routers store the datagram in memory; when transmitted it is encapsulated in a frame suitable for the network. Each hardware technology specifies the maximum amount of data that a frame can carry. This is called the Maximum Transmission Unit (MTU). Figure 3 illustrates a router connecting two networks with different MTUs.

Differing MTUs
Figure 3: Differing MTUs

Host H2 can only transmit datagrams containing 1,000 octets or less, which router R can forward across network 1. However, if host H1 transmits a 1,500-octet datagram, router R cannot send it across network 2.

Fragmentation

IP uses a technique called fragmentation to solve the problem of heterogeneous MTUs. When a datagram is larger than the MTU of the network over which it must be sent, it is divided into smaller fragments which are each sent separately. This process is illustrated in Figure 4.

Fragmentation
Figure 4: Fragmentation

To fragment a datagram, a host or router uses the MTU and the datagram header size to calculate how many fragments are required (they must be in multiples of 8 octets). Then the header of the original datagram is copied into the headers of each of the fragments. The following fields change.

Each fragment becomes its own datagram and is routed independently of any other datagrams. This makes it possible for the fragments of the original datagram to arrive at the final destination out of order. Further fragmentation could occur.

At the final destination, the process of re-constructing the original datagram is called reassembly. The unique IDENTIFICATION field groups fragments together, even from the same source. The FRAGMENT OFFSET field tells the receiver how to order the fragments. The absence of the MORE flag signals the last fragment. An example is illustrated in Figure 5.

Traversing an internet
Figure 5: Traversing an internet

If host H1 sends a 1,500 octet datagram (20-octet header and 1,480 octets of data) to host H2, router R will fragment the datagram into two fragments.

Consider an example. Ethernet's MTU is 1,500 bytes. The IP header takes 20 bytes; the ICMP header takes 8 bytes; add these to 1,473 data bytes for an 1,501 byte datagram! Figure 6 illustrates the output from the ping command.

penguin(108)% ping -c2 -s1473 rhea
PING rhea.dcs.bbk.ac.uk (193.61.29.2) from 193.61.29.127: 1473(1501) bytes of data.
1481 bytes from rhea.dcs.bbk.ac.uk (193.61.29.2): icmp_seq=1 ttl=255 time=1.19 ms
1481 bytes from rhea.dcs.bbk.ac.uk (193.61.29.2): icmp_seq=2 ttl=255 time=1.16 ms

--- rhea.dcs.bbk.ac.uk ping statistics ---
2 packets transmitted, 2 received, 0% loss, time 1002ms
rtt min/avg/max/mdev = 1.166/1.179/1.192/0.013 ms
penguin(109)%

Figure 6: Pinging rhea

The first line shows that rhea's IP address is 193.61.29.2 and that we are transmitting 1,501 bytes. The remainder of the output shows that the round-trip time is approximately 1.2 ms, nearly an order of magnitude larger than when we send the default 84-byte datagram. Figure 7 illustrates the output from tcpdump, run in another window at the same time.

[root@penguin root]# tcpdump -nt icmp
tcpdump: listening on eth0
193.61.29.127 > 193.61.29.2: (frag 556:1@1480)
193.61.29.127 > 193.61.29.2: icmp: echo request (frag 556:1480@0+)
193.61.29.2 > 193.61.29.127: icmp: echo reply (frag 8873:1480@0+)
193.61.29.2 > 193.61.29.127: (frag 8873:1@1480)
193.61.29.127 > 193.61.29.2: (frag 557:1@1480)
193.61.29.127 > 193.61.29.2: icmp: echo request (frag 557:1480@0+)
193.61.29.2 > 193.61.29.127: icmp: echo reply (frag 8874:1480@0+)
193.61.29.2 > 193.61.29.127: (frag 8874:1@1480)

8 packets received by filter
0 packets dropped by kernel
[root@penguin root]#

Figure 7: Fragmented ping

The first packet shows the second fragment of datagram 556 which contains one byte at offset 1480. The second packet shows the first fragment of datagram 556 which contains 1,480 bytes at offset 0 with the MORE FRAGMENTS flag set (+). The response from rhea comes in the two fragments of datagram 8873. This echo request/response is repeated with datagram 557 from penguin and datagram 8874 from rhea.

With IP's best-effort delivery, it is possible for one or more fragments to be lost. When the first fragment arrives, the receiver starts a timer. If this timer expires before all the fragments have been received, the receiver discards those fragments it has received. Interestingly, it is possible for a fragment to be fragmented. The information in the fragments always refers to the original datagram.

References

  1. Douglas Comer, Computer Networks and Internets with Internet Applications (fourth edition), Prentice Hall, Upper Saddle River, NJ, 2004, ISBN 0-13-143351-2. http://netbook.cs.purdue.edu


Last modified: Thu Nov 24 10:40:11 2005