Strategic Security Intelligence


Overview of Linux Networking


Overview of Linux Networking

Copyright(c), 1990, 1995, 2002 Dr. Frederick B. Cohen - All Rights Reserved

Basic Networking Concepts

In order for communication to take place, there must be some sort of common syntax and semantics encoded into some sort of a communications protocol - a shared method for sequencing communications between parties for the conveyance of meaningful content. The predominant networking protocol used in Linux as in most of the world today is the Internet Protocol (IP) and more specifically, version 4 of that protocol. For this reason, in this course we will focus on IPv4.

Basics of IPv4

IPv4 is defined by a set of "Request for Comment"s (RFCs) which are documents that are subject to open review and comment for a period of time before becoming permanent. They are not standards in the sense that anybody is required to follow them, but rather, they are the sequences of interactions that form the basis for communications between parties. To the extent that parties follow these protocols they will be more likely to interoperate with other parties. The RFCs can be found at: http://www.ietf.org/rfc.html Using your web browser and this URL, retrieve RFC 791. If this fails, click here.

Page through the RFC briefly to see the sorts of things it specifies, but don't go into too much detail yet.

At its core, the Internet consists of a set of computers of various sorts that listen to 'Datagrams'. These datagrams consist of sequences of 8-bit bytes (called octets in the RFCs). The devices in the Internet generally forward datagrams based on the IP destination address as indicated by the 17th through 20th bytes of the datagram. Here is the format of an IP header according to RFC 791:

  0                   1                   2                   3   
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL  |Type of Service|          Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |    Protocol   |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Source Address                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Destination Address                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The header includes (1) The IP version (the bits 0100 for version 4), (2) The IP Header Length, (3) the Type Of Service code, (4) the total packet length, (5) A packet identification pair of bytes, (6) A set of flags, (7) A fragment offset, (8) A time to live, (9) a Protocol number, (10) a Header Checksum, (11) a Source Address, (12) a Destination Address, (11) a set of (optional) Options, (12) Padding to bring the datagram to a multiple of 4 bytes in total length, and (13) content associated with the datagram.

It is important to note that these are all just bits that are put into the transmission media, and that anyone who wants to can enter any sequence of bites desired at any point in a transmission process. So, for example, the fact that precedence code 110 indicates Internetwork control does not mean that the system that set those bits was authorized to control the Internet. If someone starts the conversation by saying they are from the FBI that does not mean that they really are.

In order to get packets through the Internet, an IP address is needed, because otherwise, the datagram transport mechanism will not know where to send the packets. Configuring an IP address is done by the program 'ifconfig'. The example from the previous graphical interface actually performs this command (and some others):

Some of the RFCs eventually became standards because they were so widely adopted and are so central to the operation of the Internet. Standards are listed many places, one of them being here. As of this writing, only about 60 of the thousands of RFCs have been turned into standards.

The Internet operated by having each computer that receives a datagram use it's "best effort" to try to deliver that datagram to it's destination address. This is typically done via selecting the optimal interface to resend the datagram on and sending the datagram that way. If all of these decisions are made well, the datagrams flow well and the Internet works. If not, it fails. Routing and similar elements of protocol are controlled by the Internet Control Message Protocol (ICMP). Details of this protocol element are provided in RFC 792.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |     Code      |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             unused                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Internet Header + 64 bits of Original Data Datagram      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

ICMP datagrams augment the IP header with a 'type' of ICMP datagram, a 'code' indicating what the message is intended to mean, a checksum of the content, an unused sequence of bytes, and the first 64 bits of the header of the datagram which induced this ICMP message. For example, ICMP is used to indicate that host could not be found. In this case, the datagram returned would include the requesting packet's first 64 bits and the originating host would be able to tell what outgoing packet generated the incoming ICMP packet so it could act appropriately.

Datagrams are not guaranteed to arrive or if they do arrive, they are not guaranteed to arrive in order or without corruption. That's what "best effort" means, and these facts are explicitly called out in the RFCs. If properly sequenced output is desired or added reliability or privacy is desired, additional protocol elements are used to attain this. For example, the Transmission Control Protocol (TCP) is typically used to assure sequential delivery of content. Details of this protocol element are provided in RFC 793.

  0                   1                   2                   3   
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Source Port          |       Destination Port        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Acknowledgment Number                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Data |           |U|A|P|R|S|F|                               |
   | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
   |       |           |G|K|H|T|N|N|                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Checksum            |         Urgent Pointer        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             data                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TCP headers include a source and destination 'port' number. This, in combination with the host IP addresses is used to differentiate 'sessions'. For example, a session between a web browser and a web server would typically have a browser port greater than or equal to 1024 (the threshold for typical 'service' ports in the RFCs) and less than or equal to 65535 (the maximum value for a 16 bit unsigned integer). The web server typically uses port 80. So each time you request a web page, you might be allocated the next available port over 1024 on your browser, and use this as the source port for your packets. The destination port would be 80, and return packets from this request would come from port 80 on the remote host to the sending port on your host so your computer could recognize and reassociate the session with your request.

TCP headers also include sequence numbers so that if packets arrive out of order they can be reordered to get sequential behavior, acknowledgement numbers so acknowledgements can be associated with the sending sequence numbers, data offsets for indicating where the data starts in the TCP part of the IP packet, a set of flags associated with the start, continuation, and termination of sessions, a 'window' value indicating how much data can be received from the sender at one time, a checksum, urgency level, set of TCP options, padding to the nearest 4-bytes, and the content of the datagram.

Another very important protocol is called User Datagram Protocol (UDP)RFC 768. This is used for functions like looking up the translation between host names and their IP addresses. For example, when you enter "http://all.net" into your browser address window, the Domain Name System (DNS) used UDP packets to find the IP address associated with the "all.net" domain name in order to create the IP datagrams needed to communicate with the all.net server to fetch the web pages. UDP packets are independent of each other so it is generally a good idea to encode an entire request or an entire response in a single datagram.

 0      7 8     15 16    23 24    31  
+--------+--------+--------+--------+ 
|     Source      |   Destination   | 
|      Port       |      Port       | 
+--------+--------+--------+--------+ 
|                 |                 | 
|     Length      |    Checksum     | 
+--------+--------+--------+--------+ 
|                                     
|          data octets ...            
+---------------- ...                 

The DNS entry for the example network setup provided earlier is stored as a line in the file '/etc/resolv.conf' and the line looks like this:

UDP datagrams, like TCP datagrams, have a source and destination port, but because there is no synchronization mechanism, they need only a length and checksum before their content is included.

The list of all assigned numbers for IP is provided in RFC 1700. In accessing the databases of RFCs, one of the common techniques is to start with RFC 1700, find which other RFCs you are looking for there, and then change the URL in the browser's location area to reflect the changes RFC. The back button is used to revert to RFC1700 for the next search. To try it out, start here.

Interfaces and Media

Datagrams typically travel from place to place over transmission media such as radio waves, optical signals in fiber, or electrical signals in cables. In order for the computer to interface with these media, Input/Output (I/O) devices are used. These devices interface between the computer and the media.

Different media have different characteristics. For example, serial communications of telephone modems using the Point-to-Point Protocol (PPP) os the Serial Line Internet Protocol (SLIP) typically uses packet sizes of less than 256 bytes, while Ethernet cables normally have packet sizes of 1508 bytes and Asynchronous Transfer Mode (ATM) uses 'cells' of 56 bytes each. Wireless varies in packet lengths. Serial communications tends to run at less than 56,000 bits per second, while ATM runs at millions of cells per second and 802.11b wireless runs from 1 million bits per second to 11 million bits per second. Wireless has a lot of noise and as the distance increases reliability goes down and bit rates are adjusted appropriately lower. Most wired connections have far less noise, and thus there are significantly different operational characteristics.

The Linux operating system interfaces to devices by using a combination of built-in and loadable driver modules. These driver modules largely reduce and often eliminate the need to understand details of the networking technologies by providing the IP abstraction and taking care of the details of the interface so the user and programmer can largely ignore them. Nevertheless, there are specific protocols such as the Address Resolution Protocol (ARP) that are effective only within a particular media and do not flow across the Internet as a whole.

Using ARP as an example, we will use the 'tcpdump' program to see how packets flow across the Internet. From the X11 xterm window type this:

The 'ifconfig' command configures an IP interface manually. In this case we are configuring eth0 - the device name typically associated with the first Ethernet card found at bootup by the operating system. The string 'promisc' is short for 'promiscuous mode'. While normally Ethernet cards only listen for packets specifically destined for them and so-called broadcast packets, when promiscuous mode is enabled, the Ethernet card observes all traffic at the Interface and reports it to the computer. The 'tcpdump' command extracts data from the Ethernet card and optionally formats it as desired. In this case we have indicated that we want numeric output (-n) from interface eth0 (-i eth0):

The result is a sequence of lines representing packets observed on the Interface. The run was terminated manually by typing [CTRL-C] (hold down the [CTRL] key and press 'c'. The first line displayed is from a session between IP address 204.7.229.79 on TCP port 22 and 204.7.229.16 on TCP port 32789. This was an acknowledgement (ack) which then resulted in another packet from 204.7.229.16 to 204.7.229.79. Port 22 is normally associated with the Secure Shell (ssh) protocol, an encrypted remote terminal protocol. The next two lines are ARP packets. In this case 204.7.229.16 is asking for the Ethernet address of 204.7.229.79 (arp who-has)and in the next packet a reply comes that 204.7.229.79 is at MAC address 0:50:ba:87:fd:78. In subsequent packets, 204.7.229.16 will use this MAC address to address 204.7.229.79.

The effect is that 204.7.229.79 does not have to have it's Ethernet card in promiscuous mode and thus the hardware can ignore packets not addressed to it. This saves the operating system time and effort that it can then spend in more useful work. Similarly, 204.7.229.79 will eventually ask for the MAC address of 204.7.229.16 and use it for transmissions. In Ethernets, packets include the MAC address information followed by the IP packet, and the total length is limited to 1508 bytes per Ethernet frame. If datagrams are larger, they have to be fragmented to be sent through the Ethernet.

Routing and Gateways

While most end user workstations have only one Ethernet interface, computers that operate to route traffic between Ethernets typically have two or more interfaces. In some cases, these routers will have many different interfaces so that they convert between ATM, wireless, and Ethernet media. Similarly, firewalls are special cases of gateway computers, computers that form a gateway between the Internet and internal networks. These systems do a function called 'routing' in which they examine incoming packets, determine which interface to send them to based on their destination IP address, and forward the packets out the appropriate interface.

In Linux, routing is controlled by a program called 'route'. For example, to route packets to the Internet, the network configuration program used earlier does a command like this:

In this case, we are telling the computer that when it doesn't know where to send a datagram, give it the MAC address of 204.7.229.1 and send it out eth0. If 204.7.229.1 is configured to be a gateway, it will take that packet, determine whether it should go out of the local area network, and if so, give it the MAC address of it's default gateway computer and send it out the appropriate interface. The next computer down the line may do the same thing, and so forth, until the datagram reaches its final destination. This is called static routing because all of the routes are statically defined in the routing table. To see the routing table on your computer, type:

The '-n' part of the command indicates that information should be presented in numeric form. The 'gw' is an abbreviation for 'gateway'. In the output, gateway routing is indicated by a 'G' in the flags section. The destination 0.0.0.0 indicates any destination not already listed.

Dynamic routing is typically used in the core of the Internet and static routing such as that shown here is more common in the periphery of the Internet. Unless you are running a substantial Internet Service Provider (ISP) or a large corporate network, static routes are usually for you.

Putting it All Together

This then is the typical packet and datagram sequence that will happen when you get a web page from a remote Internet site assuming everything works right:

And that's how you get a web page. If you have recently gotten the MAC address or the DNS entry, your computer will usually remember this information for subsequent requests and thus save time and effort, however, the time to live on MAC addresses is usually on the order of 15 minutes or less while the time to live on DNS entries is determined by responses from the DNS server.

A variation on this process is readily observable by using a protocol analyzer such as the one provided on White Glove under the Ethereal selection in the Sniffers entry in the Administrator X11 menu:

The process and the corresponding packet numbers are: (3) ARP is used to get the MAC address of 204.7.229.12, (4) The MAC address is returned, (5) DNS request, (6) DNS reply, (7) Gateway ARP request, (8) Gateway ARP reply, (9) The TCP session begins. We will go into this example in more depth in the next chapter.

Summary

In this section we have explained how IPv4 datagrams are formatted and used to communicate over the Internet, detailed information on protocol elements from the Internet, shown how to manually configure the most common IP interfaces, routing, and gateways, described how the Internet operates to fetch a web page, and provided an example of this process using a protocol analyzer.