Overview of Linux Networking |
In order for communication to take place, there must be some sort of common syntax and semantics encoded into some sort of a communications protocol - a shared method for sequencing communications between parties for the conveyance of meaningful content. The predominant networking protocol used in Linux as in most of the world today is the Internet Protocol (IP) and more specifically, version 4 of that protocol. For this reason, in this course we will focus on IPv4.
IPv4 is defined by a set of "Request for Comment"s (RFCs) which are documents that are subject to open review and comment for a period of time before becoming permanent. They are not standards in the sense that anybody is required to follow them, but rather, they are the sequences of interactions that form the basis for communications between parties. To the extent that parties follow these protocols they will be more likely to interoperate with other parties. The RFCs can be found at: http://www.ietf.org/rfc.html Using your web browser and this URL, retrieve RFC 791. If this fails, click here.
Page through the RFC briefly to see the sorts of things it specifies, but don't go into too much detail yet. |
At its core, the Internet consists of a set of computers of various sorts that listen to 'Datagrams'. These datagrams consist of sequences of 8-bit bytes (called octets in the RFCs). The devices in the Internet generally forward datagrams based on the IP destination address as indicated by the 17th through 20th bytes of the datagram. Here is the format of an IP header according to RFC 791:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
The header includes (1) The IP version (the bits 0100 for version 4), (2) The IP Header Length, (3) the Type Of Service code, (4) the total packet length, (5) A packet identification pair of bytes, (6) A set of flags, (7) A fragment offset, (8) A time to live, (9) a Protocol number, (10) a Header Checksum, (11) a Source Address, (12) a Destination Address, (11) a set of (optional) Options, (12) Padding to bring the datagram to a multiple of 4 bytes in total length, and (13) content associated with the datagram.
IP version: The bits 0100 for IPv4.
IP Header Length: Total number of bytes in the header divided by 4. Total length must be a multiple of 4 bytes, thus the use of padding. The minimum header length is 5.
Type Of Service: This includes 3 'precedence' bits with increased unsigned binary value indicating higher precedence, a bit for low delay, high throughput, and high reliability, and 2 bits reserved for future use. Precedence codes indicate: (000=routine, 001=priority, 010=immediate, 011=flash, 100=flash override, 101=Critical/Emergency Communications Protocol, 110=Internetwork control, 111=Network control.
Total packet length: This is the total length of the entire packet including the header and content. Note that a 'packet' is usually but not always the same as a datagram because datagrams can be 'fragmented' during transmission to allow for transmission media with limited packet lengths.
Identification: This is an identifying value assigned by the sender to aid in assembling the fragments of a datagram. Normally it is unique to the {Source, Destination} address pair over some length of time.
Flags: Flags include an initial bit that is supposed to be 0, a "don't fragment" bit, and a "last fragment" bit.
Fragment offset: This indicates the offset in the overall datagrams in which the current fragment is to be placed.
Time To Live: This is a counter that is supposed to be decremented each time it is encountered by a router or other transmission device. If the value reaches zero, the datagram is supposed to be dropped (i.e. ignored). If a datagram somehow ends up looping through a sequence of routers, the time to live will decrement until it reaches zero and then the datagram will be dropped. This prevents infinite duplicates of datagrams.
Protocol: The protocols are specified by the Internet Assigned Number Authority (IANA) based on the presentation of RFCs that use otherwise unused protocol numbers for new functions.
Header Checksum: This is a checksum of the entire header used to detect accidental line noise that can corrupt a packet. Incorrect checksums in packets can result in dropped packets. This is a protection against noise, not against malicious attack.
Source Address: This is the source IP address of the computer that is supposed to be identified as having sent the datagram and is normally used as the destination address on any return datagrams.
Destination Address: This is the IP address the datagram is to be delivered to.
Options: There are a wide range of options for IP packets specified in the RFCs.
Padding: This is supposed to be all zero values sufficient to assure that the total header length is a multiple of 4 bytes.
Content: After this header, datagrams optionally contain additional data, and this is where most of the content tends to be included, however, the header will be of particular interest to our discussions because at its heart, the network simply does it's best to route traffic from place to place in order to get the datagrams to their destination addresses efficiently. For the most part, the rest of the datagram is not important to the routing process.
It is important to note that these are all just bits that are put into the transmission media, and that anyone who wants to can enter any sequence of bites desired at any point in a transmission process. So, for example, the fact that precedence code 110 indicates Internetwork control does not mean that the system that set those bits was authorized to control the Internet. If someone starts the conversation by saying they are from the FBI that does not mean that they really are.
In order to get packets through the Internet, an IP address is needed, because otherwise, the datagram transport mechanism will not know where to send the packets. Configuring an IP address is done by the program 'ifconfig'. The example from the previous graphical interface actually performs this command (and some others):
wg:root /root> ifconfig eth0 204.7.229.125
Some of the RFCs eventually became standards because they were so widely adopted and are so central to the operation of the Internet. Standards are listed many places, one of them being here. As of this writing, only about 60 of the thousands of RFCs have been turned into standards.
The Internet operated by having each computer that receives a datagram use it's "best effort" to try to deliver that datagram to it's destination address. This is typically done via selecting the optimal interface to resend the datagram on and sending the datagram that way. If all of these decisions are made well, the datagrams flow well and the Internet works. If not, it fails. Routing and similar elements of protocol are controlled by the Internet Control Message Protocol (ICMP). Details of this protocol element are provided in RFC 792.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unused | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Internet Header + 64 bits of Original Data Datagram | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
ICMP datagrams augment the IP header with a 'type' of ICMP datagram, a 'code' indicating what the message is intended to mean, a checksum of the content, an unused sequence of bytes, and the first 64 bits of the header of the datagram which induced this ICMP message. For example, ICMP is used to indicate that host could not be found. In this case, the datagram returned would include the requesting packet's first 64 bits and the originating host would be able to tell what outgoing packet generated the incoming ICMP packet so it could act appropriately.
Datagrams are not guaranteed to arrive or if they do arrive, they are not guaranteed to arrive in order or without corruption. That's what "best effort" means, and these facts are explicitly called out in the RFCs. If properly sequenced output is desired or added reliability or privacy is desired, additional protocol elements are used to attain this. For example, the Transmission Control Protocol (TCP) is typically used to assure sequential delivery of content. Details of this protocol element are provided in RFC 793.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
TCP headers include a source and destination 'port' number. This, in combination with the host IP addresses is used to differentiate 'sessions'. For example, a session between a web browser and a web server would typically have a browser port greater than or equal to 1024 (the threshold for typical 'service' ports in the RFCs) and less than or equal to 65535 (the maximum value for a 16 bit unsigned integer). The web server typically uses port 80. So each time you request a web page, you might be allocated the next available port over 1024 on your browser, and use this as the source port for your packets. The destination port would be 80, and return packets from this request would come from port 80 on the remote host to the sending port on your host so your computer could recognize and reassociate the session with your request.
TCP headers also include sequence numbers so that if packets arrive out of order they can be reordered to get sequential behavior, acknowledgement numbers so acknowledgements can be associated with the sending sequence numbers, data offsets for indicating where the data starts in the TCP part of the IP packet, a set of flags associated with the start, continuation, and termination of sessions, a 'window' value indicating how much data can be received from the sender at one time, a checksum, urgency level, set of TCP options, padding to the nearest 4-bytes, and the content of the datagram.
An interesting example of a network traffic problem can be created if the Window is set to zero. The sessions continue working without limit, but no data can be sent. This is a trick used to defeat malicious viruses that use these protocols to send their malicious payloads. By setting up a number of false servers with this property, the attacking systems get stuck. |
Another very important protocol is called User Datagram Protocol (UDP)RFC 768. This is used for functions like looking up the translation between host names and their IP addresses. For example, when you enter "http://all.net" into your browser address window, the Domain Name System (DNS) used UDP packets to find the IP address associated with the "all.net" domain name in order to create the IP datagrams needed to communicate with the all.net server to fetch the web pages. UDP packets are independent of each other so it is generally a good idea to encode an entire request or an entire response in a single datagram.
0 7 8 15 16 23 24 31 +--------+--------+--------+--------+ | Source | Destination | | Port | Port | +--------+--------+--------+--------+ | | | | Length | Checksum | +--------+--------+--------+--------+ | | data octets ... +---------------- ... |
The DNS entry for the example network setup provided earlier is stored as a line in the file '/etc/resolv.conf' and the line looks like this:
UDP datagrams, like TCP datagrams, have a source and destination port, but because there is no synchronization mechanism, they need only a length and checksum before their content is included.
The list of all assigned numbers for IP is provided in RFC 1700. In accessing the databases of RFCs, one of the common techniques is to start with RFC 1700, find which other RFCs you are looking for there, and then change the URL in the browser's location area to reflect the changes RFC. The back button is used to revert to RFC1700 for the next search. To try it out, start here.
Datagrams typically travel from place to place over transmission media such as radio waves, optical signals in fiber, or electrical signals in cables. In order for the computer to interface with these media, Input/Output (I/O) devices are used. These devices interface between the computer and the media.
Different media have different characteristics. For example, serial communications of telephone modems using the Point-to-Point Protocol (PPP) os the Serial Line Internet Protocol (SLIP) typically uses packet sizes of less than 256 bytes, while Ethernet cables normally have packet sizes of 1508 bytes and Asynchronous Transfer Mode (ATM) uses 'cells' of 56 bytes each. Wireless varies in packet lengths. Serial communications tends to run at less than 56,000 bits per second, while ATM runs at millions of cells per second and 802.11b wireless runs from 1 million bits per second to 11 million bits per second. Wireless has a lot of noise and as the distance increases reliability goes down and bit rates are adjusted appropriately lower. Most wired connections have far less noise, and thus there are significantly different operational characteristics.
The Linux operating system interfaces to devices by using a combination of built-in and loadable driver modules. These driver modules largely reduce and often eliminate the need to understand details of the networking technologies by providing the IP abstraction and taking care of the details of the interface so the user and programmer can largely ignore them. Nevertheless, there are specific protocols such as the Address Resolution Protocol (ARP) that are effective only within a particular media and do not flow across the Internet as a whole.
Using ARP as an example, we will use the 'tcpdump' program to see how packets flow across the Internet. From the X11 xterm window type this:
wg:root /root> ifconfig eth0 promisc
wg:root /root> tcpdump -n -i eth0
The 'ifconfig' command configures an IP interface manually. In this case we are configuring eth0 - the device name typically associated with the first Ethernet card found at bootup by the operating system. The string 'promisc' is short for 'promiscuous mode'. While normally Ethernet cards only listen for packets specifically destined for them and so-called broadcast packets, when promiscuous mode is enabled, the Ethernet card observes all traffic at the Interface and reports it to the computer. The 'tcpdump' command extracts data from the Ethernet card and optionally formats it as desired. In this case we have indicated that we want numeric output (-n) from interface eth0 (-i eth0):
The result is a sequence of lines representing packets observed on the Interface. The run was terminated manually by typing [CTRL-C] (hold down the [CTRL] key and press 'c'. The first line displayed is from a session between IP address 204.7.229.79 on TCP port 22 and 204.7.229.16 on TCP port 32789. This was an acknowledgement (ack) which then resulted in another packet from 204.7.229.16 to 204.7.229.79. Port 22 is normally associated with the Secure Shell (ssh) protocol, an encrypted remote terminal protocol. The next two lines are ARP packets. In this case 204.7.229.16 is asking for the Ethernet address of 204.7.229.79 (arp who-has)and in the next packet a reply comes that 204.7.229.79 is at MAC address 0:50:ba:87:fd:78. In subsequent packets, 204.7.229.16 will use this MAC address to address 204.7.229.79.
The effect is that 204.7.229.79 does not have to have it's Ethernet card in promiscuous mode and thus the hardware can ignore packets not addressed to it. This saves the operating system time and effort that it can then spend in more useful work. Similarly, 204.7.229.79 will eventually ask for the MAC address of 204.7.229.16 and use it for transmissions. In Ethernets, packets include the MAC address information followed by the IP packet, and the total length is limited to 1508 bytes per Ethernet frame. If datagrams are larger, they have to be fragmented to be sent through the Ethernet.
While most end user workstations have only one Ethernet interface, computers that operate to route traffic between Ethernets typically have two or more interfaces. In some cases, these routers will have many different interfaces so that they convert between ATM, wireless, and Ethernet media. Similarly, firewalls are special cases of gateway computers, computers that form a gateway between the Internet and internal networks. These systems do a function called 'routing' in which they examine incoming packets, determine which interface to send them to based on their destination IP address, and forward the packets out the appropriate interface.
In Linux, routing is controlled by a program called 'route'. For example, to route packets to the Internet, the network configuration program used earlier does a command like this:
wg:root /root> route add default gateway 204.7.229.1 eth0
In this case, we are telling the computer that when it doesn't know where to send a datagram, give it the MAC address of 204.7.229.1 and send it out eth0. If 204.7.229.1 is configured to be a gateway, it will take that packet, determine whether it should go out of the local area network, and if so, give it the MAC address of it's default gateway computer and send it out the appropriate interface. The next computer down the line may do the same thing, and so forth, until the datagram reaches its final destination. This is called static routing because all of the routes are statically defined in the routing table. To see the routing table on your computer, type:
wg:root /root> route -n
The '-n' part of the command indicates that information should be presented in numeric form. The 'gw' is an abbreviation for 'gateway'. In the output, gateway routing is indicated by a 'G' in the flags section. The destination 0.0.0.0 indicates any destination not already listed.
Dynamic routing is typically used in the core of the Internet and static routing such as that shown here is more common in the periphery of the Internet. Unless you are running a substantial Internet Service Provider (ISP) or a large corporate network, static routes are usually for you.
This then is the typical packet and datagram sequence that will happen when you get a web page from a remote Internet site assuming everything works right:
You send an ARP request to your gateway to find it's MAC address. (If the DNS is on your local network, this process is delayed until after the DNS lookup is completed and an ARP request for the DNS IP address is used here).
The answer comes back.
You send a DNS lookup UDP datagram to the first nameserver appearing in '/etc/resolv.conf' asking for the IP address of the desired remote site, using the destination IP address of the DNS server and the MAC address of your gateway.
The DNS responds with the requested IP address in a DNS response datagram.
You send a 'SYN' datagram to start a TCP session on port 80 (normally the web port) using the destination address provided by the DNS lookup with the MAC address of your gateway.
You get a 'SYN-ACK' datagram back from the server indicating that it is ready for a TCP session to start.
You send an 'ACK-PSH' datagram back to the server which includes as content your request for a web page using the destination IP address of the web server and the MAC address of your gateway.
You get an 'ACK-PSH' datagram back from the server that includes the requested web content.
You display the content on the screen and send back an 'ACK' datagram to identify that the last response was received using the destination IP address of the web server and the MAC address of your gateway.
You send a 'FIN' datagram to the server to indicate that your TCP session is done using the destination IP address of the web server and the MAC address of your gateway (or it sends one to you first).
And that's how you get a web page. If you have recently gotten the MAC address or the DNS entry, your computer will usually remember this information for subsequent requests and thus save time and effort, however, the time to live on MAC addresses is usually on the order of 15 minutes or less while the time to live on DNS entries is determined by responses from the DNS server.
A variation on this process is readily observable by using a protocol analyzer such as the one provided on White Glove under the Ethereal selection in the Sniffers entry in the Administrator X11 menu:
The process and the corresponding packet numbers are: (3) ARP is used to get the MAC address of 204.7.229.12, (4) The MAC address is returned, (5) DNS request, (6) DNS reply, (7) Gateway ARP request, (8) Gateway ARP reply, (9) The TCP session begins. We will go into this example in more depth in the next chapter.
In this section we have explained how IPv4 datagrams are formatted and used to communicate over the Internet, detailed information on protocol elements from the Internet, shown how to manually configure the most common IP interfaces, routing, and gateways, described how the Internet operates to fetch a web page, and provided an example of this process using a protocol analyzer.