Securing Voice Over Internet Protocol (Ip) Networks By Thomas J. Walsh and D. Richard Kuhn National Institute of Standards and Technology Voice over IP (VOIP) – the transmission of voice over traditional packet-switched IP networks – is one of the hottest trends in telecommunications. As with any new technology, VOIP introduces both opportunities and security challenges. Lower cost and greater flexibility are among the promises of VOIP for the enterprise, but security administrators will face significant issues. Administrators may assume that since digitized voice travels in packets, they can simply plug VOIP components into their already-secured networks and expect a stable and secure voice network. Unfortunately, many of the tools used to safeguard today's computer networks, namely firewalls, Network Address Translation (NAT), and encryption, don't work "as is" in a VOIP network. VOIP systems take a wide variety of forms. Just about any computer is capable of providing VOIP, and most users don't realize that they already have basic VOIP applications. Microsoft's NetMeeting, or the newer Windows Messenger, which come with Windows platforms, provides voice and video services, and Linux platforms have a number of VOIP applications from which to choose. In general, though, the term Voice Over IP is associated with equipment that provides the ability to dial telephone numbers and communicate with parties on the other end who may have either another VOIP system or a traditional analog telephone. Demand for VOIP services has resulted in a broad array of products, including: ? Traditional telephone handset – Usually these products have extra features beyond a simple handset with dial pad. Some of these units may have a "base station" design that provides the same convenience as a conventional cordless phone. ? Conferencing units – These provide the same type of service as conventional conference calling phone systems, but since communication is handled over the Internet, they may allow users to coordinate traditional data communication services, such as a whiteboard that displays on computer monitors at both ends. ? Mobile units – Wireless VOIP units are becoming increasingly popular, especially since many organizations already have an installed base of 802.11 networking equipment. Wireless VOIP products present particularly acute security problems, given the well-known weaknesses of the 802.11 family of protocols. ? PC or "softphone" – With a headset, software, and inexpensive connection service, any PC or workstation can be used as a VOIP unit, often referred to as a "softphone." In addition to end-user equipment, VOIP systems include specialized components beyond those found on an ordinary IP network: call managers and media/signaling gateways. Call managers are required to set up calls, monitor call state, handle number translation, and provide basic telephony services. Call managers also handle signaling functions that coordinate with media gateways, which are the interface between the VOIP network and the public switched telephone network (PSTN). Depending on the system, gateway functions may be implemented as a board or dedicated appliance, or may be provided through a distributed system of servers and databases. Current VOIP systems use one of two protocols, H.323 or the Session Initiation Protocol (SIP). SIP is the Internet Engineering Task Force (IETF) specified protocol for initiating a two-way communication session. It was designed to be simpler than H.323, but has become increasingly complex, as the standard has evolved. SIP is text based; its messages are similar to e-mail message formats. Also, SIP is an application level protocol, that is, it is decoupled from the protocol layer it is transported across. Unlike H.323, SIP uses only one port in the call setup process. The architecture of a SIP network also differs from the H.323 structure. A SIP network is made up of end points, a proxy and/or redirect server, location server, and registrar. In the SIP model, a user is not bound to a specific host. Instead, users initially report their location to a registrar, which may be integrated into a proxy or redirect server. H.323 is the International Telecommunication Union (ITU) specification for audio and video communication across packetized networks. H.323 acts as a wrapper for a suite of media control recommendations by the ITU incorporating several other protocols, including H.225 and H.245. Each of these protocols has a specific role in the call setup process, and all but one make use of dynamic ports. An H.323 network is made up of several endpoints (terminals) that are normally bound to a specific address, a gateway, and possibly a gatekeeper, multipoint control unit, and back end service. The gateway serves as a bridge between the H.323 network and the outside world of (possibly) non- H.323 devices, including SIP networks and traditional PSTN networks. Most VOIP components have counterparts used in data networks, but the performance demands of VOIP mean that ordinary network software and hardware must be supplemented with special VOIP components. One of the main sources of confusion for those new to VOIP is the assumption that because digitized voice travels in packets just like other data, existing network architectures and tools can be used with little or no change. Unfortunately, VOIP adds a number of complications to existing network technology, and these problems are compounded by security considerations. What's Different About VOIP Security? To understand why security for VOIP isn't the same as data network security, we need to look at both the unique constraints of transmitting voice over a packet network, and at characteristics shared by VOIP and data networks. Packet networks depend on a large number of configurable parameters: IP and media access control (MAC) (physical) addresses of voice terminals, addresses of routers and firewalls. VOIP networks add specialized software such as call managers and other programs used to place and route calls. Many of the network parameters are established dynamically every time a network component is restarted, or when a VOIP telephone is restarted or added to the network. Because there are so many places in a VOIP network with dynamically configurable parameters, intruders have as wide an array of potentially vulnerable points to attack as they have with data networks. But VOIP systems have much stricter performance constraints than data networks, with significant implications for security. Quality of Service (QoS) is fundamental to the operation of a VOIP network. A VOIP application is much more sensitive to delays than its traditional data counterparts. If one downloads a file, a slowdown of a few seconds is negligible. In contrast, a delay of merely 150 milliseconds is enough to turn a crisp VOIP call into a garbled, unintelligible mess. In the VOIP vernacular, this is termed the latency problem. Latency turns traditional security measures into double-edged swords for VOIP. Tools such as encryption and firewall protection can help secure the network, but they also introduce a significant amount of delay. Latency is not just a quality of service issue, but a security issue as well, because it increases the system's susceptibility to a Denial of Service (DoS) attack. For a DoS attack to succeed in a VOIP network, it need not completely shut down the system. It must only delay voice packets for a fraction of a second. The necessary impediment is even less when latency-producing security devices are slowing down traffic. Another QoS issue, jitter, refers to non-uniform delays that can cause packets to arrive and be processed out of sequence. Real-time Transport Protocol (RTP), the protocol used to transport voice media, is based on the User Datagram Protocol (UDP), so packets received out of order cannot be reassembled at the transport level, and therefore must be reordered at the application level, introducing a significant overhead. Even when packets manage to arrive in order, high jitter causes them to arrive at their destination in spurts. This scenario is analogous to uniform road traffic coming to a stoplight. As soon as the stoplight turns green (bandwidth opens up), traffic races through in a clump. Infrastructure issues become significant with a change to VOIP. With conventional telephones, eavesdropping requires either physical access to tap a line or penetration of a switch. Attempting physical access increases the intruder's risk of being discovered, and conventional private branch exchanges (PBXs) typically use proprietary protocols, specialized software, and have fewer points of access than VOIP systems. With VOIP, opportunities for eavesdroppers are multiplied. VOIP units share physical network connections with the data network, and in many cases, VOIP and data are on the same logical portion of the network. Protocols are standardized, and tools to monitor and control packet networks are widely available. Attaching a packet sniffer, such as the freely available "voice over misconfigured internet telephony" (known by its unfortunate acronym "vomit"), to the VOIP network segment makes it easy to intercept voice traffic. Like other types of software, VOIP systems have been found to have vulnerabilities due to buffer overflows and improper packet header handling. Exploitable software flaws typically result in two types of vulnerabilities: denial of service or disclosure of critical system parameters. In some cases, the system can be crashed, producing a memory dump in which an intruder can find IP addresses of critical system nodes, passwords, or other security-relevant information. Crashing a VOIP server may also result in a restart that restores default passwords or falls prey to a rogue server attack. In addition, buffer overflows that allow the introduction of malicious code have been found in VOIP software, as in other applications. Tradeoffs between convenience and security are routine in software, and VOIP is no exception. Most, if not all, VOIP components use integrated web servers for configuration. Web interfaces can be attractive, easy to use, and inexpensive to produce because of the wide availability of good development tools. Unfortunately, most web development tools are built with features and ease of use in mind, with less attention to the security of the applications they help produce. VOIP device web applications have been discovered with weak or no access control, script vulnerabilities, and inadequate parameter validation, resulting in privacy and denial of service vulnerabilities. As VOIP gains in popularity, with implementations on devices of all types, it is almost inevitable that more administrative web applications with exploitable errors will be found. What do the Special Characteristics of VOIP Mean for Security? Meeting the security challenges of VOIP can require changes to a number of familiar security components. Firewalls are a staple of security in today's IP networks. Whether protecting a local-area network (LAN), a wide-area network (WAN), encapsulating a demilitarized zone (DMZ), or just protecting a single computer, a firewall is usually the first line of defense. Firewalls work by blocking traffic deemed to be malicious or potentially risky. Acceptable traffic is determined by a set of rules programmed into the firewall by the network administrator. These may include such commands as "Block all FTP traffic (port 21)" or "Allow all http traffic (port 80)." Much more complex rule sets are available in almost all firewalls. Firewalls also provide a central location for deploying security policies, the ultimate bottleneck for network traffic, because no traffic can enter or exit the LAN without passing through the firewall. This situation lends itself to the VOIP network where firewalls simplify security management by consolidating security measures at the firewall gateway, instead of requiring all the endpoints to maintain up-to-date security policies. This takes an enormous burden off the VOIP network infrastructure. Unfortunately, this abstraction and simplification of security measures comes at a price. The introduction of firewalls to the VOIP network complicates several aspects of VOIP, most notably dynamic port trafficking and call setup procedures. Several commercial solutions are available to alleviate this including Application Level Gateways (ALGs), that make the firewall "VOIP-aware," and Midcom Controls, which allow the firewall to be traversed by allowing it to receive instruction from an application-aware agent. That is, they can understand the VOIP protocol data carried as a payload within an ordinary packet, making it possible to do stateful filtering of call packets. Attempting to implement a VOIP system on a legacy network without such devices is generally not feasible. Firewalls, gateways, and other such devices can help keep intruders from compromising a network. However, these devices are no defense against an internal hacker and don't protect voice data as it crosses the Internet. Another layer of defense is necessary at the protocol level to protect the data itself. In VOIP, as in data networks, this can be accomplished by encrypting the packets at the IP level using Internet Protocol Security (IPsec). This way, if anyone intercepts VOIP traffic and is not the intended recipient (for instance, via a packet sniffer), such packets would be unintelligible. The IPsec suite of security protocols and encryption algorithms is the standard for securing packets against unauthorized viewers over data networks and will be supported by the protocol stack in IPv6. So it seems logical to extend IPsec to VOIP, encrypting the signal and voice packets on one end and decrypting them only when needed by their intended recipient. Unfortunately, the nature of the signaling protocols and the VOIP network itself make it necessary for routers, proxies, and other components to read the VOIP packets, so encryption is often done at the gateways to a network, rather than the endpoints. Such a scheme also allows the endpoints to be computationally simple and promotes scalability as new encryption algorithms can be overlaid on the network without upgrading the endpoints. Several factors, including the expansion of packet size, ciphering latency, and a lack of QoS urgency in the cryptographic engine itself, can cause an excessive amount of latency in the VOIP packet delivery. This leads to degraded voice quality, so once again there is a tradeoff between security and voice quality, and a need for speed. Virtual private network (VPN) tunneling of VOIP has also become popular recently, but the congestion and bottlenecks associated with encryption suggest that this solution may not always be scalable. Although great strides are being made in this area, the hardware and software necessary to ensure call quality for encrypted voice traffic may not be economically or architecturally viable for all enterprises considering the move to VOIP. What are the Prospects for Securing a VOIP Network? Thus far, we have painted a fairly bleak picture of VOIP security. The construction of a VOIP network is an intricate procedure that should be studied in great detail before being attempted. Integrating a VOIP system into an already congested or overburdened network could be disastrous for an organization's technology infrastructure. There is no easy "one size fits all" solution to the issues discussed in this bulletin. The use of VPNs, versus ALG-like solutions and the choice of SIP or H.323 are decisions that must be made based on the specific nature of the current network and the VOIP network to be. However, the technical problems are solvable, and the establishment of a secure implementation of VOIP is well worth the difficulty associated with these solutions. To implement VOIP securely today, start with these general guidelines, recognizing that practical considerations may require adjustments for the organization: ? Put voice and data on logically separate networks. Different subnets with separate RFC 1918 address blocks should be used for voice and data traffic, with separate DHCP servers for each, to ease the incorporation of intrusion detection and VOIP firewall protection. ? At the voice gateway, which interfaces with the PSTN, disallow H.323, SIP, or Media Gateway Control Protocol (MGCP) connections from the data network. Use strong authentication and access control on the voice gateway system, as with any other critical network management component. ? A mechanism to allow VOIP traffic through firewalls is required. There are a variety of protocol-dependent and independent solutions, including ALGs for VOIP protocols, Session Border Controllers, or other standards-based solutions. Stateful packet filters can track the state of connections, denying packets that are not part of a properly originated call. ? Use IPsec or Secure Shell (SSH) for all remote management and auditing access. If practical, avoid using remote management at all and do IP PBX access from a physically secure system. ? Use IPsec tunneling when available instead of IPsec transport because tunneling masks the source and destination IP addresses. This secures communications against rudimentary traffic analysis (i.e., determining who is calling each other). ? If performance is a problem, use encryption at the router or other gateway, not the individual endpoints, to provide for IPsec tunneling. Since some VOIP endpoints are not computationally powerful enough to perform encryption, placing this burden at a central point ensures all VOIP traffic emanating from the enterprise network has been encrypted. Newer IP phones are able to provide Advanced Encryption Standard (AES) encryption at a reasonable cost. ? Look for IP Phones that can load digitally (cryptographically) signed images to guarantee the integrity of the software loaded onto the IP Phone. ? "Softphone" systems, which implement VOIP using an ordinary PC with a headset and special software, should be avoided, if possible, where security or privacy are a concern. In addition to violating the separation of voice and data, PC-based VOIP applications can be vulnerable to worms and viruses that are all too common on PCs, and may infect other parts of the network. ? Consider methods to "harden" any VoIP platform based on common operating systems such as Windows or Linux. This includes disabling unnecessary services and possibly using host-based intrusion detection methods. ? Be especially diligent about maintaining patches and current versions of VOIP software. ? Analyze the impact of VOIP adoption on the rest of the organization's infrastructure, including issues such as backup power, E-911 emergency location, and records retention policies or other legal issues. VOIP can be done securely, but the path is not smooth. It will likely be several years before standards issues are settled and VOIP systems become a mainstream commodity. Until then, organizations should proceed cautiously and not assume that VOIP components are just more peripherals for the local network. Above all, it is important to keep in mind the unique requirements of VOIP, acquiring the right hardware and software to meet the challenges of VOIP security. For more information on securing VOIP systems, see draft NIST Special Publication 800-58, Security Considerations for Voice Over IP Systems, at http://csrc.nist.gov/publications/nistpubs/index.html. Disclaimer: Any mention of commercial products or reference to commercial organizations is for information only; it does not imply recommendation or endorsement by the National Institute of Standards and Technology nor does it imply that the products mentioned are necessarily the best available for the purpose.