At the Intersection of Security, Networking, and Management

November 20, 1998

DRAFT DRAFT DRAFT

An NITB Study by
Sandia National Laboratories
and
Lawrence Livermore National Laboratory

Executive Summary

There are two pressing issues at the intersection of networking, security, and management. One is the issue of what to do, and the other is the issue of how to do it. While we have progressed significantly in the development of techniques and systems to implement technical controls when we know what they should be, we have fallen far short of developing the necessary understandings and technologies to determine what those controls should be. As a result, we have increasingly efficient and cost effective methods for doing things like setting access controls, authenticating keys and control of keys, and granting or denying authority - but little understanding of how those access controls should be set, the risk management value of authenticating control over keys, or what authorities to grant to what individuals and systems. What we cannot do today, and what little work appears to be underway to resolve, is to figure out optimal protection settings.

It appears that the primary motivation for using the tools available today is cost reduction and increased reliability. In effect, the task of systems and network administration has become so critical to business function and so expensive in terms of personnel time, that economic pressures justify automation. In this economy of scale, a single administrator sets control values that may effect hundreds, thousands, or in extreme cases, hundreds of thousands of systems. From a standpoint of risk management this is inherently dangerous, and none of today's widely used systems really address this multi-user control challenge, but this would not be a difficult challenge to meet and there is a theoretical basis in place for meeting the need.

We know from experience that no individual is capable of even setting all of the access control bits in a single computer system correctly (in the sense of allowing only the necessary and sufficient accesses). And yet we now have individuals making control decisions for large masses of computer systems from a central control point. Just as every good decision made centrally propagates throughout an organization quickly and efficiently, every poor decision is multiplied in its effect. Mistakes by those tasked with critical control functions have become increasingly expensive and difficult to detect and repair.

There are significant unanswered questions and little effort appears to be underway to address these questions. Perhaps the most important of these questions relates to sensitivity. In essence, we have to know how close to optimal we have to get before improvement is no longer needed in a particular environment. In order to address this question, it would seem apparent that we need some sort of metrics, something we lack in the information protection arena today.

While it appears that we have a long way to go before the majority of networks and networked systems are properly controlled, it also appears that the technology to control them has advanced to the point where it is useful, efficient, cost effective, and being widely adopted. In the near future, we will likely see significant advances, increased automation of the decision processes, improvements in validating control decisions against policies, and an ever-increasing control purview. The future is indeed bright for this technology.

Outline

Scope of this study
- Securely Managing the Network Infrastructure
- Using the Network to Manage Node Security
- Management issues in network security are outside of the scope of this study
Study Process
Secure Network Management - Securely Managing the Network Infrastructure - status and technologies
Network Security Management - Using the Network to Manage Node Security - status and technologies
Gaps and Limitations

Scope of this study and terms used

Three interrelated topic areas are considered in this study and we have decided to define terms that we associate with these areas. The terminology we use is not standard by any means, but since no other standard appears to be available, we chose to define for our own use. We do not particularly endorse this specific terminology, but in identifying the need to have such definitions, is hope to help stir the debate over what words to use and their meaning in this context. We also note that our distinction is arbitrary and that in today's complex intertwining of infrastructure control, data, and application, the distinctions we draw are more than a bit artificial.

Secure Network Management is the term we use to discuss secure (i.e., with appropriate integrity, availability, and confidentiality) control of network components (e.g., routers, gateways, firewalls, and infrastructure elements interconnecting them) management that is done securely. In other words, we are concentrating on securing the communications between end-nodes that communicate with each other through the network. In this case, we are not concerned with so-called host security except for hosts that act as parts of the infrastructure interconnecting other hosts. Nor are we concerned with making the "right" management decisions.

Network Security Management is the term we use when we discuss the use of a network infrastructure to secure (as before, integrity, availability, and confidentiality) control of the nodes attached to a network. In other words, we are concerned with the control of the processing, storage, input, and output of hosts attached together through some sort of infrastructure. In this case we are not concerned with the network itself but only the functions carried out at the end-points as protected through the sort of remote control feasible in modern networked environments.

In this study we are not concentrating on Managing Network Security which we define as the art of management as it applies to network security. While we believe this is a very important issue, it is outside of the scope of this study and other current technical baseline studies.

The Study Process

This study was performed by interviewing about 15 providers of network security management tools and products, by reviewing open source material on the topics under consideration, and by writing up the results. (1)

The study took place in cooperation with the Computer Security Institute and many of the interviews were done during their fall "Network Security" conference in San Francisco in 1997. About ten providers were interviewed during the conference and another ten were interviewed in follow-up discussions over the following months, at other conferences, and over the telephone.

Other studies on related topics were also consulted including most notably a 1997 study performed for DISA by SAIC and a recent Sandia study of public key infrastructure components and capabilities for secure networking. (2)

Section 1 - Secure Network Management

Securely Managing the Network Infrastructure

The central issue of secure network management is retaining control over the collection of wires, routers, switches, gateways, firewalls, and outside infrastructure elements comprising the network. The notion of control is not a trivial one and not one that we will be fully addressing in this study. But to give a sense of the issues we are trying to address, some questions might be appropriate:

How do we assure to an appropriate degree of certainty that...

...traffic goes where it should and nowhere else?
...only permitted traffic passes?
...specified traffic levels are maintained?
...specified traffic volumes are available?
...reachability is appropriate?
...information is not lost or corrupted?
...the network infrastructure is not tapped?

In the following discussions, we describe how these things are done and limitations noted during this study. (3)

1.1 Traffic goes where it should and nowhere else:

Control over wires, connectors, wire rooms, routers, gateways, firewalls, switching systems, telephone systems, cable systems, satellites, and other infrastructure components is key to success in this assurance issue.

Internal wiring is protected by physical controls, wiring standards, inspections, the use of tools like time domain reflectometers, and random testing of end connections.

Routers can be controlled strictly by their consoles, but they usually are not because it is operationally important and more efficient to control them remotely. Remote control implies protocols for control and the potential for protocol forgery. If the control plane is not secured and separate from the data stream, any user of the data stream has the potential to create forgeries that interfere with control.

In most modern routers, there are capabilities to limit forgery by limiting the flow of packets based on source and destination pairs. This method can be used to create a closed system with respect to elements of the data stream that can effect the router control operations. This can be done without undue interference with normal network operations or control of routers by subscribers to a given network service who place their own network control on top of the underlying infrastructure. However, without secured low-level control, high-level control is not attainable with this class of solutions. Unfortunately, the complexity of asserting this level of protection is beyond the reach of most current technical staff at infrastructure and network service organizations and no automation currently exists to provide for this capability. In addition, the placement of such controls in network routers has potential performance implications that cause many providers to delay or avoid implementation.

The protocols for managing network routers are themselves relatively insecure as well. For example, flexible network infrastructure implies that routers have the ability to compensate automatically for network changes. This is accomplished by protocols that have historically been flawed. The result has been a series of network-wide collapses on a national and sometimes global scale.

Finally, routers are not always implemented properly and this results in flaws such as the recently discovered weak passwords and packet-based attacks on router data and control systems. While this challenge may be met with better design in the future, work-arounds are needed in existing infrastructure, which is expensive to replace.

Switches are essentially specialized computer systems, often consisting of a combination of a control computer and a special-purpose switching device. Telephone switches have historically been subject to the same flaws as general-purpose computer systems. In addition, switches often have vulnerabilities associated with special services used in telephony for testing, installation, and other service functions. Vulnerabilities are commonly mitigated through improved computer security for the controlling computer, separation of control (called signalling in most telephony applications) from data channels using coding or physical redundancy, and the use of intrusion detection systems.

This set of techniques have been largely successful but recent changes in telephone regulations have made this protection far less likely to suceed because it has always been based on limiting access to trusted individuals. With telephone deregulation, infrastructure providers are now required to provide visibility and control in to their infrastructures as a part of the doctrine of equal access. Many telephone companies are not well prepared for this and don 't have strong techniques available to mitigate this risk at this time. Current efforts are concentrated on finding network management techniques that will meet this challenge, but no technology currently exists to handle the challenges that need to be met by the telecommunications industry while very strong competition has forced these companies to abandon much of their infrastructure surety effort related to high-grade threats in favor of other more profitable areas of pursuit. It is likely that government-funded research will be necessary in order for effective network management for telephone switching systems against high grade threats to be developed and deployed in the forseeable future.

Routing is normally out of user control. This implies that users are unable to decide how their information gets routed. A substantial loss of control by an infrastructure provider can result in serious damage to customers who depend on route-based protection. For example, traffic could be routed through competitive intelligence groups causing financial harm or through collection agencies resulting in loss of security for military or other government operations. In some circumstances today, configuration errors could cause routing around the world for non-local calls under heavy load conditions. Furthermore, loss of control of key components of the switching system could result in the loss of emergency telecommunications required for continuity of government in crisis situations. With the addition of increased visibility into the switching system by non-infrastructure-owner telecommunications service providers, as mandated by recent deregulation, maintaining communications under high grade attacks based in the non-infrastructure-owner organizations is likely to be very difficult.

Switch errors have caused power outages (for example a small city in California experienced outages a few years ago because the wrong telephone number was used when communicating to a power grid switch). Party lines occasionally happen by accident. Call forwarding and conference call vulnerabilities are often exploited by telephone attackers (a.k.a., phreakers), resulting in lost confidentiality, lost integrity, and substantial toll-fraud losses. Phone system radio components are open to observation and modification. When access to a central office (CO) is available, call rerouting is simple to accomplish and widely documented in the attacking community. Wires in infrastructure are open to attack; gateways, firewalls, and ATM clouds all have protection flaws that can be exploited if not properly managed; and routing controls between and within these infrastructure elements have not been mathematically analyzed to determine their safety, absence of livelock, absence of deadlock, or other similar properties of import to continuity of service and proper routing of calls.

In addition, only one telephone company interviewed in this study indicated a completely separate control plane in their infrastructure. Without an independent control plane, data streams can be potentially used to take control over switches, thus empowering sufficiently knowledgeable end-users with the potential for remote exploitation without the need to penetrate the infrastructure itself.

Within organizations, whether they are infrastructure providers or the organizations that depend on them, internal wiring is usually vulnerable to subversion. Wiring is usually based on a common bus with information broadcast to all those who share the media. Satellite systems broadcast over a wide area. Emanations are almost always easily exploitable. Furthermore, the technology to exploit these vulnerabilities is widely available, inexpensive, and easy to use.

In summary, infrastructures of today are not designed or implemented in such a way as to assure that traffic goes where it should and nowhere else, and technological solutions can only address a part of this issue.

1.2 Only permitted traffic passes

Most publicly available packet-based infrastructures today do not limit inbound traffic to authorized source addresses, and as a result, forged addresses are simple and commonplace. Over the past several years this situation has improved, largely because of the rampant forgeries in the Internet used to deny or spoof services. Since providers cannot retain customers when they do not provide services, many providers who originally chose to permit unrestricted access rather than slow infrastructure routers with protective filters have changed their policies. In effect, forgeries finally reached a level where the denial of service they produced was more damaging than the cost of preventing them so the providers responded in their own best interest by preventing the forgeries.

This situation can be largely mitigated in switched Ethernet, Asynchronous Transfer Mode (ATM) and other switched technologies because of the implicit authentication provided by the circuit switching orientation they bring to the communications environment - but it rarely is. The ability to mitigate stems largely from the ability to provide routing limits based on where a signal is physically coming from rather than a logical Internet Protocol (IP) or Message Authentication Protocol (MAC) address.

Even though IP and MAC address forgeries are possible, they are far more easily detected and prevented in these technologies than in the party line environments common in non-switched Ethernets. An additional challenge comes from the fact that source addresses can legitimately change over time - for example in a dial-up IP address assignment situation. This challenge has been largely met by such technologies as Radius authentication servers which pass authorization information back into the infrastructure to authorize use, and in the process can specify usage restrictions based on the authenitcated identity of the user. Thus even in flexible address situations, a properly managed switched network is capable of detecting attempted forgeries and tracing them back to their physical connection.

While some controls to limit traffic have been implemented in firewalls and some infrastructure components, source/destination pairs are almost never checked or limited because of the difficulty of managing configurations at this level of granularity. According to studies performed by the Computer Security Institute and others, less than a third of organizations implement firewalls at their borders, and firewalls are also relatively hard to control at the level of granularity necessary in order to be truly effective. Network management systems have not addressed this challenge, at least to some degree because the mathematics of proper configuration control has not yet been worked out and is somewhat complex.

So it appears that limiting traffic to authorized traffic is the exception rather than the rule, especially in the infrastructure where the common carrier roles of providers is taken to mean unfettered access to infrastructure by their clients.

1.3 Traffic levels are maintained:

Few networks are required to handle specific traffic levels and most modern systems are designed to handle only 85% of peak load before performance and reliability is reduced. With the rapid growth of the information infrastructure, 'brown-outs' have become very common. A typical example is the performance of Internet connections during peak hours and the lack of available dial-in ports during early evening hours of most Internet service providers.

Asynchronous Transfer Mode (ATM) is really the first networking technology to include price vs. service level tradeoffs (i.e., quality of service) in signaling protocols. Quality of service is only partially implemented today and many difficulties have arisen for those who have attempted to attain guaranteed ATM service levels while supporting the rest of the flexible protocol requirements of ATM.

Almost no network designers analyze quality of service or availability issues thoroughly other than for telephony, and telephony has had major difficulties in keeping up with the rapid growth of both call volumes and terminal equipment. Little theory is available in this area and optimization expertise in this area is highly specialized and very limited.

1.4 Reachability is appropriate:

Most nodes in modern networks are designed and implemented so that they can connect to anywhere. Universal access is key to the widespread application of information technology, but it introduces many complexities for those who wish to maintain control. This issue is key to many aspects of information protection. For example:

There is almost always a tradeoff between control and accessibility;

Uniformity of interface is necessary for compatibility and a key reason that errors and omissions cause the connection between systems that re supposed to be separated; and

The raging debate between those who desire unfettered access and those who wish to protect intellectual property is centered on this issue.

Transitivity is rarely accounted for in the analysis of reachability even though it has long been known that transitive information flow is a key property in virus spread. When combined with the inclination to grant access for legitimate purposes but not to remove it once that purpose no longer exists, this creates ever-increasing reachability for most users. Within most intranets, access is rarely limited, and where it is limited, transitive access is almost always feasible by a knowledgeable user.

1.5 Information not lost or corrupted:

Information loss is common in today's networks and happens at many different levels. At the packet level, packet switching networks almost universally have a policy of dropping packets when loads get high or collisions occur. Lost connections are fairly common even in so-called reliable protocols such as the transfer control protocol (TCP). Losses at higher protocol levels are even more stunning. For example, it is fairly common for large volumes of email to be lost due to hardware, software, or configuration errors - in one widely published incident some 40,000 emails were lost during a software maintenance operation.

Networks don't usually prevent corruption either. For example, Internet Protocol (IP) networks are often subject to corruption when accidental or intentional packets cause sessions to fail. Email commonly ends up in the wrong place, while email forgeries are so rampant that it has become big business to forge email return addresses in order to provide for Internet-based junk mail without getting replies in email form. Web-based attacks demonstrate the ability to corrupt information on servers, much of which is used as a basis for securing other systems and trusted by recipients usually without question.

Security management for networks is largely unauthenticated, leading to widespread outages and corruption. The Internet has repeatedly been brought down by corrupted information forwarded to router tables without proper verification or testing. This universal trust of unverified information at the infrastructure level is perhaps one of the most serious impediments to network integrity. It results from the historical way that networks were implemented - based on trusted insiders with universal access. Changing this requires that major portions of the information infrastructure be rebuilt and rethought - a process that will happen only at great expense and over a substantial period of time.

1.6 Network is not tapped:

Physical security is effective but rarely used to a level necessary to protect modern computer networks against high-grade threats. Almost all infrastructure and network components today can be trivially tapped by someone with no special skills in a very short period of time. With only a few minutes of access, a tap can be placed, while detection is expensive, time consuming, and rarely done at an infrastructure level.

End-to-end encryption is effective against most tapping but is not commonly used and expensive in terms of management overhead. Ongoing efforts by government to slow the spread of encryption technology has been somewhat successful, but an unfortunate side effect is that our networks have not been built with embedded encryption at every protocol level. Such an embedding of even moderately effective encryption would largely eliminate network tapping, but law enforcement continues to oppose this because it would largely eliminate their ability to carry out effective line tapping of digital traffic.

Routing control provides a potential alternative to encryption for limiting tapping within infrastructures, but is almost never used and is not properly understood yet. For example, by limiting the routes taken for information or by using multiple paths to route information so that it is split up at the source and reconstructed at the destination, the effectiveness of tapping is largely mitigated. In the limit, path diversity can be as effective as encryption in an information theoretic sense, while limiting paths to physically secured subparts of an infrastructure is as effective as other physical means of protection. While some limited research is underway in this area, it is only minimally supported and has yet to draw much attention.

1.7 Summary:

If we were to keep score of the preceding discussion, we would find that none of the reasonable control objectives for network security itemized above are being met in the vast majority of today's computer networks.

With rare but notable exceptions, we do not now assure to an appropriate - or even well measured - degree of certainty that any of these control objectives are met. We do not assure that traffic goes where it should and nowhere else, that only permitted traffic passes, that specified traffic levels are maintained, that reachability is appropriate, that information is not lost or corrupted, or that the network infrastructure is not tapped. Thus the state of information protection as it is widely applied today in secure network management is poor.

Section 2 - Secure Network Management Technology

There are a substantial number of technologies available for secure network management today, and they are generally divided into the following sorts of items:

Key management systems
Infrastructure management protocols
Infrastructure management systems
Independent control planes
Virtual private networks

2.1 Key management systems:

Generally speaking, key management systems are used to manage the keys used to encrypt and authenticate information in systems and networks. The widely used systems of this sort are predominantly based on the exchange of public keys used to create and exchange private keys used for encrypting sessions and on certificates that can be used to allow a certificate authority to vouch for the bearer of the certificate. The key virtues of these systems today are the ability of certificate systems to reduce centralization of encryption while providing for a centralized or controllably distributed control over authentication and authorization.

The issues that present substantial challenges in today's environment are the ability to revoke authorization or vouchers in a manner that effectively balances performance with revocation time and space. Future challenges that appear to be key are the embedding of these technologies in infrastructure components, the issues surrounding key escrow and recovery both in conjunction with law enforcement and with management's legitimate need to access information protected by cryptographic technology. While these issues are being heavily debated, resolution appears to be far off. An additional concern with all such systems is that they depend on the ability to authenticate the person using the technology and the systems used to implement it. In today's environment, as it has been for much of the history of modern cryptography, the cryptographic transforms are far more secure than the systems and techniques used to operate them.

2.2 Network management protocols:

Most Internet Protocol (IP) based networks are managed by the Simple Network Management Protocol (SNMP). This is historically very insecure and the new replacement for SNMP, which provides improved authentication, has not yet been embraced. One example of an SNMP weakness is the automated routing table updates that have caused widespread outages. Source quench packets can be used to slow traffic to unacceptable performance levels, while other Internet Control Message Protocol (ICMP) packets can cause traffic to be redirected, routers to cease functioning, and details of traffic flow to be revealed or altered. (4)

Signaling protocols are used to control most of the packet, cell, and line-switching systems used in the telephone infrastructure that underlies the rest of the information infrastructure. These protocols, like the "Simple Network Management Protocol" (SNMP), were designed to provide easy control of infrastructure components by trusted insiders. Asynchronous Transfer Mode (ATM) signaling, Electronic Switching System (ESS) signaling, and other infrastructure signaling protocols used to set up, maintain, and tear down calls and other interconnections have little authentication, are not designed to withstand intentional attacks, and are deeply embedded in hardware and software. In many of these protocols, performance is critical. For example, there is at least one such system that requires sub-millisecond response to faults to prevent cascade failures that could bring down large portions of the telephone networks.

Underlying the basic information infrastructure, is the power infrastructure, which is also controlled by a signaling system wherein Supervisory Control And Data Acquisition (SCADA) systems send signals to power switching and circuit protection equipment. These networks are also based on trusted insiders making correct control decisions, and because the rapid rate at which changes in power propagate through the infrastructure and the substantial distances over which signals must flow to reach remote control points, these signaling systems also have stringent timing and delay constraints. Just as in telephony, power control systems are expensive and deeply embedded, and adding cryptographic or other similar protection presents daunting compatibility challenges.

2.3 Network management systems:

Data gathering, graphical display, data storage, analysis, and retrieval, communication, and monitoring are the primary functions of network management systems in widespread use today. In order to effectively control a large complex switching network, automation is needed for gathering, analysis, presentation, and control. Databases are combined with these basic control functions to provide more complex analysis and reporting, and correlation algorithms are added to facilitate pattern detection, trend analysis, and customized analysis. Communications with infrastructure components can be encrypted or authenticated by network management systems if the network components have compatible protocols, and hierarchies and redundancy can be provided by placing multiple systems throughout an infrastructure. Typical systems allow network mapping and remote control of components within the infrastructure. More advanced systems provide for visualization of data, traffic analysis, load analysis and balancing, and the embedding of customer-specific features.

2.4 Separate Control Planes:

In-band signaling uses the same infrastructure for data as for signaling. Since the signaling protocols are not separated, data users have the potential for sending data that has signaling effects. Historically, telephone systems used in-band signaling which resulted in the creation of illegal red boxes, black boxes, and so forth. These devices were designed to send signals into the system to allow end users to control infrastructure directly by entering trunk lines, simulating the sounds of dropping coins, and so forth. The efficiency of using the same infrastructure for signaling as for data is tremendous, but risks are also quite high. One side effect was a set of laws against the use of tones of certain frequencies, but by publishing the laws, an obvious hint was given to potential attackers. IP and most other modern data network signaling is normally in-band, and it displays the same weaknesses seen more than 25 years ago in the telephone system.

Out-of-band signaling is achieved with any of a number of techniques. Time division multiplexing can be used to separate signaling protocols in time from the data stream in cases where the infrastructure does not extend all the way to the end nodes and is a common method for separating channels in microwave and satellite communications systems. Similarly, frequency division multiplexing can be used to separate signaling when the frequencies are not passed from the end node into the infrastructure. In each of these cases, efficiency is potentially lost either by increasing signaling delay or by wasting otherwise available bandwidth and, if pushed to the end points of the network, can still be disrupted by the end user. Signal shape division is an alternative method for signaling in which the shapes of signals (e.g., waveforms) are different than those for data. This method eliminates essentially all data channel attacks at the cost of requiring more complex hardware at all nodes in the network authorized for signaling. Finally, physical separation of the control and data planes is feasible and has been done by at least one major infrastructure provider which uses a separate fiber within fiber bundles for signaling between infrastructure elements.

2.5 Virtual Private Networks:

A Virtual Private Network (VPN) is a network that provides a private enclave to some subset of the users of a larger actual network. VPNs are generated by one of two methods; either cryptography is used to separate the ability to use data streams, or routing controls are used to physically separate data streams.

Most current VPNs that use cryptography have a point of presence on a public or semi-public network which provides cryptographic services in the exchange of information between points inside it's local enclave and points inside other enclaves it is provisioned to communicate with. The intervening networks are treated as a cloud of unknown outsiders who cannot read content or alter content undetected. This implies key management, encryption devices, and other components that impact performance and cost. Most current VPNs of this sort don't defeat traffic analysis, assure availability, or prevent attacks against the VPN gateway platforms from defeating their entire purpose. In addition, most VPNs of this sort are configured so as to allow unencrypted traffic to public networks when not communicating with other VPN enclaves. This introduces the potential for exploiting content-based attacks (i.e., attacks wherein interpreted content contains malicious code) against systems within the enclave, which then grants a foothold for attacks within and between communicating enclaves. This technology is normally not used within infrastructure because of the penalty in cost and performance and the resulting requirement to uniformly implement cryptographic capabilities to large portions of a network.

The second sort of VPN is in limited use today involves provisioning network components between participating parties so as to control information flow between them. This can be done by router configuration, with ATM Virtual Connections (VCs), with line switched systems such as are common in telephony, with private infrastructure, with fractional T1 circuits, in frame relay routers, and so forth. Some of these technologies allow very flexible reconfiguration based on authenticated user identities allowing the network to configure based on the possession of a hand-held device, a user identity and password, and so forth. These networks can be provisioned to provide guaranteed bandwidth, strong confidentiality against non-infrastructure-altering attacks, and a high degree of trust of the identity and location of communicating parties. This sort of VPN can be used to assure control of routers in a network backbone, for carrying need-to-know separated information through a common infrastructure, and when guaranteed performance is required. Limited forms of this technology are in limited use in infrastructures to prevent IP address forgery, to separate need-to-know, and for a wide range of other virtual LAN and switched Ethernet applications. Mathematical limitations on configuration management, both in terms of our lack of understanding of the underlying mathematical structure, and in terms of the complexity of the algorithms currently available to solve these problems, substantially limit our ability to use this technology on a wide scale, but progress is underway.

2.6 Summary:

Technologies for securely managing network infrastructure elements are limited in their scope and technical capability, and the management of large networks is still more of an art than a science. Nevertheless, they provide a cost effective way to leverage hard-to-find expertise against hard-to-solve problems. While they are necessary for modern network operations, the limited assurance they afford is largely ineffective against even the mildest of threats and they cannot be expected to sustain required operational characteristics under malicious exploitation by knowledgeable and well-funded attackers.

Section 3 - Network Security Management

Using the Network to Manage Node Security

Assuming that the network infrastructure operates ideally in every important way, the remaining challenge lies in how we cost-effectively control and protect the nodes attached to this network. The issues that commonly emerge are assuring integrity, availability, and confidentiality; controlling access; auditing; and maintenance.

3.1 Assure integrity:

Integrity controls are minimal in almost all widely used operating environments used as end nodes. Change control is the exception rather than the rule, the vast majority of systems today provide remotely accessible executable content to be loaded by the user from anywhere in the world often without warning, and new exploits against the integrity of these systems are published via the Internet on a daily basis. While there are a substantial number of tools and techniques available to assure integrity, they are not widely adopted, they are expensive to use effectively for most situations, and to date the risks have not justified the costs of using these techniques. There are a few notable exceptions where this is not the case:

Quality assurance is commonly used in high-value applications where integrity is inherently important to maintaining value. Examples include real-time control of manufacturing systems, electronic banking systems that process large volumes of information for major clearinghouse activities, some command and control systems, and telecommunications switching systems.
Access control is widely used both within and between systems, particularly when timeshared access to databases, network file servers, and other similar repositories are involved. Similarly, access control is common at entry points such as dial-in modems, firewalls, and remotely accessed high-valued systems.
Scanning for known corruptive sequences is very widespread and has been reported as used by almost all substantial organizations. Virus scanners lead the list, followed by other intrusion detection technologies, internal audit and configuration checking tools, and remote vulnerability demonstration tools.

3.2 Assure availability:

Availability is essentially unaddressed in today's computing environment and this is leading to an increasing number of exploitations by various groups and individuals on a wide scale. Major examples include:

the recent attack against tens of thousands of Windows NT systems via published denial of service protocol attacks
the recent publication of scores of new denial of service attacks against computer systems by protocol and content of remote services
the anticipated failures associated with date errors in the year 2000 and the numerous date failures we have experienced historically
the widely known process virus and paging monster attacks against most operating systems, and
the widespread lack of power conditioning and uninterruptable power at end nodes.

Perhaps most disconcerting is the notion that many organizations that were not impacted by events such as the NT attacks were kept safe largely by accident rather than by design or intent. Many of the decision-makers in these organizations appear to believe that they were saved because their protective system worked as they should, when in fact, many such systems did not work as they were designed and erroneously failed to pass the packets that just happened to be used in this attack.

Access controls may help limit attackers attempting to exploit vulnerabilities requiring access, network backup and recovery may help but is not integrated into current tools, and remote system configuration may be used to enhance availability but is not explicitly done by current products. Regular maintenance is also key to high availability but the lack of competent systems administrative personnel limits the availability of this level of service.

3.3 Assure confidentiality:

Confidentiality is a core function of common tools today and the historical framework in which information protection was created and has been viewed. Unfortunately, in today's computing environment, confidentiality increasingly pales in importance by comparison to integrity and availability except in exceptional circumstances. When end users are asked which effects they wish to avoid, integrity loss is almost always ranked as most important, availability is usually ranked very highly, and confidentiality us usually third. But when security is brought up, the notion of confidentiality seems to be emotively evoked. Perhaps even more intriguingly, infrastructure providers favor availability above all, integrity second, and confidentiality is a poor third. While all three are considered important, when push comes to shove, these facts come to bear.

Access controls are the major method of preventing information leakage in modern information systems, while auditing and audit analysis enhance gross leakage detection but not fine-grained exploitation. Automated user removal is a key function that protects confidentiality and is one of the major selling points of commercial NSM systems.

3.4 Control access:

Access controls in a typical host today include something on the order of 1,000,000 bits of protection settings, user and group identities, passwords or other authenticating information, and application-specific controls. It appears that no human is capable of managing this much information properly today, especially in a changing environment, so tools of various sorts have been created over the past ten years to assist people in setting and identifying changes in these settings. More advanced network-based tools extend these management capabilities to the remote management's control, but this doesn't change the fundamental issue that we don't really know what the ideal settings are. (6) But even if we did know how to specify the optimal settings, there are challenges related to:

the incommensurability of information across platforms (e.g., how do we translate append-only in Novell to Unix, DOS, or other environments without the same semantics),
issues of controls at different granularity (e.g., network, system, device, directory, file, record, field, bit), and
timeliness issues (e.g., is there ever a window of vulnerability between the creation of a file and the setting up of it's protections).

3.5 Audit:

Auditing is fundamental to the feedback required for control to be effective, but the information technology (IT) audit function is generally understaffed. For example, it is not uncommon for a single auditor to have responsibility for verifying the proper protection of thousands of computers. Clearly this leaves only a very small amount of time for each system, and without very good tools, is an infeasible task. As a result of this workload, auditing is commonly done on a statistical basis with more critical systems audited with greater frequency and at greater depth. The statistical approach is normally far more effective if it involves less travel and can look at larger amounts of data in shorter periods of time with similar detection accuracy. The generation of audit trails on each system is common in information systems and the delivery to central locations for analysis provides both a degree of independence from conditions on the system under audit and an ability to do audits with greater frequency and surprise. This increase in uncertainty for the attacker is a considerable advantage for the auditor. Unfortunately, some of the advantage of remote auditing is lost by not being able to access information first hand. The remote auditing process depends on the integrity of the system being audited, the proper operation of intervening infrastructure, and the integrity of the collection and analysis system. If an objective of an audit is to verify integrity, there are serious questions about the validity of assuming that the system being audited has integrity.

Current technology enhances audit by providing visibility into large numbers of hosts from a central site, large-scale correlation across platforms, rapid collection and analysis against common criteria, and trend analysis. The centralization of tools provides a richer auditing environment and better data reduction and long-term storage and retrieval capabilities. Remote maintenance of audit information also enhances availability by eliminating some common faults created by audit trails using excessive space.

3.6 Maintain:

The real key to the success of today's remote host security management tools lies in the resulting reduction in systems administration and maintenance costs. The centralization and partial automation of mundane systems administration tasks allows a small number of systems administrators to administer more systems more effectively. Security benefits are a side effect of more efficient administration and improved configuration management, and this advantage has been leveraged to improve sales and marketability of these products. The primary security value lies in improved access control (both by automating the setting of the bits and by automating the removal and modification of privileges for accounts), improved auditing and audit analysis, and improved configuration management.

While the improvement is efficiency is laudable, there is a tradeoff in improved efficiency related to reduced efficacy against certain classes of threats and failure mechanisms. Specifically, making fewer and fewer people responsible for more and more protection settings increases their sphere of influence over the overall operation - perhaps beyond the desired point. If taken too far, this begins to seriously impinge on basic control priciples such as the Least Privilege and Separation of Duty principles of the Generally Accepted System Security Principles (GASSP) standard and the Segregation of duties and Segregation in networks control requirements of the British Standards Institute Code of Practice for Information Security Management (BS7799). From a risk management perspective, this has to be seriously considered.

3.7 Summary:

The real advantage of automated remote host security management lies in the efficiency gained by centralization - a seeming contradiction to today's networking paradigm of highly distributed systems purchased, controlled, and operated by the "data owners".

Section 4 - Network Security Management Technologies

Modern remotely controlled security tools help assure integrity by providing remote access for an authorized administrator or automated service. The basic components of these systems include:

A central database (possibly redundant, mirrored, and/or hierarchical) that stores configuration information passed to other network components and historical information reflecting the operations on and of the controlled systems.
A secure communications system that links the central database system(s) to hosts under control while providing assurance of privacy and integrity of information flowing through the intervening untrusted infrastructure.
Programs running on the controlled systems (often called agents) that are embedded into those systems in such a way as to grant system privileges to authorized remote users, collect information on the controlled systems, and in some cases, initiate communication back to the central database indicating special circumstances.
Interfaces designed to allow improved efficiency of administration, auditing, and other management tasks related to controlling networked computer systems in the particular environment.

4.1 Database:

Databases in remote host management systems are, more often than not, SQL compliant, and typically allow queries for outside analytical tools. Distributed databases in these systems tend to be in a master-slave relationship because central control seems to be the goal of these systems and because, with the exception of some key management systems, decentralized control appears to be unnecessary and unsolved.

The databases in these systems tend to be managed from menu-based user interfaces, and as a result, they require less database expertise than security expertise to operate. In some cases, the databases are customized to the workflow requirements of the customer and provide forms-based interfaces to allow access requests, position changes, and other administrative tasks to be automatically submitted, approved by competent authority, and promulgated into operational systems.

The security of the central database is key to the success of any such system, and some vendors have made substantial efforts to put their applications on relatively secure platforms. While this is not an industry standard practice, it is a valid selling point and an important issue to be addressed in the centralization of security.

The fundamental limitations of these systems seem to lie in four areas. The tradeoff between bandwidth, storage space, performance, and granularity seems to be a key issue in large-scale deployment of these systems. This is reflected in a wide range of different approaches to the communications and storage of information, division of labor between central and distributed components of these systems, and selection of automated versus manually triggered activities. Timeliness is closely linked issue that has to do with bandwidth, responsiveness, and the revocation process. Finally, no metrics have been demonstrated to date to compare performance of these systems to human experts. While the question of accuracy and completeness would seem to be important, the key customer interest in most cases seems to be the ability to produce reports that demonstrate improvement to management (i.e., perceived improvement), regardless of whether the demonstrable 'improvement' is reflected in better protection. The ability to demonstrate quantity is far easier than the ability to demonstrate quality in the short run.

4.2 Communications:

Communications between central databases and agent programs running on remote systems is usually secured by cryptographic protocols. Public key systems are typically used for private key exchanges and private keys are used for transmissions. These systems are intended to provide confidentiality for exchanged information and assurance against forgeries or other corruption. Custom protocols are normally used for the exchange between databases and agent programs in order to provide for efficient implementation of functions provided by the specific product. While many of these systems are operated in a master/slave relationship, some provide interrupt driven interfaces permitting agent programs to alert central databases of changing conditions. This type of system also provides for a form of intrusion detection.

4.3 Agent Programs:

Remote "agent" programs are commonly inserted into hosts to implement controls specified by the remote administrator. These agent programs act on behalf of the remote administrator and/or the database to provide control in the form of accepting and acting on requests and feeding back results. They are usually slaved to the central databases but some allow interrupts to be generated on locally detected conditions. In some cases, these agents go beyond the scope of intrusion detection into automated response. Typical examples include shutting down an account after an excessive number of failed access attempts. In addition to local incident handling, the central reporting of these incidents can result in global account freezes and other similar indirect effects.

4.4 Interfaces:

Four types of interfaces are common in remote security management systems. Administrator interfaces allow policy issues to be related to controls. User interfaces Allow users to request services and view status information on their accounts and pending requests. Management interfaces allow management to get reports and measure control effectiveness at a statistical level. Auditor interfaces allow auditors to examine controls remotely and run programs against identified control settings.

Section 5 - Product Characteristics:

We examine product characteristics from 14 viewpoints used in some audit systems. (5) Specifically, we will cover management, policy, standards, procedures, auditing, testing, scalability, technical safeguards, personnel, incident response, legal issues, physical security, awareness, and training and education.

5.1 Management:

Products provide administrative interfaces and can often provide management reports. Most have no other management function.

5.2 Policy:

Most systems allow access control policy to be specified at a fairly high level of granularity with specific controls implemented in detail by the agent programs. Copyright and similar control policies can be automatically marked on login screens and through the records provided in the central databases in some systems. Common policy templates are widely used to automate much of the effort required to set such a system up for a large organization. User account creation, removal, freezing, and similar policies related to job function and access are often automated providing a strong link between policy constraints and system behavior. Other policy elements are not implemented in the system provided today but customization is possible in most systems and many customers apparently customize these systems to their policies and control requirements.

5.3 Standards:

Systems follow standards when available, but uniform standards for these sorts of systems are not yet published. Specific standards mentioned by vendors included:

X.500, X.501, X.509, DCE are common
SQL for databases is quite common
Several companies are working with NIST to set new standards
Idea/3DES/DES/etc. as defined by routers
IPSEC and POSIX standards are used by some vendors
The SOCKS standard is implemented for remote agent programs
Telecommunications standards are followed when feasible.
TCP/IP, IPX, and other protocol standards are commonly followed since these networks are used as the backbone for controlling agent programs.

5.4 Procedures:

Many procedures are automated with these systems. Batch processing procedures are commonly used to update files, retrieve audit trails and perform other systems maintenance functions. Paperwork emulation in electronic form is implemented by several systems. Scheduled en-mass operations like user and program updates are commonplace. Automated reporting and response procedures are available in many products and used by many users. Application-defined procedures are implemented by some companies using the infrastructure provided by the remote security management to aide in management of distributed systems and applications running on them.

5.5 Auditing and support:

A major function of these systems is in audit support. Centralized audit databases provide a great deal of efficiency and allows auditors access to complete transaction histories. Database search and analysis of audit records provides auditors with the ability to automate many audit functions including trend analysis and examination for abnormalities. Reconciliation of audit trails with actions is provided in many systems, thus allowing auditors to verify controls in place are identical to the results of applying the sequences of operations supposedly performed. Consistency checks of system state with database state is also available in several systems and this provides a useful crosscheck that can be done automatically and at any time. Audit information can be used to trigger automated response such as paging a person, turning services on or off, disabling accounts, or anything else the systems administrator programs them to do. POSIX audit standards are widely used and auditing application program interfaces (APIs) are provided in these systems.

5.6 Testing:

Few systems have strong linkage with protection testing capabilities, and testing capability varies substantially in the available products. Checking return codes from commands is used to verify proper communication and actions. Comparison of database state to remote state is used to verify that changes were identical in both systems, but not to verify that the changes were appropriate to the orders issued. This is done both at the database field-level and the system function-level. In development, some companies follow international standards organization (ISO) standard 9000 testing processes including functional and regression testing in development. Self-test of hardware and software are performed in some of the top-end cryptographic systems, while data is verified during operation in many of these systems. Sampling of log files for syntax, value checking, request/response testing, and other similar tests are performed in some of the products.

5.7 Scalability:

The largest scaled demonstrated as of the time this study was performed included: A central computer security management system with 300,000 end users. Some cryptographic networks had 30 million certificates and 100,000 users per server (at a cost of $10 per certificate). An infrastructure with 1,000 routers or similar devices managed by a central control facility was mentioned.

5.8 Technical safeguards:

None of these systems run on or control trusted computing bases (TCBs). One cryptographic system is implemented on custom 'secure' hardware. Communications is encrypted in most systems and authenticated on many systems. Database field-protection based on user is implemented in some of the central databases. User identity and password controls are used for management interfaces, and smart cards or other tokens are supported by some of the vendors.

5.9 Personnel:

Some support is provided to facilitate personnel controls in many of the central security control systems. Personnel department interfaces allow changes in title or position to be used to notify databases, and through them, systems of changes. This is called role-based access control and it is far more effective than manual controls at assuring that changes in responsibilities are reflected in changes in technical controls. Near-real-time updates are available in some systems and usage monitoring, profiling, and usage analysis has been tied to job functions to detect abuses by certain classes of users.

5.10 Incident response:

Database analysis and remote control provide a substantial capability for incident handling and response. Agent programs can detect specified patterns, results are reported to the database, and pager, email, or other notice is provided to the responsible parties. Response can be programmed by the administrator, and typically includes retry limit or similar threshold-based limitations on abuse. Manual response can be assisted by the distributed control mechanism, and investigative support is provided by using the database capabilities for analysis. Ticket management provided for tracking of incidents as well as other maintenance functions until closure. Reflexive control and others attacks against these systems is likely feasible, but these controls provide dramatic improvements over manual control of individual systems by remote unencrypted login and command-line interfaces.

5.11 Legal issues

Compliance with some laws, such as reporting and record keeping, is automated by centralized management systems. Similarly, consistency with business rules can be assured and verified which may help limit liability and support defenses against discrimination or similar suits. The use of a central control system may provide for some standard of due care, while contemporaneous collection of normal business records allows their use in court. These systems also help enforce login screens and other legal notice and warnings, but at the same time, they may introduce monitoring issues that could result in employee lawsuits. No suits have appeared to date over issues related to security monitoring.

5.12 Physical security:

Support for physical security incident handling can be done through ticket management interfaces of some control systems and physical protection is provided in the form of improved authentication via smart cards, but over all, these management systems play little role in physical security today.

5.13 Awareness/Training/Education:

Companies offering these tools provide training services, but the system themselves are not used for enhanced training. The awareness that might be afforded by using this technology is a potential area for improvement.

Footnotes:

(1) Products and/or companies involved in this study included but were not limited to: Tally Systems, Entrust, IRE/SafeNet, Bull/AccessMaster, Trident/NetRisk, Wheel-Group/NetRanger, Veranda, Schuman/SAM, Dascom/Intraverse, Axent/OmniGuard, Technologic/RAS

(2) DISA/SAIC Study

(3) Examples used in this study were taken largely from Protection and Security on the Information Superhighway (John Wiley & Sons, 1995) and Computer-Related Risks (Addison-Wesley Pub Co, 1995)

(4) For examples of this sort of protocol issue, see Internet Holes in Network Security Magazine, 1995-6.

(5) This audit scheme was first published in Protection and Security on the Information Superhighway (John Wiley & Sons, 1995) and has been the basis for many audits. Other schemes with similar control perspectives include the GASSP standard and BS7799.

(6) For related information and a selected bibliography on access control in flexible distributed networks, see Literature Search of Topics Related to Flexible Distributed Security Management Relative to Access Control