Fred Cohen & Associates

Models of Practical Defenses Against Computer Viruses
Fred Cohen
Electrical and Computer Engineering Department
University of Cincinnati

Terms: Computer Viruses, Computer Security, Integrity, Protection

Abstract

In this paper, we model complexity based virus detection mechanisms which detect modifications and thereby prevent computer viruses from causing secondary infections. We use these models to show how to protect information in both trusted and untrusted computing bases, show the optimality of these mechanisms, and discuss some of their features. The models indicate that we can cover changes at all levels of interpretation with a unified mechanism for describing interdependencies of information in a system, and we discuss the ramifications of this unification in some depth.

1 - Background

A "computer virus" [1] is a program that can "infect" other programs by modifying them to include a possibly evolved version of itself. With this infection property, a virus can spread to the transitive closure of information flow, corrupting the integrity of information as it spreads [1]. Given the widespread use of sharing in current computer systems, the threat of a virus causing widespread integrity corruption is significant [2,3].

A considerable amount of early work was done on protection policies to prevent from the illicit dissemination of information [4,5], and several systems have been implemented to provide protection from this sort of attack [6], but little work was done on protecting the integrity of information except in very limited domains [7,8]. With the advent of computer viruses, this situation has changed dramatically, with over 20 journal and conference papers appearing in the last 3 years [15], and numerous small conferences worldwide.

Prevention of infection by computer viruses can only be guaranteed by limiting sharing of information or limiting functionality [1,9]. In general, we can limit information flow so as to form a POset of communicating information domains in a network or system [10]. In such a system, we can guarantee that a virus will spread only to those domains which are in the transitive flow path from its initial source [10]. The POset policy has advantages over the Bell-LaPadula [4] policy, the Biba [8] policy, and the lattice policy [5], in that it is more general and encompasses properties of information networks as well as single processors. We don't yet know exactly what limits on functionality are required to prevent viruses, but we do know that Turing [14] capability must not be available [1,9].

We have shown that detection of a virus is undecidable [1,9], as is the detection of evolutions of viruses from known viruses. Although we may not be able to detect all viruses with any given technique, a recent result shows that we can systematically cause the complexity of undetected modification of information to be made very large by the use of a cryptographic checksum [9,12,13]. This is closely related to change control techniques commonly used by EDP auditors, but the types of changes detected are of such a nature that analyzing their ramifications is beyond the scope of most experts.

In this paper, we examine mathematical models for complexity based integrity maintenance and closely related techniques wherein we use change control and human decision making to maintain the integrity of a system.

2 - Philosophical Issues

The principle that systems can automatically determine the difference between legitimate and illegitimate modification of information has a fundamental flaw in that legitimacy of change is necessarily dictated by the intent of the user. User involvement is therefore a critical aspect of integrity protection in general, and protection against viruses in particular.

If users are aware of the threats against them, they may be better prepared to protect themselves. A classic example is the problem with password protection, where short and easy to guess passwords are commonly used. The use of passwords could be significantly improved both through automated techniques and user awareness. Aware users tend to be significantly less vulnerable to these attacks, but as a community we don't usually provide more than a trivial amount of assistance or education in this area.

Several current methods designed to detect computer viruses suffer from providing too much user awareness. As an example, in some such defenses, every time disk I/O is performed, the operating system notifies the user. This makes the user more aware of side effects of programs, and allows the detection of many attacks. Unfortunately, the user is typically inundated with useless warnings of a possible virus. Every program that performs I/O receives numerous warnings. Even the most knowledgeable and observant user would not likely detect the difference between a program before and after infection. In addition, the unnecessary displays can make it quite difficult to find the legitimate output of programs. These "false positives" make the defense quite intolerable for real world users.

Requiring approval for writing to certain classes of files augments the notification process while decreasing convenience. The confinement of programs to limited authorization spaces has been shown infeasible because of covert channels [7], and precise tracking of information is NP-complete [5]. Nevertheless, limited protection can be provided by thoughtful combinations of these principles.

The problem with numerous false positives is that a plethora of warnings makes each warning have a low information content, and thus we tend to ignore the important messages along with the others. False negatives in the presence of protection has another deleterious effect in that it gives a false sense of security. The basic principle that we wish to explore is that the user should be made aware of unusual or noteworthy behavior, while usual behavior should go on unhindered and uninterrupted. The problem is to identify just what constitutes behavior worthy of user notification.

A computer virus is a program that "modifies other programs" to include a possibly evolved version of itself. One behavior of interest is the modification of programs. A logical starting point is to differentiate between programs that have been modified and those that have remained unchanged. Constancy and change are relative concepts, so we require a standard against which to compare. Since change occurs over time, the standard we choose is the state of a program at some time. If we are aware of all subsequent changes and have some understanding of the ramifications of changes in our environment, we may be better prepared to identify and deal with unexpected or suspicious behavior.

In such a system, we will be alerted to many modifications that are not due to viruses, and it would be helpful to find finer grains of distinction whereby we could be forced to be aware of fewer and fewer instances of legitimate program modification. Although it is impossible in general to find a perfect method of this sort for viruses [1,9], we may be able to do considerably better than simply notifying the user of all modifications. For the more general question of integrity, legitimacy is based on intent, and therefore all changes must be verified by all effected parties.

In the end, we will either make conservative estimates or allow some viruses to slip through the cracks. As we have previously shown [1,9] when we always use conservative estimates, we tend toward a system of isolationism. It appears that we have little choice in this matter if we are to have perfect protection from computer viruses. The alternative is to provide protection which does not prevent or detect all viruses, but provides a reasonable degree of protection by driving the complexity of writing a successful virus up [12,13].

3 - The Structure of Dependencies in Systems

Information systems consist of many components, and as in all complex systems, components work together to form an environment for performing a process. No single component in such a system stands on its own, independent from the rest of the system. In order to assure the integrity of a given part of a system, it is therefore appropriate to consider the other parts of the system upon which it depends.

In the simplest sort of system, there is a single component which interacts with the environment. If the environment does not operate within the constraints of the design of the component, we can not assure that the component operates properly. Thus we see the simplest form of dependency in figure 1.

-----------------------------
|	   World	    |
|	-----------	    |
|	|Component|	    |
|	-----------	    |
-----------------------------

Figure 1 - The Simplest Form of Dependency

In many systems, dependencies are quite complex, perhaps involving many different types of components with complex interactions. In a typical computer system, a user may operate on data that is interpreted by several different programs, transformed into forms for still other programs, all of which depend on the operating system, which depends on the hardware, which depends on the physics of space. We will assume that the epistemology underlying the physics is not of import since it is unlikely that the operation of a computer system will change simply because someone changes their mind about how physics works. In figure 2, we show such a situation in graphic form.

		------
	  ------|Data|------
	  |	------     |
	-------		-------
	|prog1|		|prog2|
	-------		-------
	/ |		   |
       /  |		-------      -------
      /	  |		|prog3|------|prog4|
     /	  |		-------      -------
    /	  |		   |  \       /
   |	---------------------- \     /
   |	|  Operating System  |  |   |
   |	----------------------  |   |
   |		   |		|   |
--------------------------------------
|		Hardware	     |
--------------------------------------
			   |
--------------------------------------
|		Physics		     |
--------------------------------------

Figure 2 - The Dependency Structure of a Simple Computer System

Redundancy exists in virtually every system that man has ever devised, if only the redundancy due to the dependency on the statistics of physics for assuring the well behaved operation of physical components. If it were not for redundancy, a single instance of the wave function of a particle shifting due to external forces would cause system collapse.

In considering any particular system from any particular point of view, we may abstract out the redundancy of lower levels of operation and treat components as macroscopic functional units. For the purposes of discussion, we will call these macroscopic units 'programs'.

It is important for general understanding that 'programs' be thought of only as components in a system interpreted by the physics of their environment, and resulting in system behavior in response to the system state and external forces. A multiplication, for example, depends for its operation on the inputs provided. The inputs can be thought of as the program, and the multiplier as the environment. A multiplier is not as general as a Turing machine in terms of the functions it can compute, but otherwise, it is quite similar.

The dependency relationships between information in information systems has been discussed by many authors over an extended period. Some major areas where it has been used are briefly outlined below without references:

Fault tolerant computing uses the dependency of gate outputs on gate inputs to analyze circuits so as to form test sets and analyze them for coverage.

Simulation uses the interdependency of parts of a system to produce efficient simulation. This is particularly important in event driven and time warp simulation where the relationships cause evaluation of only those components which are affected by changes in causal components.

Parallel computing theory is deeply entrenched in the dependency graph of a computation because this leads to optimal allocation of processes to processors and to determine optimal parallelism and performance.

Data flow computing uses the dependency graph of a computation as a basic building block to the description and interpretation of programs.

Many modern debugging tools (e.g. the one in Interlisp) build a graph of the interdependency of routines on each other to allow the programmer to trace the effects of change on other parts of the system and to make proper modifications to dependent structures.

PERT charts use dependency relations to track and analyze the progress of a project towards a goal.

From a historical perspective, this interdependency of information in a system has widespread implications. Dependency graphs are a central theme to understanding the underlying behavior of systems because they trace the cause effect relation between information and its interpretation. In our case, we are interested in tracing the effects of change. If the basis upon which we perform a computation is unchanged, we assume that the computation itself will be unchanged.

4 - A Trusted System Example

In the present context, we define the term "trusted system" as a system that can be trusted to accurately implement algorithms we describe herein regardless of other factors in the environment. In a trusted system, we may model change control using the following mechanism.

S1:={P,d}
	A set of programs P={p1,p2,...,pn}, n in I	(1.1)
	A dependency relation d in PxP			(1.2)

The relation 'd' describes the dependency of programs on each other. This is closely related to the flow relation discussed in previous papers [10], but differs in that general purpose functionality is not assumed, and the ability to modify another domain is not required. Only the ability to cause another program to result in a different state or output is required in order to have a dependency relation.

If we could truly enforce this policy, it would appear to be a solid defense against unauthorized cause/effect relationships, but it still fails to capture the intent of the user regarding the times at which dependencies are to be exercised and what specific dependencies are intended. The requirement to specify 'd' also places an increased burden on the user, while only providing limited protection. By way of example, in a general purpose system, an authorized program pj where:

{pi,pj} in P,(pi,pj) in d (read pi depends on pj)

could cause pi to implant a virus at any time. Similarly, pk could cause any pj to implant a virus where:

{pj,pk} in P,(pj,pk) in d

In essence, we have implemented a pre-ordering by using the trusted system this way [10] since:

for all pi,pj,pk in P
	pi d pj and pj d pk => pi d pk
	pi d pj and pj d pi => pi=pj
	but NOT necessarily pi d pi

Another side effect is that for every new program, we must specify d, and as our intent changes, so must d. The ability to modify d must be limited so that a program in P cannot modify the structure of the relation in order to perform other modifications, and the inconvenience increases.

5 - An Alternative to Dependency Relations

An alternative method for allowing user intent to be captured in the operation of such a system is the use of a modified command interpreter that requires explicit authorization for changes. This allows us to have an implied dependency relation that changes with time and is quite specific to the user's intent at a given moment.

S2:={P,T,A,M}
	A set of programs P={p1,p2,...,pn}, n in I		(2.1)
	A set of modification times T={t1,t2,...,tn}, where	(2.2)
	for all ti in T, ti is the last modification time of program pi
	A set of authorized modification times A={a1,a2,...,an}	(2.3)
	A set of moves M={m1,m2,m3,m4}				(2.4)

Operating System:
1	get a program 'pi' to be interpreted			(2.5)
2	if ti=ai, {interpret pi; goto 1}			(2.6)
3	ask user for a move 'm' and act as described below:	(2.7)
		m=m1	goto 1
		m=m2	{interpret pi; goto 1}
		m=m3	{set ai=ti; interpret pi; goto 1}
		m=m4	{restore pi from time ai; set ti=ai;
			 interpret pi; goto 1}

In S2, we have a set of authorized and actual modification times that are compared to determine whether modifications to a program 'pi' have been authorized, as a condition on interpreting pi. In the case where modifications are authorized, pi is interpreted without interruption. Unauthorized modifications are treated in one of 4 ways as specified by the user:

1 - pi is not interpreted

2 - pi is interpreted as is but not permanently authorized

3 - pi is permanently authorized and then interpreted

4 - pi is replaced with an authorized version, then interpreted

The four moves above are examples of how a situation of change might be handled. In the first option, we simply refuse to run the changed program, and presumably investigate further at a later time. In the second option, we run the program for now, knowing it might be untrustworthy. We don't decide to trust the change, and thus continue to generate warnings. In the third option, we decide to accept the change as desirable now and for ever. In the fourth option, we restore an old and trusted version of the program from backups and abandon the change.

This method has the advantage that it does not prohibit modifications to programs, but merely forces the user to decide whether or not to trust these modifications before applying them. There is a drawback in that a trusted channel is needed between the user and the TCB in order for S2 to reliably get 'm' from the user. In a trusted computing base, we could allow the user to selectively authorize modifications before they occur. We could not be guaranteed that the modifications are those desired or expected.

A different attack scenario is an attack that corrupts only information that has been changed very recently. In this scenario, we cannot differentiate legitimate from illegitimate modification because of their proximity in time. We still suffer from the possibility of an authorized modification not matching our desires.

The basic problem of being unable to verify that a change is desirable appears to be unsolvable. We may even be willing to verify every state change a system makes. This activity would make the use of a system unbearable unless we could automate the process, but then we would be faced with the problem of verifying the automated checking process as it verifies the system. In the end, we cannot perfect this checking process unless we are able to discern between correct and incorrect inputs to the system. Any time there is a choice to be made, if we can fully cover corruption in the making of the choice, there is no choice left to make, and thus the choice is unnecessary.

6 - Implementations for Untrusted Systems

If we attempt to implement the above techniques in an untrusted computing base, we meet with several problems. In S1, we used 'd' to specify the programs that affect each program, but in an untrusted system, there is no way to enforce the 'd' relation. In S2 we require a method for maintaining two sets of modification times and comparing them at interpretation time. Again, we cannot provide this service with impunity in an untrusted system.

In previous papers, we have partially solved this problem through the use of a cryptographic checksum [12,13]. This allows the complexity of illicit modification to be driven up so that we can provide any desired degree of assurance against undetected or unauthorized modification. With the use of the cryptographic checksum to identify change, we can implement a command interpreter similar to the one of S2 above. This has been partially implemented by several authors and described in [16]. A mathematical description of the technique follows:

S3: {P,K,S,C:PxK=>S,M,V,k}
	A set of programs P={p1,p2,...,pn}, n in I	(3.1)
	A set of keys K={k1,k2,...,km}, m in I		(3.2)
	A set of checksums S={s1,...,so}, o in I	(3.3)
	A transform C:PxK=>S				(3.4)
	A set of moves M = {m1,m2,m3,m4}		(3.5)
	A set of values V = {v1,...,vn},		(3.6)
		for all vi in V, EXISTS si in S: vi=si
	Each user has a secret key k in K		(3.7)

At a set of times T={t1,t2,...,tn}, we generate initial values
	V={v1,v2,...,vn} for each pi in P, where
	for all vi in V, vi=C(pi,k) at time ti.		(3.8)

Define tj: For all ti in T, tj > ti			(3.9)

Operating System:
1	get a program 'pi' to be interpreted (time=tj)	(3.10)
2	if if C(pi,k) = vi, {interpret pi; goto 1}	(3.11)
3	ask user for a move 'm' and act as described below:
		m=m1	goto 1				(3.12)
		m=m2	{interpret pi; goto 1}
		m=m3	{set vi=C(pi,k); interpret pi; goto 1}
		m=m4	{restore pi to where C(pi,k)=vi;
			 interpret pi; goto 1}

Just as in the previous example, this method permits 4 options which allow the user to accept or reject a change before using the results of that change. The only difference is the use of a cryptographic checksum to enforce the change control mechanism and the need for a trusted channel between the user and the computer in order to assure that a Trojan horse does not intercept k.

The overhead for a typical program is about 1 second on an IBM PC with a high speed disk drive running the DOS operating system. On a lightly loaded PC-AT runing Xenix, the delay is under 1 second, and similar delays have been found on 3b2's and Harris 8000's. Sample implementations have been demonstrated in the Bourn shell, the C-shell, a modified C-shell for IBM PCs running DOS, The standard DOS command interpreter, and the Korn shell. Similar results are expected on most other computers and command interpreters.

7 - The Optimality of S2 and S3

We now show that S3 is optimal in the following sense. We cannot prevent primary infection (i.e. the infection of pi by some pj where C(pj,k)=vj) without eliminating sharing or programming [1,9]. We therefore define optimality as maximizing the prevention of secondary infection (i.e. the infection of pi by some pj where C(pj,k)!=vj), and minimizing the time spent in overhead for this prevention. We assume that only one program runs at a time in S3, that C is ample for making forgery of an vi in S infeasible, and that only the operating system can interpret a program.

We note first that S3 prevents all secondary infection except in the case where C(pj',k)=C(pj,k), a successful forgery of a checksum. This is because for all pj' such that C(pj',k)<>C(pj,k), C(pj',k)<>vj, and thus from (3.11) pj' is not interpreted unless the change resulting in C(pj',k)<>vj is authorized by the user.

Since at any time, tk: ti < tk < tj there is another time tl: tk < tl < tj checking C(pj,k) at time tk does not detect a change at time tl.

Since there is no possibility of interpreting pj before time tj, there is no possibility of infection of any pi by pj before tj. Thus a check of pj at time tk does not detect infection at time tl, which is required in order to meet our first condition of optimality, and furthermore requires extra time, which fails to meet our second condition of optimality.

If S3 does one thing at a time, there is no program pk that can be interpreted between time tj and the interpretation of pj, and thus there is no later time than tj to perform the checksum and still meet the condition of preventing all secondary infections. Hence, tj is the optimal time to perform C(pj,k). Since our system performs C(pj,k) at time tj, it is in fact optimal by our criteria.

S2 varies from S3 in only one important way, it needn't depend on the quality of C for its proper operation. Thus all of the other facts presented in this section apply equally well to S2, and S2 is therefore optimal in the same sense as S3.

8 - The Role of the Environment in Viral Protection

A major problem with the previous discussion is that it ignores the role of the environment in the propagation of computer viruses. For example, the use of the word "program" may be quite misleading in the above discussion. In effect, information only has meaning in that it is interpreted [1,9,14]. Since we cannot know a-priori which information is treated as program in any given environment, we cannot know what behavior is questionable.

We often cite the "basic" programming language to show that information treated as data by the editor is treated as program by the basic language interpreter. Similarly, in compiled systems, the source code is treated as data by the editor, while it is translated into program by the compiler. Even a document formatter treats its data as program, and in fact several document formatters are general purpose enough to allow the implementation of a virus. It is therefore completely inadequate to provide methods which operate only in the environment of the command interpreter. In effect, this only covers a very small subset of the information residing in a system.

One option is to model the dependencies of programs on each other to allow the checking of all relevant information for integrity before interpreting a given program. We can perform such checks if we know the dependency of programs on each other, either by explicit proclamation by the user or by learning the dependencies through experience. We model the dependency relation 'd' as follows:

S4:=S3+{D,d:PxP=>D}
	D = {true,false}
	A dependency relation d:(PxP=>D)			(4.1)

Operating System:
1	Same as (2.5)
2	if ti=ai, {if check(i) then [interpret pi; goto 1]	(4.2)
			else goto 1}
3	Same as (2.7)

Where check(x) is a recursive function of the form:

	if tx<>ax then FALSE,
	else	{for all py in P: (py,px) in d, [remove (py,px) from d]
		if d is empty, return TRUE
		if (for all pj in P: (px,pj) in d, [check(j)<>FALSE])
			then	TRUE
			else	FALSE
		}

The dependency relation is applied transitively in the check function to verify the transitive closure of all programs that pi depends on, and thus all programs that can affect the integrity of results yielded by pi. This relation can produce an infinite regress if not properly implemented since there can be cyclic interdependencies between programs. The resulting dependency structure is therefore a transitive binary relation as follows:

for p1,p2,p3 in P
	p1 d p2 and p2 d p3 => p1 d p3	(transitive)
	d subset of PxP			(binary relation)

A special case of this structure is the case where:

for all pa,pb in P: (pa d pb) => (pb ~d pa)

since this guarantees that {p: (pa d p)} is finite and degenerate. Such a dependency graph is a pre-ordering which guarantees that we can search the entire structure without loops and resolve all dependencies in a straight forward manner in finite time. It also allows us to use many of the results from previous works [10,11], but does not imply that all programs must have Turing capability.

Using the flow relation 'f' from [10], we know that:

if [(pa f pb) and (pb f pa)]
	and ~[(pa d pb) => (pb d pa)],
		then pb is not general purpose!

This result states that if we are in a pre-ordering structure and not a POset, and if multiple programs coexist in a single domain, then those programs are not general purpose. Put another way, if we wish to enforce a pre-ordering without enforcing a POset, we must use programs without Turing capability or put each program in a private domain.

In the case of limited functionality programs, we can only use information in limited ways. The traditional concept of 'data' is actually a function of the environment in which the information is interpreted, not a function of the data itself. When addressing the issue of program dependencies, we are therefore exploring properties of programs regardless of the information they operate on. Conversely, if we wish to protect 'data' files from corruption, we must do so by limiting the environments in which they are interpreted. We unify this perspective by treating all information as 'program', and limiting the dependencies of programs on each other.

This unification leads to a methodology wherein data files are treated as programs with the association between data files and programs that are intended to interpret them maintained by the operating system. This type of user interface has been provided in the Apple MacIntosh environment wherein the invocation of a document is actually treated as the invocation of a document processing program on the specified data file. We have prototyped a generalization of this technique for PCs and mainframes wherein a sequence of programs are offered in order to evaluate a given program, with the user deciding which is appropriate to the desired use. We model this system as follows:

S5:=S4+{A,a:PxP=>A}
	A = {true,false}
	An association relation a:(PxP=>A)			(5.1)

Operating System:
1	Same as (2.5)
2	if ti=ai, {if check(i) then [interpret(pi); goto 1]	(5.2)
			else goto 1}
3	Same as (2.7)

where check(x) is defined as in S4 and interpret(x) is defined as:

for all p in P, (x a p) and "user permission"
	=> interpret p using x as input

Three more potential improvements to this system are; ordering the association relation so that the most likely associations are tried first, allowing the user to authenticate changes immediately after they occur so that the window in which a subsequent modification can happen is reduced nearly to vacuous, and checking critical portions of the system upon reentry into the operating system so that any corruptions of the operating system itself will be detected immediately.

9 - Practical Limits of these Techniques

In the proposed system, we are placing somewhat of a burden on the user to provide accurate information about the changes in a system with time and the dependency of programs on each other. In such a situation, we can naturally expect to have some failures, if only in the inability of the human operator to make correct decisions all of the time. Furthermore, as we require more of the user, the system becomes less usable and less reliable. Fortunately, the system is quite resilient in that (N)ary infections are detected before (N+1)ary infections can take place. As the system becomes more and more corrupt, the user must make more and more questionable decisions to keep the system operating, until finally, the system becomes so corrupt that useful operation may become nearly impossible.

Analyzing the dependencies of programs on each other may take a good deal of time if the cryptographic checksum is to be strong. If done on-line, it may create a severe performance bottleneck to usability. In S2, this is not a problem. In a trusted system, S2 has almost no performance effects while providing excellent protection, even if we check all of the dependencies of S5.

Another major problem on untrusted systems is that an attacker could simulate the entire system, always returning the results that should be returned, but actually storing different information. When damage is triggered, everything might cease to function with no prior warning. It is for this reason that we must have some trusted hardware and a secure channel in order to have high integrity in a general purpose trusted system.

A different attack that has been suggested is the use of a program that simulates the defense, acting as the defense would in all events except in detecting the corruptions that it spreads. Even as behavior begins to change, we may suspect that there is something wrong, but the checking mechanism will still claim that everything is as it should be. This is a quite complex problem to solve because the substitute system must produce all of the positive alarms and fail to produce the non-alarms of the system it replaces. Individualized built-in keys for each copy of the program generated could make this more complex, as could the use of evolution in the design of the defense. Although these could drive the complexity of attack higher and higher, it would still be possible to launch such an attack.

A third attack which is similar to the simulation situation is the replacement of the normal operating system with a substitute that acts just as the original operating system acts except in the case of a particular form of corruption. An example would be encrypting all on-line files, decrypting them for use. In this case, the attack may have to modify the current version of the operating system in memory to assure that when reading the encrypted operating system from disk, it returns the original. If the defense checks for changes in the memory used by the operating system, this attack becomes still more complex in that the integrity checker must be corrupted as in the second attack listed above. Again we can drive the complexity up, but can never make attack impossible.

As we see, the price for successful attack can be made quite high, but in an untrusted system we are eternally in a struggle between attack and defense. The issue at hand then is what price we are willing to pay for what degree of assurance.

10 - A Practical Implementation

Several practical implementations of these systems have been completed. We quickly list their features:

A PC implementation of S5 uses a derivative of the cryptographic checksum scheme in [12] to produce checksums. It has a screen interface which improves usability and provides automatic program execution and checking as well as all of the other features listed in section 8. Checksums on an 8Mhz PC-AT system with a fast hard disk take about 0.2sec+1sec/8Kbytes. The average PC program is 17K bytes, for an average increased delay of about 2.5 sec. The use of the association feature has a pleasant side effect in reducing training time and eliminating errors made in command entry, and this makes up for the performance reduction to some degree.

A UNIX based implementation of S3 without m2 and m4 was included in a UNIX 'sh' shell in the spring of 1988 [16] using a CRC check in place of a cryptographic checksum. This took an overhead of about 0.5sec/program, and was used in a sample user community for several months. The community found no significant problems with the system, and has now adopted it as a standard at that facility. In this experiment, it was also found that the integrity shell detected Trojan horses in the user's path and helped identify failed compilations which otherwise would have gone unnoticed. A cryptographic checksum based on [12] was also used in a subsequent implementation with only a 10% increase in the time required to check an average program.

A UNIX based implementation of a complete S3 was placed in both the 'sh' and 'csh' command interpreters by May of 1988, and has also been operating since that time with no substantial problems. The use of the cryptographic checksum increased overhead by less than 1 sec for the average program.

An MVS implementation of S3 is currently being tested at IBM Watson Research Center. Details are not as yet available on this experiment.

A prototype of S2 has been implemented under Unix with almost no detectable performance loss, but this is not a sound system because the underlying system is not a trusted system. As an experimental system, this demonstrates most of the features of S5 with almost no overhead, and could easily be implemented in a trusted system

11 - Summary, Conclusions, and Further Research

We have presented a series of increasingly complex models of practical defenses against computer viruses. These models have been successfully used to prove the optimality of the systems described herein. We have quickly described 6 prototype implementations in 3 different computing environments. The techniques are highly acceptable to users and provide improved integrity in the computing environment. We have described how this technique can be used for maintaining the integrity of interpreted information at all levels in a computer system, and have demonstrated its applicability in a multileveled environment where "data", "programs", and the operating system are interrelated automatically.

We conclude that the modeling of defenses against computer viruses has shed significant light on the nature of the problem and the solutions proposed to resolve it. We also conclude that the techniques presented herein are practical and useful for the broad computing community, that they do not present an intolerable overhead, and that their use increases the integrity of real systems in everyday use without providing unnecessary false positives or false negatives. We further conclude that this technique can be applied to many levels of information systems and that a finer granularity may be useful in maintaining change control over subroutines, microprograms, and perhaps even hardware.

Of particular interest for further research is the use of faster cryptosystems to assure integrity, hardware assistance for high performance versions of these systems, the use of these techniques at much finer granularity for integrity verification, and automation for generating and maintaining the dependency relations. Perhaps more interesting and important for the long term is determining what must be trusted to have a strong system of this sort, and the architectures that would lend themselves to this sort of protection. Finally, the extension and combination of this work and other works by other authors towards a unified theory of integrity in computing systems.

References:

[1] F. Cohen, "Computer Viruses - Theory and Experiments", DOD/NBS 7th Conference on Computer Security, originally appearing in IFIP-sec 84, also appearing in "Computers and Security", V6(1987), pp22-35 and other publications in several languages

[2] J. P. Anderson, "Computer Security Technology Planning Study", USAF Electronic Systems Division, #ESD-TR-73-51, Oct 1972, (Cited in Denning)

[3] R. R. Linde, "Operating System Penetration", AIFIPS National Computer Conference, pp 361-368, 1975

[4] D. E. Bell and L. J. LaPadula, "Secure Computer Systems: Mathematical Foundations and Model", The Mitre Corporation, 1973 (cited in many papers)

[5] D. E. Denning, "Cryptography and Data Security", Addison Wesley, 1982

[6] C. E. Landwehr, "The Best Available Technologies for Computer Security", IEEE Computer, V16#7, July, 1983

[7] B. W. Lampson, "A note on the Confinement Problem", Communications of the ACM V16(10) pp613-615, Oct, 1973

[8] K. J. Biba, "Integrity Considerations for Secure Computer Systems", USAF Electronic Systems Division (cited in Denning), 1977

[9] F. Cohen, "Computer Viruses", PhD Dissertation, University of Southern California, 1986

[10] F. Cohen, "Protection and Administration of Information Networks Under Partial Orderings", Computers and Security, V6(1987) pp118-128

[11] M. Pozzo and T. Gray, "Computer Virus Containment in Untrusted Computing Environments", IFIP/SEC 4th International Conference on Computers and Security, Dec. 1986

[12] F. Cohen, "A Cryptographic Checksum for Integrity Protection in Untrusted Computer Systems", Computers and Security, V6(1987).

[13] F. Cohen, "A Complexity Based Integrity Maintenance Mechanism", Conference on Information Sciences and Systems, Princeton University, March 1986

[14] A. Turing, "On Computable Numbers, with an Application to the Entscheidungsproblem", London Math Soc Ser 2, 1936

[15] F. Cohen, "On the Implications of Computer Viruses and Methods of Defense", Computers and Security, V7#2, April, 1988

[16] M. Cohen, "A New Integrity Based Model for Limited Protection Against Computer Viruses" Master's Thesis, 1988, Pennsylvania State University, Department of Electrical and Computer Engineering, College Park, PA.