The Evidence

Digital forensic evidence is ideally processed through a series of steps. While there are many other characterizations of the processes involved in dealing with digital forensic evidence (DFE), the perspective taken here will assume, without limit, the DFE must be identified, collected, preserved, transported, stored, analyzed, interpreted, attributed, perhaps reconstructed, presented, and, depending on court orders, destroyed. [1] All of these must be done in a manner that meets the legal standards of the jurisdiction and the case.

Identify: In order to be processed and applied, evidence must first, somehow, be identified as evidence. It is common for there to be an enormous amount of potential evidence available for a legal matter, and for the vast majority of the potential evidence to never be identified. To get a sense of this, consider that every sequence of events within a single computer might cause interactions with files and the file systems in which they reside, other processes and the programs they are executing and the files they produce and manage, and log files and audit trails of various sorts. In a networked environment, this extends to all networked devices, potentially all over the world. Evidence of an activity that caused digital forensic evidence to come into being might be contained in a time stamp associated with a different program in a different computer on the other side of the world that was offset from its usual pattern of behavior by a few microseconds. If the evidence cannot be identified as relevant evidence, it may never be collected or processed at all, and it may not even continue to exist in digital form by the time it is discovered to have relevance.

Collect: In order to be considered for use in court, identified evidence must be collected in such a manner as to preserve its integrity throughout the process, including the preservation of information related to the chain of custody under which it was collected and preserved. Recent case law has established that there is a duty to preserve digital forensic evidence once the holder of that evidence is or reasonably should be aware that it has potential value in a legal matter. This duty is typically fulfilled by collecting and preserving a copy of the original evidence so that the actual original media need not be preserved, but rather, can continue to be used. Collection may involve many different technologies and techniques depending on the circumstance.

What is collected is driven by what is identified; however, a common practice in the digital forensics community has been to take forensically sound images of all bits contained within each media containing identified content. This provides the means to then identify further evidence contained within that media for subsequent analysis, assuming that the copy of the media was properly preserved along the way. The problem with this process today is that the volume of storage required has become very large in many cases, and this process tends to be highly disruptive of operating businesses that use these computers in a non-stop fashion. Consider the business impact on an Internet Service Provider if they have to cease operations of a computer that would otherwise be in use in order to preserve evidence.

Preserve: Preservation of relevant log files and audit data is particularly important and should always be identified and preserved. This includes all logs associated with the servers used to send, receive, process, and store the evidence. Failure to do this becomes particularly problematic in cases when the purity of the evidence is at issue. For example, if an exhibit contains some corrupt content, the entire exhibit becomes suspect. If original records are not available to rehabilitate relevant portions of the exhibit, all of the evidence contained in the exhibit may be inadmissible. If there is suspicion of spoliation, the additional log files and related records will be necessary in order to show that redundant information exists that is consistent with the actual creation of the content at issue. Even information such as system crashes and reboots may be critical to a case because corrupt file content may be produced by those sorts of events and without the logs to show what happened when, that corruption may not be able to be reconciled with the need for preservation of the purity of the evidence.

Transport: Evidence must sometimes be transported from place to place. For example, when collected from a crime scene, the evidence must somehow be moved to a secure location or it may not be properly preserved through to a trial. Digital forensic evidence can generally be transported by making exact duplicates, at the level of bits, of the original content. This includes, without limit, the movement of the content over networks, assuming adequate precautions are taken to assure its purity during that transportation. Evidence is often copied and sent electronically, on compact disks, or in other media, from place to place. Original copies are normally kept in a secure location in order to act as the original evidence that is introduced into the legal proceedings. If there is any question about the bits contained in the evidence, it can be settled by returning to the original. Facsimile evidence, printouts, and other similar depictions of digital forensic evidence may also be transported, but they are not a good substitute for the original digital forensic evidence in most cases, among other reasons, because they make it far harder, if not impossible, to properly analyze what the original bits were. For example, many different bit sequences may produce the output depictions, and identical bit sequences may produce different output depictions. Care must be taken in transportation to prevent spoliation as well. For example, in a hot car, digital media tends to lose bits.

Increasingly evidence is transported electronically from place to place, and even the simplest errors can cause the data arriving to be incorrect or improperly authenticated for legal purposes. Care must be taken to preserve chain of custody and assure that a witness can testify accurately about what took place, using and retaining contemporary notes, and taking proper precautions to assure that evidence is not spoliated and is properly treated along the way. [1]

Store: In storage, digital media must be properly maintained for the period of time required for the purposes of trial. Depending on the particular media, this may involve any number of requirements ranging from temperature and humidity controls to the need to supply additional power, or to reread media. Storage must be adequately secure to assure proper chain of custody, and typically, for evidence areas containing large volumes of evidence, paperwork associated with all actions related to the evidence must be kept to assure that evidence doesn't go anywhere without being properly traced. Many different sorts of things can go wrong in storage, including, without limit, decay over time, environmental changes resulting in the presence or absence of a necessary condition for preservation, direct environmental assault on the media, fires, floods, and other external events reaching the evidence, loss of power to batteries and other media-preserving mechanisms, and decay over time from other natural and artificial sources.

Analyze: Interpret: Attribute: Analysis, interpretation, and attribution of evidence are the most difficult aspects encountered by most digital forensic evidence examiners. In the digital forensics arena, there are usually only a finite number of possible event sequences that could have produced evidence; however, the actual number of possible sequences may be almost unfathomably large. In essence, almost any execution of an instruction by the computing environment containing or generating the evidence may have an impact on the evidence.

Since it is infeasible to reconstruct every possible sequence to find all of the sequences that may have produced the actual evidence in a any particular case, examiners focus in on large sets of sequences of events and tend to characterize things in those terms. For example, if the evidence includes a log file that appears to be associated with a file transfer, the name of the file transfer program included in the log file will typically be associated with common behavior of that program and used as a basis for the analysis. The user identity indicated in the log file may be associated with a human or group, and this creates an initial attribution that can then be used as a basis for further efforts to attribute to the standard of proof required.

Of course the presence of this record in an audit trail doesn't mean that the program was ever run at all or that the thing the record indicates ever took place or that the user identified caused the events of interest. There are many possible sequences of events that could result in the presence of such a record. For example, and without limiting the totality of possible event sequences, the record could have been placed there maliciously, it could be a record produced by another program that looks similar to the program being considered, it could have been a record produced by the program even though the file transfer failed, the record could have been produced by a Trojan horse acting for the user, or the record could be there because of a failure in a disk write that produced a cross-link between disk blocks associated with different sorts of records.

The examiner seeking to interpret the evidence should seek to take into account the alternative explanations for evidence in trying to understand what actually took place and how certain they are of the assertions they make. It is fairly common for supposed experts to make leaps and draw conclusions that are not justified. For example, an examiner might write a report stating something like "X did Y producing Z" where X is an individual or program and Y is an action that produced some element of the evidence Z. But this is excessive in almost all cases. A more appropriate conclusion might be "Based on the evidence available to me at this time, it appears that X did Y producing Z". And of course it helps if some or many of the alternative explanations have been explored and shown to be inconsistent with the evidence. That's one of the reasons that seemingly irrelevant evidence might be very useful in a legal matter. For example, evidence from system logs might indicate that there were no detected disk errors, system crashes or reboots, or other anomalies reflected in the log files for the period in question, and that therefore, the explanations associated with these sorts of anomalies are inconsistent with the evidence. But without those log files or some other evidence, this conclusion cannot be reasonably drawn.

In networked environments, there are potentially far more sequences of bits that may be relevant to the issues in the matter at hand. As a result, there is potentially far more evidence available, and the analysis and interpretation of that larger body of evidence leads to many more potential analytical and interpretive processes and products. It could be argued that this increases the complexity of analysis exponentially, but in reality, the additional evidence tends to further restrict the number of histories that are feasible in order to retain consistency of interoperation across the evidence. As an example, the file transfer record identified above might be greatly bolstered or flatly refuted by corresponding records on remote systems from which the file was asserted to be downloaded and through which the transfer may have come.

Analysis, interpretation, and attribution of digital forensic evidence are also reconcilable with non-digital evidence and externally stipulated or demonstrated facts. As an example, if the digital forensic evidence appears to show that person X was present at the local console of a computer in Los Angeles, California two hours after they passed through customs and immigration in London, England, even though the network logs from distant systems show that the transfer took place, it is not a reasonable interpretation to assert that the individual was in Los Angeles. Clearly there is another explanation, whether it is two individuals, a remote control mechanism, alteration of multiple logs in multiple systems, alteration of customs and immigration logs, altered time clocks, or any of a long list of other possibilities. While in some venues, the "don't confuse me with the facts" approach may apply, in a legal setting, digital forensic evidence should reconcile with external reality.

Anchor facts that the examiner can testify to are a good example of the interaction between digital forensic evidence and physical reality. An example of an anchor fact is knowledge of time keeping mechanisms on systems that interact with evidence available in the matter at hand. For example, if the examiner operates a system that retains sound records and was synchronized to network time protocol during the period of time at issue, and that system has a record of an email passing through a relevant system that includes time and date stamps, then the time skew between the examiner's system and the relevant system provides an anchor in facts that the examiner can use to make more definitive statements about what took place and when. Interpretation of the evidence can then more definitively assert that, based on the personal knowledge of the witness and the records they have of facts relevant to the matter, a particular record is consistent with a time skew of 18 hours. This may even allow the examiner to explain how the individual could have appeared to have been in London at the same time they appeared to have been in Los Angeles.

Reconstruct: In many cases, the relevance of the evidence is specific to hardware and/or software. While many examiners make the assumption that mechanisms operate according to their specifications, in the information technology arena, where digital forensic evidence originates, there are in fact few standards and they are liberally violated all of the time. Documentation is often at odds with reality, versions of systems and software change at a high rate, and records of what was in place at any given time are often scarce to non-existent. Legal cases also often come to trial many years after the actual events that led to them take place, and evidence that might have been present at the time of the incident at issue may no longer be available by the time is is known to be of import.

In these cases, reconstruction of the mechanisms that produced the records of import may be the only available approach to resolving, to a reasonable level of certainty, what actually could and could not have taken place. For example, if the content of the metadata within a document containing evidence of intent indicates that a particular user identity modified the document on a particular date and at a particular time and that the document was edited for 7 minutes and 23 seconds, but does not show specific modifications made by that individual, and a previous version of the document from an hour earlier written with another user identity does not have the content with the evidence of intent and has an edit time of 5 minutes, and no other documentation exists, then it might appear to be strong evidence that the individual who last wrote the document added the content indicative of intent and did so by editing the document for 2 minutes and 23 seconds.

But this conclusion depends on a set of assumptions surrounding the software in use for editing this document. Even if a current version of this software reliably applies this sorts of metadata, it may be that the version of software in use at the time in question and in the computing environments in question did something quite different. If this is the only evidence of the issue at hand, and the matter is important enough to justify the effort, then a reconstruction of the process by which the digital forensic evidence was created may be necessary to show that the specific version of the software operating in the specific environment at issue could or could not have produced the results contained in the evidence and that other possibilities do or do not exist.

Given that a reconstruction is to be considered, additional determinations must be made. For example, based on the available information, how can a definitive determination be made about the version of the hardware, software, and operating environment be made, and how important is it to precisely reconstruct the original situation down to what level of accuracy and in what aspects? The answer to these and other related questions are tied intimately to the details at issue in the matter at hand.

Present: Evidence, analysis, interpretation, and attribution, must ultimately be presented in the form of expert reports, depositions, and testimony. The presentation of evidence and its analysis, interpretation, and attribution have many challenges, but presentation is only addressed to a limited extent in the literature. [1]

Presentation is more of an art than a science, but there is a substantial amount of scientific literature on methods of presentation and their impact on those who observe those presentations. Aspects ranging from the order of presentation of information to the use of graphics and demonstrations all present significant challenges and are poorly defined.

Some word usage:
Word Definition
suggests imply as a possibility ("The evidence suggests ...") - calls to mind - propose a hypothesis or possible explanation
indicate a summary of a statement or statements or other content codified ("His statement indicates that ...") - a defined set of "indicators" are present and have, through some predefined methodology and identified as such ("The presence of [...] smoke indicates ...")
demonstrate exemplify - show - establish the validity of - provide evidence for ("The reconstruction demonstrates that ...")
correlate a statistical relation between two or more variables such that systematic changes in the value of one variable are accompanied by systematic changes in the other as shown by statistical studies ("Based on this statistical analysis, the use of the "KKJ" account is correlated (p=95%) with ...")
match an exact duplicate ("These two documents have matching publication dates, page counts, ...")
similar a correspondence or resemblance as defined by specified and measured quantities or qualities ("The 18 files were similar in that they all had syntax consistent with HTML, sizes under 1000 bytes, ...")
relate a defined and specified link ("The file system is related to FAT32 in that FAT32 was derived from ...")
associate make a logical or causal connection with basis provided ("I associate these bit sequences with program crashes because ...")

Destroy: Courts often order evidence and other information associated with a legal matter to be destroyed or returned after its use in the matter ends. This applies to trade secrets, confidential patent and client-related information, copyrighted works, and information that enterprises normally dispose of but must retain for the duration of the legal process. Data retention and disposition has extensive literature involving legal restrictions on and mandates for destruction. [9]

There are also significant technical issues associated with destruction of digital data. The processes for destruction in legal matters rarely rise to the level required for national security issues; however, the efforts involved in evidence recovery do, at times, go the extremes. [10][11][14]