NEW APPROACHES
TO DIGITAL EVIDENCE
ACQUISITION AND
ANALYSIS
BY MARTIN NOVAK, JONATHAN GRIER, AND DANIEL GONZALES
Two NIJ-supported projects offer innovative ways to process digital evidence.
C
omputers are used to commit crime, but with the
burgeoning science of digital evidence forensics, law
enforcement can now use computers to fight crime.
Digital evidence is information stored or transmitted in
binary form that may be relied on in court. It can be found on a
computer hard drive, a mobile phone, a CD, and a flash card in a
digital camera, among other places. Digital evidence is commonly
associated with electronic crime, or e-crime, such as child
pornography or credit card fraud. However, digital evidence is now
used to prosecute all types of crimes, not just e-crime. For example,
suspects’ email or mobile phone files might contain critical evidence
regarding their intent, their whereabouts at the time of a crime, and
their relationship with other suspects.
In an effort to fight e-crime and to collect relevant digital evidence for
all crimes, law enforcement agencies are incorporating the collection and analysis of digital evidence into their
infrastructure.
Digital forensics essentially involves a three-step, sequential process:1
1. Seizing the media.
2. Acquiring the media; that is, creating a forensic image of the media for examination.
3. Analyzing the forensic image of the original media. This ensures that the original media are not modified during
analysis and helps preserve the probative value of the evidence.
2 New Approaches to Digital Evidence Acquisition and Analysis
Sifting Collectors has the backlogs. In 2014, there were 7,800 backlogged
cases involving digital forensics in publicly funded
potential to significantly reduce forensic crime labs.3
digital forensics backlogs and To help address these challenges, NIJ funded two
projects in 2014: Grier Forensics received an award
quickly get valuable evidence to develop a new approach to acquiring digital media,
and RAND Corporation received an award to work on
to the people who need it. an innovative means for analyzing digital media. Four
years later, these software applications are coming to
fruition.
Large-capacity media typically seized as evidence
in a criminal investigation, such as computer hard Identifying Disk Regions That May
drives and external drives, may be 1 terabyte (TB) Contain Evidence
or larger. This is equivalent to about 17,000 hours
of compressed recorded audio. Today, media can be Traditional disk acquisition tools produce a disk image
acquired forensically at approximately 1.5 gigabytes that is a bit-for-bit duplicate of the original media.
(GB) per minute. The forensically acquired media are Therefore, if a piece of acquired media is 2 TB in size,
stored in a RAW image format, which results in a then the disk image produced will also be 2 TB in size.
bit-for-bit copy of the data contained in the original The disk image will include all regions of the original
media without any additions or deletions, even for the media, even those that are blank, unused, or irrelevant
portions of the media that do not contain data. This to the investigation. It will also include large portions
means that a 1 TB hard drive will take approximately devoted to operating systems (e.g., Windows 10 or
11 hours for forensic acquisition.2 Although this Mac OSX), third-party applications, and programs
method captures all possible data stored in a piece supplied by vendors such as Microsoft or Apple
of digital media, it is time-consuming and creates (see exhibit 1).
Exhibit 1. Typical Disk Regions
Temp files, history, logs,
browser artifacts
Program files HIGH VALUE
Registry, system metadata
HIGH VALUE
Blank space, never used
Windows OS files
Typical Disk
Source: Courtesy of Grier Forensics.
National Institute of Justice | NIJ.ojp.gov
NIJ Journal / Issue No. 280 January 2019 3
Exhibit 2. Visualization of Disk Regions Sifting Collectors is designed to drop right into existing
practices. The software creates an industry-standard
forensic file — known as an “E01 file” — that is
accessible from standard forensic tools, just like
current imaging methods.5 Grier Forensics is working
with major forensics suite manufacturers to allow
Sifting Collectors to work seamlessly with their
existing tools.
Potential Limitations of Sifting
Source: Courtesy of Grier Forensics.
Collectors
For some cases, such as software piracy, it is Perhaps the most significant drawback of Sifting
important to collect these programs so investigators Collectors is that, unlike traditional imaging, it does
can understand the computer’s original environment. not collect the entire disk. Instead, Sifting Collectors
However, for the vast majority of cases, these discovers which regions of the disk may contain
regions are not important. For most computer evidence and which do not.
forensic investigations, the evidence lies in the
user’s documents, emails, internet history, and any This might not be a significant drawback, however.
downloaded illicit images. Digital evidence is typically handled in one of two
ways:
Grier Forensics proposed a novel approach that
images only those regions of a disk that may contain • The investigators seize and maintain the original
evidence. Called the Rapid Forensic Acquisition of evidence (i.e., the disk). This is the typical practice
Large Media with Sifting Collectors (Sifting Collectors of law enforcement organizations.
for short), this software application bypasses regions • The original evidence is not seized, and access
that contain exclusively third-party, unmodified to collect evidence is available only for a limited
applications and, instead, zeroes in on the regions duration. This is common in cases involving ongoing
that contain data, artifacts, and other evidence. (The intelligence gathering — for example, when law
software can be easily configured to collect third- enforcement has a valid search warrant to collect
party applications when necessary for certain types of evidence but, because of an ongoing investigation,
cases.) does not plan to seize the evidence.
Exhibit 2 is a visualization of disk regions generated by In the second scenario, computer forensics examiners
the Sifting Collectors diagnostic package. The green have a limited time window for entering the site
areas represent user-created files and the black areas and collecting as much evidence as possible.
represent portions of the media that have never been Consequently, they will focus only on the most
used. valuable devices and then image each device,
spending more than half of their time collecting
Sifting Collectors has the potential to significantly unmodified regions (as described above). Sifting
reduce digital forensics backlogs and quickly get Collectors would allow them to accelerate the process
valuable evidence to the people who need it. In and collect evidence from many more devices. Either
laboratory testing,4 it accelerated the imaging process way, given the limited time window, it is difficult
by three to 13 times while still yielding 95 to 100 to collect all digital evidence. The choice for the
percent of the evidence. computer forensics examiner is whether to collect
National Institute of Justice | NIJ.ojp.gov
4 New Approaches to Digital Evidence Acquisition and Analysis
all regions, including blanks, from a small number of Perhaps the drawback that is likely to cause the
devices or to collect only modified regions containing most resistance is simply that Sifting Collectors
evidence from a large number of devices. Sifting necessitates a break with current practice. Indeed,
Collectors allows examiners to make that choice. reluctance to change current practice will be a
substantial obstacle to overcome if Sifting Collectors
When investigators retain the original evidence, the is to achieve widespread adoption.
mitigation is even simpler: Sifting Collectors allows
users to collect and analyze disk regions expected to Accelerating Digital Forensics Analysis
contain evidence. It allows them to acquire evidence
quickly and start the case more rapidly, and it Each year, the time it takes to conduct digital forensics
potentially reduces case backlogs. If, at any time, investigations increases as the size of hard drives
users need to analyze other regions, they can go back continues to increase. With NIJ support, RAND has
to the original and collect those regions. developed an open-source digital forensics processing
application designed to reduce the time required to
Another potential drawback concerns hash conduct forensically sound investigations of data
verification — using an electronic signature or stored on desktop computers. The application, called
verification code, known as a hash, to verify that the Digital Forensics Compute Cluster (DFORC2),
a disk image matches the original evidence disk. takes advantage of the parallel-processing capability
Existing methods of hash verification depend on of stand-alone high-performance servers or cloud-
verifying the entire disk and thus are not compatible computing environments (e.g., it has been tested on
with Sifting Collectors. However, this problem is not the Amazon Web Services cloud).
limited to Sifting Collectors; modern, solid-state drives
(SSDs) are often incompatible with hash verification DFORC2 is an open-source project. It uses open-
because certain SSD regions are unstable due to source software packages such as dc3dd,6 Apache
maintenance operations. In both cases, the solution Kafka,7 and Apache Spark.8 Users interact with
is the same: moving from disk-based verification to DFORC2 through Autopsy, an open-source digital
more granular verification strategies. As the industry forensics tool that is widely used by law enforcement
adopts newer verification strategies to accommodate and other government agencies and is designed to
SSDs, Sifting Collectors will likely benefit as well. hide complexity from the user. RAND has designed
DFORC2 so the application can also use the
The process that Sifting Collectors uses to analyze the Kubernetes Cluster Manager,9 an open-source project
disk and distinguish relevant regions from unmodified that provides auto-scaling capabilities when deployed
or irrelevant ones takes time. The amount of time to appropriate cloud-computing services. (See exhibit
varies greatly based on the disk, but it could be up 3 for a detailed description of how DFORC2 works.)
to 10 percent of the imaging time. This means that
if Sifting Collectors determines that it is necessary to The primary advantage of DFORC2 is that it will
collect the entire disk or nearly all of it, the software significantly reduce the time required to ingest and
will not save the user any time and will, in fact, be process digital evidence. DFORC2’s speed advantage,
somewhat slower than current imaging methods. To however, will depend on two factors. The first factor
help mitigate this, Grier Forensics is using advanced is the speed and memory of the server. For smaller
parallel processing, concurrency, and compression servers (those with 16 GB of RAM or less and an older
algorithms. However, even with these modifications, microprocessor), the original stand-alone version of
Sifting Collectors will end up being slightly slower than Autopsy will perform better than DFORC2. On a larger
traditional imaging in cases where nearly all of the server (one with 28 GB of RAM or more and a new
disk is collected. high-end multicore microprocessor), DFORC2 will be
faster.
National Institute of Justice | NIJ.ojp.gov
NIJ Journal / Issue No. 280 January 2019 5
Exhibit 3. DFORC2 System Architecture
Spark
Streaming Spark Cluster File DESH Cluster Postgres
Job System DB
SMN
Map CMN CWN 1
File
hashing Autopsy
KWS GUI
Disk Kafka SWN etc.
Partition 1 1
Image
Amazon
CWN 2 EFS
File
Kafka SWN hashing
Partition 2 2 KWS
dc3dd Kafka etc. Disk
Blocks,
Hashes
Autopsy Volume
GUI
Kafka SWN
Partition n n CWN n
File
hashing SOLR
KWS Cloud
etc. Server
SMN – Spark Master Node
CMN – Cluster Master Node Autopsy
SWN – Spark Worker Node
Kubernetes GUI
CWN – Cluster Worker Node
KWS – Key Word Search
Source: Courtesy of RAND Corporation.
Note: A compute cluster has its resources organized into a cluster manager and worker nodes. Worker nodes perform computing
tasks assigned to them by the cluster manager. DFORC2 ingests data from the hard drive (using dc3dd) and streams it in “blocks”
to the Apache Spark cluster. Apache Spark worker nodes search for logical file metadata and send their findings to the PostgreSQL
database. Data blocks are hashed before and after receipt to ensure integrity. As the streamed data are received, worker nodes in a
second cluster, the Digital Evidence Search and Hash (DESH) cluster, identify and reconstruct “complete” files and process these files
using local copies of the Autopsy application. An essential part of the core workflow is the reconstruction of the master file system
during the file ingestion process. This is done by the Apache Spark cluster, during rather than after file ingestion, to speed up the
forensics analysis process. The master file system map or table and logical file metadata are stored in the PostgreSQL database.
The second factor is the number of worker nodes that Potential Limitations of DFORC2
can be allocated to the clusters. DFORC2 organizes
resources into a cluster manager and worker nodes. The first potential limitation is the complexity of the
Worker nodes perform computing tasks assigned to current prototype. Currently, distributed computing
them by the cluster manager. More worker nodes will expertise is needed to set up and implement the
significantly reduce evidence ingest and processing stand-alone version of DFORC2. RAND is working to
times. However, there is a limit to the number of simplify its installation on a stand-alone server.
worker nodes that can be implemented on a server,
even one that is equipped with a state-of-the-art A different set of complex tasks is required to
multicore microprocessor. To get the full benefit of implement DFORC2 in a commercial cloud. Although
large numbers of worker nodes, the cloud-based the Kubernetes Cluster Manager simplifies much
version of DFORC2 is needed; the Kubernetes Cluster of the system’s internal setup and configuration, a
Manager can spread data-processing tasks over number of complex steps are required to ensure
multiple machines in the cloud.
National Institute of Justice | NIJ.ojp.gov
6 New Approaches to Digital Evidence Acquisition and Analysis
secure communications with a DFORC2 cloud these approaches will depend on the admissibility of
installation. the evidence each produces. That admissibility will
ultimately be determined by the threshold tests of the
In developing its prototype, RAND is using the Amazon Daubert standard in court. These new approaches
Web Services computing cloud. It communicates with will need to be independently tested, validated, and
the DFORC2 prototype through the firewalls protecting subjected to peer review. Known error rates and the
RAND’s enterprise network. RAND has had to work standards and protocols for the execution of their
through a number of security and firewall exception methodologies will need to be determined. In addition,
issues to enable the smooth installation and startup the relevant scientific community must accept them.
of DFORC2 in Amazon Web Services. This is another
setup and installation issue that RAND is working to RAND will release DFORC2 software code to their
simplify so law enforcement agencies can securely law enforcement partners and members of the digital
access their own DFORC2 cloud installations from forensics research community in the near future.
their enterprise networks. They will test it, find bugs, and improve the code.
Eventually, it will be released as an open-source
Another potential concern with the use of DFORC2 project.
in criminal investigations is the chain of custody for
evidence when commercial cloud-computing services Grier Forensics will release Sifting Collectors to their
are used to process and store evidence. Additional law enforcement partners for field trials to verify
processing and communication steps are involved its preliminary laboratory findings with real cases.
when using DFORC2.10 RAND is conducting a chain- It recently benchmarked Sifting Collectors against
of-custody analysis to strengthen the integrity of the conventional forensic imaging technology and found
digital forensics processing paths used by DFORC2 that Sifting Collectors was two to 14 times as fast as
in a commercial cloud. Additional cloud security conventional imaging technology, depending on the
features can also be enabled to protect user data and mode and the source disk, and produced an image
strengthen the chain of custody in the cloud. file requiring one-third the storage space — and it
still achieved 99.73 percent comprehensiveness (as
Finally, an additional source of concern is how measured by a third-party tool).
compute clusters handle data. The chain-of-custody
analysis now underway will examine this issue and Meanwhile, NIJ plans to have both DFORC2 and
will include a comprehensive review of the distributed Sifting Collectors independently tested by the NIJ-
computing software components used in DFORC2. supported National Criminal Justice Technology
Research, Test and Evaluation Center, which is hosted
Need for Evaluation by the Applied Physics Laboratory at Johns Hopkins
University.
With the support of NIJ, Grier Forensics and RAND are
moving the field forward by developing new means for
processing digital evidence. Grier Forensics’ Sifting About the Authors
Collectors provides the next step in the evolution of
evidence acquisition. RAND’s DFORC2 combines the Martin Novak is a senior computer scientist in NIJ’s
power of compute clusters with open-source forensic Office of Science and Technology. Jonathan Grier
analysis software to process evidence more efficiently. has performed security research, consulting, and
investigation for more than 15 years. He developed
Both of these projects introduce new paradigms new security technology for the Defense Advanced
for the acquisition and analysis of digital evidence. Research Projects Agency, the Massachusetts Institute
Whether the criminal justice community accepts
National Institute of Justice | NIJ.ojp.gov
NIJ Journal / Issue No. 280 January 2019 7
3. Matthew R. Durose, Andrea M. Burch, Kelly Walsh, and
Emily Tiry, Publicly Funded Forensic Crime Laboratories:
of Technology Lincoln Laboratory, the National Institute Resources and Services, 2014 (Washington, DC: U.S.
of Standards and Technology, and the United States Department of Justice, Bureau of Justice Statistics,
Air Force. Daniel Gonzales, Ph.D., is a senior November 2016), NCJ 250151, https://www.bjs.gov/
content/pub/pdf/pffclrs14.pdf.
physical scientist at RAND Corporation. He has
expertise in command, control, and communications 4. The tests used disk images from DigitalCorpora.org, a
website of digital corpora for use in computer forensics
systems; electronic warfare; cybersecurity; digital
education research that is funded through the National
forensics; critical infrastructure protection; and Science Foundation.
emergency communications.
5. Simson L. Garfinkel, David J. Malan, Karl-Alexander Dubec,
Christopher C. Stevens, and Cecile Pham, “Advanced
Forensic Format: An Open Extensible Format for Disk
Imaging,” in Advances in Digital Forensics II, ed. Martin S.
For More Information Olivier and Sujeet Shenoi (New York: Springer, 2006), 13-27.
Read the results of an NIJ-sponsored research effort to 6. The application dc3dd, created by the Department of
Defense’s Cyber Crime Center, is capable of hashing files
identify and prioritize criminal justice needs related to
and disk blocks “on the fly” as a disk is being read. The
digital evidence collection, management, analysis, and application can be downloaded at SourceForge.
use at NIJ.ojp.gov, keyword: 248770.
7. Apache Kafka is an open-source stream processing platform
that provides a unified, high-throughput, low-latency
Read the findings of an NIJ-sponsored expert panel on platform for handling real-time data feeds.
the challenges facing law enforcement when accessing
8. Apache Spark provides an interface for programming entire
data in remote data centers at https://www.rand.org/ clusters with implicit data parallelism and fault tolerance.
pubs/research_reports/RR2240.html.
9. Kubernetes Cluster Manager is an open-source platform
that automates deployment, scaling, and operations of
applications on compute clusters. If the Kubernetes Cluster
Manager is not used (e.g., if DFORC2 is deployed to a single
This article discusses the following grants: server), then the user will fix the number of worker nodes
performing forensics analysis tasks at runtime. Because of
• “Rapid Forensic Acquisition of Large Media with Sifting this, digital forensics analysts using DFORC2 would have to
Collectors,” grant number 2014-IJ-CX-K001 estimate the number of Apache Spark and Digital Evidence
• “Rapid Forensic Acquisition of Large Media with Sifting Search and Hash cluster worker nodes needed for a specific
Collectors,” grant number 2014-IJ-CX-K401 size of hard disk and for a specific type of investigation.
The number of compute nodes needed could depend on
• “Accelerating Digital Evidence Analysis Using Recent many factors, which the analyst may not know before the
Advances In Parallel Processing,” grant number investigation is started. This limitation would likely require
2014-IJ-CX-K102 the analyst to overprovision the cloud compute cluster to
ensure timely processing of the evidence. The Kubernetes
Cluster Manager solves this problem. It is designed
Notes to deploy or shut down cluster computing resources,
depending on the level of demand on each virtual machine.
1. National Institute of Justice funding opportunity, “New Furthermore, it is compatible with a wide range of cloud-
Approaches to Digital Evidence Processing and Storage,” computing environments. The Kubernetes Cluster Manager
Grants.gov announcement number NIJ-2014-3727, posted can deploy applications on demand, scale applications while
February 6, 2014, https://www.ncjrs.gov/pdffiles1/nij/ processes are running in containers (i.e., add additional
sl001078.pdf. worker nodes to compute tasks), and optimize hardware
resources and limit costs by using only the resources
2. Steven Branigan, “Identifying and Removing Bottlenecks
needed.
in Computer Forensic Imaging,” poster session presented
at NIJ Advanced Technology Conference, Washington, DC,
June 2012.
National Institute of Justice | NIJ.ojp.gov
8 New Approaches to Digital Evidence Acquisition and Analysis
10. The DFORC2 chain of custody relies on cryptographic
hashes to verify the content of disk blocks and logical files
found on the hard disk that is the subject of investigation. Image source: PeterPhoto123/Shutterstock
All disk blocks are hashed twice, first by dc3dd when the
disk is read into DFORC2. This hashing takes place outside
the cloud, on a local computer that is used to ingest the
hard disk and stream it into the cloud. Autopsy then hashes NCJ 250700
the disk blocks a second time inside the cloud. These two
hashes can be compared to prove that the copy of the disk
in the cloud is identical to the disk block ingested from Cite this article as: Martin Novak, Jonathan Grier,
the original piece of evidence. Logical files are not hashed and Daniel Gonzalez, “New Approaches to Digital
during data ingestion. However, they can be hashed on the Evidence Acquisition and Analysis,” NIJ Journal 280,
local computer using an accepted standard digital forensics
tool if this is required to verify evidence found in a specific January 2019, https://www.nij.gov/journals/280/
file by DFORC2 in the cloud. All logical file hashes are pages/new-approaches-to-digital-evidence-
retained by DFORC2 in the cloud to enable the analyst to acquisition-and-analysis.aspx.
trace the chain of custody for specific pieces of evidence on
an as-needed basis.
National Institute of Justice | NIJ.ojp.gov