0% found this document useful (0 votes)
39 views16 pages

Forensic Recovery of File System Metadata For Digital Forensic Investigation

This document discusses the importance of recovering file system metadata, particularly from the $LogFile of NTFS, for digital forensic investigations. It highlights the limitations of existing recovery methods that rely on fixed values and proposes a new record-level recovery technique that can retrieve metadata even when traditional methods fail. The authors implemented this method as a tool, verified its effectiveness through experiments, and made it available for free to aid the digital forensic community.

Uploaded by

Carlos Dallos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views16 pages

Forensic Recovery of File System Metadata For Digital Forensic Investigation

This document discusses the importance of recovering file system metadata, particularly from the $LogFile of NTFS, for digital forensic investigations. It highlights the limitations of existing recovery methods that rely on fixed values and proposes a new record-level recovery technique that can retrieve metadata even when traditional methods fail. The authors implemented this method as a tool, verified its effectiveness through experiments, and made it available for free to aid the digital forensic community.

Uploaded by

Carlos Dallos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Received 2 August 2022, accepted 1 October 2022, date of publication 10 October 2022, date of current version 26 October 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3213030

Forensic Recovery of File System Metadata


for Digital Forensic Investigation
JUNGHOON OH 1,2 , SANGJIN LEE 1, AND HYUNUK HWANG2 , (Member, IEEE)
1 School of Cybersecurity, Korea University, Seoul 02841, South Korea
2 The Affiliated Institute of ETRI, Daejeon 34044, South Korea

Corresponding author: Sangjin Lee ([email protected])

ABSTRACT File system forensics is one of the most important elements in digital forensic investigations.
To date, various file system forensic methods, such as analysis of tree structure and the recovery of deleted
file data, have been studied. Among these file system forensic methods, the recovery of file system metadata
is a key technique that makes digital forensic investigations possible by recovering metadata when it is not
possible to obtain metadata in a regular manner because the file system structure is damaged due to an
accident/disaster or cyber terrorism. Previous studies mainly focused on recovering record or entry data,
which are the basic units of metadata, using carving techniques via a fixed values or values capable of range
prediction at the beginning of the data. However, no studies have been conducted on metadata without such
fixed values or values capable of range prediction. $LogFile, which is a metadata file of the New Technology
File System (NTFS) that is one of the most used file systems at present, contains very important metadata
that provide a history of all file system operations during a specific period. However, since there is no fixed
value or a value capable of range prediction at the start position of the record, which is the basic unit of
$LogFile, there have been no studies on recovery using record units, and only recovery by file and page
have been possible. If the file header or page header of $LogFile is damaged, existing recovery methods
cannot properly recover the metadata; in such cases, a record-level recovery method is required to recover
the metadata. In this context, we investigated the mechanisms of record storage through a detailed analysis
of the $LogFile structure and proposed a recovery method for records without fixed values. Our proposed
method was implemented as a tool and verified through comparative experiments with existing forensic
tools that recover $LogFile data. The experimental results showed that the proposed recovery method was
able recover all the data that existing tools are unable to recover in situations where the $LogFile data were
damaged. The implemented tools are released free of charge to contribute digital forensic community. Finally,
we explained what important role $LogFile played in solving real-world cases and confirm the importance
of recovering $LogFile data in situations where file systems may be damaged due to accidents and disasters.

INDEX TERMS File system, forensics, metadata, forensic recovery.

I. INTRODUCTION for forensic investigators. Various forensic methods, such as


File system forensics is a branch of digital forensics that analysis of tree structure [1], [2], [3], [4] and the recovery
analyzes the structure of a file system and examines its con- of deleted file data [5], [6], [7], have been studied to find
tents [1]. Because most of the evidence that can be obtained important evidence in file systems. Among these file system
during a digital forensic investigation is stored in file units, forensic methods, the recovery of file system metadata is
file system forensics is the most basic and important factor a key technique that makes digital forensic investigations
possible by recovering metadata when it is not possible to
The associate editor coordinating the review of this manuscript and obtain metadata in a regular manner because file system
approving it for publication was Aniello Castiglione . structure is damaged due to an accident/disaster or cyber

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.


VOLUME 10, 2022 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 111591
J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

terrorism [8], [9]. Additionally, recovering file system meta- such as file-level event generation [17] and tracking file data
data from unallocated areas can help to discover additional history [19].
evidence [10]. - This paper provided a comparative performance evalu-
Previous studies mainly focused on recovering record or ation of our method relative to well-known forensic tools
entry data, which are basic units of metadata, using carving developed to recover $LogFile data.
techniques via a fixed value or a value capable of range - The implemented tools are released free of charge to
prediction at the beginning of the data [10], [11], [12], [13], contribute to the digital forensic community.
[14], [15], [16]. Indeed, no studies have been conducted on The remainder of this paper is organized as follows:
metadata without such a fixed or a value capable of range Section 2 introduces previous studies, and Section 3 explains
prediction. the need to recover $LogFile data by using specific exam-
$LogFile, a metafile of the New Technology File System ples. Section 4 introduces the internal structure of $LogFile.
(NTFS), is a log file that records data from such file system Section 5 details how to recover $LogFile data based on
transactions as file creations, deletions, data changes, and the record carving technique. Section 6 introduces the devel-
name changes. Transaction data are stored in record units, oped tools and experimentally evaluates their performance
where each record contains ‘‘redo’’ data and ‘‘undo’’ data. in comparison with currently available tools. In Section 7,
If the file system is broken due to a system error, such as we introduce a case study involving the use of the proposed
a sudden power outage, the operating system restores the tool. Finally, Section 8 summarizes the conclusions of this
file system to a normal state by using the ‘‘redo’’ data and study.
‘‘undo’’ data of records in $LogFile [1]. Therefore, $LogFile
is an important artifact that provides information about all file II. RELATED WORKS
operations in the file system and history of file data changes Previous studies on the recovery of file system metadata have
during a specific period [17], [18], [19]. been mainly conducted on NTFS, APFS, and Ext4 file sys-
If a Windows system that uses NTFS as the main file tems, which are the main file systems of the most commonly
system is abnormally stopped due to an accident/disaster or used operating systems. In addition, studies on the recovery
a cyber terrorism, analyzing $LogFile can help identify the of metadata using timestamps have also been conducted.
file that was created before the system stopped and exactly
when it stopped. However, the file system in such cases can be A. RECOVERY OF NTFS METADATA
damaged arising from physical damage to the storage device 1) $MFT
or anti-forensic behavior, thereby causing a situation where Fuchs [11] performed a study on recovering data of master
the $LogFile data cannot be acquired and analyzed normally. file table ($MFT), the metadata for storing information on
Previous studies regarding the recovery of $LogFile data files and directories in NTFS, to entry units, which is the basic
have been conducted on file unit recovery [15] and page unit storage unit. The recovery of an entry was done by carving
recovery [16]. If a part of a $LogFile, such as a file header the entry using ‘‘FILE’’, which is a fixed value located in
or a page header, is corrupted, there is a limitation in these front of the entry. Fuchs studied how to extract the file name
data recovery is not performed properly. In order to recover and timestamp from a recovered $MFT entry, and recovery
the data as much as possible even when the $LogFile data of the broken entry was also conducted. The tools of the
are damaged, the recovery of record units is necessary, but study were implemented and released as MFTEntryCarver.
since the starting position of the $LogFile record does not Additionally, Fuchs speculated that the reason why a large
have fixed values nor a value capable of range prediction, number of $MFT entries remain in unallocated areas is that a
there has been no previous research on the recovery of record part of the memory is saved to the disk in the form of a crash
units. In this context, we investigate the mechanisms of record dump file and then deleted and left.
storage through a detailed analysis of the $LogFile structure
and propose a recovery method for records without fixed 2) $USNJRNL
values. This method is implemented as a tool and verified Oh [10] researched recovering $UsnJrnl data, the metadata
through comparative experiments with existing forensic tools that stores the file change history of NTFS, to record units,
that recover $LogFile data. which is a basic storage unit. The recovery of $UsnJrnl record
This study makes the following contributions to the data was performed by carving records using the version
literature: information and the predicted value of the record size located
- We introduced an algorithm to recover $LogFile records at the beginning of the record; additional predictable field
based on record carving. To the best of our knowledge, this values were checked to minimize false positives. He added
paper is the first systematic study regarding the recovery of a researched recovery function to the NTFS Log Tracker [17]
$LogFile record data. and released it. In addition, he tested how many $UsnJrnl
- The proposed method recovers all records which are not records were recovered from unallocated area in hard disks,
damaged, even if it is not possible to obtain record data in a and it was confirmed that change journal data up to 11 months
regular manner, and makes the existing techniques available, old could be obtained.

111592 VOLUME 10, 2022


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

3) $INDX fixed value. The proposed recovery method was implemented


Segev [12] studied how to recover deleted index entries as a module in Sleuthkit framework and released.
remaining in the slack area of the $INDX attribute, which
stores directory index data of NTFS. The recovery of an entry D. RECOVERY OF TIMESTAMP
was carried out using predictable values of the $FILE_NAME Nordvik [20] performed a study to recover metadata using
attribute in an entry, and the file name and timestamp the characteristics of timestamps included in the metadata of
information of the deleted file could be extracted from the each file system. The recovery method looked for correspond-
$FILE_NAME attribute in the recovered index entry. This ing patterns using a feature in which the timestamp of the
allowed for additional acquisition of traces of deleted files same value was continuously stored, and then carving of the
that could not be obtained from $MFT data. He imple- metadata including the pattern was performed. In this study,
mented the researched recovery function as INDXRipper and Nordvik performed tests to recover $MFT entries of NTFS
released it. and inodes of Ext4 using the proposed recovery method.
Porter [21] improved Nordvik’s research by recovering meta-
4) $LOGFILE data using a continuous pattern of similar timestamps with
Previous studies on $LogFile, another type of metadata that the same prefix. This allowed the recovery accuracy to be
stores file change history in NTFS, conducted only for the improved by recovering additional metadata not found in
recovery of file units and page units. The X-Ways Foren- Nordvik’s original approach.
sics [15] provided the recovery of $LogFile data to file units.
The recovery of the file was perform using the method of III. THE NEED FOR RECOVERY OF $LogFile DATA
carving the files using the fixed value ‘‘RSTR’’ of the restart A. ACCIDENT/DISASTER
page located at the beginning of $LogFile file. However, if the If a Windows system stops suddenly due to an acci-
restart page in $LogFile was corrupted or the logging page dent/disaster, data on the $MFT entry stored in the mem-
was corrupted, the recovery of the file could not be performed ory may not be saved to the disk due to the file caching
properly. Bulk Extractor, developed by Garfinkel [16], recov- mechanism of the Windows system [22]. A representative
ered the restart page and the logging page of $LogFile, and example of this situation is the sinking of the MV Sewol ferry
provided a function for storing each recovered page in one in South Korea in 2014 [8]. In the accident, the Windows
file. The recovery of a page was achieved by carving pages XP-based CCTV system of the ferry stopped working as it
using ‘‘RSTR’’ and ‘‘RCRD’’, which are fixed values of the sank, but the exact time at which it had stopped and data from
frontmost portion of each page. However, if the header of the last CCTV video that had recorded the capsizing could not
the page was damaged, the recovery of a page could not be be obtained using only data on the MFT entry left on the disk.
performed properly. In this case, analyzing the $LogFile data can help deter-
mine the exact time at which the system stopped and the
B. RECOVERY OF APFS METADATA location of data from the last CCTV video. This is because
Plum [6] studied the recovery of deleted files by restoring the $LogFile immediately stores data for file system recovery
the metadata of APFS, such as superblocks and volume on the disk [23].
superblocks. Each recovery of metadata used a method of However, fires and flooding caused by accidents/disasters
carving blocks using ‘‘NXSB’’ and ‘‘APSB’’, which are fixed can physically damage disks of the system such that even if
values at the front of the data. The metadata recovery method the data are recovered, the normal structure of the file system
in the study was performed for the recovery of deleted files, or the file data may be damaged.
but the recovery method could also be sufficiently utilized Therefore, even in such a case, there is a need for a method
to build a file system tree even when the file system could to recover the $LogFile data.
not be analyzed normally due to damage of the file system.
In addition, LSoft Technologies [13] revealed a method of B. CYBER TERRORISM
recovering nodes of the B-tree used to construct the file Cyber terrorism refers to the act of interrupting or destroying
system tree through a carving technique. This method could systems or networks for political purposes by using the Inter-
be used to recover metadata of files and directories. net [24]. A cyber terrorism against a Windows system can
destroy the metadata of the file system so that it no longer
C. RECOVERY OF Ext4 METADATA operates. A representative example is the ‘‘Dark Seoul’’
Dewald [14] studied how to construct a file system tree by campaign, a cyber terrorism that occurred in South Korea
recovering inodes, which store metadata in files/directories, in 2013 [9].
even if the superblock or group descriptor table of the Ext4 The malware used in the DarkSeoul campaign carried out
file system is damaged and the file system cannot be ana- an attack that overwrites the front of the volume, including
lyzed. The recovery of inodes was achieved using a method of the MBR (Master Boot Record) and the VBR (Volume Boot
carving by creating a pattern through predictable values such Record), with specific value (‘‘PRINCPES’’ or ‘‘HASTATI’’)
as access rights, timestamps, and link counts rather than a [25]. In such a case, analyzing the $LogFile can help obtain

VOLUME 10, 2022 111593


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

traces of malware and file data used before the system was of the first two pages in case of LFS 1.1 and the first 32 pages
destroyed and stopped [17]. in case of LFS 2.0 [27].
However, the method of disk destruction used in the Dark- The remainder of the area excluding the buffer logging area
Seoul campaign sometimes damaged some of the $LogFile is the normal logging area. Records are sequentially stored;
data located in front of the NTFS volume. In this case, when it is full, it is once again stored at the front of the normal
the data could not be obtained using the existing recovery logging area [1]. (See. Fig. 1.)
methods [26]. A page in the logging area consists of a header and multiple
Therefore, it is important to recover $LogFile data even records. Each record stores the operation content for MFT
if $LogFile is not accessible in a normal manner through entry, and multiple records are sequentially gathered to form
file system analysis. Moreover, even if the $LogFile data are a single transaction. The first and middle records of a trans-
corrupted, it is important to recover data from an intact area. action are update records, and the last record is a commit
record [1]. The record consists of header, redo data, and undo
IV. INTERNAL STRUCTURE OF $LogFile data. This LSN field of the record header stores the LSN (Log
The $LogFile consists of pages that are 4096 bytes in size. Sequence Number) indicating its order and the Previous LSN
The pages are divided into a restart area and a logging field stores the LSN of the previous record. (See. Fig. 3.)
area [1]. (See. Fig. 1.) The details of each field of the record header are shown in
Appendix [17].

FIGURE 1. Structure of $LogFile. FIGURE 3. Header of record.

The restart area is composed of two consecutive pages The LSN is 64 bits in size, and consists of a sequence num-
at the beginning of the file, and the second page is used ber and an offset number. The sequence number is a value that
for backup [1]. Each page header in the restart area begins increases by one each time a normal logging area is toured,
with an ‘‘RSTR’’ magic number, and stores the LFS (LogFile and the offset number is the value used to obtain a record
Service) version by dividing it into a MinorVersion field and offset representing the location of the corresponding record
a MajorVersion field. Typically, LFS version 1.1 is used by based on the start location of the $LogFile. The record offset
operating systems prior to Win10 and version 2.0 is used of the corresponding record can be obtained by multiplying
for those subsequent to it [27]. Additionally, the FileSize the offset number by eight. The sequence number is located at
field in the restart page header stores the $LogFile size [23]. the top of the LSN, and has the number of bits corresponding
(See. Fig. 2.) to the SeqNumberBits field of the restart page header. The
offset number is located at the bottom [27]. Fig. 4 shows the
structure of the LSN.

FIGURE 4. Structure of LSN. (Log Sequence Number.)


FIGURE 2. Header of the restart page.

The redo and undo data of the record are used to per-
The logging area is divided into a buffer logging area form operations corresponding to the redo/undo OP of the
and a normal logging area, and stores records that contain record header, and are stored with a size corresponding to the
transaction data. The buffer logging area is an area where redo/undo length of the header in the position corresponding
records are temporarily stored before being stored in the to the redo/undo offset of the header [17].
normal logging area. Therefore, the most recent records are The header of the logging page starts with an ‘‘RCRD’’
stored in this area [17]. The buffer logging area is composed magic number, and stores the largest LSN among the records,

111594 VOLUME 10, 2022


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

including the record going over the page within the page, TABLE 1. Signature for record carving.
in the ‘‘Last LSN’’ field. The largest LSN among records that
do not spill over the page is stored in the ‘‘Last End LSN’’
field of the header. The detailed structure of the logging page
header is shown in Fig. 5 [17].

by analyzing the restart and the logging pages recovered


from the page carving operation. Once this structure has
been determined, the scopes of the buffer logging area and
FIGURE 5. Header of page in logging area. the normal logging area can be accurately known, and the
recovered logging pages can be classified into buffer and
normal logging pages. The structure of the $LogFile should
V. RECOVERY METHOD be identified in this way because the appropriate recovery
A. BASIC CONCEPT OF RECORD CARVING technique can subsequently be applied to each area and log-
If the $LogFile has been damaged, a record-based method ging page by knowing the exact scopes of each logging area
of recovery is required to obtain as much data from it as and type of logging page. Following this, logging pages that
possible. However, because the $LogFile record does not are uncarved due to damage to the header are recovered in
have a fixed value, such as a magic number, information on each logging area because there is a high likelihood that the
the signature that can determine whether a given record is a $LogFile data remains even on uncarved pages. The uncarved
$LogFile record must be generated for record carving. pages are recovered by creating a virtual header through a
The ‘‘This LSN’’ field located at the forefront of the $Log- record carving technique according to the characteristics of
File record stores the LSN of the current record. Therefore, each logging area. Then, sorting work is performed for all
the ‘‘This LSN’’ value cannot be used as a magic number for logging pages, including the recovered ones, because the
carving because it has different for each record. On the con- pages must be analyzed to avoid missing records. Finally,
trary, because all records in a page have the same sequence record carving is performed on the sorted logging pages to
number, the sequence number that can be extracted from obtain the $LogFile data. Fig. 6 is a summary of the entire
the ‘‘This LSN’’ value can be used as a magic number for process of recovering $LogFile data through record carving.
carving records located in one page. The sequence number We now detail the process of recovering the $LogFile data.
that all records in the page have in common can be obtained
from the Last LSN value of the logging page header. For C. STEP 1: CARVING $LogFile PAGES
this, the OffsetNumberBits value is first obtained from the The first task when recovering $LogFile data is to carve
SeqNumberBits value obtained from the restart page header. the pages. Page carving is performed by using the magic
The sequence number can then be obtained by performing numbers of the restart page (‘‘RSTR’’) and the logging page
a right-shift operation on the Last LSN value of the logging (‘‘RCRD’’), and by using values that are fixed in each page
page header by using the OffsetNumberBits value. header. Table 2 shows the header signature for carving the
OffsetNumberBits = 64 - SeqNumberBits $LogFile page.
Sequence Number = PageHeader.LastLSN  OffsetNumberBits The range of page carving operations can be divided into
If sequence number is obtained as described above, the two cases. First, when the restart page is carved, the carving
signature consisting of the conditions provided in Table 1 can operation starts at the location at which the first restart page
be generated for record carving. Record carving is then per- was carved and ends at the start location plus the size of the
formed by searching for the data on the logging page in units $LogFile. The size of the $LogFile can be obtained from the
of eight bytes, finding a start location for the record that meets header of the carved restart page. This method is feasible
the conditions of the record carving signature, and identifying because the $LogFile file is generally created without frag-
the size of the record through the Client Data Length value of mentation during the formatting of the file system. Second,
its header and analyzing it. if the restart page is not carved, page carving is performed
only over a range of size of 64 MB, starting from the location
B. OVERVIEW OF RECOVERY OF $LogFile DATA where the first logging page was carved. This method is
The overall process of $LogFile data recovery is as follows: used because the $LogFile increases in size depending on
The pages, which are the basic components of the $LogFile, the volume, but only up to 64 MB. Therefore, unless the user
are first restored by carving with the magic number of the directly adjusts its size, the size of the $LogFile in the volume
page header. The structure of the $LogFile is then determined is generally 64 MB.

VOLUME 10, 2022 111595


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

FIGURE 6. Overview of recover $LogFile data.

TABLE 2. Header signature for page carving. logging area and the normal logging area can be accurately
identified. Table 3 shows ranges of the buffer logging area and
the normal logging area based on the location of the restart
page for each LFS version. PFRP stands for the ‘‘position of
the first restart page’’ and PSRP stands for the ‘‘position of the
second restart page.’’ The size of the $LogFile can be obtained
from the restart page header.
In this way, each logging page recovered through carving in
the previous step can be classified into a buffer logging page
or a normal logging page according to the area information
in the Table 3.

TABLE 3. Range of buffer/normal logging area.

When page carving is performed by specifying the range


in the above way, it is not necessary to perform page carving
for the entire volume such that the time needed can be greatly
reduced.

D. STEP 2: IDENTIFYING THE STRUCTURE OF $LogFile


Once page carving is complete, we need to determine the
structure of the $LogFile. If the ranges of the buffer logging
area and the normal logging area as well as the types of carved
logging pages can be accurately identified using knowledge
of the structure of the $LogFile, an appropriate recovery
technique can be applied to the logging pages in each area.
The logging pages classified in this operation are stored in
the list of buffer logging page and the list of normal logging
page, respectively.
The structure of the $LogFile depends on the LFS version, 2) IN CASE ONE RESTART PAGE IS CARVED
which can be obtained from the header of the restart page. If only one restart page is carved, we should first determine
Therefore, the method used to determine the structure of the whether the carved restart page is the first or the second. If the
$LogFile varies as described below, depending on whether order can be determined, the range of each logging area is
the restart page is carved and the number of carved restart identified through the contents of Table 3.
pages. Two methods can be used to determine the order of carved
restart pages: by using the difference between a buffer logging
1) IN CASE ALL RESTART PAGES ARE CARVED page and a normal logging page for each LFS version, and by
If both restart pages are carved, the start position of the using the adjacency in physical location between the restart
$LogFile data is fixed such that the ranges of the buffer page and the buffer logging page.

111596 VOLUME 10, 2022


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

The method that uses the difference between the buffer log- logging pages can be used to this end. The relevant method
ging page and the normal logging page for each LFS version is based on the fact that the offset number calculated through
is as follows: In the case of LFS 1.1, the Last LSN field in the the Last LSN of the header remains the same or continuously
header of the buffer page stores a file offset value, instead increases as the physical address increases in the normal log-
of the Last LSN value. The file offset value indicates the ging area, but not in case of the buffer logging area. Therefore,
location of the normal logging page from which data on while checking the offset number generated through the Last
the corresponding buffer logging page are copied from within LSN value of the header of the carved logging pages in turn,
the $LogFile. Thus, the file offset value of the buffer logging we check whether this value is the same or has increased—
page is always a multiple of the page size (typically 0×1000). over three pages in case of LFS 1.1 and over 33 pages in
On the contrary, the Last LSN value of the header of the case of LFS 2.0. If such continuous page flow is obtained,
normal logging page cannot be a multiple of the page size the first page is considered to be the start page of the normal
because if this is the case, the record offset obtained from the logging area. All logging pages at the back, including the
LSN is also a multiple of the page size. This means that the corresponding page, are classified as normal logging pages,
record with the LSN corresponding to the ‘‘This LSN’’ value and all logging pages at the front are classified as buffer
is located in the header, which is the starting position of the logging pages.
page. Such a case is not possible. In case of LFS 1.1, the file Algorithm 1 is a pseudocode that identifies the range of
offset value of the header of the buffer logging page is thus a each logging area and classifies logging pages when only one
multiple of page size and the Last LSN of the normal logging restart page is carved by using the methods described above.
page header is not. Therefore, the order of the restart pages
can be determined by checking whether such a difference 3) IN CASE NO RESTART PAGE IS CARVED
exists between the logging pages at the point determined to be If no restart page is carved, it is necessary to first check
the boundary between the buffer logging area and the normal the similarity in data between the buffer logging pages in
logging area. LFS 1.1. The first and second logging page located immedi-
The method used in LFS 1.1 cannot be used in case of LFS ately after the restart page are buffer logging pages in LFS 1.1.
2.0 because all buffer logging pages have the Last LSN value These two pages contain data before and after the operational
in their headers. In this case, we can use the fact that the record data of the record are applied to the $MFT entry. The data
offset value calculated from the Last End LSN of the header on both pages are thus the same except for the last record
of the buffer logging page does not indicate the area in which (see Fig. 7).
the corresponding buffer logging page is located, unlike in
case of the normal logging page. This method is possible
because the buffer logging page contains data to be copied to
the normal logging page, and thus the record offset calculated
through the Last End LSN of the header of the buffer logging
page is the address of the area of the normal logging page
to be copied later, and not the address of the area where the
current buffer logging page is located. In case of LFS 2.0,
the record offset value calculated through the Last End LSN
of the header of the buffer logging page thus does not point
to an area of its own page, but that calculated through the
Last End LSN of the header of the normal logging page does.
Therefore, as in case of LFS 1.1, the order of the restart pages
can be determined by checking whether such a difference
exists between the logging pages at the point determined to be
the boundary between the buffer logging area and the normal
logging area.
The second method to determine the order of the carved
restart pages uses the adjacent physical positions of the buffer
FIGURE 7. Buffer logging pages in LFS 1.1.
logging page and the restart page. Given that the buffer log-
ging page is located immediately after the second restart page,
the carved restart page can be determined to be the second Therefore, if the first and second logging pages that are
restart page if the logging page is carved immediately after carved are physically adjacent to each other, and 64 bytes of
the carved restart page. This method can be used regardless data after the header of the two pages are the same, the LFS
of the version of LFS. can be identified as version 1.1. In this case, the locations
If the order of restart pages cannot be determined through of the two buffer logging pages can be known, and thus all
the above two methods, the range of each logging area should logging pages carved behind the two buffer logging pages can
be identified regardless of this order. The continuity of normal be classified as normal logging pages.

VOLUME 10, 2022 111597


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

Algorithm 1 Determining the Structure of $ LogFile When of the restart page. Therefore, a method to obtain the Off-
One Restart Page Is Carved setNumberBits value is required even in this case. One such
1: if LFS version == 1.1 then method uses the difference in the sequence numbers of the
2: if there are carved logging pages at +0x3000 and +0x4000 from
carved_restart_page_addr then logging pages when there is a circulation in the normal log-
3: result_1 ← 0x3000_page_last_lsn % 0x1000 ging area. The difference between the sequence number from
4: result_2 ← 0x4000_page_last_lsn % 0x1000
5: if result_1 == 0 && result_2 != 0 then
the smallest Last LSN and that from the largest Last LSN is
6: carved restart page is first restart page one in this case (see Fig. 8).
7: buffer_logging_area_start_addr ← carved_restart_page_addr+0x2000
8: normal_logging_area_start_addr ← carved_restart_page_addr+0x4000
9: else
10: carved restart page is second restart page
11: buffer_logging_area_start_addr ← carved_restart_page_addr+0x1000
12: normal_logging_area_start_addr ← carved_restart_page_addr+0x3000
13: end if
14: else
15: if first carved logging page is located after carved restart page then
16: carved restart page is second restart page
17: buffer_logging_area_start_offset ← carved_restart_page_offset+0x1000
18: normal_logging_area_start_offset ← carved_restart_page_offset+0x3000 FIGURE 8. Status of sequence numbers in case of rotation.
19: else
20: Find 3 consecutive pages with the same or increasing offset_number
from last_lsn Therefore, with respect to the smallest and the largest Last
21: normal_logging_area_start_addr ← detected_page_addr LSN values, we calculate the difference between the sequence
22: all carved pages are buffer pages before normal logging area
23: end if numbers in each case while increasing the OffsetNumberBits
24: end if value from 1 to 63. When the difference becomes one for
25: else if LFS version == 2.0 then
the first time, the corresponding OffsetNumberBits value is
26: if there is carved logging pages which located at +0x21000 from
carved_restart_page_addr then determined to be that of the $LogFile.
27: temp_value ← 0x21000_page_last_end_lsn << sequence_number_bits Another method can be used when there is no circulation in
28: offset_number ← temp_value >> sequence_number_bits
29: record_offset ← offset_number × 8 the normal logging area. This method is based on the fact that
30: if record_offset > 0x22000 && record_offset < 0x23000 then the position of the record according to the starting position
31: carved restart page is second restart page
32: buffer_logging_area_start_addr ← carved_restart_page_addr+0x1000
of the $LogFile can be obtained by multiplying the offset
33: normal_logging_area_start_addr ← carved_restart_page_addr+0x21000 number obtained from the This LSN value of the record by
34: else eight. The OffsetNumberBits value is adjusted according to
35: carved restart page is first restart page
36: buffer_logging_area_start_addr ← carved_restart_page_addr+0x2000 the size of the $LogFile. And the record offset, which is
37: normal_logging_area_start_addr ← carved_restart_page_addr+0x22000 eight times the maximum offset number that can be expressed
38: end if
39: else
with the OffsetNumberBits value, must be at least equal to or
40: if first carved logging page is located after carved restart page then greater than the file size in order to express the position of the
41: carved restart page is second restart page
last record at the end of the file (see Fig. 9).
42: buffer_logging_area_start_addr ← carved_restart_page_addr+0x1000
43: normal_logging_area_start_addr ← carved_restart_page_addr+0x21000
44: else
45: Find 33 consecutive pages with the same or increasing offset_number
from last_lsn
46: normal_logging_area_start_addr ← detected_page_addr
47: all carved pages are buffer logging pages before normal logging area
48: end if
49: end if
50: end if

FIGURE 9. Relationship between the size of the $LogFile and the offset
number.
If each logging area cannot be identified using the above
method, the method that uses continuity of the normal logging The method used to obtain the OffsetNumberBits value is
pages, described above, should be used. While checking the as follows: The estimated size of the $LogFile is first obtained
offset number value generated through the Last LSN value by adding 0×3000, which is the sum of the sizes of the restart
of the header of the carved logging pages in turn, we check area and the last logging page, to the difference between
whether this value is the same or has increased over 33 pages. the location of the first and the last carved logging pages
If such continuous page flow is obtained, the first page of the (see Fig. 10).
flow is considered to be the start page of the normal logging Then, the maximum offset number is obtained by dividing
area. the estimated size of the $LogFile by eight, and its value is
If the restart page is not carved, the sequence number changed to a binary number. The position of the highest 1 in
cannot be acquired and the record carving signature cannot the binary number is identified. The value of OffsetNum-
be generated through the OffsetNumberBits value because berBits represents the distance between the bottom and the
the SeqNumberBits value cannot be obtained from the header position of the highest 1. Algorithm 2 is a pseudocode for

111598 VOLUME 10, 2022


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

carved logging pages. The page that has not been carved
within the area is then searched for. If an uncarved page is
found, a record carving signature is created based on the
sequence number collected earlier. The generated signature
is then used to search forward in units of eight bytes from
the end of the page to find the last record, including records
that go beyond the page. If the last record is found, its This
LSN value is the Last LSN value of the uncarved page.
FIGURE 10. Calculating the size of the $LogFile. Algorithm 3 is a pseudocode used to obtain the Last LSN
value by searching for the last record in the page.

finding the OffsetNumberBits value of the $LogFile when the Algorithm 3 Obtaining the Last LSN From Uncarved Pages
restart page is not carved. 1: function CheckPageForLastLSN(current_file_offset, seq_num[])
2: page_buffer[] ← GetPageData(current_file_offset)
3: current_position ← 0xFC8
Algorithm 2 Find the OffsetNumberBits of $LogFile 4: detection_position ← -1
1: if there is a rotation in normal logging area then 5: find_flag ← FALSE
2: last_lsn_1 ← the smallest last_lsn in normal logging area 6: page_seq_num ← -1
3: last_lsn_2 ← the biggest last_lsn in normal logging area 7: while current_position >= 0 do
4: find_flag ← FALSE 8: record_header ← page_buffer[current_position]
5: for i=1;i<64;i++ do 9: this_lsn_seq_num ← record_header.this_lsn >> offset_num_bits
6: seq_num_1 ← last_lsn_1 >> i 10: for seq_num in seq_num[] do
7: seq_num_2 ← last_lsn_2 >> i 11: if this_lsn_seq_num == seq_num then
8: if seq_num_2 - seq_num_1 == 1 then 12: pre_lsn_seq_num ← record_header.previous_lsn >> offset_num_bits
9: find_flag ← TRUE 13: if pre_lsn_seq_num == seq_num
10: offset_number_bits ← i || record_header.previous_lsn == 0 then
11: break 14: undo_lsn_seq_num ← record_header.undo_lsn >> offset_num_bits
12: end if 15: if undo_lsn_seq_num == seq_num
13: end for || record_header.undo_lsn == 0 then
14: if find_flag == FALSE then 16: if record_header.record_type == 0x1
15: offset_number_bits ← 24 || record_header.record_type == 0x2 then
16: end if 17: if record_header[0x31] == 0x0
17: else || record_header[0x33] == 0x0 then
18: file_offset_1 ← file offset of first carved logging page 18: detection_position ← current_position
19: file_offset_2 ← file offset of last carved logging page 19: find_flag ← TRUE
20: file_size ← (file_offset_2 - file_offset_1) + 0x3000 20: page_seq_num ← seq_num
21: maximum_offset_number ← file_size / 8 21: break
22: binary_arr[64] ← ConvertUint64ToBinaryArray(maximum_offset_number) 22: end if
23: for i=63;i»0;i- do 23: end if
24: if binary_arr[i] == 1 then 24: end if
25: offset_number_bits ← i 25: end if
26: break 26: end if
27: end if 27: end for
28: end for 28: if find_flag == TRUE then
29: end if 29: break
30: end if
31: current_position ← current_position - 0x8
If the OffsetNumberBits value cannot be obtained through 32: end while
the two methods described above, it is assigned the value 24, 33: if detection_position >= 0 then
34: record_header ← page_buffer[detection_position]
which corresponds to the most common size of the 35: next_record_position ← record_header.client_data_length + detection_position
$LogFile (64 MB). + 0x30
36: if next_record_position >= 0x1000 then
37: last_lsn ← record_header.this_lsn
E. STEP 3: RECOVERING UNCARVED LOGGING PAGES 38: return last_lsn
39: else
Once the structure of the carved logging pages has been iden- 40: record_header ← page_buffer[next_record_position]
tified, logging pages that have not been carved due to damage 41: this_lsn_seq_num ← record_header.this_lsn >> offset_num_bits
to the page header should be recovered. This is because the 42: if this_lsn_seq_num == page_seq_num then
43: last_lsn ← record_header.this_lsn
$LogFile data are likely to persist even in logging pages that 44: return last_lsn
have not been carved. 45: else
46: record_header ← page_buffer[detection_position]
47: last_lsn ← record_header.this_lsn
1) IN CASE RESTART PAGE IS CARVED 48: return last_lsn
49: end if
Once the restart page has been carved, the ranges of the buffer 50: end if
logging area and the normal logging area can be accurately 51: end if
identified through Table 3. Uncarved pages for each logging 52: end function

area are recovered as follows:


In case of the normal logging area, the sequence number When the Last LSN value is obtained in this way, a virtual
is first collected through the Last LSN in the header of the header is created and Last LSN is set to recover the damaged

VOLUME 10, 2022 111599


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

header of the uncarved page. If the last record is not found In case of LFS 2.0, virtual headers of the uncarved logging
and the last LSN value is not obtained, and the virtual header pages are created in the same way as in the normal logging
uses the last LSN value of the page located directly in front area to execute recovery because the Last LSN value is used in
of it. Normal logging pages recovered by creating a virtual the header of the buffer logging page. As such, the recovered
header in this process are added to the list of normal logging buffer logging pages in each LFS version are added to the list
pages. of buffer logging pages.
The recovery of uncarved logging pages in the buffer log-
ging area depends on the version of LFS. If the version of 2) IN CASE NO RESTART PAGE IS CARVED
LFS is 1.1, the Last End LSN value is used instead of the Last If the restart page is not carved, the exact range of buffer
LSN value because the buffer logging page does not store the logging and normal logging areas is unknown because
Last LSN value in Last LSN field of the header. It instead the size of the $LogFile cannot be determined. Accord-
stores the file offset value that represents the location in the ingly, the uncarved page can be recovered only in the area
$LogFile where the data of the buffer page are applied later. between the first and last of the carved logging pages. When
The file offset value is not related to records in the page the area in which the recovery operation is to be performed is
and is not used for record carving in the subsequent process. designated in this way, the subsequent recovery process is the
Therefore, if an uncarved page is found in the buffer logging same as when the restart page is carved as described above.
area, the record carving signature is used to search forward
in units of eight bytes from the end of the page to find the F. STEP 4: SORTING LOGGING PAGES
last record among those that do not go beyond the page. Once the uncarved logging pages have been recovered, and
If this record is found, its This LSN value is the Last End information on the buffer and normal logging pages has been
LSN value of the uncarved page. The record carving signature collected, the logging pages should be sorted in order for
used in this process is generated using the sequence number the following reasons: In some cases, the $LogFile record is
obtained from the carved logging pages as described above. located between logging pages, or may use multiple logging
Algorithm 4 is a pseudocode for obtaining the Last End LSN pages owing to its large size. The logging pages should be
value by searching for the last record in the page. analyzed in order in this case to acquire the data on such
$LogFile records. When the storage device formatted using
Algorithm 4 Obtaining the Last End LSN in Uncarved Pages NTFS is unmounted and remounted, the structure of circula-
1: function CheckPageForLastEndLSN(current_file_offset, seq_num[]) tion of the normal logging area of $LogFile may be broken
2: page_buffer[] ← GetPageData(current_file_offset)
3: current_position ← 0xFC8
and the order of logging pages may become mixed. The
4: detection_position ← -1 logging pages should be analyzed in order in this case to
5: while current_position >= 0 do obtain as much data as possible (see Fig. 11).
6: record_header ← page_buffer[current_position]
7: this_lsn_seq_num ← record_header.this_lsn >> offset_num_bits
8: for seq_num in seq_num[] do
9: if this_lsn_seq_num == seq_num then
10: pre_lsn_seq_num ← record_header.previous_lsn >> offset_num_bits
11: if pre_lsn_seq_num == seq_num
|| record_header.previous_lsn == 0 then
12: undo_lsn_seq_num ← record_header.undo_lsn >> offset_num_bits
13: if undo_lsn_seq_num == seq_num
|| record_header.undo_lsn == 0 then
14: if record_header.record_type == 0x1
|| record_header.record_type == 0x2 then
15: if record_header[0x31] == 0x0
|| record_header[0x33] == 0x0 then
16: if current_position + 0x30
+record_header.client_data_length
FIGURE 11. Normal circulation vs. corrupted circulation.
< 0x1000 then
17: last_end_lsn ← current_position
18: return last_end_lsn
19: end if The method for sorting logging pages in order is as fol-
20: end if lows: While sequentially searching for logging pages in the
21: end if
22: end if
normal logging area, pages whose Last LSN values increase
23: end if or remain the same are grouped together. Then, each page
24: end if group is sorted in order based on the largest Last LSN value
25: end for
26: current_position ← current_position - 0x8 in the group. Finally, the buffer logging pages are sorted
27: end while according to the version of LFS. The largest Last LSN value
28: end function
of the normal logging area is used as a reference value for
sorting (see Fig. 12).
Once the Last End LSN value has been obtained in this In case of LFS 1.1, the Last End LSN value is used
way, a virtual header is created and the Last End LSN is set because the Last LSN values of the two buffer logging
to recover the damaged header of the uncarved page. pages cannot be used. If the Last End LSN values of

111600 VOLUME 10, 2022


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

FIGURE 12. Sorting the logging pages.

both buffer logging pages are less than the reference value,
it means that the data of both buffer logging pages are
already reflected in the normal logging area. Therefore, the FIGURE 13. Carving $LogFile records.
two buffer pages are not sorted. If the Last End LSN value
of one of the two buffer logging pages is greater than the
reference value, only the corresponding buffer logging page
is sorted. Finally, if the Last End LSN of both buffer log-
ging pages are greater than the reference value, only the
buffer logging page with the larger Last End LSN value is
sorted.
In case of LFS 2.0, the buffer logging pages with Last
LSN value smaller than the reference value are excluded. The
remaining pages are sorted based on their Last LSN values
and grouped together. Finally, the group of buffer logging
pages is sorted.

G. STEP 5: CARVING $LogFile RECORDS


Once the logging pages have been sorted, record carving is
performed on them to obtain as much of the data on the
record that remains on the disk as possible, even if data on
the logging page have been damaged by an accident/disaster
or cyber terrorism.
Record carving is performed in units of logging pages,
and a record carving signature that is valid only within a
given page is generated using the sequence number obtained
through the Last LSN value of the page header. The record is
searched for by using the signature in units of eight bytes on
data on the page (see Fig. 13).
If the header and data of a carved record continue to the
FIGURE 14. Flowchart for the recovery of $LogFile data.
next page, a record carving signature that is valid only on
next page is regenerated and the record carving operation is
resumed after obtaining the remaining header and data from VI. EXPERIMENT
the next page. This section describes the development of the record carving-
The $LogFile data acquired through record carving in this based $LogFile data recovery technology based on the above
way can be used to create file-level events, such as file content, and evaluates its performance in comparison with
creation, deletion, name change, and move [17], or to track currently existing tools through experiments.
the data history of a specific file [19].
Figure 14 shows a flowchart summarizing the process of A. TOOL DEVELOPMENT
$LogFile data recovery through the record carving technique The technology for $LogFile data recovery based on record
described above. carving was developed in modular format. The module

VOLUME 10, 2022 111601


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

receives a disk or volume image file as input, and each of its the damage scenario is applied in the middle of an image file
sub-modules operates to finally output the $LogFile record (1 GB) to fit the sector layout (512 bytes).
data. (see Fig. 15). The scenarios of damage to the $LogFile data considered
were as follows: The first was the case where the $LogFile
data had not been corrupted at all. We tested whether the
$LogFile data could be recovered in case the $LogFile could
not be accessed normally due to damage to the file system.
The test image file (test01.dd) created in this scenario was
used as a baseline dataset for performance evaluation because
all $LogFile data within the file could be recovered by the
existing recovery methods and proposed recovery method in
this paper. Therefore, the existing recovery methods and pro-
posed recovery method could be confirmed whether $LogFile
data is recovered properly while adding damage scenarios
based on this image file. The second was the case where the
header of the restart page had been damaged, and was divided
into the case where the header of the first restart page had
FIGURE 15. $LogFile data recovery module. been damaged for each LFS version, one where the header
of the second restart page had been damaged for each LFS
The developed module was added to the NTFS Log Tracker version, and the case where the headers of all restart pages
v1.8 [17] and the NTFS Data Tracker v1.1 [19]. To use had been damaged. These scenarios were run to determine
the functions of the module in each tool, the path of the whether different structures of the $LogFiles for each version
disk/volume image file needs be input to the path of the source of the LFS could be identified and the $LogFile data could
file for carving as shown in Figure 16. be recovered. The third case considered was one where the
header of the buffer logging page had been damaged. In this
case, 50% of page headers in the buffer logging area for
each version of the LFS were damaged. These scenarios were
run to determine whether buffer logging pages with damaged
headers could be recovered according to the characteristics
of different buffer logging pages for each LFS version, and
whether the $LogFile data could be recovered. The fourth
scenario was one where the header of normal logging page
had been damaged. In this case, because there was no differ-
ence between the LFS versions, 50% of the page headers in
the normal logging area were assumed to have been damaged
FIGURE 16. Input interface of disk/volume image file in each tool. only for LFS 2.0. This scenario was run to determine whether
the $LogFile data could be restored by recovering the pages
Each tool uses data on the record received from the $Log- with damaged headers in the normal logging area. The fifth
File data recovery module for file event analysis, and to was the case where data on the normal logging pages had been
analyze the file data history. damaged. In this case, there was no difference between the
The NTFS Log Tracker and NTFS Data Tracker with the LFS versions, and thus only LFS 2.0 was considered. One
record carving-based $LogFile data recovery module can be record within data on pages corresponding to 50% of the
downloaded free of charge.1 normal logging area was selected and damaged. This scenario
was run to determine whether the $LogFile data could be
B. EVALUATION
recovered by carving records in the remaining area, except
We compared the performance to recover $LogFile data for the damaged area. The last scenario was one where the
between NTFS Log Tracker and NTFS Data Tracker which header and data of pages in the normal logging area had been
use the module of $LogFile data recovery developed in this simultaneously damaged, and the fourth and fifth scenarios
paper, and existing $LogFile recovery tools such as Bulk were applied at the same time. This scenario was run to
Extractor v2.0, X-Ways Forensics v19.9 SR-4. determine whether the logging pages with damaged headers
could be recovered, and whether records in the areas except
1) DATA SET
the damaged areas could be carved to recover the $LogFile
The test image files used for performance evaluation were data.
created by storing data of the $LogFile (64 MB) to which Twelve image files were used for the evaluation, and the
1 URL: https://sites.google.com/site/forensicnote/ntfs-log-tracker, LFS version, damage-related scenario, and details of each
https://sites.google.com/site/forensicnote/ntfs-data-tracker image are shown in Table 4. In each scenario, the header was

111602 VOLUME 10, 2022


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

TABLE 4. Damage to and purpose of testing each image file.

corrupted by zeroizing the initial 32-byte data, including the $LogFile, to which the damage scenario had been applied,
magic number, and the record was corrupted by zeroizing to check whether all records to be analyzed had in fact been
the header and data of the record. In case of test11.dd and carved.
test12.dd image files, one record that does not go over the Table 5 shows the results of the performance evaluation of
page is selected in each of 50% pages in normal logging area, each tool on the test image files. In the results, ‘‘O’’ means
and it is zeroized. However, depending on the page, there that all $LogFile data that could be obtained from the test
were cases where there was no record header, and only record image were successfully acquired, and ‘‘’’ means that all
data, in the page. The actual number of zero-filled records data were not acquired but those from the logging area were.
was thus 4,359 in the test11.dd image file and 4,305 in the Finally, ‘‘X’’ means that the data of the logging area were not
test12.dd image file. properly obtained owing to structural damage.
X-Ways Forensics v19.9 SR-4 successfully performed file
2) RESULT carving on the test01.dd image file, which had sustained
The performance of each tool was assessed differently no damage, but if the header of the restart page or the
because the results of recovery of each were different. In case logging page had been damaged, it was unable to carve
of X-Ways Forensics v19.9 SR-4, which performs file carvi the file norm ally. However, it successfully performed file
ng, the hash values of the carved file from the test image carving on the test11.dd image file which only records were
file were compared with those of the $LogFile applied in the damaged.
damage scenarios to check whether the file had been carved Bulk Extractor v2.0 was able to carve all pages of
normally. In case of Bulk Extractor v2.0 that performs page the test01.dd image file that had sustained no damage.
carving, the number of pages carved from the test image file In case the header of the restart page had been damaged
is compared with those in the original $LogFile to check (test02.dd∼test07.dd), the carving of all logging pages except
whether all pages had been carved normally. Finally, in case for the restart page was successful. In addition, if only records
of the NTFS Log Tracker and NTFS Data Tracker, the num- had been damaged, all logging pages were carved well. How-
ber of records that the tools analyzed in the test image file ever, if the header of the logging page had been damaged, the
was compared with the number of records remaining in the corresponding page was not carved properly.

VOLUME 10, 2022 111603


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

TABLE 5. Results of tool evaluation.

The NTFS Log Tracker and NTFS Data Tracker analyzed


all records normally, even if the header of the restart page or
the logging page had been damaged. They were also able to
normally analyze all records in test11.dd and test12.dd except
for the damaged ones.
The above results show that the proposed methods of
recovery based on record carving were able to obtain as much
$LogFile data as possible, even when the structure of the FIGURE 17. Results of analyzing $MFT.
LogFile had been damaged such that existing tools did not
work. The test images used in the performance evaluation
can be downloaded from the URL where the tool can be were analyzed, only traces of the 201404160832.vdo file,
downloaded. which was saved at 8:32 am, remained as shown in Figure 17.
However, when the $LogFile data were analyzed through
VII. CASE STUDY the NTFS Data Tracker, traces of the 201404160833.vdo and
At around 8:34 am on April 16, 2014, the ferry MV Sewol 201404160834.vdo files, corresponding to 8:33 and 8:34 am,
sank off the coast of Jindo in South Korea [8]. The ferry were found as shown in Figure 18.
was equipped with a Windows XP-based CCTV system that When the data history of 201404160832.vdo was analyzed
stored one-minute CCTV video files (.vdo) on the F drive. through the NTFS Data Tracker, the 80th data runs matched
After the accident, no CCTV videos could be obtained at 8:33 data runs of the 201404160832.vdo file recorded in $MFT,
am and 8:34 am, corresponding to the time at which the ferry as shown in Figures 19 and 20.
sank. It then became an important to determine whether the If the 201404160833.vdo, 201404160834.vdo files had
CCTV files had been intentionally deleted or had not existed been normally stored and then deleted, data runs of
in the first place. 201404160832.vdo recorded in the $MFT would have
The F drive of the CCTV system was formatted with NTFS. matched the last data runs of the history of the file as
When the metadata of the file stored in $MFT of the NTFS analyzed through the $LogFile. However, data runs of

111604 VOLUME 10, 2022


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

TABLE 6. Structure of the header of a record.

FIGURE 18. Result of analyzing $LogFile with NTFS Data Tracker.

FIGURE 19. Data runs of 201404160832.vdo from $MFT.

FIGURE 20. File data history of 201404160832.vdo from $LogFile.

201404160832.vdo in the current $MFT matched the 80th


data runs in the history of the file. These traces prove that
only up to the 80th data runs in the data history of the
201404160832.vdo file had been saved to $MFT in the disk;
then, as the system suddenly stopped, the last data runs of
201404160832.vdo in the memory were not saved to disk.
It also proves that the metadata of 201404160833.vdo and
201404160834.vdo files were not saved to $MFT in the disk.
Therefore, there were no traces of 201404160833.vdo and
201404160834.vdo not because the corresponding video files
were deleted, but because their metadata were not saved to In this paper, we studied the recovery of $LogFile records,
$MFT due to the sudden termination of the system. the metadata of NTFS, one of the most used file systems
worldwide. We identified the mechanisms of record storage
VIII. CONCLUSION through a detailed analysis of the $LogFile structure and
A file system is the basic structure that most operating sys- proposed a recovery algorithm for records without a fixed
tems use to store files, and is the most basic and important value. The proposed algorithm is applicable to all versions
element for investigators to analyze during digital forensic of $LogFile and has been released as a freeware tool. In addi-
investigations. The recovery of file system metadata enables tion, we confirmed the superiority of the proposed algorithm
investigations to proceed even when the file system structure via experiments the evaluated performance against existing
is damaged and files cannot be accessed normally, and can $LogFile recovery tools. Finally, we demonstrated the effec-
help to find additional evidence in unallocated areas. tiveness of this study with a real case solved by the recovery

VOLUME 10, 2022 111605


J. Oh et al.: Forensic Recovery of File System Metadata for Digital Forensic Investigation

of $LogFile data in a situation where the file system was [21] K. Porter, R. Nordvik, F. Toolan, and S. Axelsson, ‘‘Timestamp prefix carv-
damaged due to an accident/disaster. ing for filesystem metadata extraction,’’ Forensic Sci. Int., Digit. Invest.,
vol. 38, Sep. 2021, Art. no. 301266.
The results of this study should be useful for digital foren- [22] M. Russinovich, D. Solomon, and A. Lonescu, Windows Internals, Part 2,
sic investigators in recovering $LogFile data at the time of 6th ed. Unterschleissheim, Germany: Microsoft Press, 2012.
system shutdown in the case of an accident/disaster or a cyber [23] NTFS Documentation-File-$LogFile(2). Accessed: May 26,
2022. [Online]. Available: https://flatcap.github.io/linux-ntfs/ntfs/
terrorism so that they can analyze the causes of the event, files/logfile.html
reconstruct it, and devise methods to prevent it in the future. [24] Cyberterrorism, Accessed: May 26, 2022. [Online]. Available:
Journal data, one of file system metadata, contains impor- https://en.wikipedia.org/wiki/Cyberterrorism
[25] M. D. Martin. Tracing the Lineage of DarkSeoul. Accessed: May 26, 2022.
tant information that can be used to accurately identify events [Online]. Available: https://www.giac.org/paper/gsec/31524/tracing-
that occurred just before the system was stopped. In partic- lineage-darkseoul/126346
ular, the recovery of journal data is a very important task [26] J. Kim. 3.20 Cyber-Terror Form Recovery Perspectives. Accessed: May 26,
2022. [Online]. Available: http://forensic-proof.com/archives/5111
in situations where the file system is damaged by acci- [27] Msuhanov. How the $LogFile Works?. Accessed: May 26, 2022. [Online].
dents/disasters or cyber terrorism. In future studies, we plan Available: https://dfir.ru/2019/02/16/how-the-logfile-works
to study regarding forensic recovery of journal data in Ext4,
which is the most used file system along with NTFS.

APPENDIX
See Table 6.

REFERENCES
[1] B. Carrier, File System Forensic Analysis. Boston, MA, USA: Addison-
Wesley Professional, 2005.
JUNGHOON OH received the B.S. degree from the Division of Computer,
[2] G.-S. Cho, ‘‘NTFS directory index analysis for computer forensics,’’ in
Information Communication Engineering, Dongguk University, in 2010.
Proc. 9th Int. Conf. Innov. Mobile Internet Services Ubiquitous Comput.,
Santa Cantarina, Brazil, Jul. 2015, pp. 441–446. He is currently pursuing the Ph.D. degree with the Graduate School of
[3] K. H. Hansen and F. Toolan, ‘‘Decoding the APFS file system,’’ Digit. Information Security, Korea University. His research interests include digital
Invest., vol. 22, pp. 107–132, Sep. 2017. forensics, filesystem forensics, incident response, and artificial intelligence.
[4] K. D. Fairbanks, ‘‘An analysis of Ext4 for digital forensics,’’ Digit. Invest.,
vol. 9, pp. 118–130, Aug. 2012.
[5] X. Lin, ‘‘Deleted file recovery in NTFS,’’ in Introductory Computer Foren-
sics. Cham, Switzerland: Springer, Nov. 2018, pp. 199–210.
[6] J. Plum and A. Dewald, ‘‘Forensic APFS file recovery,’’ in Proc. 13th Int.
Conf. Availability, Rel. and Secur., Hamburg, Germany, 2018, pp. 1–10.
[7] D. Kim, J. Park, K. G. Lee, and S. Lee, ‘‘Forensic analysis of Android
phone using ext4 file system journal log,’’ in Future Information Technol-
ogy, Application, and Service (Lecture Notes in Electrical Engineering),
vol. 1. Berlin, Germany: Springer, Jun. 2012, pp. 435–446.
[8] Sinking of MV Sewol, Accessed: May 26, 2022. [Online]. Available: SANGJIN LEE received the Ph.D. degree from
https://en.wikipedia.org/wiki/Sinking_of_MV_Sewol the Department of Mathematics, Korea University,
[9] 2013 (South) Korea Cyberattack, Accessed: May 26, 2022. [Online]. Avail-
in 1994. From 1989 to 1999, he was a Senior
able: https://en.wikipedia.org/wiki/2013_South_Korea_cyberatt ack
Researcher at the Electronics and Telecommuni-
[10] J. Oh. (2013). Advanced $UsnJrnl Forensics. Accessed:
May 26, 2022. [Online]. Available: http://forensicinsight.org/wp-
cations Research Institute, South Korea. He has
content/uploads/2013/07/F-INSIGHT-Advanced-UsnJrnl-Forensics- been running the Digital Forensic Research Cen-
English.pdf ter, Korea University, since 2008, where he is cur-
[11] M. Fuchs. (2018). MFTEntryCarver. Accessed: May 26, 2022. [Online]. rently the President of the Division of Informa-
Available: https://github.com/cyb3rfox/MFTEntryCarver tion Security. He has authored or coauthored over
[12] H. Segev. (2021). INDXRipper. Accessed: May 26, 2022. [Online]. Avail- 130 papers in various archival journals and con-
able: https://github.com/harelsegev/INDXRipper ference proceedings and over 200 articles in domestic journals. His research
[13] LSoft Technologies. APFS Recovery Methodologies. Accessed: May 26, interests include digital forensics, data processing, forensic framework, and
2022. [Online]. Available: https://www.ntfs.com/apfs-recovery.htm incident response.
[14] A. Dewald and S. Seufert, ‘‘AFEIC: Advanced forensic Ext4 inode carv-
ing,’’ Digit. Invest., vol. 20, pp. 83–91, Mar. 2017.
[15] X-Ways Forensics. Accessed: May 26, 2022. [Online]. Available:
http://www.x-ways.net/winhex/manual.pdf
[16] S. Garfinkel. (2012). BulkExtractor. Accessed: May 26, 2022. [Online].
Available: https://github.com/simsong/bulk_extractor
[17] J. Oh. (2013). NTFS Log Tracker. Accessed: May 26, 2022. [Online].
Available: http://forensicinsight.org/wp-content/uploads/2013/06/F-
INSIGHT-NTFS-Log-TrackerEnglish.pdf
[18] D. Cowen. (2013). NTFS Triforce—A Deeper Look Inside
the Artifacts. Accessed: May 26, 2022. [Online]. Available:
https://www.hecfblog.com/2013/01/ntfs-triforce-deeper-look-inside.hl HYUNUK HWANG (Member, IEEE) received the Ph.D. degree from the
[19] J. Oh, S. Lee, and H. Hwang, ‘‘NTFS data tracker: Tracking file data history Department of Information Security, Chonnam National University, in 2004.
based on $LogFile,’’ Digit. Invest., vol. 39, Dec. 2021, Art. no. 301309. He is currently the Head of Department of the Attached Institute of ETRI.
[20] R. Nordvik, K. Porter, F. Toolan, S. Axelsson, and K. Franke, ‘‘Generic His research interests include digital forensics, vulnerability verification,
metadata time carving,’’ Forensic Sci. Int., Digit. Invest., vol. 33, Jul. 2020, malware, and artificial intelligence.
Art. no. 301005.

111606 VOLUME 10, 2022

You might also like