Forensic Recovery of File System Metadata For Digital Forensic Investigation
Forensic Recovery of File System Metadata For Digital Forensic Investigation
ABSTRACT File system forensics is one of the most important elements in digital forensic investigations.
To date, various file system forensic methods, such as analysis of tree structure and the recovery of deleted
file data, have been studied. Among these file system forensic methods, the recovery of file system metadata
is a key technique that makes digital forensic investigations possible by recovering metadata when it is not
possible to obtain metadata in a regular manner because the file system structure is damaged due to an
accident/disaster or cyber terrorism. Previous studies mainly focused on recovering record or entry data,
which are the basic units of metadata, using carving techniques via a fixed values or values capable of range
prediction at the beginning of the data. However, no studies have been conducted on metadata without such
fixed values or values capable of range prediction. $LogFile, which is a metadata file of the New Technology
File System (NTFS) that is one of the most used file systems at present, contains very important metadata
that provide a history of all file system operations during a specific period. However, since there is no fixed
value or a value capable of range prediction at the start position of the record, which is the basic unit of
$LogFile, there have been no studies on recovery using record units, and only recovery by file and page
have been possible. If the file header or page header of $LogFile is damaged, existing recovery methods
cannot properly recover the metadata; in such cases, a record-level recovery method is required to recover
the metadata. In this context, we investigated the mechanisms of record storage through a detailed analysis
of the $LogFile structure and proposed a recovery method for records without fixed values. Our proposed
method was implemented as a tool and verified through comparative experiments with existing forensic
tools that recover $LogFile data. The experimental results showed that the proposed recovery method was
able recover all the data that existing tools are unable to recover in situations where the $LogFile data were
damaged. The implemented tools are released free of charge to contribute digital forensic community. Finally,
we explained what important role $LogFile played in solving real-world cases and confirm the importance
of recovering $LogFile data in situations where file systems may be damaged due to accidents and disasters.
terrorism [8], [9]. Additionally, recovering file system meta- such as file-level event generation [17] and tracking file data
data from unallocated areas can help to discover additional history [19].
evidence [10]. - This paper provided a comparative performance evalu-
Previous studies mainly focused on recovering record or ation of our method relative to well-known forensic tools
entry data, which are basic units of metadata, using carving developed to recover $LogFile data.
techniques via a fixed value or a value capable of range - The implemented tools are released free of charge to
prediction at the beginning of the data [10], [11], [12], [13], contribute to the digital forensic community.
[14], [15], [16]. Indeed, no studies have been conducted on The remainder of this paper is organized as follows:
metadata without such a fixed or a value capable of range Section 2 introduces previous studies, and Section 3 explains
prediction. the need to recover $LogFile data by using specific exam-
$LogFile, a metafile of the New Technology File System ples. Section 4 introduces the internal structure of $LogFile.
(NTFS), is a log file that records data from such file system Section 5 details how to recover $LogFile data based on
transactions as file creations, deletions, data changes, and the record carving technique. Section 6 introduces the devel-
name changes. Transaction data are stored in record units, oped tools and experimentally evaluates their performance
where each record contains ‘‘redo’’ data and ‘‘undo’’ data. in comparison with currently available tools. In Section 7,
If the file system is broken due to a system error, such as we introduce a case study involving the use of the proposed
a sudden power outage, the operating system restores the tool. Finally, Section 8 summarizes the conclusions of this
file system to a normal state by using the ‘‘redo’’ data and study.
‘‘undo’’ data of records in $LogFile [1]. Therefore, $LogFile
is an important artifact that provides information about all file II. RELATED WORKS
operations in the file system and history of file data changes Previous studies on the recovery of file system metadata have
during a specific period [17], [18], [19]. been mainly conducted on NTFS, APFS, and Ext4 file sys-
If a Windows system that uses NTFS as the main file tems, which are the main file systems of the most commonly
system is abnormally stopped due to an accident/disaster or used operating systems. In addition, studies on the recovery
a cyber terrorism, analyzing $LogFile can help identify the of metadata using timestamps have also been conducted.
file that was created before the system stopped and exactly
when it stopped. However, the file system in such cases can be A. RECOVERY OF NTFS METADATA
damaged arising from physical damage to the storage device 1) $MFT
or anti-forensic behavior, thereby causing a situation where Fuchs [11] performed a study on recovering data of master
the $LogFile data cannot be acquired and analyzed normally. file table ($MFT), the metadata for storing information on
Previous studies regarding the recovery of $LogFile data files and directories in NTFS, to entry units, which is the basic
have been conducted on file unit recovery [15] and page unit storage unit. The recovery of an entry was done by carving
recovery [16]. If a part of a $LogFile, such as a file header the entry using ‘‘FILE’’, which is a fixed value located in
or a page header, is corrupted, there is a limitation in these front of the entry. Fuchs studied how to extract the file name
data recovery is not performed properly. In order to recover and timestamp from a recovered $MFT entry, and recovery
the data as much as possible even when the $LogFile data of the broken entry was also conducted. The tools of the
are damaged, the recovery of record units is necessary, but study were implemented and released as MFTEntryCarver.
since the starting position of the $LogFile record does not Additionally, Fuchs speculated that the reason why a large
have fixed values nor a value capable of range prediction, number of $MFT entries remain in unallocated areas is that a
there has been no previous research on the recovery of record part of the memory is saved to the disk in the form of a crash
units. In this context, we investigate the mechanisms of record dump file and then deleted and left.
storage through a detailed analysis of the $LogFile structure
and propose a recovery method for records without fixed 2) $USNJRNL
values. This method is implemented as a tool and verified Oh [10] researched recovering $UsnJrnl data, the metadata
through comparative experiments with existing forensic tools that stores the file change history of NTFS, to record units,
that recover $LogFile data. which is a basic storage unit. The recovery of $UsnJrnl record
This study makes the following contributions to the data was performed by carving records using the version
literature: information and the predicted value of the record size located
- We introduced an algorithm to recover $LogFile records at the beginning of the record; additional predictable field
based on record carving. To the best of our knowledge, this values were checked to minimize false positives. He added
paper is the first systematic study regarding the recovery of a researched recovery function to the NTFS Log Tracker [17]
$LogFile record data. and released it. In addition, he tested how many $UsnJrnl
- The proposed method recovers all records which are not records were recovered from unallocated area in hard disks,
damaged, even if it is not possible to obtain record data in a and it was confirmed that change journal data up to 11 months
regular manner, and makes the existing techniques available, old could be obtained.
traces of malware and file data used before the system was of the first two pages in case of LFS 1.1 and the first 32 pages
destroyed and stopped [17]. in case of LFS 2.0 [27].
However, the method of disk destruction used in the Dark- The remainder of the area excluding the buffer logging area
Seoul campaign sometimes damaged some of the $LogFile is the normal logging area. Records are sequentially stored;
data located in front of the NTFS volume. In this case, when it is full, it is once again stored at the front of the normal
the data could not be obtained using the existing recovery logging area [1]. (See. Fig. 1.)
methods [26]. A page in the logging area consists of a header and multiple
Therefore, it is important to recover $LogFile data even records. Each record stores the operation content for MFT
if $LogFile is not accessible in a normal manner through entry, and multiple records are sequentially gathered to form
file system analysis. Moreover, even if the $LogFile data are a single transaction. The first and middle records of a trans-
corrupted, it is important to recover data from an intact area. action are update records, and the last record is a commit
record [1]. The record consists of header, redo data, and undo
IV. INTERNAL STRUCTURE OF $LogFile data. This LSN field of the record header stores the LSN (Log
The $LogFile consists of pages that are 4096 bytes in size. Sequence Number) indicating its order and the Previous LSN
The pages are divided into a restart area and a logging field stores the LSN of the previous record. (See. Fig. 3.)
area [1]. (See. Fig. 1.) The details of each field of the record header are shown in
Appendix [17].
The restart area is composed of two consecutive pages The LSN is 64 bits in size, and consists of a sequence num-
at the beginning of the file, and the second page is used ber and an offset number. The sequence number is a value that
for backup [1]. Each page header in the restart area begins increases by one each time a normal logging area is toured,
with an ‘‘RSTR’’ magic number, and stores the LFS (LogFile and the offset number is the value used to obtain a record
Service) version by dividing it into a MinorVersion field and offset representing the location of the corresponding record
a MajorVersion field. Typically, LFS version 1.1 is used by based on the start location of the $LogFile. The record offset
operating systems prior to Win10 and version 2.0 is used of the corresponding record can be obtained by multiplying
for those subsequent to it [27]. Additionally, the FileSize the offset number by eight. The sequence number is located at
field in the restart page header stores the $LogFile size [23]. the top of the LSN, and has the number of bits corresponding
(See. Fig. 2.) to the SeqNumberBits field of the restart page header. The
offset number is located at the bottom [27]. Fig. 4 shows the
structure of the LSN.
The redo and undo data of the record are used to per-
The logging area is divided into a buffer logging area form operations corresponding to the redo/undo OP of the
and a normal logging area, and stores records that contain record header, and are stored with a size corresponding to the
transaction data. The buffer logging area is an area where redo/undo length of the header in the position corresponding
records are temporarily stored before being stored in the to the redo/undo offset of the header [17].
normal logging area. Therefore, the most recent records are The header of the logging page starts with an ‘‘RCRD’’
stored in this area [17]. The buffer logging area is composed magic number, and stores the largest LSN among the records,
including the record going over the page within the page, TABLE 1. Signature for record carving.
in the ‘‘Last LSN’’ field. The largest LSN among records that
do not spill over the page is stored in the ‘‘Last End LSN’’
field of the header. The detailed structure of the logging page
header is shown in Fig. 5 [17].
TABLE 2. Header signature for page carving. logging area and the normal logging area can be accurately
identified. Table 3 shows ranges of the buffer logging area and
the normal logging area based on the location of the restart
page for each LFS version. PFRP stands for the ‘‘position of
the first restart page’’ and PSRP stands for the ‘‘position of the
second restart page.’’ The size of the $LogFile can be obtained
from the restart page header.
In this way, each logging page recovered through carving in
the previous step can be classified into a buffer logging page
or a normal logging page according to the area information
in the Table 3.
The method that uses the difference between the buffer log- logging pages can be used to this end. The relevant method
ging page and the normal logging page for each LFS version is based on the fact that the offset number calculated through
is as follows: In the case of LFS 1.1, the Last LSN field in the the Last LSN of the header remains the same or continuously
header of the buffer page stores a file offset value, instead increases as the physical address increases in the normal log-
of the Last LSN value. The file offset value indicates the ging area, but not in case of the buffer logging area. Therefore,
location of the normal logging page from which data on while checking the offset number generated through the Last
the corresponding buffer logging page are copied from within LSN value of the header of the carved logging pages in turn,
the $LogFile. Thus, the file offset value of the buffer logging we check whether this value is the same or has increased—
page is always a multiple of the page size (typically 0×1000). over three pages in case of LFS 1.1 and over 33 pages in
On the contrary, the Last LSN value of the header of the case of LFS 2.0. If such continuous page flow is obtained,
normal logging page cannot be a multiple of the page size the first page is considered to be the start page of the normal
because if this is the case, the record offset obtained from the logging area. All logging pages at the back, including the
LSN is also a multiple of the page size. This means that the corresponding page, are classified as normal logging pages,
record with the LSN corresponding to the ‘‘This LSN’’ value and all logging pages at the front are classified as buffer
is located in the header, which is the starting position of the logging pages.
page. Such a case is not possible. In case of LFS 1.1, the file Algorithm 1 is a pseudocode that identifies the range of
offset value of the header of the buffer logging page is thus a each logging area and classifies logging pages when only one
multiple of page size and the Last LSN of the normal logging restart page is carved by using the methods described above.
page header is not. Therefore, the order of the restart pages
can be determined by checking whether such a difference 3) IN CASE NO RESTART PAGE IS CARVED
exists between the logging pages at the point determined to be If no restart page is carved, it is necessary to first check
the boundary between the buffer logging area and the normal the similarity in data between the buffer logging pages in
logging area. LFS 1.1. The first and second logging page located immedi-
The method used in LFS 1.1 cannot be used in case of LFS ately after the restart page are buffer logging pages in LFS 1.1.
2.0 because all buffer logging pages have the Last LSN value These two pages contain data before and after the operational
in their headers. In this case, we can use the fact that the record data of the record are applied to the $MFT entry. The data
offset value calculated from the Last End LSN of the header on both pages are thus the same except for the last record
of the buffer logging page does not indicate the area in which (see Fig. 7).
the corresponding buffer logging page is located, unlike in
case of the normal logging page. This method is possible
because the buffer logging page contains data to be copied to
the normal logging page, and thus the record offset calculated
through the Last End LSN of the header of the buffer logging
page is the address of the area of the normal logging page
to be copied later, and not the address of the area where the
current buffer logging page is located. In case of LFS 2.0,
the record offset value calculated through the Last End LSN
of the header of the buffer logging page thus does not point
to an area of its own page, but that calculated through the
Last End LSN of the header of the normal logging page does.
Therefore, as in case of LFS 1.1, the order of the restart pages
can be determined by checking whether such a difference
exists between the logging pages at the point determined to be
the boundary between the buffer logging area and the normal
logging area.
The second method to determine the order of the carved
restart pages uses the adjacent physical positions of the buffer
FIGURE 7. Buffer logging pages in LFS 1.1.
logging page and the restart page. Given that the buffer log-
ging page is located immediately after the second restart page,
the carved restart page can be determined to be the second Therefore, if the first and second logging pages that are
restart page if the logging page is carved immediately after carved are physically adjacent to each other, and 64 bytes of
the carved restart page. This method can be used regardless data after the header of the two pages are the same, the LFS
of the version of LFS. can be identified as version 1.1. In this case, the locations
If the order of restart pages cannot be determined through of the two buffer logging pages can be known, and thus all
the above two methods, the range of each logging area should logging pages carved behind the two buffer logging pages can
be identified regardless of this order. The continuity of normal be classified as normal logging pages.
Algorithm 1 Determining the Structure of $ LogFile When of the restart page. Therefore, a method to obtain the Off-
One Restart Page Is Carved setNumberBits value is required even in this case. One such
1: if LFS version == 1.1 then method uses the difference in the sequence numbers of the
2: if there are carved logging pages at +0x3000 and +0x4000 from
carved_restart_page_addr then logging pages when there is a circulation in the normal log-
3: result_1 ← 0x3000_page_last_lsn % 0x1000 ging area. The difference between the sequence number from
4: result_2 ← 0x4000_page_last_lsn % 0x1000
5: if result_1 == 0 && result_2 != 0 then
the smallest Last LSN and that from the largest Last LSN is
6: carved restart page is first restart page one in this case (see Fig. 8).
7: buffer_logging_area_start_addr ← carved_restart_page_addr+0x2000
8: normal_logging_area_start_addr ← carved_restart_page_addr+0x4000
9: else
10: carved restart page is second restart page
11: buffer_logging_area_start_addr ← carved_restart_page_addr+0x1000
12: normal_logging_area_start_addr ← carved_restart_page_addr+0x3000
13: end if
14: else
15: if first carved logging page is located after carved restart page then
16: carved restart page is second restart page
17: buffer_logging_area_start_offset ← carved_restart_page_offset+0x1000
18: normal_logging_area_start_offset ← carved_restart_page_offset+0x3000 FIGURE 8. Status of sequence numbers in case of rotation.
19: else
20: Find 3 consecutive pages with the same or increasing offset_number
from last_lsn Therefore, with respect to the smallest and the largest Last
21: normal_logging_area_start_addr ← detected_page_addr LSN values, we calculate the difference between the sequence
22: all carved pages are buffer pages before normal logging area
23: end if numbers in each case while increasing the OffsetNumberBits
24: end if value from 1 to 63. When the difference becomes one for
25: else if LFS version == 2.0 then
the first time, the corresponding OffsetNumberBits value is
26: if there is carved logging pages which located at +0x21000 from
carved_restart_page_addr then determined to be that of the $LogFile.
27: temp_value ← 0x21000_page_last_end_lsn << sequence_number_bits Another method can be used when there is no circulation in
28: offset_number ← temp_value >> sequence_number_bits
29: record_offset ← offset_number × 8 the normal logging area. This method is based on the fact that
30: if record_offset > 0x22000 && record_offset < 0x23000 then the position of the record according to the starting position
31: carved restart page is second restart page
32: buffer_logging_area_start_addr ← carved_restart_page_addr+0x1000
of the $LogFile can be obtained by multiplying the offset
33: normal_logging_area_start_addr ← carved_restart_page_addr+0x21000 number obtained from the This LSN value of the record by
34: else eight. The OffsetNumberBits value is adjusted according to
35: carved restart page is first restart page
36: buffer_logging_area_start_addr ← carved_restart_page_addr+0x2000 the size of the $LogFile. And the record offset, which is
37: normal_logging_area_start_addr ← carved_restart_page_addr+0x22000 eight times the maximum offset number that can be expressed
38: end if
39: else
with the OffsetNumberBits value, must be at least equal to or
40: if first carved logging page is located after carved restart page then greater than the file size in order to express the position of the
41: carved restart page is second restart page
last record at the end of the file (see Fig. 9).
42: buffer_logging_area_start_addr ← carved_restart_page_addr+0x1000
43: normal_logging_area_start_addr ← carved_restart_page_addr+0x21000
44: else
45: Find 33 consecutive pages with the same or increasing offset_number
from last_lsn
46: normal_logging_area_start_addr ← detected_page_addr
47: all carved pages are buffer logging pages before normal logging area
48: end if
49: end if
50: end if
FIGURE 9. Relationship between the size of the $LogFile and the offset
number.
If each logging area cannot be identified using the above
method, the method that uses continuity of the normal logging The method used to obtain the OffsetNumberBits value is
pages, described above, should be used. While checking the as follows: The estimated size of the $LogFile is first obtained
offset number value generated through the Last LSN value by adding 0×3000, which is the sum of the sizes of the restart
of the header of the carved logging pages in turn, we check area and the last logging page, to the difference between
whether this value is the same or has increased over 33 pages. the location of the first and the last carved logging pages
If such continuous page flow is obtained, the first page of the (see Fig. 10).
flow is considered to be the start page of the normal logging Then, the maximum offset number is obtained by dividing
area. the estimated size of the $LogFile by eight, and its value is
If the restart page is not carved, the sequence number changed to a binary number. The position of the highest 1 in
cannot be acquired and the record carving signature cannot the binary number is identified. The value of OffsetNum-
be generated through the OffsetNumberBits value because berBits represents the distance between the bottom and the
the SeqNumberBits value cannot be obtained from the header position of the highest 1. Algorithm 2 is a pseudocode for
carved logging pages. The page that has not been carved
within the area is then searched for. If an uncarved page is
found, a record carving signature is created based on the
sequence number collected earlier. The generated signature
is then used to search forward in units of eight bytes from
the end of the page to find the last record, including records
that go beyond the page. If the last record is found, its This
LSN value is the Last LSN value of the uncarved page.
FIGURE 10. Calculating the size of the $LogFile. Algorithm 3 is a pseudocode used to obtain the Last LSN
value by searching for the last record in the page.
finding the OffsetNumberBits value of the $LogFile when the Algorithm 3 Obtaining the Last LSN From Uncarved Pages
restart page is not carved. 1: function CheckPageForLastLSN(current_file_offset, seq_num[])
2: page_buffer[] ← GetPageData(current_file_offset)
3: current_position ← 0xFC8
Algorithm 2 Find the OffsetNumberBits of $LogFile 4: detection_position ← -1
1: if there is a rotation in normal logging area then 5: find_flag ← FALSE
2: last_lsn_1 ← the smallest last_lsn in normal logging area 6: page_seq_num ← -1
3: last_lsn_2 ← the biggest last_lsn in normal logging area 7: while current_position >= 0 do
4: find_flag ← FALSE 8: record_header ← page_buffer[current_position]
5: for i=1;i<64;i++ do 9: this_lsn_seq_num ← record_header.this_lsn >> offset_num_bits
6: seq_num_1 ← last_lsn_1 >> i 10: for seq_num in seq_num[] do
7: seq_num_2 ← last_lsn_2 >> i 11: if this_lsn_seq_num == seq_num then
8: if seq_num_2 - seq_num_1 == 1 then 12: pre_lsn_seq_num ← record_header.previous_lsn >> offset_num_bits
9: find_flag ← TRUE 13: if pre_lsn_seq_num == seq_num
10: offset_number_bits ← i || record_header.previous_lsn == 0 then
11: break 14: undo_lsn_seq_num ← record_header.undo_lsn >> offset_num_bits
12: end if 15: if undo_lsn_seq_num == seq_num
13: end for || record_header.undo_lsn == 0 then
14: if find_flag == FALSE then 16: if record_header.record_type == 0x1
15: offset_number_bits ← 24 || record_header.record_type == 0x2 then
16: end if 17: if record_header[0x31] == 0x0
17: else || record_header[0x33] == 0x0 then
18: file_offset_1 ← file offset of first carved logging page 18: detection_position ← current_position
19: file_offset_2 ← file offset of last carved logging page 19: find_flag ← TRUE
20: file_size ← (file_offset_2 - file_offset_1) + 0x3000 20: page_seq_num ← seq_num
21: maximum_offset_number ← file_size / 8 21: break
22: binary_arr[64] ← ConvertUint64ToBinaryArray(maximum_offset_number) 22: end if
23: for i=63;i»0;i- do 23: end if
24: if binary_arr[i] == 1 then 24: end if
25: offset_number_bits ← i 25: end if
26: break 26: end if
27: end if 27: end for
28: end for 28: if find_flag == TRUE then
29: end if 29: break
30: end if
31: current_position ← current_position - 0x8
If the OffsetNumberBits value cannot be obtained through 32: end while
the two methods described above, it is assigned the value 24, 33: if detection_position >= 0 then
34: record_header ← page_buffer[detection_position]
which corresponds to the most common size of the 35: next_record_position ← record_header.client_data_length + detection_position
$LogFile (64 MB). + 0x30
36: if next_record_position >= 0x1000 then
37: last_lsn ← record_header.this_lsn
E. STEP 3: RECOVERING UNCARVED LOGGING PAGES 38: return last_lsn
39: else
Once the structure of the carved logging pages has been iden- 40: record_header ← page_buffer[next_record_position]
tified, logging pages that have not been carved due to damage 41: this_lsn_seq_num ← record_header.this_lsn >> offset_num_bits
to the page header should be recovered. This is because the 42: if this_lsn_seq_num == page_seq_num then
43: last_lsn ← record_header.this_lsn
$LogFile data are likely to persist even in logging pages that 44: return last_lsn
have not been carved. 45: else
46: record_header ← page_buffer[detection_position]
47: last_lsn ← record_header.this_lsn
1) IN CASE RESTART PAGE IS CARVED 48: return last_lsn
49: end if
Once the restart page has been carved, the ranges of the buffer 50: end if
logging area and the normal logging area can be accurately 51: end if
identified through Table 3. Uncarved pages for each logging 52: end function
header of the uncarved page. If the last record is not found In case of LFS 2.0, virtual headers of the uncarved logging
and the last LSN value is not obtained, and the virtual header pages are created in the same way as in the normal logging
uses the last LSN value of the page located directly in front area to execute recovery because the Last LSN value is used in
of it. Normal logging pages recovered by creating a virtual the header of the buffer logging page. As such, the recovered
header in this process are added to the list of normal logging buffer logging pages in each LFS version are added to the list
pages. of buffer logging pages.
The recovery of uncarved logging pages in the buffer log-
ging area depends on the version of LFS. If the version of 2) IN CASE NO RESTART PAGE IS CARVED
LFS is 1.1, the Last End LSN value is used instead of the Last If the restart page is not carved, the exact range of buffer
LSN value because the buffer logging page does not store the logging and normal logging areas is unknown because
Last LSN value in Last LSN field of the header. It instead the size of the $LogFile cannot be determined. Accord-
stores the file offset value that represents the location in the ingly, the uncarved page can be recovered only in the area
$LogFile where the data of the buffer page are applied later. between the first and last of the carved logging pages. When
The file offset value is not related to records in the page the area in which the recovery operation is to be performed is
and is not used for record carving in the subsequent process. designated in this way, the subsequent recovery process is the
Therefore, if an uncarved page is found in the buffer logging same as when the restart page is carved as described above.
area, the record carving signature is used to search forward
in units of eight bytes from the end of the page to find the F. STEP 4: SORTING LOGGING PAGES
last record among those that do not go beyond the page. Once the uncarved logging pages have been recovered, and
If this record is found, its This LSN value is the Last End information on the buffer and normal logging pages has been
LSN value of the uncarved page. The record carving signature collected, the logging pages should be sorted in order for
used in this process is generated using the sequence number the following reasons: In some cases, the $LogFile record is
obtained from the carved logging pages as described above. located between logging pages, or may use multiple logging
Algorithm 4 is a pseudocode for obtaining the Last End LSN pages owing to its large size. The logging pages should be
value by searching for the last record in the page. analyzed in order in this case to acquire the data on such
$LogFile records. When the storage device formatted using
Algorithm 4 Obtaining the Last End LSN in Uncarved Pages NTFS is unmounted and remounted, the structure of circula-
1: function CheckPageForLastEndLSN(current_file_offset, seq_num[]) tion of the normal logging area of $LogFile may be broken
2: page_buffer[] ← GetPageData(current_file_offset)
3: current_position ← 0xFC8
and the order of logging pages may become mixed. The
4: detection_position ← -1 logging pages should be analyzed in order in this case to
5: while current_position >= 0 do obtain as much data as possible (see Fig. 11).
6: record_header ← page_buffer[current_position]
7: this_lsn_seq_num ← record_header.this_lsn >> offset_num_bits
8: for seq_num in seq_num[] do
9: if this_lsn_seq_num == seq_num then
10: pre_lsn_seq_num ← record_header.previous_lsn >> offset_num_bits
11: if pre_lsn_seq_num == seq_num
|| record_header.previous_lsn == 0 then
12: undo_lsn_seq_num ← record_header.undo_lsn >> offset_num_bits
13: if undo_lsn_seq_num == seq_num
|| record_header.undo_lsn == 0 then
14: if record_header.record_type == 0x1
|| record_header.record_type == 0x2 then
15: if record_header[0x31] == 0x0
|| record_header[0x33] == 0x0 then
16: if current_position + 0x30
+record_header.client_data_length
FIGURE 11. Normal circulation vs. corrupted circulation.
< 0x1000 then
17: last_end_lsn ← current_position
18: return last_end_lsn
19: end if The method for sorting logging pages in order is as fol-
20: end if lows: While sequentially searching for logging pages in the
21: end if
22: end if
normal logging area, pages whose Last LSN values increase
23: end if or remain the same are grouped together. Then, each page
24: end if group is sorted in order based on the largest Last LSN value
25: end for
26: current_position ← current_position - 0x8 in the group. Finally, the buffer logging pages are sorted
27: end while according to the version of LFS. The largest Last LSN value
28: end function
of the normal logging area is used as a reference value for
sorting (see Fig. 12).
Once the Last End LSN value has been obtained in this In case of LFS 1.1, the Last End LSN value is used
way, a virtual header is created and the Last End LSN is set because the Last LSN values of the two buffer logging
to recover the damaged header of the uncarved page. pages cannot be used. If the Last End LSN values of
both buffer logging pages are less than the reference value,
it means that the data of both buffer logging pages are
already reflected in the normal logging area. Therefore, the FIGURE 13. Carving $LogFile records.
two buffer pages are not sorted. If the Last End LSN value
of one of the two buffer logging pages is greater than the
reference value, only the corresponding buffer logging page
is sorted. Finally, if the Last End LSN of both buffer log-
ging pages are greater than the reference value, only the
buffer logging page with the larger Last End LSN value is
sorted.
In case of LFS 2.0, the buffer logging pages with Last
LSN value smaller than the reference value are excluded. The
remaining pages are sorted based on their Last LSN values
and grouped together. Finally, the group of buffer logging
pages is sorted.
receives a disk or volume image file as input, and each of its the damage scenario is applied in the middle of an image file
sub-modules operates to finally output the $LogFile record (1 GB) to fit the sector layout (512 bytes).
data. (see Fig. 15). The scenarios of damage to the $LogFile data considered
were as follows: The first was the case where the $LogFile
data had not been corrupted at all. We tested whether the
$LogFile data could be recovered in case the $LogFile could
not be accessed normally due to damage to the file system.
The test image file (test01.dd) created in this scenario was
used as a baseline dataset for performance evaluation because
all $LogFile data within the file could be recovered by the
existing recovery methods and proposed recovery method in
this paper. Therefore, the existing recovery methods and pro-
posed recovery method could be confirmed whether $LogFile
data is recovered properly while adding damage scenarios
based on this image file. The second was the case where the
header of the restart page had been damaged, and was divided
into the case where the header of the first restart page had
FIGURE 15. $LogFile data recovery module. been damaged for each LFS version, one where the header
of the second restart page had been damaged for each LFS
The developed module was added to the NTFS Log Tracker version, and the case where the headers of all restart pages
v1.8 [17] and the NTFS Data Tracker v1.1 [19]. To use had been damaged. These scenarios were run to determine
the functions of the module in each tool, the path of the whether different structures of the $LogFiles for each version
disk/volume image file needs be input to the path of the source of the LFS could be identified and the $LogFile data could
file for carving as shown in Figure 16. be recovered. The third case considered was one where the
header of the buffer logging page had been damaged. In this
case, 50% of page headers in the buffer logging area for
each version of the LFS were damaged. These scenarios were
run to determine whether buffer logging pages with damaged
headers could be recovered according to the characteristics
of different buffer logging pages for each LFS version, and
whether the $LogFile data could be recovered. The fourth
scenario was one where the header of normal logging page
had been damaged. In this case, because there was no differ-
ence between the LFS versions, 50% of the page headers in
the normal logging area were assumed to have been damaged
FIGURE 16. Input interface of disk/volume image file in each tool. only for LFS 2.0. This scenario was run to determine whether
the $LogFile data could be restored by recovering the pages
Each tool uses data on the record received from the $Log- with damaged headers in the normal logging area. The fifth
File data recovery module for file event analysis, and to was the case where data on the normal logging pages had been
analyze the file data history. damaged. In this case, there was no difference between the
The NTFS Log Tracker and NTFS Data Tracker with the LFS versions, and thus only LFS 2.0 was considered. One
record carving-based $LogFile data recovery module can be record within data on pages corresponding to 50% of the
downloaded free of charge.1 normal logging area was selected and damaged. This scenario
was run to determine whether the $LogFile data could be
B. EVALUATION
recovered by carving records in the remaining area, except
We compared the performance to recover $LogFile data for the damaged area. The last scenario was one where the
between NTFS Log Tracker and NTFS Data Tracker which header and data of pages in the normal logging area had been
use the module of $LogFile data recovery developed in this simultaneously damaged, and the fourth and fifth scenarios
paper, and existing $LogFile recovery tools such as Bulk were applied at the same time. This scenario was run to
Extractor v2.0, X-Ways Forensics v19.9 SR-4. determine whether the logging pages with damaged headers
could be recovered, and whether records in the areas except
1) DATA SET
the damaged areas could be carved to recover the $LogFile
The test image files used for performance evaluation were data.
created by storing data of the $LogFile (64 MB) to which Twelve image files were used for the evaluation, and the
1 URL: https://sites.google.com/site/forensicnote/ntfs-log-tracker, LFS version, damage-related scenario, and details of each
https://sites.google.com/site/forensicnote/ntfs-data-tracker image are shown in Table 4. In each scenario, the header was
corrupted by zeroizing the initial 32-byte data, including the $LogFile, to which the damage scenario had been applied,
magic number, and the record was corrupted by zeroizing to check whether all records to be analyzed had in fact been
the header and data of the record. In case of test11.dd and carved.
test12.dd image files, one record that does not go over the Table 5 shows the results of the performance evaluation of
page is selected in each of 50% pages in normal logging area, each tool on the test image files. In the results, ‘‘O’’ means
and it is zeroized. However, depending on the page, there that all $LogFile data that could be obtained from the test
were cases where there was no record header, and only record image were successfully acquired, and ‘‘’’ means that all
data, in the page. The actual number of zero-filled records data were not acquired but those from the logging area were.
was thus 4,359 in the test11.dd image file and 4,305 in the Finally, ‘‘X’’ means that the data of the logging area were not
test12.dd image file. properly obtained owing to structural damage.
X-Ways Forensics v19.9 SR-4 successfully performed file
2) RESULT carving on the test01.dd image file, which had sustained
The performance of each tool was assessed differently no damage, but if the header of the restart page or the
because the results of recovery of each were different. In case logging page had been damaged, it was unable to carve
of X-Ways Forensics v19.9 SR-4, which performs file carvi the file norm ally. However, it successfully performed file
ng, the hash values of the carved file from the test image carving on the test11.dd image file which only records were
file were compared with those of the $LogFile applied in the damaged.
damage scenarios to check whether the file had been carved Bulk Extractor v2.0 was able to carve all pages of
normally. In case of Bulk Extractor v2.0 that performs page the test01.dd image file that had sustained no damage.
carving, the number of pages carved from the test image file In case the header of the restart page had been damaged
is compared with those in the original $LogFile to check (test02.dd∼test07.dd), the carving of all logging pages except
whether all pages had been carved normally. Finally, in case for the restart page was successful. In addition, if only records
of the NTFS Log Tracker and NTFS Data Tracker, the num- had been damaged, all logging pages were carved well. How-
ber of records that the tools analyzed in the test image file ever, if the header of the logging page had been damaged, the
was compared with the number of records remaining in the corresponding page was not carved properly.
of $LogFile data in a situation where the file system was [21] K. Porter, R. Nordvik, F. Toolan, and S. Axelsson, ‘‘Timestamp prefix carv-
damaged due to an accident/disaster. ing for filesystem metadata extraction,’’ Forensic Sci. Int., Digit. Invest.,
vol. 38, Sep. 2021, Art. no. 301266.
The results of this study should be useful for digital foren- [22] M. Russinovich, D. Solomon, and A. Lonescu, Windows Internals, Part 2,
sic investigators in recovering $LogFile data at the time of 6th ed. Unterschleissheim, Germany: Microsoft Press, 2012.
system shutdown in the case of an accident/disaster or a cyber [23] NTFS Documentation-File-$LogFile(2). Accessed: May 26,
2022. [Online]. Available: https://flatcap.github.io/linux-ntfs/ntfs/
terrorism so that they can analyze the causes of the event, files/logfile.html
reconstruct it, and devise methods to prevent it in the future. [24] Cyberterrorism, Accessed: May 26, 2022. [Online]. Available:
Journal data, one of file system metadata, contains impor- https://en.wikipedia.org/wiki/Cyberterrorism
[25] M. D. Martin. Tracing the Lineage of DarkSeoul. Accessed: May 26, 2022.
tant information that can be used to accurately identify events [Online]. Available: https://www.giac.org/paper/gsec/31524/tracing-
that occurred just before the system was stopped. In partic- lineage-darkseoul/126346
ular, the recovery of journal data is a very important task [26] J. Kim. 3.20 Cyber-Terror Form Recovery Perspectives. Accessed: May 26,
2022. [Online]. Available: http://forensic-proof.com/archives/5111
in situations where the file system is damaged by acci- [27] Msuhanov. How the $LogFile Works?. Accessed: May 26, 2022. [Online].
dents/disasters or cyber terrorism. In future studies, we plan Available: https://dfir.ru/2019/02/16/how-the-logfile-works
to study regarding forensic recovery of journal data in Ext4,
which is the most used file system along with NTFS.
APPENDIX
See Table 6.
REFERENCES
[1] B. Carrier, File System Forensic Analysis. Boston, MA, USA: Addison-
Wesley Professional, 2005.
JUNGHOON OH received the B.S. degree from the Division of Computer,
[2] G.-S. Cho, ‘‘NTFS directory index analysis for computer forensics,’’ in
Information Communication Engineering, Dongguk University, in 2010.
Proc. 9th Int. Conf. Innov. Mobile Internet Services Ubiquitous Comput.,
Santa Cantarina, Brazil, Jul. 2015, pp. 441–446. He is currently pursuing the Ph.D. degree with the Graduate School of
[3] K. H. Hansen and F. Toolan, ‘‘Decoding the APFS file system,’’ Digit. Information Security, Korea University. His research interests include digital
Invest., vol. 22, pp. 107–132, Sep. 2017. forensics, filesystem forensics, incident response, and artificial intelligence.
[4] K. D. Fairbanks, ‘‘An analysis of Ext4 for digital forensics,’’ Digit. Invest.,
vol. 9, pp. 118–130, Aug. 2012.
[5] X. Lin, ‘‘Deleted file recovery in NTFS,’’ in Introductory Computer Foren-
sics. Cham, Switzerland: Springer, Nov. 2018, pp. 199–210.
[6] J. Plum and A. Dewald, ‘‘Forensic APFS file recovery,’’ in Proc. 13th Int.
Conf. Availability, Rel. and Secur., Hamburg, Germany, 2018, pp. 1–10.
[7] D. Kim, J. Park, K. G. Lee, and S. Lee, ‘‘Forensic analysis of Android
phone using ext4 file system journal log,’’ in Future Information Technol-
ogy, Application, and Service (Lecture Notes in Electrical Engineering),
vol. 1. Berlin, Germany: Springer, Jun. 2012, pp. 435–446.
[8] Sinking of MV Sewol, Accessed: May 26, 2022. [Online]. Available: SANGJIN LEE received the Ph.D. degree from
https://en.wikipedia.org/wiki/Sinking_of_MV_Sewol the Department of Mathematics, Korea University,
[9] 2013 (South) Korea Cyberattack, Accessed: May 26, 2022. [Online]. Avail-
in 1994. From 1989 to 1999, he was a Senior
able: https://en.wikipedia.org/wiki/2013_South_Korea_cyberatt ack
Researcher at the Electronics and Telecommuni-
[10] J. Oh. (2013). Advanced $UsnJrnl Forensics. Accessed:
May 26, 2022. [Online]. Available: http://forensicinsight.org/wp-
cations Research Institute, South Korea. He has
content/uploads/2013/07/F-INSIGHT-Advanced-UsnJrnl-Forensics- been running the Digital Forensic Research Cen-
English.pdf ter, Korea University, since 2008, where he is cur-
[11] M. Fuchs. (2018). MFTEntryCarver. Accessed: May 26, 2022. [Online]. rently the President of the Division of Informa-
Available: https://github.com/cyb3rfox/MFTEntryCarver tion Security. He has authored or coauthored over
[12] H. Segev. (2021). INDXRipper. Accessed: May 26, 2022. [Online]. Avail- 130 papers in various archival journals and con-
able: https://github.com/harelsegev/INDXRipper ference proceedings and over 200 articles in domestic journals. His research
[13] LSoft Technologies. APFS Recovery Methodologies. Accessed: May 26, interests include digital forensics, data processing, forensic framework, and
2022. [Online]. Available: https://www.ntfs.com/apfs-recovery.htm incident response.
[14] A. Dewald and S. Seufert, ‘‘AFEIC: Advanced forensic Ext4 inode carv-
ing,’’ Digit. Invest., vol. 20, pp. 83–91, Mar. 2017.
[15] X-Ways Forensics. Accessed: May 26, 2022. [Online]. Available:
http://www.x-ways.net/winhex/manual.pdf
[16] S. Garfinkel. (2012). BulkExtractor. Accessed: May 26, 2022. [Online].
Available: https://github.com/simsong/bulk_extractor
[17] J. Oh. (2013). NTFS Log Tracker. Accessed: May 26, 2022. [Online].
Available: http://forensicinsight.org/wp-content/uploads/2013/06/F-
INSIGHT-NTFS-Log-TrackerEnglish.pdf
[18] D. Cowen. (2013). NTFS Triforce—A Deeper Look Inside
the Artifacts. Accessed: May 26, 2022. [Online]. Available:
https://www.hecfblog.com/2013/01/ntfs-triforce-deeper-look-inside.hl HYUNUK HWANG (Member, IEEE) received the Ph.D. degree from the
[19] J. Oh, S. Lee, and H. Hwang, ‘‘NTFS data tracker: Tracking file data history Department of Information Security, Chonnam National University, in 2004.
based on $LogFile,’’ Digit. Invest., vol. 39, Dec. 2021, Art. no. 301309. He is currently the Head of Department of the Attached Institute of ETRI.
[20] R. Nordvik, K. Porter, F. Toolan, S. Axelsson, and K. Franke, ‘‘Generic His research interests include digital forensics, vulnerability verification,
metadata time carving,’’ Forensic Sci. Int., Digit. Invest., vol. 33, Jul. 2020, malware, and artificial intelligence.
Art. no. 301005.