Chapter 5 File Managment
Chapter 5 File Managment
File System
7.1. Introduction
In computing, a file system (or file system) is a type of data store which can be used to store,
retrieve and update a set of files. The term could refer to the abstract data structures used to
define files, or to the actual software or firmware components that implement the abstract ideas.
The file system manages access to both the content of files and the metadata about those files. It
is responsible for arranging storage space; reliability, efficiency, and tuning with regard to the
physical storage medium are important design considerations.
The file system is responsible for organizing files and directories, and keeping track of which
areas of the media belong to which file and which are not being used.
File system fragmentation occurs when unused space or single files are not contiguous. As a
file system is used, files are created, modified and deleted. When a file is created the file system
allocates space for the data. Some file systems permit or require specifying an initial space
allocation and subsequent incremental allocations as the file grows. As files are deleted the space
they were allocated eventually is considered available for use by other files. This creates
alternating used and unused areas of various sizes. This is free space fragmentation. When a file
is created and there is not an area of contiguous.
7.1.1. File
A file is a named collection of related information that is recorded on secondary storage such as
magnetic disks, magnetic tapes and optical disks. In general, a file is a sequence of bits, bytes,
lines or records whose meaning is defined by the files creator and user.
Page 1 of 16
7.1.2. File Structure
A File Structure should be according to a required format that the operating system can
understand.
File type refers to the ability of the operating system to distinguish different types of file such as
text files, source file, executable file, object file and etc. Many operating systems support many
types of files. Operating system like MS-DOS and UNIX have the following types of files −
1. Ordinary files
These are the files that contain user information.
These may have text, databases or executable program.
The user can apply various operations on such files like add, modify, delete or even
remove the entire file.
2. Directory files
These files contain list of file names and other information related to these files.
3. Special files
These files are also known as device files. These files represent physical device like disks,
terminals, printers, networks, tape drive etc. The special files are of two types:
Page 2 of 16
Block special files − data is handled in blocks as in the case of disks and tapes.
The file extension indicates what kind of file it is: its “format” or “type.” For instance, the file
extension .exe refers to an "executable" file--in other words, an application. The file
extension .html indicates a Hypertext Markup Language file--in other words, a web page. In My
Computer or Windows Explorer, double-clicking on a file will open it if the file extension is
correct. Some common file extensions:
.doc – Microsoft Word document
.wpd – WordPerfect Document
.txt – Plain text document
.htm, .html – A plain text document with added code that enables it to be read on the
World Wide Web
.jpg – An image file
.gif – An image file
.exe – An executable file, meaning an application/program/piece of software
Filename: The name of a file, including or not including its file extension.
File Size: The size of a file measured in bytes. A floppy disk holds about 1.5 Mb; a Zip disk
holds 100 Mb or 250 Mb; a CD holds about 800 Mb; a DVD holds about 4,700 Mb.
7.2. File Systems
Provide a means to store data organized as files as well as a collection of functions that
can be performed on files
Maintain a set of attributes associated with the file
Typical operations include:
Create
Delete
Open
Close
Read
Write
Page 3 of 16
7.2.1. File Attributes
Information about files is kept in the directory structure, which is maintained on the disk.
Four terms are commonly used when discussing files. These are Files, record, file, and database.
Files can be structured as a collection of records or as a sequence of bytes. UNIX, Linux,
Windows, Mac OS’s considers files as a sequence of bytes.
Fields
basic element of data
contains a single value
fixed or variable length
File
collection of similar records
treated as a single entity
may be referenced by name
access control restrictions usually apply at the file level
Database
collection of related data
relationships among elements of data are explicit
designed for use by a number of different applications
consists of one or more types of files
Record
Collection of related fields that can be treated as a unit by some application program
Page 4 of 16
One field is the key – a unique identifier
A sequential access is that in which the records are accessed in some sequence, i.e., the
information in the file is processed in order, one record after the other. This access method is the
most primitive one. Example: Compilers usually access files in this fashion.
The records need not be in any sequence within the file and they need not be in
adjacent locations on the storage medium.
Files are allocated disk spaces by operating system. Operating systems deploy following three
main ways to allocate disk space to files.
Contiguous Allocation
Page 5 of 16
Linked Allocation
Indexed Allocation
A disk file system takes advantages of the ability of disk storage media to randomly address data
in a short amount of time. Additional considerations include the speed of accessing data
Page 6 of 16
following that initially requested and the anticipation that the following data may also be
requested. This permits multiple users (or processes) access to various data on the disk without
regard to the sequential location of the data. Some disk file systems are journaling file
systems or versioning file systems.
A network file system is a file system that acts as a client for a remote file access protocol,
providing access to files on a server. Examples of network file systems include clients for
the NFS,AFS, SMB protocols, and file-system-like clients for FTP and WebDAV.
A special file system presents non-file elements of an operating system as files so they can be
acted on using file system APIs. This is most commonly done in Unix-like operating systems,
but devices are given file names in some non-Unix-like operating systems as well.
Page 7 of 16
Naming - collisions
Grouping capability – not possible except by user
3.Three Level Directory
Advantages
Efficient searching – current working directory
Naming - same file name in a different directory
Grouping capability
Disadvantages
Structural complexity
7.6.1. Operations Performed on Directory
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
Organize a directory (logically) based on the following criteria:
Efficiency – locating a file quickly
Naming – convenient to users
Two users can have same name for different files
The same file can have several different names
Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)
7.7. File-System techniques
7.7.1. File System mounting
File System must be mounted before it can be available to process on the system
The OS is given the name of the device and the mount point (location within file
structure at which files attach).
OS verifies that the device contains a valid file system.
OS notes in its directory structure that a file system is mounted at the specified
mount point.
Page 8 of 16
7.7.2. Allocation of Disk Space
Low level access methods depend upon the disk allocation scheme used to store file data
1. Contiguous Allocation
Each file occupies a set of contiguous blocks on the disk.
Simple - only starting location (block #) and length
(number of blocks) are required.
Suits sequential or direct access.
Fast (very little head movement) and easy to recover in the
event of system crash.
Problems :- Wasteful of space
2. Linked List Allocation
Each file is a linked list of disk blocks
Blocks may be scattered anywhere on the disk.
Each node in list can be a fixed size physical block or a contiguous
collection of blocks.
3. Block Allocation
Simple - need only starting address.
Free-space management system - space efficient.
Can grow in middle and at ends. No estimation of size necessary.
7.7.3. Disk Defragmentation
Re-organize blocks in disk so that file is (mostly) contiguous
Link or FAT organization preserved
Purpose:
To reduce disk arm movement during sequential
accesses
7.8. File Management Systems
A file management system is that set of system software that provides services to users and
applications in the use of files. Following objectives for a file management system:
To meet the data management needs and requirements of the user
To guarantee, to the extent possible, that the data in the file are valid.
To optimize performance
Page 9 of 16
To provide I/O support for a variety of storage device types.
To minimize or eliminate the potential for lost or destroyed data.
To provide a standardized set of I/O interface routines to use processes.
Examples of file systems include ISO 9660, CP/M, MS-DOS, Windows 98, and UNIX. These
differ in many ways, including how they keep track of which blocks go with which file, directory
structure, and management of free disk space.
File Protection
File owner/creator should be able to control
what can be done
by whom
Types of access
read
write
execute
append
delete
list
Access Lists and Groups
Page 10 of 16
N.B:- The Boot Process
A memory-mapped file contains the contents of a file in virtual memory. This mapping between
a file and memory space enables an application, including multiple processes, to modify the file
by reading and writing directly to the memory.
There are two types of memory-mapped files:
1. Persisted memory-mapped files: Persisted files are memory-mapped files that are associated
with a source file on a disk. When the last process has finished working with the file, the data is
saved to the source file on the disk. These memory-mapped files are suitable for working with
extremely large source files.
2. Non-persisted memory-mapped files: Non-persisted files are memory-mapped files that are
not associated with a file on a disk. When the last process has finished working with the file, the
data is lost and the file is reclaimed by garbage collection. These files are suitable for creating
shared memory for inter-process communications (IPC).
The term backup has become synonymous with data protection over the past several decades and
may be accomplished via several methods. Backup software applications reduce the complexity
of performing backup and recovery operations. Backing up data is only one part of a disaster
protection plan, and may not provide the level of data and disaster recovery capabilities desired
Page 11 of 16
without careful design and testing.
Backup applications have long offered several types of backup operations. The most common
backup types are a full backup, incremental backup and differential backup. Other backup types
include synthetic full backups and mirroring.
In the debate over cloud vs. local backup, there are some types of backup that are better in
certain locations. If you're performing cloud backup, incremental backups are generally a better
fit because they consume fewer resources. You might start out with a full backup in the cloud
and then shift to incremental backups. Mirror backup, though, is typically more of an on-
premises approach and often involves disks.
The most basic and complete type of backup operation is a full backup. As the name implies, this
type of backup makes a copy of all data to another set of media, such as a disk or tape. The
primary advantage to performing a full backup during every operation is that a complete copy of
all data is available with a single set of media. This results in a minimal time to restore data, a
metric known as a recovery time objective. However, the disadvantages are that it takes longer to
perform a full backup than other types (sometimes by a factor of 10 or more), and it requires
more storage space.
Thus, full backups are typically run only periodically. Data centers that have a small amount of
data (or critical applications) may choose to run a full backup daily, or even more often in some
cases. Typically, backup operations employ a full backup in combination with either incremental
or differential backups.
An incremental backup operation will result in copying only the data that has changed since the
last backup operation of any type. An organization typically uses the modified time stamp on
files and compares it to the time stamp of the last backup. Backup applications track and record
the date and time that backup operations occur in order to track files modified since these
operations.
Page 12 of 16
Because an incremental backup will only copy data since the last backup of any type, an
organization may run it as often as desired, with only the most recent changes stored. The benefit
of an incremental backup is that it copies a smaller amount of data than a full. Thus, these
operations will complete faster, and require less media to store the backup.
A differential backup operation is similar to an incremental the first time it is performed, in that
it will copy all data changed from the previous backup. However, each time it is run afterwards,
it will continue to copy all data changed since the previous full backup. Thus, it will store more
data than an incremental on subsequent operations, although typically far less than a full backup.
Moreover, differential backups require more space and time to complete than incremental
backups, although less than full backups. Below are described a comparison of the different
types of backup:
As shown in "A comparison of different types of backup," above, each process works differently.
An organization must run a full backup at least once. Afterwards, it is possible to run either
another full, an incremental or a differential backup. The first partial backup performed either
a differential or incremental, will back up the same data. By the third backup operation, the data
that is backed up with an incremental is limited to the changes since the last incremental. In
comparison, the third backup with a differential will back up all changes since the first full
backup, which was "Backup 1."
From these three primary types of backup, it is possible to develop an approach for
comprehensive data protection. An organization often uses one of the following approaches:
Full daily
Full weekly + differential daily
Full weekly + incremental daily
Page 13 of 16
Many considerations will affect the choice of the optimal backup strategy. Typically, each
alternative and strategy choice involves making tradeoffs between performance, data protection
levels, total amount of data retained and cost. In "A backup strategy's impact on space" below,
the media capacity requirements and media required for recovery are shown for three typical
backup strategies. These calculations presume 20 TB of total data, with 5% of the data changing
daily, and no increase in total storage during the period. The calculations are based on 22
working days in a month and a one-month retention period for data.
As shown above, performing a full backup daily requires the most amount of space, and will also
take the most amount of time. However, more total copies of data are available, and fewer pieces
of media are required to perform a restore operation. As a result, implementing this backup
policy has a higher tolerance to disasters, and provides the least time to restore, since any piece
of data required will be located on at most one backup set.
As an alternative, performing a full backup weekly, coupled with running incremental backups
daily, will deliver the shortest backup time during weekdays and use the least amount of storage
space. However, there are fewer copies of data available and restore time is the longest, since an
organization may need to use six sets of media to recover the necessary information. If data is
needed from data backed up on Wednesday, the Sunday full backup, plus the Monday, Tuesday
Page 14 of 16
and Wednesday incremental media sets, are required. This can dramatically increase recovery
times, and requires that each media set work properly; a failure in one backup set can impact the
entire restoration.
Running a weekly full backup plus daily differential backup’s delivers results in between the
other alternatives. Namely, more backup media sets are required to restore than with a daily full
policy, although less than with a daily incremental policy. Also, the restore time is less than
using daily incremental backups, and more than daily full backups. In order to restore data from
a particular day, at most two media sets are required, diminishing the time needed to recover and
the potential for problems with an unreadable backup set.
One of the main drawbacks, though, is the amount of storage space required. With that extra
storage, organizations should be wary of cost increases and maintenance needs. In addition, if
there's a problem in the source data set, such as a corruption or deletion, the mirror backup
experiences the same. As a result, it's a good idea not to rely on mirror backups for all your data
protection needs, and to have other types of backup for the data. You'll want to follow the 3-2-1
rule of backup, which includes three copies of data on two different media, with one copy off
site.
One specific kind of mirror, disk mirroring, is also known as RAID 1. This process replicates
data to two or more disks. Disk mirroring is a strong option for data that needs high availability
because of its quick recovery time. It's also helpful for disaster recovery because of its immediate
failover capability. Disk mirroring requires at least two physical drives. If one drive fails, an
organization can use the mirror copy. While disk mirroring offers comprehensive data protection,
it requires a lot of storage capacity.
Here, are the benefits and drawbacks of the backup types are described below:
Page 15 of 16
Type of backup Benefits Drawbacks
Full Provides full copy of datasets Time-consuming
Offers arguably best Requires lots of storage space
protection
Most of the advanced types of backup such as synthetic full, mirror and continuous data
protection require disk storage as the backup target. A synthetic full simply reconstructs the full
backup image using all required incremental backups or the differential backup on disk. This
synthetic full may then be stored to tape for offsite storage, with the advantage being reduced
restoration time. Finally, continuous data protection enables a greater number of restoration
points than traditional backup options.
When deciding which type of backup strategy to use, the question is when to use each, and how
these options should be combined with testing to meet the overall business cost, performance and
availability goals.
The purpose of most backups is to create a copy of data so that a particular file or application
may be restored after data loss, corruption or deletion, or a disaster strikes. Thus, backup is not
the goal, but rather it is one means to accomplish the goal of protecting data. Testing backups is
just as important as backing up and restoring data. Again, the point of backing up data is to
enable restoration of data at a later point in time. Without periodic testing, it is impossible to
guarantee that the goal of protecting data is being met.
Page 16 of 16