0% found this document useful (0 votes)
51 views

Chapter 5 File Managment

Uploaded by

zenebeshambel4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Chapter 5 File Managment

Uploaded by

zenebeshambel4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter Five

File System
7.1. Introduction
In computing, a file system (or file system) is a type of data store which can be used to store,
retrieve and update a set of files. The term could refer to the abstract data structures used to
define files, or to the actual software or firmware components that implement the abstract ideas.

The file system manages access to both the content of files and the metadata about those files. It
is responsible for arranging storage space; reliability, efficiency, and tuning with regard to the
physical storage medium are important design considerations.

The file system is responsible for organizing files and directories, and keeping track of which
areas of the media belong to which file and which are not being used.

File system fragmentation occurs when unused space or single files are not contiguous. As a
file system is used, files are created, modified and deleted. When a file is created the file system
allocates space for the data. Some file systems permit or require specifying an initial space
allocation and subsequent incremental allocations as the file grows. As files are deleted the space
they were allocated eventually is considered available for use by other files. This creates
alternating used and unused areas of various sizes. This is free space fragmentation. When a file
is created and there is not an area of contiguous.

7.1.1. File

A file is a named collection of related information that is recorded on secondary storage such as
magnetic disks, magnetic tapes and optical disks. In general, a file is a sequence of bits, bytes,
lines or records whose meaning is defined by the files creator and user.

 Desirable properties of files:


 Long-term existence :- files are stored on disk or other secondary storage and
do not disappear when a user logs off
 Sharable between processes :- files have names and can have associated
access permissions that permit controlled sharing
 Structure:- files can be organized into hierarchical or more complex structure
to reflect the relationships among files

Page 1 of 16
7.1.2. File Structure

A File Structure should be according to a required format that the operating system can
understand.

 A file has a certain defined structure according to its type.


 A text file is a sequence of characters organized into lines.
 A source file is a sequence of procedures and functions.
 An object file is a sequence of bytes organized into blocks that are understandable by the
machine.
 When operating system defines different file structures, it also contains the code to
support these file structure. UNIX, MS-DOS support minimum number of file structure.

7.1.3. File Type

File type refers to the ability of the operating system to distinguish different types of file such as
text files, source file, executable file, object file and etc. Many operating systems support many
types of files. Operating system like MS-DOS and UNIX have the following types of files −

1. Ordinary files
 These are the files that contain user information.
 These may have text, databases or executable program.
 The user can apply various operations on such files like add, modify, delete or even
remove the entire file.

2. Directory files
 These files contain list of file names and other information related to these files.

3. Special files

These files are also known as device files. These files represent physical device like disks,
terminals, printers, networks, tape drive etc. The special files are of two types:

 Character special files − data is handled character by character as in case of terminals or


printers.

Page 2 of 16
 Block special files − data is handled in blocks as in the case of disks and tapes.

7.1.4. File extension

The file extension indicates what kind of file it is: its “format” or “type.” For instance, the file
extension .exe refers to an "executable" file--in other words, an application. The file
extension .html indicates a Hypertext Markup Language file--in other words, a web page. In My
Computer or Windows Explorer, double-clicking on a file will open it if the file extension is
correct. Some common file extensions:
 .doc – Microsoft Word document
 .wpd – WordPerfect Document
 .txt – Plain text document
 .htm, .html – A plain text document with added code that enables it to be read on the
World Wide Web
 .jpg – An image file
 .gif – An image file
 .exe – An executable file, meaning an application/program/piece of software
Filename: The name of a file, including or not including its file extension.
File Size: The size of a file measured in bytes. A floppy disk holds about 1.5 Mb; a Zip disk
holds 100 Mb or 250 Mb; a CD holds about 800 Mb; a DVD holds about 4,700 Mb.
7.2. File Systems

 Provide a means to store data organized as files as well as a collection of functions that
can be performed on files
 Maintain a set of attributes associated with the file
 Typical operations include:
 Create
 Delete
 Open
 Close
 Read
 Write

Page 3 of 16
7.2.1. File Attributes

 Name – only information kept in human-readable form


 Identifier – unique tag (number) identifies file within file system
 Type – needed for systems that support different types
 Location – pointer to file location on device
 Size – current file size
 Protection – controls who can do reading, writing, executing
 Time, date, and user identification – data for protection, security, and usage
monitoring

Information about files is kept in the directory structure, which is maintained on the disk.
Four terms are commonly used when discussing files. These are Files, record, file, and database.
Files can be structured as a collection of records or as a sequence of bytes. UNIX, Linux,
Windows, Mac OS’s considers files as a sequence of bytes.

Fields
 basic element of data
 contains a single value
 fixed or variable length
File
 collection of similar records
 treated as a single entity
 may be referenced by name
 access control restrictions usually apply at the file level

Database
 collection of related data
 relationships among elements of data are explicit
 designed for use by a number of different applications
 consists of one or more types of files
Record
 Collection of related fields that can be treated as a unit by some application program

Page 4 of 16
 One field is the key – a unique identifier

7.3. File Access Mechanisms


File access mechanism refers to the manner in which the records of a file may be accessed. There
are several ways to access files:
 Sequential access
 Direct/Random access
 Indexed sequential access

7.3.1. Sequential access

A sequential access is that in which the records are accessed in some sequence, i.e., the
information in the file is processed in order, one record after the other. This access method is the
most primitive one. Example: Compilers usually access files in this fashion.

7.3.2. Direct/Random access

 Random access file organization provides, accessing the records directly.


 Each record has its own address on the file with by the help of which it can be directly
accessed for reading or writing.

 The records need not be in any sequence within the file and they need not be in
adjacent locations on the storage medium.

7.3.3. Indexed sequential access

 This mechanism is built up on base of sequential access.


 An index is created for each file which contains pointers to various blocks.
 Index is searched sequentially and its pointer is used to access the file directly.

7.4. Space Allocation

Files are allocated disk spaces by operating system. Operating systems deploy following three
main ways to allocate disk space to files.
 Contiguous Allocation

Page 5 of 16
 Linked Allocation
 Indexed Allocation

7.4.1. Contiguous Allocation

 Each file occupies a contiguous address space on disk.


 Assigned disk address is in linear order.
 Easy to implement.
 External fragmentation is a major issue with this type of allocation technique.

7.4.2. Linked Allocation

 Each file carries a list of links to disk blocks.


 Directory contains link / pointer to first block of a file.
 No external fragmentation
 Effectively used in sequential access file.
 Inefficient in case of direct access file.

7.4.3. Indexed Allocation

 Provides solutions to problems of contiguous and linked allocation.


 A index block is created having all pointers to files.
 Each file has its own index block which stores the addresses of disk space occupied by
the file.
 Directory contains the addresses of index blocks of files.

7.5. Types of file systems


File system types can be classified into disk/tape file systems, network file systems and special-
purpose file systems.

7.5.1. Disk file systems

A disk file system takes advantages of the ability of disk storage media to randomly address data
in a short amount of time. Additional considerations include the speed of accessing data

Page 6 of 16
following that initially requested and the anticipation that the following data may also be
requested. This permits multiple users (or processes) access to various data on the disk without
regard to the sequential location of the data. Some disk file systems are journaling file
systems or versioning file systems.

7.5.2. Network file systems

A network file system is a file system that acts as a client for a remote file access protocol,
providing access to files on a server. Examples of network file systems include clients for
the NFS,AFS, SMB protocols, and file-system-like clients for FTP and WebDAV.

7.5.3. Special file systems

A special file system presents non-file elements of an operating system as files so they can be
acted on using file system APIs. This is most commonly done in Unix-like operating systems,
but devices are given file names in some non-Unix-like operating systems as well.

7.6. Content and structure of Directories


File systems typically have directories (also called folders) which allow the user to group files
into separate collections. This may be implemented by associating the file name with an index in
a table of contents or an inode in a Unix-like file system. Directory structures may be flat (i.e.
linear), or allow hierarchies where directories may contain subdirectories. The first file system to
support arbitrary hierarchies of directories was used in the Multics operating system.
1. Single Level Directory
 A single directory for all users
 Naming Problem and Grouping Problem
 As the number of files increases, difficult to remember unique names
 As the number of users increase, users must have unique names.
2. Two Level Directory
 Separate directory for each user
 Advantages
 Efficient searching
 Naming - same file name in a different directory; path name
 Disadvantages

Page 7 of 16
 Naming - collisions
 Grouping capability – not possible except by user
3.Three Level Directory
 Advantages
 Efficient searching – current working directory
 Naming - same file name in a different directory
 Grouping capability
 Disadvantages
 Structural complexity
7.6.1. Operations Performed on Directory
 Search for a file
 Create a file
 Delete a file
 List a directory
 Rename a file
 Traverse the file system
Organize a directory (logically) based on the following criteria:
 Efficiency – locating a file quickly
 Naming – convenient to users
 Two users can have same name for different files
 The same file can have several different names
 Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)
7.7. File-System techniques
7.7.1. File System mounting
File System must be mounted before it can be available to process on the system
 The OS is given the name of the device and the mount point (location within file
structure at which files attach).
 OS verifies that the device contains a valid file system.
 OS notes in its directory structure that a file system is mounted at the specified
mount point.

Page 8 of 16
7.7.2. Allocation of Disk Space
Low level access methods depend upon the disk allocation scheme used to store file data
1. Contiguous Allocation
 Each file occupies a set of contiguous blocks on the disk.
 Simple - only starting location (block #) and length
(number of blocks) are required.
 Suits sequential or direct access.
 Fast (very little head movement) and easy to recover in the
event of system crash.
 Problems :- Wasteful of space
2. Linked List Allocation
 Each file is a linked list of disk blocks
 Blocks may be scattered anywhere on the disk.
 Each node in list can be a fixed size physical block or a contiguous
collection of blocks.
3. Block Allocation
 Simple - need only starting address.
 Free-space management system - space efficient.
 Can grow in middle and at ends. No estimation of size necessary.
7.7.3. Disk Defragmentation
 Re-organize blocks in disk so that file is (mostly) contiguous
 Link or FAT organization preserved
 Purpose:
 To reduce disk arm movement during sequential
accesses
7.8. File Management Systems
A file management system is that set of system software that provides services to users and
applications in the use of files. Following objectives for a file management system:
 To meet the data management needs and requirements of the user
 To guarantee, to the extent possible, that the data in the file are valid.
 To optimize performance

Page 9 of 16
 To provide I/O support for a variety of storage device types.
 To minimize or eliminate the potential for lost or destroyed data.
 To provide a standardized set of I/O interface routines to use processes.
Examples of file systems include ISO 9660, CP/M, MS-DOS, Windows 98, and UNIX. These
differ in many ways, including how they keep track of which blocks go with which file, directory
structure, and management of free disk space.

File Protection
 File owner/creator should be able to control
 what can be done
 by whom
 Types of access

 read
 write
 execute
 append
 delete
 list
 Access Lists and Groups

Page 10 of 16
N.B:- The Boot Process

7.9. Memory-mapped file

A memory-mapped file contains the contents of a file in virtual memory. This mapping between
a file and memory space enables an application, including multiple processes, to modify the file
by reading and writing directly to the memory.
There are two types of memory-mapped files:
1. Persisted memory-mapped files: Persisted files are memory-mapped files that are associated
with a source file on a disk. When the last process has finished working with the file, the data is
saved to the source file on the disk. These memory-mapped files are suitable for working with
extremely large source files.

2. Non-persisted memory-mapped files: Non-persisted files are memory-mapped files that are
not associated with a file on a disk. When the last process has finished working with the file, the
data is lost and the file is reclaimed by garbage collection. These files are suitable for creating
shared memory for inter-process communications (IPC).

7.10. Backup strategies

The term backup has become synonymous with data protection over the past several decades and
may be accomplished via several methods. Backup software applications reduce the complexity
of performing backup and recovery operations. Backing up data is only one part of a disaster
protection plan, and may not provide the level of data and disaster recovery capabilities desired

Page 11 of 16
without careful design and testing.

Backup applications have long offered several types of backup operations. The most common
backup types are a full backup, incremental backup and differential backup. Other backup types
include synthetic full backups and mirroring.

In the debate over cloud vs. local backup, there are some types of backup that are better in
certain locations. If you're performing cloud backup, incremental backups are generally a better
fit because they consume fewer resources. You might start out with a full backup in the cloud
and then shift to incremental backups. Mirror backup, though, is typically more of an on-
premises approach and often involves disks.

7.10.1. Full backups

The most basic and complete type of backup operation is a full backup. As the name implies, this
type of backup makes a copy of all data to another set of media, such as a disk or tape. The
primary advantage to performing a full backup during every operation is that a complete copy of
all data is available with a single set of media. This results in a minimal time to restore data, a
metric known as a recovery time objective. However, the disadvantages are that it takes longer to
perform a full backup than other types (sometimes by a factor of 10 or more), and it requires
more storage space.

Thus, full backups are typically run only periodically. Data centers that have a small amount of
data (or critical applications) may choose to run a full backup daily, or even more often in some
cases. Typically, backup operations employ a full backup in combination with either incremental
or differential backups.

7.10.2. Incremental backups

An incremental backup operation will result in copying only the data that has changed since the
last backup operation of any type. An organization typically uses the modified time stamp on
files and compares it to the time stamp of the last backup. Backup applications track and record
the date and time that backup operations occur in order to track files modified since these
operations.

Page 12 of 16
Because an incremental backup will only copy data since the last backup of any type, an
organization may run it as often as desired, with only the most recent changes stored. The benefit
of an incremental backup is that it copies a smaller amount of data than a full. Thus, these
operations will complete faster, and require less media to store the backup.

7.10.3. Differential backups

A differential backup operation is similar to an incremental the first time it is performed, in that
it will copy all data changed from the previous backup. However, each time it is run afterwards,
it will continue to copy all data changed since the previous full backup. Thus, it will store more
data than an incremental on subsequent operations, although typically far less than a full backup.
Moreover, differential backups require more space and time to complete than incremental
backups, although less than full backups. Below are described a comparison of the different
types of backup:

Type/backup Full Mirror Incremental Differential


Backup 1 All Data All Data Selected ----- ------
Backup 2 All Data All Data Selected Changes from Backup 1 Changes from Backup 1
Backup 3 All Data All Data Selected Changes from Backup 2 Changes from Backup 1
Backup 4 All Data All Data Selected Changes from Backup 3 Changes from Backup 1

As shown in "A comparison of different types of backup," above, each process works differently.
An organization must run a full backup at least once. Afterwards, it is possible to run either
another full, an incremental or a differential backup. The first partial backup performed either
a differential or incremental, will back up the same data. By the third backup operation, the data
that is backed up with an incremental is limited to the changes since the last incremental. In
comparison, the third backup with a differential will back up all changes since the first full
backup, which was "Backup 1."
From these three primary types of backup, it is possible to develop an approach for
comprehensive data protection. An organization often uses one of the following approaches:
 Full daily
 Full weekly + differential daily
 Full weekly + incremental daily

Page 13 of 16
Many considerations will affect the choice of the optimal backup strategy. Typically, each
alternative and strategy choice involves making tradeoffs between performance, data protection
levels, total amount of data retained and cost. In "A backup strategy's impact on space" below,
the media capacity requirements and media required for recovery are shown for three typical
backup strategies. These calculations presume 20 TB of total data, with 5% of the data changing
daily, and no increase in total storage during the period. The calculations are based on 22
working days in a month and a one-month retention period for data.

A Backup Strategy’s Impact on Space


Common backup scenarios Media space required for one Media required for recovery
month (20TB @ 5% Dailly Rate
of Change)
Full daily(weekdays) Space for 22 daily Most recent backup only
Fulls(22*20TB)=440 TB
Full (weekly)+differential(week- Fulls, plus most recent differential Most recent Full + most recent
days) Since full (5*20TB)+ Differential
(22*5%*20TB)=122TB
Full(weekly) + incremental(week- Fulls, plus all incremental since Most recent full + all incremental
days) weekly full since full
(5*20TB)+(22*5%*20TB)=122TB

As shown above, performing a full backup daily requires the most amount of space, and will also
take the most amount of time. However, more total copies of data are available, and fewer pieces
of media are required to perform a restore operation. As a result, implementing this backup
policy has a higher tolerance to disasters, and provides the least time to restore, since any piece
of data required will be located on at most one backup set.

As an alternative, performing a full backup weekly, coupled with running incremental backups
daily, will deliver the shortest backup time during weekdays and use the least amount of storage
space. However, there are fewer copies of data available and restore time is the longest, since an
organization may need to use six sets of media to recover the necessary information. If data is
needed from data backed up on Wednesday, the Sunday full backup, plus the Monday, Tuesday

Page 14 of 16
and Wednesday incremental media sets, are required. This can dramatically increase recovery
times, and requires that each media set work properly; a failure in one backup set can impact the
entire restoration.

Running a weekly full backup plus daily differential backup’s delivers results in between the
other alternatives. Namely, more backup media sets are required to restore than with a daily full
policy, although less than with a daily incremental policy. Also, the restore time is less than
using daily incremental backups, and more than daily full backups. In order to restore data from
a particular day, at most two media sets are required, diminishing the time needed to recover and
the potential for problems with an unreadable backup set.

7.10.4. Mirror backups


A mirror backup is comparable to a full backup. According to a blog from backup vendor
Nakivo, "This backup type creates an exact copy of the source data set, but only the latest data
version is stored in the backup repository with no track of different versions of the files." The
backup is a mirror of the source data, thus the name. All the different backed up files are stored
separately, like they are in the source. One of the benefits of mirror backup is a fast recovery
time. It's also easy to access individual backed up files.

One of the main drawbacks, though, is the amount of storage space required. With that extra
storage, organizations should be wary of cost increases and maintenance needs. In addition, if
there's a problem in the source data set, such as a corruption or deletion, the mirror backup
experiences the same. As a result, it's a good idea not to rely on mirror backups for all your data
protection needs, and to have other types of backup for the data. You'll want to follow the 3-2-1
rule of backup, which includes three copies of data on two different media, with one copy off
site.
One specific kind of mirror, disk mirroring, is also known as RAID 1. This process replicates
data to two or more disks. Disk mirroring is a strong option for data that needs high availability
because of its quick recovery time. It's also helpful for disaster recovery because of its immediate
failover capability. Disk mirroring requires at least two physical drives. If one drive fails, an
organization can use the mirror copy. While disk mirroring offers comprehensive data protection,
it requires a lot of storage capacity.

Here, are the benefits and drawbacks of the backup types are described below:

Page 15 of 16
Type of backup Benefits Drawbacks
Full  Provides full copy of datasets  Time-consuming
 Offers arguably best  Requires lots of storage space
protection

Incremental  Less time and storage space  Time consuming to restore


than Full  Need all the backups in backup chain
to restore
Differential  Shorter restore time than  Can grow too much bigger size than
incremental incremental
Synthetic full  Reduce restore time  Newer, so not as well-known
 Less bandwidth usage
Incremental-  Availability of data  Newer, so not as well-known
forever  Automated restoration  Need all the backups in backup chain
process to restore

Most of the advanced types of backup such as synthetic full, mirror and continuous data
protection require disk storage as the backup target. A synthetic full simply reconstructs the full
backup image using all required incremental backups or the differential backup on disk. This
synthetic full may then be stored to tape for offsite storage, with the advantage being reduced
restoration time. Finally, continuous data protection enables a greater number of restoration
points than traditional backup options.

When deciding which type of backup strategy to use, the question is when to use each, and how
these options should be combined with testing to meet the overall business cost, performance and
availability goals.

The purpose of most backups is to create a copy of data so that a particular file or application
may be restored after data loss, corruption or deletion, or a disaster strikes. Thus, backup is not
the goal, but rather it is one means to accomplish the goal of protecting data. Testing backups is
just as important as backing up and restoring data. Again, the point of backing up data is to
enable restoration of data at a later point in time. Without periodic testing, it is impossible to
guarantee that the goal of protecting data is being met.

Page 16 of 16

You might also like