0% found this document useful (0 votes)
7 views

OS Chapter-5 Handout os

Chapter 5 discusses the file concept in operating systems, detailing how files are logical units of storage that abstract physical storage properties. It covers file attributes, operations, access methods, and directory structures, highlighting the importance of organization and management of files within a system. The chapter also explains various directory structures, including single-level, two-level, tree-structured, and acyclic graph directories, along with the process of mounting file systems.

Uploaded by

Robel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

OS Chapter-5 Handout os

Chapter 5 discusses the file concept in operating systems, detailing how files are logical units of storage that abstract physical storage properties. It covers file attributes, operations, access methods, and directory structures, highlighting the importance of organization and management of files within a system. The chapter also explains various directory structures, including single-level, two-level, tree-structured, and acyclic graph directories, along with the process of mounting file systems.

Uploaded by

Robel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter- 5

File Concept & Implementing File Systems


File Concept
Computers can store information on various storage media such as magnetic disks, magnetic tapes and optical
disks. OS provides a uniform logical view of information storage. OS abstracts from the physical properties of its
storage devices to define a logical storage unit called a file. Files are mapped by OS onto physical devices. These
storage devices are non volatile so the contents are persistent through power failures and system reboots. A file is a
named collection of related information that is recorded on secondary storage. A file is the smallest allotment of
logical secondary storage; that is data cannot be written to secondary storage unless they are within a file. Files
represent programs and data. Data files may be numeric, alphabetic, alphanumeric or binary. Files may be free
form such as text files or may be formatted rigidly. A file is a sequence of bits, bytes, lines or records. Information
in a file is defined by its creator. Many different types of information may be stored in a file – source programs,
object programs, executable programs, numeric data, text etc. A file has a certain defined structure which depends
on its type.

Text file: sequence of characters organized into lines.


Source file: sequence of sub routines and functions each of which is further organized as declarations followed by
executable statements.
Object file: sequence of bytes organized into blocks understandable by the system’s linker.
Executable file: series of code sections that the loader can bring into memory and execute.

File Attributes
A file is referred to by its name. A name is usually a string of characters. When a file is named, it becomes
independent of the process, the user and even the system that created it. A file’s attributes vary from one OS to
another but consist of these –

 Name: symbolic file name is the only information kept in human readable form.
 Identifier: number which identifies the file within the file system; it is the non human readable name for the
file.
 Type: information is needed for systems that support different types of files.
 Location: this information is a pointer to a device and to the location of the file on that device.
 Size: the current size of the file.
 Protection: Access control information determines who can do reading, writing, executing etc.
 Time, date and user identification: This information may be kept for creation, last modification and last use.
The information about all files is kept in the directory structure which resides on secondary storage. A
directory entry consists of the file’s name and its unique identifier. The identifier in turn locates the other file
attributes.

File Operations
A file is an abstract data type. OS can provide system calls to create, write, read, reposition, delete and truncate
files.
 Creating a file – First space in the file system must be found for the file. Second, an entry for the new file
must be made in the directory.
 Writing a file – To write a file, specify both the name of the file and the information to be written to the
file. The system must keep a write pointer to the location in the file where the next write is to take place.
 Reading a file – To read from a file, directory is searched for the associated entry and the system needs to
keep a read pointer to the location in the file where the next read is to take place. Because a process is
either reading from or writing to a file, the current operation location can be kept as a per process current
file position pointer.

1 Operating Systems Chapter- 5: File Concepts


 Repositioning within a file – Directory is searched for the appropriate entry and the current file position
pointer is repositioned to a given value. This operation is also known as file seek.
 Deleting a file – To delete a file, search the directory for the named file. When found, release all files
space and erase the directory entry.
 Truncating a file – User may want to erase the contents of a file but keep its attributes. This function
allows all attributes to remain unchanged except for file length.

Other common operations include appending new information to the end of an existing file and renaming an
existing file. We may also need operations that allow the user to get and set the various attributes of a file.

Most of the file operations mentioned involve searching the directory for the entry associated with the named file.
To avoid this constant search, many systems require that an open ( ) system call be made before a file is first used
actively. OS keeps a small table called the open file table containing information about all open files.

When a file operation is requested, the file is specified via an index into this table so no searching is required.
When the file is no longer being actively used, it is closed by the process and the OS removes its entry from the
open file table. Create and delete are system calls that work with closed files.

The open ( ) operation takes a file name and searches the directory copying the directory entry into the open file
table. The open ( ) call can also accept access mode information – create, read – only, read – write, append – only,
etc. This mode is checked against file’s permissions. If the request mode is allowed, the file is opened for the
process. The open ( ) system call returns a pointer to the entry in the open file table. This pointer is used in all I/O
operations avoiding any further searching and simplifying the system call interface.

OS uses two levels of internal tables – a per process table and a system wide table. The per process table tracks all
files that a process has open. Stored in this table is information regarding the use of the file by the process.

Each entry in the per process table points to a system wide open file table. The system wide table contains process
independent information. Once a file has been opened by one process, the system wide table includes an entry for
the file. The open file table also has an open count associated with each file to indicate how many processes have
the file open.

To summarize, several pieces of information are associated with an open file.

File pointer – System must keep track of the last read – write location as a current file position pointer.
File open count – As files are closed, OS must reuse its open file entries or it could run out of space in the table.
File open counter tracks the number of opens and closes and reaches zero on the last close.
Disk location of the file – The information needed to locate the file on disk is kept in memory so that the system
does not have to read it from disk for each operation.
Access rights – Each process opens a file a file in an access mode. This information is stored on the per process
table so the OS can allow or deny subsequent I/O requests.

Some OS’s provide facilities for locking an open file. File locks allow one process to lock a file and prevent other
processes from gaining access to it. File locks are useful for files that are shared by several processes. A shared
lock is where several processes can acquire the lock concurrently. An exclusive lock is where only one process at
a time can acquire such a lock.

Also some OS’s may provide either mandatory or advisory file locking mechanisms. If a lock is mandatory,
then once a process acquires an exclusive lock, the OS will prevent any other process from accessing the
locked file. If the lock scheme is mandatory, OS ensures locking integrity.

2 Operating Systems Chapter- 5: File Concepts


For advisory locking, it is upto software developers to ensure that locks are appropriately acquired and
released.

File types
A common technique for implementing file types is to include the type as part of the file name. The name is
split into two parts – a name and an extension separated by a period character. The system uses the extension
to indicate the type of the file and the type of operations that can be done on that file.

File structure
File types can be used to indicate the internal structure of the file. Source and object files have structures that
match the expectations of the programs that read them. Certain files conform to a required structure that is
understood by OS. But the disadvantage of having the OS support multiple file structures is that the resulting
size of the OS is cumbersome. If the OS contains five different file structures, it needs to contain the code to
support these file structures. Hence some OS’s impose a minimal number of file structures. MAC OS also
supports a minimal number of file structures. It expects files to contain two parts – a resource fork and a data
fork. The resource fork contains information of interest to the user. The data fork contains program code or
data – traditional file contents.

Internal file structure

Internally locating an offset within a file can be complicated for the OS. Disk systems have a well defined block
size determined by the size of the sector. All disk I/O is performed in units of one block and all blocks are the same

3 Operating Systems Chapter- 5: File Concepts


size. Since it is unlikely that the physical record size will exactly match the length of the desired logical record,
and then logical records may even vary in length, packing a number of logical records into physical blocks is a
solution. The logical record size, physical block size and packing technique determine how many logical records
are in each physical block. The packing can be done either by the user’s application program or by the OS. Hence
the file may be considered to be a sequence of blocks. All the basic I/O functions operate in terms of blocks.

Access methods
Files store information. When it is used, this information must be accessed and read into computer memory. The
information in the file can be accessed in several ways. They are –

 Sequential access: Simplest method. Information in the file is processed in order that is one record after
the other. This method is based on a tape model of a file and works as well on sequential access devices as
it does on random access

 Direct access: Another method is direct access or relative access. A file is made up of fixed length logical
records that allow programs to read and write records rapidly in no particular order. The direct access
method is based on a disk model of a file since disks allow random access to any file block. Direct access
files are of great use for immediate access to large amounts of information. In this method, file operations
must be modified to include block number as a parameter.

 Other Access Methods: Other access methods can be built on top of a direct access method. These
methods generally involve the construction of an index for the file. This index contains pointers to the
various blocks. To find a record in the file, first search the index and then use the pointer to access the file
directly and to find the desired record.

Directory Structure
Systems may have zero or more file systems and the file systems may be of varying types. Organizing millions of
files involves use of directories. Storage Structure A disk can be used in its entirety for a file system. But at times,
it is desirable to place multiple file systems on a disk or to use parts of a disk for a file system and other parts for
other things. These parts are known variously as partitions, slices or minidisks. A file system can be created on
each of these parts of the disk. These parts can be combined together to form larger structures known as volumes
and file systems can be created on these too. Each volume can be thought of as a virtual disk. Volumes can also
store multiple OS’s allowing a system to boot and run more than one. Each volume that contains a file system must
also contain information about the files in the system. This information is kept in entries in a device directory or
volume table of contents. The device directory/directory records information for all files on that volume.
4 Operating Systems Chapter- 5: File Concepts
Directory Overview
The directory can be viewed as a symbol table that translates file names into their directory entries. The operations
that can be performed on the directory are:
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system

5 Operating Systems Chapter- 5: File Concepts


Single level directory
The simplest directory structure is the single level directory. All files are contained in the same directory which is
easy to support and understand. But this implementation has limitations when the number of files increases or
when the system has more than one user. Since all files are in same directory, all files names must be unique.
Keeping track of so many files is a difficult task. A single user on a single level directory may find it difficult to
remember the names of all the files as the number of files increases.

Two level directory


In the two level directory structure, each user has his own user file directory (UFD). The UFD’s have similar
structures but each lists only the files of a single user. When a user job starts or a user logs in, the system’s master
file directory (MFD) is searched. The MFD is indexed by user name or account number and each entry points to
the UFD for that user. When a user refers to a particular file, only his own UFD is searched. Different users may
have files with the same name as long as all the files names within each UFD are unique.

Root of the tree is MFD. Its direct descendants are UFDs. The descendants of the UFDs are the files themselves.
The files are the leaves of the tree.

The sequence of directories searched when a file is names is called the search path.

Although the two level directory structures solve the name collision problem, it still has disadvantages. This
structure isolates on user from another. Isolation is an advantage when the users are completely independent but a
disadvantage when the users want to cooperate on some task and to access one another’s files.

Tree Structured Directories


Here, we extend the two level directories to a tree of arbitrary height. This generalization allows users to create
their own subdirectories and to organize their files accordingly. A tree is the most common directory structure. The
tree has a root directory and every file in the system has a unique path name. A directory contains a set of files or
sub directories. All directories have the same internal format. One bit in each directory entry defines the entry as a
file (0) or as a subdirectory (1).

6 Operating Systems Chapter- 5: File Concepts


Each process has a current directory. The current directory should contain most of the files that are of current
interest to the process.

Path names can be of two types – absolute and relative. An absolute path name begins at the root and follows a
path down to the specified file giving the directory names on the path. A relative path name defines a path from the
current directory.

Deletion of directory under tree structured directory – If a directory is empty, its entry in the directory that contains
it can simply be deleted. If the directory to be deleted is not empty, then use one of the two approaches –

User must first delete all the files in that directory


If a request is made to delete a directory, all the directory’s files and sub directories are also to be deleted.

A path to a file in a tree structured directory can be longer than a path in a two level directory.

Acyclic graph directories


A tree structure prohibits the sharing of files and directories. An acyclic graph i.e. a graph with no cycles allows
directories to share subdirectories and files. The same file or subdirectory may be in two different directories.

With a shared file, only one actual file exists. Sharing is particularly important for subdirectories. Shared files and
subdirectories can be implemented in several ways. One way is to create a new directory entry called a link. A link
is a pointer to another file or subdirectory. Another approach in implementing shared files is to duplicate all
information about them in both sharing directories.

7 Operating Systems Chapter- 5: File Concepts


An acyclic graph directory structure is flexible than a tree structure but it is more complex. Several problems may
exist such as multiple absolute path names or deletion.

General graph directory

A problem with using an acyclic graph structure is ensuring that there are no cycles.

The primary advantage of an acyclic graph is the relative simplicity of the algorithms to traverse the graph and to
determine when there are no more references to a file. If cycles are allowed to exist in the directory, avoid
searching any component twice. A similar problem exists when we are trying to determine when a file can be
deleted. The difficulty is to avoid cycles as new links are added to the structure.

8 Operating Systems Chapter- 5: File Concepts


File System Mounting
A file system must be mounted before it can be available to processes on the system. OS is given the name of the
device and a mount point – the location within the file structure where the file system is to be attached. This mount
point is an empty directory. Next, OS verifies that the device contains a valid file system. It does so by asking the
device driver to read the device directory and verifying that the directory has the expected format. Finally OS notes
in its directory structure that a file system is mounted at the specified mount point.

File Sharing
File sharing is desirable for users who want to collaborate and to reduce the effort required to achieve a computing
goal.
Multiple users
When an OS accommodates multiple users, the issues of file sharing, file naming and file protection become
preeminent. System mediates file sharing. The system can either allow a user to access the files of other users by
default or require that a user specifically grant access to the files.

Remote File Systems


Networking allows sharing of resources spread across a campus or even around the world. One obvious resource to
share is data in the form of files.

Client Server Model


Remote file systems allow a computer to mount one or more file systems from one or more remote machines. Here
the machine containing the files is the server and the machine seeking access to the files is the client. A server can
serve multiple clients and a client can use multiple servers depending on the implementation details of a given
client server facility. Once the remote file system is mounted, file operation requests are sent on behalf of the user
across the network to the server via the DFS protocol.

9 Operating Systems Chapter- 5: File Concepts


Distributed Information Systems
To make client server systems easier to manage, distributed information systems also known as distributed naming
services provide unified access to the information needed for remote computing. The domain name system
provides host name to network address translations for the entire Internet.

Distributed information systems used by some companies –

Sun Microsystems - Network Information Service or NIS


Microsoft - Common internet file system or CIFS

Failure Modes
Local file systems can fail for a variety of reasons including failure of the disk containing the file system,
corruption of the delivery structure or other disk management information, disk controller failure, cable failure and
host adapter failure. User or system administrator failure can also cause files to be lost or entire directories or
volumes to be deleted. Many of these failures will cause a host to crash and an error condition to be displayed and
human intervention will be required to repair the damage. Remote fail systems have even more failure modes. In
the case of networks, the network can be interrupted between two hosts. Such interruption can result from
hardware failure, poor hardware configuration or networking implementation issues. For a recovery from a failure,
some kind of state information may be maintained on both the client and server.

Consistency semantics
These represent an important criterion for evaluating any file system that supports file sharing. These semantics
specify how multiple users of a system are to access a shared file simultaneously. These are typically implemented
as code with the file system.

10 Operating Systems Chapter- 5: File Concepts


Protection
When information is stored in a computer system, it should be kept safe from physical damage (reliability) and
improper access (protection). Reliability is provided by duplicate copies of files. Protection can be provided in
many ways such as physically removing the floppy disks and locking them up.

Types of Access
Complete protection to files can be provided by prohibiting access. Systems that do not permit access to the files of
other users do not need protection. Both these approaches are extreme. Hence controlled access is required.
Protection mechanisms provide controlled access by limiting the types of file access that can be made. Access is
permitted or denied depending on many factors. Several different types of operations may be controlled

 Read
 Write
 Execute
 Append
 Delete
 List

Other operations such as renaming, copying etc may also be controlled.

Access Control
The most common approach to the protection problem is to make access dependent on the identity of the user. The
most general scheme to implement identity- dependent access is to associate with each file and directory an access-
control list (ACL) specifying user names and the types of access allowed for each user.

This approach has the advantage of enabling complex access methodologies. The main problem with access lists is
their length. To condense the length of the access control list, many systems recognize three classifications of users
in connection with each file:

a) Owner – user who created the file.


b) Group – set of users who are sharing the file and need similar access.
c) Universe – all other users in the system.

Other Protection Approaches


Another approach to protection problem is to associate a password with each file. If the passwords are chosen
randomly and changed often, this scheme may be effective in limiting access to a file. Use of passwords has
certain disadvantages

1. The number of passwords that a user needs to remember may become large making the scheme impractical.
2. If only one password is used for all the files, then once it is discovered, all files are accessible.

11 Operating Systems Chapter- 5: File Concepts


Implementing File Systems
The file system provides the mechanism for on line storage and access to file contents including data and
programs. The file system resides permanently on secondary storage which is designed to hold a large amount of
data permanently.

File System Structure


Disks provide the bulk of secondary storage on which a file system is maintained. They have two characteristics
that make them a convenient medium for storing multiple files:

A disk can be rewritten in place; it is possible to read a block from the disk, modify the block and write it back into
the same place.
A disk an access directly any given block of information it contains. It is simple to access any file sequentially or
randomly and switching from one file to another requires only moving the read – write heads and waiting for the
disk to rotate.

To improve I/O efficiency, I/O transfers between memory and disk are performed in units of blocks. Each block
has one or more sectors. To provide efficient and convenient access to the disk, OS imposes one or more file
systems to allow the data to be stored, located and retrieved easily. The file system is composed of many different
levels –
Each level in the design uses the features of lower levels to create new features for use by higher levels.

The lowest levels, I/O control consists of device drivers and interrupt handlers to transfer information between the
main memory and the disk system. The basic file system needs to issue generic commands to appropriate device
driver to read and write physical blocks on the disk.

The file organization module knows about files and their logical blocks as well as physical blocks.

The logical file system manages metadata information. Metadata includes all of the file system structure except the
actual data.

A file control block contains information about the file including ownership, permissions and location of the file
contents.

12 Operating Systems Chapter- 5: File Concepts


File System Implementation

OS’s implement open( ) and close( ) system calls for processes to request access to file contents.

Overview
Several on disk and in memory structures are used to implement a file system. These structures vary depending on
the OS and the file system. File system may contain information such as:

 Boot control block - In UFS, it is called the boot block; in NTFS it is partition boot sector.
 Volume control block - In UFS, it is called a super block; in NTFS it is stored in the master file table
 A directory structure per file system is used to organize the files. In UFS, this includes file names and
associated I node numbers. In NTFS, it is stored in master file table.
 A per fie FCB contains many details about the file, including file permissions, ownership, size and location
of data blocks. In UFS, it is called the inode. In NTFS this is stored within the master file table which uses
a relational database structure.

The structures may include the ones described below –

An in memory mount table contains information about each mounted volume
An in memory directory structure cache holds the directory information of recently accessed directories.
The system wide open file table contains a copy of the FCB of each open file
The per process open file table contains a pointer to the appropriate entry in the system wide open file table.

Partitions and Mounting


The layout of a disk can have many variations depending on the OS. A disk can be sliced into multiple partitions or
a volume can span multiple partitions on multiple disks. Each partition can be either raw containing no file system
or may contain a file system. Raw disk is used where no file system is appropriate. The root partition which
contains OS kernel and sometimes other system files is mounted at boot time. As part of successful mount
operation, OS verifies that the device contains a valid file system. OS finally notes in its in-memory mount table
structure that a file system is mounted along with the type of the file system.

Virtual File Systems


An optimal method of implementing multiple types of file systems is to write directory and file routines for each
type. Most operating systems use object oriented techniques to simplify, organize and modularize the
implementation. Data structures and procedures are used to isolate the basic system call functionality from the
implementation details. Thus, file system implementation consists of three major layers –
13 Operating Systems Chapter- 5: File Concepts
The first layer is the file system interface based on system calls and on file descriptors. The second layer is called
virtual file system layer which serves two important functions:

Separates file system generic operations from their implementation by defining a clean VFS interface.

VFS provides a mechanism for uniquely representing a file throughout a network.VFS is based on a file
representation structure called v node that contains a numerical designator for a network wide unique file.

Thus, VFS distinguishes local files from remote ones and local files are further distinguished according to their file
system types.

Directory Implementation

The selection of directory allocation and directory management algorithms significantly affects the efficiency,
performance and reliability of the file system.

Linear List
The simplest method of implementing a directory is to use a linear list of file names with pointers to the data
blocks. This method is simple to program but time consuming to execute. The real disadvantage of a linear list of
directory entries is that finding a file requires a linear search.

Hash Table
Another data structure used for a file directory is a hash table. With this method, a linear list stores the directory
entries but a hash data structure is also used. The hash table takes a value computed from the file name and returns
a pointer to the file name in the linear list. The major difficulties with a hash table are its generally fixed size and
the dependence of the hash function on that size.

14 Operating Systems Chapter- 5: File Concepts


Allocation Methods

The direct access nature of disks allows flexibility in the implementation of files. The main problem here is how to
allocate space to these files so that disk space is utilized effectively and files can be accessed quickly. Three major
methods of allocating disk space are:
 Contiguous
 Linked
 Indexed

Free Space Management

Since disk space is limited, we should reuse the space from deleted files for new files. To keep track of free disk
space, the system maintains a free space list. The free space list records all free disk blocks – those not allocated
to some file or directory. This free space list can be implemented as one of the following:

a) Bit vector – free space list is implemented as a bit map or a bit vector. Each block is represented by one bit. If
the block is free, bit is 1, if the block is allocated, bit is 0. The main advantage of this approach is its relative
simplicity and its efficiency in finding the first free block or n consecutive free blocks on the disk. The calculation
of the block number is (Number of bits per word) * (number of 0-value words) + offset of first 1 bit
b) Linked list – Another approach to free space management is to link together all the free disk blocks keeping a
pointer to the first free block in a special location on the disk and caching it in memory. The first block contains a
pointer to the next free disk block.
c) Grouping – A modification of the free list approach is to store the addresses of n free blocks in the first free
block.
d) Counting – Another approach is to take advantage of the fact that several contiguous blocks may be allocated
or freed simultaneously when space is allocated with the contiguous allocation algorithm or clustering.

Performance

Most disk controllers include local memory to form an on board cache that is large enough to store entire tracks at
a time. Once a seek is performed, the track is read into the disk cache starting at the sector under the disk head. The
disk controller then transfers any sector requests to OS. Some systems maintain a separate section of main memory
for a buffer cache where blocks are kept under the assumption that they will be used again. Other systems cache
file data using a page cache. The page cache uses virtual memory techniques to cache file data as pages rather than
as a file system oriented blocks. Caching file data using virtual addresses is more efficient than caching through
physical disk blocks as accesses interface with virtual memory rather than the file system. Several systems use
page caching to cache both process pages and file data. This is known as unified buffer cache.

15 Operating Systems Chapter- 5: File Concepts

You might also like