OS Chapter-5 Handout os
OS Chapter-5 Handout os
File Attributes
A file is referred to by its name. A name is usually a string of characters. When a file is named, it becomes
independent of the process, the user and even the system that created it. A file’s attributes vary from one OS to
another but consist of these –
Name: symbolic file name is the only information kept in human readable form.
Identifier: number which identifies the file within the file system; it is the non human readable name for the
file.
Type: information is needed for systems that support different types of files.
Location: this information is a pointer to a device and to the location of the file on that device.
Size: the current size of the file.
Protection: Access control information determines who can do reading, writing, executing etc.
Time, date and user identification: This information may be kept for creation, last modification and last use.
The information about all files is kept in the directory structure which resides on secondary storage. A
directory entry consists of the file’s name and its unique identifier. The identifier in turn locates the other file
attributes.
File Operations
A file is an abstract data type. OS can provide system calls to create, write, read, reposition, delete and truncate
files.
Creating a file – First space in the file system must be found for the file. Second, an entry for the new file
must be made in the directory.
Writing a file – To write a file, specify both the name of the file and the information to be written to the
file. The system must keep a write pointer to the location in the file where the next write is to take place.
Reading a file – To read from a file, directory is searched for the associated entry and the system needs to
keep a read pointer to the location in the file where the next read is to take place. Because a process is
either reading from or writing to a file, the current operation location can be kept as a per process current
file position pointer.
Other common operations include appending new information to the end of an existing file and renaming an
existing file. We may also need operations that allow the user to get and set the various attributes of a file.
Most of the file operations mentioned involve searching the directory for the entry associated with the named file.
To avoid this constant search, many systems require that an open ( ) system call be made before a file is first used
actively. OS keeps a small table called the open file table containing information about all open files.
When a file operation is requested, the file is specified via an index into this table so no searching is required.
When the file is no longer being actively used, it is closed by the process and the OS removes its entry from the
open file table. Create and delete are system calls that work with closed files.
The open ( ) operation takes a file name and searches the directory copying the directory entry into the open file
table. The open ( ) call can also accept access mode information – create, read – only, read – write, append – only,
etc. This mode is checked against file’s permissions. If the request mode is allowed, the file is opened for the
process. The open ( ) system call returns a pointer to the entry in the open file table. This pointer is used in all I/O
operations avoiding any further searching and simplifying the system call interface.
OS uses two levels of internal tables – a per process table and a system wide table. The per process table tracks all
files that a process has open. Stored in this table is information regarding the use of the file by the process.
Each entry in the per process table points to a system wide open file table. The system wide table contains process
independent information. Once a file has been opened by one process, the system wide table includes an entry for
the file. The open file table also has an open count associated with each file to indicate how many processes have
the file open.
File pointer – System must keep track of the last read – write location as a current file position pointer.
File open count – As files are closed, OS must reuse its open file entries or it could run out of space in the table.
File open counter tracks the number of opens and closes and reaches zero on the last close.
Disk location of the file – The information needed to locate the file on disk is kept in memory so that the system
does not have to read it from disk for each operation.
Access rights – Each process opens a file a file in an access mode. This information is stored on the per process
table so the OS can allow or deny subsequent I/O requests.
Some OS’s provide facilities for locking an open file. File locks allow one process to lock a file and prevent other
processes from gaining access to it. File locks are useful for files that are shared by several processes. A shared
lock is where several processes can acquire the lock concurrently. An exclusive lock is where only one process at
a time can acquire such a lock.
Also some OS’s may provide either mandatory or advisory file locking mechanisms. If a lock is mandatory,
then once a process acquires an exclusive lock, the OS will prevent any other process from accessing the
locked file. If the lock scheme is mandatory, OS ensures locking integrity.
File types
A common technique for implementing file types is to include the type as part of the file name. The name is
split into two parts – a name and an extension separated by a period character. The system uses the extension
to indicate the type of the file and the type of operations that can be done on that file.
File structure
File types can be used to indicate the internal structure of the file. Source and object files have structures that
match the expectations of the programs that read them. Certain files conform to a required structure that is
understood by OS. But the disadvantage of having the OS support multiple file structures is that the resulting
size of the OS is cumbersome. If the OS contains five different file structures, it needs to contain the code to
support these file structures. Hence some OS’s impose a minimal number of file structures. MAC OS also
supports a minimal number of file structures. It expects files to contain two parts – a resource fork and a data
fork. The resource fork contains information of interest to the user. The data fork contains program code or
data – traditional file contents.
Internally locating an offset within a file can be complicated for the OS. Disk systems have a well defined block
size determined by the size of the sector. All disk I/O is performed in units of one block and all blocks are the same
Access methods
Files store information. When it is used, this information must be accessed and read into computer memory. The
information in the file can be accessed in several ways. They are –
Sequential access: Simplest method. Information in the file is processed in order that is one record after
the other. This method is based on a tape model of a file and works as well on sequential access devices as
it does on random access
Direct access: Another method is direct access or relative access. A file is made up of fixed length logical
records that allow programs to read and write records rapidly in no particular order. The direct access
method is based on a disk model of a file since disks allow random access to any file block. Direct access
files are of great use for immediate access to large amounts of information. In this method, file operations
must be modified to include block number as a parameter.
Other Access Methods: Other access methods can be built on top of a direct access method. These
methods generally involve the construction of an index for the file. This index contains pointers to the
various blocks. To find a record in the file, first search the index and then use the pointer to access the file
directly and to find the desired record.
Directory Structure
Systems may have zero or more file systems and the file systems may be of varying types. Organizing millions of
files involves use of directories. Storage Structure A disk can be used in its entirety for a file system. But at times,
it is desirable to place multiple file systems on a disk or to use parts of a disk for a file system and other parts for
other things. These parts are known variously as partitions, slices or minidisks. A file system can be created on
each of these parts of the disk. These parts can be combined together to form larger structures known as volumes
and file systems can be created on these too. Each volume can be thought of as a virtual disk. Volumes can also
store multiple OS’s allowing a system to boot and run more than one. Each volume that contains a file system must
also contain information about the files in the system. This information is kept in entries in a device directory or
volume table of contents. The device directory/directory records information for all files on that volume.
4 Operating Systems Chapter- 5: File Concepts
Directory Overview
The directory can be viewed as a symbol table that translates file names into their directory entries. The operations
that can be performed on the directory are:
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
Root of the tree is MFD. Its direct descendants are UFDs. The descendants of the UFDs are the files themselves.
The files are the leaves of the tree.
The sequence of directories searched when a file is names is called the search path.
Although the two level directory structures solve the name collision problem, it still has disadvantages. This
structure isolates on user from another. Isolation is an advantage when the users are completely independent but a
disadvantage when the users want to cooperate on some task and to access one another’s files.
Path names can be of two types – absolute and relative. An absolute path name begins at the root and follows a
path down to the specified file giving the directory names on the path. A relative path name defines a path from the
current directory.
Deletion of directory under tree structured directory – If a directory is empty, its entry in the directory that contains
it can simply be deleted. If the directory to be deleted is not empty, then use one of the two approaches –
A path to a file in a tree structured directory can be longer than a path in a two level directory.
With a shared file, only one actual file exists. Sharing is particularly important for subdirectories. Shared files and
subdirectories can be implemented in several ways. One way is to create a new directory entry called a link. A link
is a pointer to another file or subdirectory. Another approach in implementing shared files is to duplicate all
information about them in both sharing directories.
A problem with using an acyclic graph structure is ensuring that there are no cycles.
The primary advantage of an acyclic graph is the relative simplicity of the algorithms to traverse the graph and to
determine when there are no more references to a file. If cycles are allowed to exist in the directory, avoid
searching any component twice. A similar problem exists when we are trying to determine when a file can be
deleted. The difficulty is to avoid cycles as new links are added to the structure.
File Sharing
File sharing is desirable for users who want to collaborate and to reduce the effort required to achieve a computing
goal.
Multiple users
When an OS accommodates multiple users, the issues of file sharing, file naming and file protection become
preeminent. System mediates file sharing. The system can either allow a user to access the files of other users by
default or require that a user specifically grant access to the files.
Failure Modes
Local file systems can fail for a variety of reasons including failure of the disk containing the file system,
corruption of the delivery structure or other disk management information, disk controller failure, cable failure and
host adapter failure. User or system administrator failure can also cause files to be lost or entire directories or
volumes to be deleted. Many of these failures will cause a host to crash and an error condition to be displayed and
human intervention will be required to repair the damage. Remote fail systems have even more failure modes. In
the case of networks, the network can be interrupted between two hosts. Such interruption can result from
hardware failure, poor hardware configuration or networking implementation issues. For a recovery from a failure,
some kind of state information may be maintained on both the client and server.
Consistency semantics
These represent an important criterion for evaluating any file system that supports file sharing. These semantics
specify how multiple users of a system are to access a shared file simultaneously. These are typically implemented
as code with the file system.
Types of Access
Complete protection to files can be provided by prohibiting access. Systems that do not permit access to the files of
other users do not need protection. Both these approaches are extreme. Hence controlled access is required.
Protection mechanisms provide controlled access by limiting the types of file access that can be made. Access is
permitted or denied depending on many factors. Several different types of operations may be controlled
Read
Write
Execute
Append
Delete
List
Access Control
The most common approach to the protection problem is to make access dependent on the identity of the user. The
most general scheme to implement identity- dependent access is to associate with each file and directory an access-
control list (ACL) specifying user names and the types of access allowed for each user.
This approach has the advantage of enabling complex access methodologies. The main problem with access lists is
their length. To condense the length of the access control list, many systems recognize three classifications of users
in connection with each file:
1. The number of passwords that a user needs to remember may become large making the scheme impractical.
2. If only one password is used for all the files, then once it is discovered, all files are accessible.
A disk can be rewritten in place; it is possible to read a block from the disk, modify the block and write it back into
the same place.
A disk an access directly any given block of information it contains. It is simple to access any file sequentially or
randomly and switching from one file to another requires only moving the read – write heads and waiting for the
disk to rotate.
To improve I/O efficiency, I/O transfers between memory and disk are performed in units of blocks. Each block
has one or more sectors. To provide efficient and convenient access to the disk, OS imposes one or more file
systems to allow the data to be stored, located and retrieved easily. The file system is composed of many different
levels –
Each level in the design uses the features of lower levels to create new features for use by higher levels.
The lowest levels, I/O control consists of device drivers and interrupt handlers to transfer information between the
main memory and the disk system. The basic file system needs to issue generic commands to appropriate device
driver to read and write physical blocks on the disk.
The file organization module knows about files and their logical blocks as well as physical blocks.
The logical file system manages metadata information. Metadata includes all of the file system structure except the
actual data.
A file control block contains information about the file including ownership, permissions and location of the file
contents.
OS’s implement open( ) and close( ) system calls for processes to request access to file contents.
Overview
Several on disk and in memory structures are used to implement a file system. These structures vary depending on
the OS and the file system. File system may contain information such as:
Boot control block - In UFS, it is called the boot block; in NTFS it is partition boot sector.
Volume control block - In UFS, it is called a super block; in NTFS it is stored in the master file table
A directory structure per file system is used to organize the files. In UFS, this includes file names and
associated I node numbers. In NTFS, it is stored in master file table.
A per fie FCB contains many details about the file, including file permissions, ownership, size and location
of data blocks. In UFS, it is called the inode. In NTFS this is stored within the master file table which uses
a relational database structure.
An in memory mount table contains information about each mounted volume
An in memory directory structure cache holds the directory information of recently accessed directories.
The system wide open file table contains a copy of the FCB of each open file
The per process open file table contains a pointer to the appropriate entry in the system wide open file table.
Separates file system generic operations from their implementation by defining a clean VFS interface.
VFS provides a mechanism for uniquely representing a file throughout a network.VFS is based on a file
representation structure called v node that contains a numerical designator for a network wide unique file.
Thus, VFS distinguishes local files from remote ones and local files are further distinguished according to their file
system types.
Directory Implementation
The selection of directory allocation and directory management algorithms significantly affects the efficiency,
performance and reliability of the file system.
Linear List
The simplest method of implementing a directory is to use a linear list of file names with pointers to the data
blocks. This method is simple to program but time consuming to execute. The real disadvantage of a linear list of
directory entries is that finding a file requires a linear search.
Hash Table
Another data structure used for a file directory is a hash table. With this method, a linear list stores the directory
entries but a hash data structure is also used. The hash table takes a value computed from the file name and returns
a pointer to the file name in the linear list. The major difficulties with a hash table are its generally fixed size and
the dependence of the hash function on that size.
The direct access nature of disks allows flexibility in the implementation of files. The main problem here is how to
allocate space to these files so that disk space is utilized effectively and files can be accessed quickly. Three major
methods of allocating disk space are:
Contiguous
Linked
Indexed
Since disk space is limited, we should reuse the space from deleted files for new files. To keep track of free disk
space, the system maintains a free space list. The free space list records all free disk blocks – those not allocated
to some file or directory. This free space list can be implemented as one of the following:
a) Bit vector – free space list is implemented as a bit map or a bit vector. Each block is represented by one bit. If
the block is free, bit is 1, if the block is allocated, bit is 0. The main advantage of this approach is its relative
simplicity and its efficiency in finding the first free block or n consecutive free blocks on the disk. The calculation
of the block number is (Number of bits per word) * (number of 0-value words) + offset of first 1 bit
b) Linked list – Another approach to free space management is to link together all the free disk blocks keeping a
pointer to the first free block in a special location on the disk and caching it in memory. The first block contains a
pointer to the next free disk block.
c) Grouping – A modification of the free list approach is to store the addresses of n free blocks in the first free
block.
d) Counting – Another approach is to take advantage of the fact that several contiguous blocks may be allocated
or freed simultaneously when space is allocated with the contiguous allocation algorithm or clustering.
Performance
Most disk controllers include local memory to form an on board cache that is large enough to store entire tracks at
a time. Once a seek is performed, the track is read into the disk cache starting at the sector under the disk head. The
disk controller then transfers any sector requests to OS. Some systems maintain a separate section of main memory
for a buffer cache where blocks are kept under the assumption that they will be used again. Other systems cache
file data using a page cache. The page cache uses virtual memory techniques to cache file data as pages rather than
as a file system oriented blocks. Caching file data using virtual addresses is more efficient than caching through
physical disk blocks as accesses interface with virtual memory rather than the file system. Several systems use
page caching to cache both process pages and file data. This is known as unified buffer cache.