Unit-IV - Storage Management
Unit-IV - Storage Management
Mass Storage System - Disk Structure - Disk Scheduling and Management; File-System Interface - File
concept - Access methods - Directory Structure - Directory organization - File system mounting - File
Sharing and Protection; File System Implementation - File System Structure - Directory implementation
- Allocation Methods - Free Space Management; I/O Systems - I/O Hardware, Application I/O interface,
Kernel I/O subsystem.
● One use for SSDs is in storage arrays, where they hold file-system meta data that require high
performance.
● SSDs are also used in some laptop computers to make them smaller, faster, and more energy-
efficient.
● Because SSDs can be much faster than magnetic disk drives, standard bus interfaces can cause a
major limit on throughput.
● Some SSDs are designed to connect directly to the system bus (PCI, for example).
● SSDs are changing other traditional aspects of computer design as well.
● Some systems use them as a direct replacement for disk drives, while others use them as a new cache
tier, moving data between magnetic disks, SSDs, and memory to optimize performance.
Magnetic Tapes
● Magnetic tape was used as an early secondary-storage medium.
● Tapes are used mainly for backup, for storage of infrequently used information, and as a medium for
transferring information from one system to another.
● A tape is kept in a spool and is wound or rewound past a read-write head.
● Moving to the correct spot on a tape can take minutes, but once positioned, tape drives can write
data at speeds comparable to disk drives.
● Tape capacities vary greatly, depending on the particular kind of tape drive, with current capacities
exceeding several terabytes.
● Some tapes have built-in compressions that can more than double the effective storage.
DISK SCHEDULING
Some important terms related to disk scheduling:
Seek Time
Seek time is the time taken in locating the disk arm to a specified track where the read/write request will be
satisfied.
Rotational Latency
It is the time taken by the desired sector to rotate itself to the position from where it can access the R/W
heads.
Transfer Time
It is the time taken to transfer the data.
Disk Access Time
Disk access time is given as,
Disk Access Time = Rotational Latency + Seek Time + Transfer Time
Disk Scheduling
Disk scheduling is done by operating systems to schedule I/O requests arriving for the disk. Disk scheduling
is also known as I/O scheduling. Disk scheduling is important because:
● Multiple I/O requests may arrive by different processes and only one I/O request can be served at a
time by the disk controller. Thus other I/O requests need to wait in the waiting queue and need to be
scheduled.
● Two or more requests may be far from each other so can result in greater disk arm movement.
● Hard drives are one of the slowest parts of the computer system and thus need to be accessed in an
efficient manner.
There are many disk scheduling algorithms that provide the total head movement for various requests to the
disk. Here are the types of disk scheduling algorithms
FCFS Scheduling
It is the simplest Disk Scheduling algorithm. It services the IO requests in the order in which they arrive.
There is no starvation in this algorithm, every request is serviced.
Example: A disk queue with requests for I/O to blocks on cylinders in that order 98, 183, 37, 122, 14, 124,
65, 67, and head starts at 53
● If the disk head is initially at cylinder 53, it will first move from 53 to 98, then to 183, 37, 122, 14,
124, 65, and finally to 67, for a total head movement of 640 cylinders.
● The wild swing from 122 to 14 and then back to 124 illustrates the problem with this schedule.
● If the requests for cylinders 37 and 14 could be serviced together, before or after the requests for 122
and 124, the total head movement could be decreased substantially, and performance could be
thereby improved.
SSTF Scheduling
Shortest seek time first (SSTF) algorithm selects the disk I/O request which requires the least disk arm
movement from its current position regardless of the direction. It reduces the total seek time as compared to
FCFS.
It allows the head to move to the closest track in the service queue.
Example: Consider the above request queue, the closest request to the initial head position (53) is at cylinder
65 and head starts at 53.
SCAN Scheduling
It is also called as Elevator Algorithm. In this algorithm, the disk arm moves into a particular direction till
the end, satisfying all the requests coming in its path, and then it turns backward moves in the reverse
direction satisfying requests coming in its path.
It works in the way an elevator works, elevator moves in a direction completely till the last floor of that
direction and then turns back.
● Assuming that the disk arm is moving toward 0 and that the initial head position is again 53, the head
will next service 37 and then 14.
● At cylinder 0, the arm will reverse and will move toward the other end of the disk, servicing the
requests at 65, 67, 98, 122, 124, and 183.
● If a request arrives in the queue just in front of the head, it will be serviced almost immediately; a
request arriving just behind the head will have to wait until the arm moves to the end of the disk,
reverses direction, and comes back.
C-SCAN Scheduling
In C-SCAN algorithm, the arm of the disk moves in a particular direction servicing requests until it reaches
the last cylinder, then it jumps to the last cylinder of the opposite direction without servicing any request
then it turns back and start moving in that direction servicing the remaining requests.
C Look Scheduling
C Look Algorithm is similar to C-SCAN algorithm to some extent. In this algorithm, the arm of the disk
moves outwards servicing requests until it reaches the highest request cylinder, then it jumps to the lowest
request cylinder without servicing any request then it again start moving outwards servicing the remaining
requests.
The selection of the disk scheduling algorithm depends on several factors, including the characteristics of
the workload, system performance requirements, and the available resources. The operating system must
evaluate the different disk scheduling algorithms based on the evaluation criteria and select the algorithm
that best meets the system's requirements.
Advantages
Improved performance − Disk scheduling algorithms ensure that data is accessed in the most efficient
manner possible, which improves the performance of the system.
Fairness − Disk scheduling algorithms ensure that all requests for data access are treated fairly and given a
chance to be processed.
Mr. N.Thanigaivel, AP/CSE 6
UNIT IV STORAGE MANAGEMENT CSE
Reduced disk fragmentation − Disk scheduling algorithms can help to reduce disk fragmentation by
accessing data in a more organized manner.
Disadvantages
Overhead − Disk scheduling algorithms can create overhead and delay in processing data access requests,
which can reduce overall system performance.
Complexity − Some disk scheduling algorithms can be complex and difficult to understand, which may
make it difficult to optimize system performance.
Risk of starvation − Disk scheduling algorithms can result in starvation of certain requests, which can lead
to inefficiencies and reduced system performance.
DISK MANAGEMENT
The operating system is also responsible for several other aspects of disk management.
Disk Formatting
● Before a disk can be used, it has to be low-level formatted, which means laying down all of the headers
and trailers demarking the beginning and ends of each sector. Included in the header and trailer are the
linear sector numbers, and error-correcting codes, ECC which allow damaged sectors to not only be
detected, but in many cases for the damaged data to be recovered (depending on the extent of the
damage). Sector sizes are traditionally 512 bytes, but may be larger, particularly in larger drives.
● ECC calculation is performed with every disk read or write, and if damage is detected but the data is
recoverable, then a soft error has occurred. Soft errors are generally handled by the on-board disk
controller, and never seen by the OS.
● Once the disk is low-level formatted, the next step is to partition the drive into one or more separate
partitions. This step must be completed even if the disk is to be used as a single large partition, so that
the partition table can be written to the beginning of the disk.
● After partitioning, then the filesystems must be logically formatted, which involves laying down the
master directory information (FAT table or inode structure), initializing free lists, and creating at least
the root directory of the filesystem. (Disk partitions which are to be used as raw devices are not
logically formatted. This saves the overhead and disk space of the filesystem structure, but requires that
the application program manage its own disk storage requirements).
Boot Block
● Computer ROM contains a bootstrap program (OS independent) with just enough code to find the first
sector on the first hard drive on the first controller, load that sector into memory, and transfer control
over to it. (The ROM bootstrap program may look in floppy and/or CD drives before accessing the hard
drive, and is smart enough to recognize whether it has found valid boot code or not).
● The first sector on the hard drive is known as the Master Boot Record, MBR and contains a very small
amount of code in addition to the partition table. The partition table documents how the disk is
partitioned into logical disks, and indicates specifically which partition is the active or boot partition.
● The boot program then looks to the active partition to find an operating system, possibly loading up a
slightly larger / more advanced boot program along the way.
● In a dual-boot (or larger multi-boot) system, the user may be given a choice of which operating system
to boot, with a default action to be taken in the event of no response within some time frame.
● Once the kernel is found by the boot program, it is loaded into memory and then control is transferred
over to the OS. The kernel will normally continue the boot process by initializing all important kernel
data structures, launching important system services (e.g. network daemons, sched, init, etc.), and
finally providing one or more login prompts. Boot options at this stage may include single-user a.k.a.
maintenance or safe modes, in which very few system services are started - These modes are designed
for system administrators to repair problems or otherwise maintain the system.
Bad Blocks
● No disk can be manufactured to 100% perfection, and all physical objects wear out over time. For these
reasons all disks are shipped with a few bad blocks, and additional blocks can be expected to go bad
slowly over time. If a large number of blocks go bad then the entire disk will need to be replaced, but a
few here and there can be handled through other means.
● In the old days, bad blocks had to be checked for manually. Formatting of the disk or running certain
disk-analysis tools would identify bad blocks, and attempt to read the data off of them one last time
through repeated tries. Then the bad blocks would be mapped out and taken out of future service.
Sometimes the data could be recovered, and sometimes it was lost forever. (Disk analysis tools could be
either destructive or non-destructive).
● Modern disk controllers make much better use of the error-correcting codes, so that bad blocks can be
detected earlier and the data usually recovered. (Recall that blocks are tested with every write as well as
with every read, so often errors can be detected before the write operation is complete and the data
simply written to a different sector instead).
● Note that re-mapping of sectors from their normal linear progression can throw off the disk scheduling
optimization of the OS, especially if the replacement sector is physically far away from the sector it is
replacing. For this reason most disks normally keep a few spare sectors on each cylinder, as well as at
least one spare cylinder. Whenever possible a bad sector will be mapped to another sector on the same
cylinder, or at least a cylinder as close as possible. Sector slipping may also be performed, in which all
sectors between the bad sector and the replacement sector are moved down by one, so that the linear
progression of sector numbers can be maintained.
● If the data on a bad block cannot be recovered, then a hard error has occurred, which requires
replacing the file(s) from backups, or rebuilding them from scratch.
Advantages:
● With the help of swapping we can manage many processes within the same RAM.
● Swapping helps to create the virtual memory.
● Swapping is economical.
FILE CONCEPT
A file is a named collection of related information that is recorded on secondary storage such as magnetic
disks, magnetic tapes and optical disks.
A file is a sequence of bits, bytes, lines or records whose meaning is defined by the files creator and user.
The collection of files is known as Directory.
The collection of directories at the different levels is known as File System.
File Structure
A file has a certain defined structure according to its type.
● A text file is a sequence of characters organized into lines - txt, doc
● A source file is a sequence of procedures and functions - c, cpp, java
● An object file is a sequence of bytes organized into blocks that are understandable by the machine –
obj, o
● An Executable file ready to run machine language program - exe, com, bin
● Batch file commands to the command interpreter - bat, sh
● Word Processor file various word processor formats - wp, tex, rrf, doc
● Archive file related files grouped into one compressed file - arc, zip, tar
● Multimedia file containing audio/video information - mpeg, mov, rm, mp3,avi
File Attributes
A file’s attributes vary from one operating system to another but typically consist of these:
● Name - The symbolic file name is the only information kept in human-readable form.
● Identifier- This unique tag, usually a number, identifies the file within the file system; it is the non-
human-readable name for the file.
● Type - This information is needed for systems that support different types of files.
Mr. N.Thanigaivel, AP/CSE 9
UNIT IV STORAGE MANAGEMENT CSE
● Location - This information is a pointer to a device and to the location of the file on that device.
● Size - The current size of the file (in bytes, words, or blocks) and possibly the maximum allowed size
are included in this attribute.
● Protection - Access-control information determines, who can do reading, writing, executing, and so
on.
● Time, date, and user identification - This information may be kept for creation, last modification,
and last use. These data can be useful for protection, security, and usage monitoring.
File Operations
A file is an abstract data type. The operations that can be performed on files are:
● Create Creation of the file is the most important operation on the file. Different types of files are
created by different methods for example text editors are used to create a text file, word processors
are used to create a word file and Image editors are used to create the image files.
● Write Writing the file is different from creating the file. The OS maintains a write pointer for every
file which points to the position in the file from which, the data needs to be written.
● Read Every file is opened in three different modes : Read, Write and append. A Read pointer is
maintained by the OS, pointing to the position up to which, the data has been read.
● Re-position Re-positioning is simply moving the file pointers forward or backward depending upon
the user's requirement. It is also called as seeking.
● Delete Deleting the file will not only delete all the data stored inside the file, It also deletes all the
attributes of the file. The space which is allocated to the file will now become available and can be
allocated to the other files.
● Truncate Truncating is simply deleting the file except deleting attributes. The file is not completely
deleted although the information stored inside the file gets replaced.
Sequential Access
Most of the operating systems access the file sequentially.
In sequential access, the OS read the file word by word. A pointer is maintained which initially points to the
base address of the file. If the user wants to read first word of the file then the pointer provides that word to
the user and increases its value by 1 word. This process continues till the end of the file.
Direct Access
The Direct Access is mostly required in the case of database systems. In most of the cases, we need filtered
information from the database. The sequential access can be very slow and inefficient in such cases.
Suppose every block of the storage stores 4 records and we know that the record we needed is stored in 10th
block. In that case, the sequential access will not be implemented because it will traverse all the blocks in
order to access the needed record.
Direct access will give the required result despite of the fact that the operating system has to perform some
complex tasks such as determining the desired block number. However, that is generally implemented in
database applications.
Indexed Access
If a file can be sorted on any of the filed then an index can be assigned to a group of certain records.
However, A particular record can be accessed by its index. The index is nothing but the address of a record
in the file.
In index accessing, searching in a large database became very quick and easy but we need to have some
extra space in the memory to store the index value.
DIRECTORY STRUCTURE
Collection of files is a file is called Directory.
Directory can be defined as the listing of the related files on the disk. The directory contains information
about the files, including attributes, location and ownership.
A hard disk can be divided into the number of partitions of different sizes. The partitions are also called
volumes or mini disks.
Each partition must have at least one directory in which, all the files of the partition can be listed. A
directory entry is maintained for each file in the directory which stores all the information related to that file.
A directory can be viewed as a file which contains the Meta data of the bunch of files.
Directory operations:
File Creation New files need to be created and added to the directory
Search for the file A particular file in a directory by their names.
File deletion When a file is no longer needed, we want to be able to remove it from the directory.
Renaming the file The name of a file represents its contents to its users, we must be able to change the
name when the contents or use of the file changes. Renaming a file may also allow its position within the
directory structure to be changed.
Traversing Files We may wish to access every directory and every file within a directory structure. For
reliability, it is a good idea to save the contents and structure of the entire file system at regular intervals.
Listing of files We need to be able to list the files in a directory and the contents of the directory entry for
each file in the list.
Advantages:
Efficiency: A file can be located more quickly.
Naming: It becomes convenient for users as two users can have same name for different files or may have
different name for same file.
Grouping: Logical grouping of files can be done by properties e.g. all java programs, all games etc.
DIRECTORY ORGANIZATION
Structures of Directory
A directory is a container that is used to contain folders and file. It organises files and folders into
hierarchical manner.
Single-Level Directory
Single level directory is simplest disectory structure.In it all files are contained in same directory which
make it easy to support and understand.
A single level directory has a significant limitation, however, when the number of files increases or when
the system has more than one user. Since all the files are in the same directory, they must have the unique
name. if two users call there dataset test, then the unique name rule violated.
Advantages:
● Since it is a single directory, so its implementation is very easy.
● If files are smaller in size, searching will faster.
● The operations like file creation, searching, deletion, updating are very easy in such a directory
structure.
Disadvantages:
● There may chance of name collision because two files can not have the same name.
● Searching will become time taking if directory will large.
● In this can not group the same type of files together.
Two-level directory
As we have seen, a single level directory often leads to confusion of files names among different users. the
solution to this problem is to create a separate directory for each user.
In the two-level directory structure, each user has there own user files directory (UFD). The UFDs has
similar structures, but each lists only the files of a single user. system’s master file directory (MFD) is
searches whenever a new user id is logged in. The MFD is indexed by username or account number, and
each entry points to the UFD for that user.
Advantages:
● We can give full path like /User-name/directory-name/.
● Diffrent users can have same directory as well as file name.
● Searching of files become more easy due to path name and user-grouping.
Disadvantages:
● A user is not allowed to share files with other users.
● Still it not very scalable, two files of the same type cannot be grouped together in the same user.
Tree-structured directory
Once we have seen a two-level directory as a tree of height 2, the natural generalization is to extend the
directory structure to a tree of arbitrary height.
This generalization allows the user to create there own subdirectories and to organise on their files
accordingly.
A tree structure is the most common directory structure. The tree has a root directory, and every file in the
system have a unique path.
Advantages:
● We can share files.
● Searching is easy due to different-different paths.
Disadvantages:
● We share the files via linking, in case of deleting it may create the problem,
● If the link is softlink then after deleting the file we left with a dangling pointer.
● In case of hardlink, to delete a file we have to delete all the reference associated with it.
Advantages:
● It allows cycles.
● It is more flexible than other directories structure.
Disadvantages:
● It is more costly than others.
● It needs garbage collection.
When you mount a file system, any files or directories in the underlying mount point directory are
unavailable as long as the file system is mounted. These files are not permanently affected by the mounting
process, and they become available again when the file system is unmounted. However, mount directories
are typically empty, because you usually do not want to obscure existing files.
For example, the figure below shows a local file system, starting with a root (/) file system and
subdirectories sbin, etc, and opt.
● Servers commonly restrict mount permission to certain trusted systems only. Spoofing ( a computer
pretending to be a different computer ) is a potential security risk.
● Servers may restrict remote access to read-only.
● Servers restrict which filesystems may be remotely mounted. Generally the information within those
subsystems is limited, relatively public, and protected by frequent backups.
The NFS ( Network File System ) is a classic example of such a system.
Failure Modes
● When a local disk file is unavailable, the result is generally known immediately, and is generally
non-recoverable. The only reasonable response is for the response to fail.
● However when a remote file is unavailable, there are many possible reasons, and whether or not it is
unrecoverable is not readily apparent. Hence most remote access systems allow for blocking or
delayed response, in the hopes that the remote system (or the network) will come back up eventually.
Consistency Semantics
Consistency Semantics deals with the consistency between the views of shared files on a networked system.
When one user changes the file, when do other users see the changes?
At first glance this appears to have all of the synchronization issues discussed in Chapter 6. Unfortunately
the long delays involved in network operations prohibit the use of atomic operations as discussed in that
chapter.
UNIX Semantics
The UNIX file system uses the following semantics:
● Writes to an open file are immediately visible to any other user who has the file open.
● One implementation uses a shared location pointer, which is adjusted for all sharing users.
The file is associated with a single exclusive physical resource, which may delay some accesses.
Session Semantics
The Andrew File System, AFS uses the following semantics:
● Writes to an open file are not immediately visible to other users.
● When a file is closed, any changes made become available only to users who open the file at a later
time.
According to these semantics, a file can be associated with multiple ( possibly different ) views. Almost no
constraints are imposed on scheduling accesses. No user is delayed in reading or writing their personal copy
of the file.
Immutable-Shared-Files Semantics
Under this system, when a file is declared as shared by its creator, it becomes immutable and the name
cannot be re-used for any other resource. Hence it becomes read-only, and shared access is simple.
FILE PROTECTION
When information is stored in a computer system, we want to keep it safe from physical damage (the issue
of reliability) and improper access (the issue of protection). Reliability is generally provided by duplicate
copies of files.
Many computers have systems programs that automatically (or through computer-operator intervention)
copy disk files to tape at regular intervals (once per day or week or month) to maintain a copy should a file
system be accidentally destroyed.
File systems can be damaged by hardware problems (such as errors in reading or writing), power surges or
failures, head crashes, dirt, temperature extremes, and vandalism.
Files may be deleted accidentally.
Protection can be provided in many ways. For a single-user laptop system, we might provide protection by
locking the computer in a desk drawer or file cabinet.
In a larger multiuser system, however, other mechanisms are needed.
Types of Access
Protection mechanisms provide controlled access by limiting the types of file access that can be made.
Access is permitted or denied depending on several factors, one of which is the type of access requested.
Several different types of operations may be controlled:
Read: Read from the file.
Write: Write or rewrite the file.
Execute: Load the file into memory and execute it.
Append: Write new information at the end of the file.
Delete: Delete the file and free its space for possible reuse.
List: List the name and attributes of the file.
Access Control
The most general scheme to implement identity-dependent access is to associate with each file and directory
an access-control list (ACL) specifying user names and the types of access allowed for each user.
When a user requests access to a particular file, the operating system checks the access list associated with
that file.
If that user is listed for the requested access, the access is allowed. Otherwise, a protection violation occurs,
and the user job is denied access to the file.
Three classifications of users
Owner: The user who created the file is the owner. Its value is 7.
Rea
Write Execute
d
1 1 1
Group: A set of users who are sharing the file and need similar access is a group. Its value is 6.
Rea
Write Execute
d
1 1 0
0 0 1
Other Protection Approaches
Another approach to the protection problem is to associate a password with each file. If the passwords are
chosen randomly and changed often, this scheme may be effective in limiting access to a file.
The use of passwords has a few disadvantages:
● First, the number of passwords that a user needs to remember may become large, making the scheme
impractical.
● Second, if only one password is used for all the files, then once it is discovered, all files are
accessible.
7.Explain in details about file system implementation.
FILE SYSTEM IMPLEMENTATION
File systems store several important data structures on the disk:
A boot-control block, ( per volume ) a.k.a. the boot block in UNIX or the partition boot sector in Windows
contains information about how to boot the system off of this disk. This will generally be the first sector of
the volume if there is a bootable system loaded on that volume, or the block will be left vacant otherwise.
A volume control block, ( per volume ) a.k.a. the master file table in UNIX or the superblock in Windows,
which contains information such as the partition table, number of blocks on each filesystem, and pointers to
free blocks and free FCB blocks.
A directory structure ( per file system ), containing file names and pointers to corresponding FCBs. UNIX
uses inode numbers, and NTFS uses a master file table.
The File Control Block, FCB, ( per file ) containing details about ownership, size, permissions, dates, etc.
UNIX stores this information in inodes, and NTFS in the master file table as a relational database structure.
When a new file is created, a new FCB is allocated and filled out with important information regarding the
new file. The appropriate directory is modified with the new file name and FCB information.
When a file is accessed during a program, the open( ) system call reads in the FCB information from disk,
and stores it in the system-wide open file table. An entry is added to the per-process open file table
referencing the system-wide table, and an index into the per-process table is returned by the open( ) system
call. UNIX refers to this index as a file descriptor, and Windows refers to it as a file handle.
If another process already has a file open when a new request comes in for the same file, and it is sharable,
then a counter in the system-wide table is incremented and the per-process table is adjusted to point to the
existing entry in the system-wide table.
When a file is closed, the per-process table entry is freed, and the counter in the system-wide table is
decremented. If that counter reaches zero, then the system wide table is also freed. Any data currently stored
in memory cache for this file is written out to disk if necessary.
In-memory file system structures. (a) File open (b) File read
errors or inconsistencies, either because they are flagged as not having been closed properly the last time
they were used, or just for general principals. Filesystems may be mounted either automatically or manually.
Most of the Operating Systems use layering approach for every task including file systems. Every layer of
the file system is responsible for some activities.
The image shown below, elaborates how the file system is divided in different layers, and also the
functionality of each layer.
When an application program asks for a file, the first request is directed to the logical file system. The
logical file system contains the Meta data of the file and directory structure. If the application program
doesn't have the required permissions of the file then this layer will throw an error. Logical file systems also
verify the path to the file.
Generally, files are divided into various logical blocks. Files are to be stored in the hard disk and to be
retrieved from the hard disk. Hard disk is divided into various tracks and sectors. Therefore, in order to store
and retrieve the files, the logical blocks need to be mapped to physical blocks. This mapping is done by File
organization module. It is also responsible for free space management.
Once File organization module decided which physical block the application program needs, it passes this
information to basic file system. The basic file system is responsible for issuing the commands to I/O
control in order to fetch those blocks.
I/O controls contain the codes by using which it can access hard disk. These codes are known as device
drivers. I/O controls are also responsible for handling interrupts.
case of every operation (creation, deletion, updating, etc) on the files therefore the systems become
inefficient.
To create a new file, we must first search the directory to be sure that no existing file has the same name.
Then, we add a new entry at the end of the directory.
To delete a file, we search the directory for the named file and then release the space allocated to it.
To reuse the directory entry, we can do one of several things. We can mark the entry as unused (by
assigning it a special name, such as an all-blank name, or by including a used– unused bit in each entry), or
we can attach it to a list of free directory entries.
Hash Table
To overcome the drawbacks of singly linked list implementation of directories, there is an alternative
approach that is hash table. This approach suggests to use hash table along with the linked lists.
A key-value pair for each file in the directory gets generated and stored in the hash table. The key can be
determined by applying the hash function on the file name while the key points to the corresponding file
stored in the directory.
Now, searching becomes efficient due to the fact that now, entire list will not be searched on every
operating. Only hash table entries are checked using the key and if an entry found then the corresponding
file will be fetched using the value.
Contiguous Allocation
If the blocks are allocated to the file in such a way that all the logical blocks of the file get the
contiguous physical block in the hard disk then such allocation scheme is known as contiguous allocation.
In the image shown below, there are three files in the directory. The starting block and the length of each file
are mentioned in the table. We can check in the table that the contiguous blocks are assigned to each file as
per its need.
Advantages
● It is simple to implement.
● We will get Excellent read performance.
● Supports Random Access into files.
Disadvantages
● The disk will become fragmented.
● It may be difficult to have a file grow.
Linked Allocation
Linked List allocation solves all problems of contiguous allocation. In linked list allocation, each file
is considered as the linked list of disk blocks. However, the disks blocks allocated to a particular file need
not to be contiguous on the disk. Each disk block allocated to a file contains a pointer which points to the
next disk block allocated to the same file.
Advantages
● There is no external fragmentation with linked allocation.
● Any free block can be utilized in order to satisfy the file block requests.
● File can continue to grow as long as the free blocks are available.
● Directory entry will only contain the starting block address.
Disadvantages
● Random Access is not provided.
● Pointers require some space in the disk blocks.
● Any of the pointers in the linked list must not be broken otherwise the file will get corrupted.
● Need to traverse each block.
Indexed Allocation
Mr. N.Thanigaivel, AP/CSE 27
UNIT IV STORAGE MANAGEMENT CSE
Instead of maintaining a file allocation table of all the disk pointers, Indexed allocation scheme stores
all the disk pointers in one of the blocks called as indexed block. Indexed block doesn't hold the file data,
but it holds the pointers to all the disk blocks allocated to that particular file. Directory entry will only
contain the index block address.
Advantages
● Supports direct access
● A bad data block causes the lost of only that block.
Disadvantages
● A bad index block could cause the lost of entire file.
● Size of a file depends upon the number of pointers, a index block can hold.
● Having an index block for a small file is totally wastage.
● More pointer overhead
Bit Vector
Frequently, the free-space list is implemented as a bit map or bit vector.
Each block is represented by 1 bit.
If the block is free, the bit is 1; if the block is allocated, the bit is 0.
Example:
Mr. N.Thanigaivel, AP/CSE 28
UNIT IV STORAGE MANAGEMENT CSE
Consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26, and 27 are free and the rest of the
blocks are allocated.
The free-space bit map would be 001111001111110001100000011100000...
The main advantage of this approach is its relatively simplicity and efficiency in finding the first free
block, or n consecutive free blocks on the disk.
The first non-0 word is scanned for the first 1 bit, which is the location of the first free block.
The calculation of the block number is
(number of bits per word) × (number of 0-value words) + offset of first 1 bit.
Linked List
Another approach to free-space management is to link together all the free disk blocks, keeping a pointer to
the first free block in a special location on the disk and caching it in memory.
This first block contains a pointer to the next free disk block, and so on.
In our example, 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26, and 27 were free and the rest of the blocks
were allocated. In this situation, we would keep a pointer to block 2 as the first free block. Block 2 would
contain a pointer to block 3, which would point to block 4, which would point to block 5, which would point
to block 8, and so on.
However, this scheme is not efficient; to traverse the list, we must read each block, which requires
substantial I/O time. The FAT method incorporates free-block accounting data structure. No separate
method is needed.
Grouping
A modification of the free-list approach is to store the addresses of n free blocks in the first free block.
The first n-1 of these blocks are actually free.
The last block contains the addresses of another n free blocks, and so on.
The importance of this implementation is that the addresses of a large number of free blocks can be found
quickly.
Counting
We can keep the address of the first free block and the number n of free contiguous blocks that follow the
first block.
Each entry in the free-space list then consists of a disk address and a count.
Although each entry requires more space than would a simple disk address, the overall list will be shorter, as
long as the count is generally greater than1.
Some systems use variable size clusters depending on the file size.
The more data that is stored in a directory (e.g. last access time), the more often the directory blocks have to
be re-written.
As technology advances, addressing schemes have had to grow as well.
Kernel table sizes used to be fixed, and could only be changed by rebuilding the kernels. Modern tables are
dynamically allocated, but that requires more complicated algorithms for accessing them.
Performance
Disk controllers generally include on-board caching. When a seek is requested, the heads are moved into
place, and then an entire track is read, starting from whatever sector is currently under the heads (reducing
latency). The requested sector is returned and the unrequested portion of the track is cached in the disk's
electronics.
Some OSes cache disk blocks they expect to need again in a buffer cache.
A page cache connected to the virtual memory system is actually more efficient as memory addresses do
not need to be converted to disk block addresses and back again.
Some systems (Solaris, Linux, Windows 2000, NT, XP) use page caching for both process pages and file
data in a unified virtual memory.
Diagram shows the advantages of the unified buffer cache found in some versions of UNIX and Linux -
Data does not need to be stored twice, and problems of inconsistent buffer information are avoided.
I/O Without a Unified Buffer Cache I/O Using a Unified Buffer Cache
Page replacement strategies can be complicated with a unified cache, as one need to decide whether to
replace process or file pages, and how many pages to guarantee to each category of pages. Solaris, for
example, has gone through many variations, resulting in priority paging giving process pages priority over
file I/O pages, and setting limits so that neither can knock the other completely out of memory.
Another issue affecting performance is the question of whether to implement synchronous writes or
asynchronous writes.
Synchronous writes occur in the order in which the disk subsystem receives them, without caching.
Asynchronous writes are cached, allowing the disk subsystem to schedule writes in a more efficient order.
Metadata writes are often done synchronously. Some systems support flags to the open call requiring that
writes be synchronous, for example for the benefit of database systems that require their writes be performed
in a required order.
The type of file access can also have an impact on optimal page replacement policies. Sequential access files
often take advantage of two special policies:
● Free-behind frees up a page as soon as the next page in the file is requested, with the assumption
that we are now done with the old page and won't need it again for a long time.
● Read-ahead reads the requested page and several subsequent pages at the same time, with the
assumption that those pages will be needed in the near future.
The caching system and asynchronous writes speed up disk writes considerably, because the disk subsystem
can schedule physical writes to the disk to minimize head movement and disk seek times.
Consistency Checking
A Consistency Checker (fsck in UNIX, chkdsk or scandisk in Windows) is often run at boot time or mount
time, particularly if a filesystem was not closed down properly. Some of the problems that these tools look
for include:
● Disk blocks allocated to files and also listed on the free list.
● Disk blocks neither allocated to files nor on the free list.
● Disk blocks allocated to more than one file.
● The number of disk blocks allocated to a file inconsistent with the file's stated size.
● Properly allocated files / inodes which do not appear in any directory entry.
● Link counts for an inode not matching the number of references to that inode in the directory
structure.
● Two or more identical file names in the same directory.
● Illegally linked directories, e.g. cyclical relationships where those are not allowed, or files/directories
that are not accessible from the root of the directory tree.
● Consistency checkers will often collect questionable disk blocks into new files with names such as
chk00001.dat. These files may contain valuable information that would otherwise be lost, but in most
cases they can be safely deleted, (returning those disk blocks to the free list).
UNIX caches directory information for reads, but any changes that affect space allocation or metadata
changes are written synchronously, before any of the corresponding data blocks are written to.
The three major jobs of a computer are Input, Output, and Processing. In a lot of cases, the most important
job is Input / Output, and the processing is simply incidental.
For example, when you browse a web page or edit any file, our immediate attention is to read or enter some
information, not for computing an answer.
The primary role of the operating system in computer Input / Output is to manage and organize I/O
operations and all I/O devices.
Overview
The controlling of various devices that are connected to the computer is a key concern of operating-system
designers. This is because I/O devices vary so widely in their functionality and speed (for example a mouse,
a hard disk and a CD-ROM), varied methods are required for controlling them. These methods form the I/O
sub-system of the kernel of OS that separates the rest of the kernel from the complications of managing I/O
devices.
I/O Hardware
I/O devices can be roughly categorized as storage, communications, user-interface, and other
Devices communicate with the computer via signals sent over wires or through the air.
Devices connect with the computer via ports, e.g. a serial or parallel port.
A common set of wires connecting multiple devices is termed a bus.
Buses include rigid protocols for the types of messages that can be sent across the bus and the procedures for
resolving contention issues.
One way of communicating with devices is through registers associated with each port. Registers
Mr. N.Thanigaivel, AP/CSE 33
UNIT IV STORAGE MANAGEMENT CSE
may be one to four bytes in size, and may typically include (a subset of) the following four:
● The data-in register is read by the host to get input from the device.
● The data-out register is written by the host to send output.
● The status register has bits read by the host to ascertain the status of the device, such as
idle, ready for input, busy, error, transaction complete, etc.
● The control register has bits written by the host to issue commands or to change settings
of the device such as parity checking, word length, or full- versus half-duplex operation.
Polling
One simple means of device handshaking involves polling:
● The host repeatedly checks the busy bit on the device until it becomes clear.
● The host writes a byte of data into the data-out register, and sets the write bit in the command
register ( in either order. )
● The host sets the command ready bit in the command register to notify the device of the pending
command.
● When the device controller sees the command-ready bit set, it first sets the busy bit.
● Then the device controller reads the command register, sees the write bit set, reads the byte of data
Polling can be very fast and efficient, if both the device and the controller are fast and if there is
significant data to transfer. It becomes inefficient, however, if the host must wait a long time in the busy
loop waiting for the device, or if frequent checks need to be made for data that is infrequently there.
Interrupts
Interrupts allow devices to notify the CPU when they have data to transfer or when an operation is
complete, allowing the CPU to perform other duties when no I/O transfers need its immediate attention.
The CPU has an interrupt-request line that is sensed after every instruction.
A device's controller raises an interrupt by asserting a signal on the interrupt request line.
The CPU then performs a state save, and transfers control to the interrupt handler routine at a fixed
address in memory. (The CPU catches the interrupt and dispatches the interrupt handler).
The interrupt handler determines the cause of the interrupt, performs the necessary processing, performs a
state restore, and executes a return from interrupt instruction to return control to the CPU. (The interrupt
handler clears the interrupt by servicing the device).
The above description is adequate for simple interrupt-driven I/O, but there are three needs in modern
computing which complicate the picture:
● The need to defer interrupt handling during critical processing,
● The need to determine which interrupt handler to invoke, without having to poll all devices to see
Mr. N.Thanigaivel, AP/CSE 35
UNIT IV STORAGE MANAGEMENT CSE
Most devices can be characterized as either block I/O, character I/O, memory mapped file access, or
network sockets. A few devices are special, such as time-of-day clock and the system timer.
Most OSes also have an escape, or back door, which allows applications to send commands directly to
device drivers if needed. In UNIX this is the ioctl() system call (I/O Control). Ioctl() takes three arguments -
The file descriptor for the device driver being accessed, an integer indicating the desired function to be
performed, and an address used for communicating or transferring additional information.
Block and Character Devices
Block devices are accessed a block at a time, and are indicated by a "b" as the first character in a long listing
on UNIX systems. Operations supported include read( ), write( ), and seek( ).
● Accessing blocks on a hard drive directly (without going through the filesystem structure) is called
raw I/O, and can speed up certain operations by bypassing the buffering and locking normally
conducted by the OS. (It then becomes the application's responsibility to manage those issues).
● A new alternative is direct I/O, which uses the normal filesystem access, but which disables
buffering and locking operations.
Memory-mapped file I/O can be layered on top of block-device drivers.
● Rather than reading in the entire file, it is mapped to a range of memory addresses, and then paged
into memory as needed using the virtual memory system.
● Access to the file is then accomplished through normal memory accesses, rather than through read( )
and write( ) system calls. This approach is commonly used for executable program code.
Character devices are accessed one byte at a time, and are indicated by a "c" in UNIX long listings.
Supported operations include get( ) and put( ), with more advanced functionality such as reading an entire
line supported by higher-level library routines.
Network Devices
Because network access is inherently different from local disk access, most systems provide a separate
interface for network devices.
One common and popular interface is the socket interface, which acts like a cable or pipeline connecting
two networked entities. Data can be put into the socket at one end, and read out sequentially at the other end.
Sockets are normally full-duplex, allowing for bi-directional data transfer.
The select( ) system call allows servers (or other applications) to identify sockets which have data waiting,
without having to poll all available sockets.
On most systems the system clock is implemented by counting interrupts generated by the PIT.
Unfortunately this is limited in its resolution to the interrupt frequency of the PIT, and may be subject to
some drift over time. An alternate approach is to provide direct access to a high frequency hardware counter,
which provides much higher resolution and accuracy, but which does not support interrupts.
Mr. N.Thanigaivel, AP/CSE 39
UNIT IV STORAGE MANAGEMENT CSE
I/O Scheduling
Scheduling I/O requests can greatly improve overall efficiency. Priorities can also play a part in request
scheduling.
Buffering and caching can also help, and can allow for more flexible scheduling options.
On systems with many devices, separate request queues are often kept for each device:
Device-status table
Buffering
Buffering of I/O is performed for (at least) 3 major reasons:
● Speed differences between two devices. A slow device may write data into a buffer, and when the
buffer is full, the entire buffer is sent to the fast device all at once. So that the slow device still has
somewhere to write while this is going on, a second buffer is used, and the two buffers alternate as
each becomes full. This is known as double buffering. (Double buffering is often used in (animated)
graphics, so that one screen image can be generated in a buffer while the other (completed) buffer is
displayed on the screen. This prevents the user from ever seeing any half-finished screen images).
● Data transfer size differences. Buffers are used in particular in networking systems to break
messages up into smaller packets for transfer, and then for re-assembly at the receiving side.
● To support copy semantics. For example, when an application makes a request for a disk write, the
data is copied from the user's memory area into a kernel buffer. Now the application can change their
copy of the data, but the data which eventually gets written out to disk is the version of the data at the
time the write request was made.
Caching
Caching involves keeping a copy of data in a faster-access location than where the data is normally stored.
Buffering and caching are very similar, except that a buffer may hold the only copy of a given data item,
whereas a cache is just a duplicate copy of some other data stored elsewhere.
Buffering and caching go hand-in-hand, and often the same storage space may be used for both purposes.
For example, after a buffer is written to disk, then the copy in memory can be used as a cached copy, (until
that buffer is needed for other purposes).
Error Handling
I/O requests can fail for many reasons, either transient (buffers overflow) or permanent (disk crash).
I/O requests usually return an error bit (or more) indicating the problem. UNIX systems also set the global
variable errno to one of a hundred or so well-defined values to indicate the specific error that has occurred.
Some devices, such as SCSI devices, are capable of providing much more detailed information about errors,
and even keep an on-board error log that can be requested by the host.
I/O Protection
The I/O system must protect against either accidental or deliberate erroneous I/O.
User applications are not allowed to perform I/O in user mode - All I/O requests are handled through system
calls that must be performed in kernel mode.
Memory mapped areas and I/O ports must be protected by the memory management system, but access to
these areas cannot be totally denied to user programs. (Video games and some other applications need to be
able to write directly to video memory for optimal performance for example). Instead the memory protection
system restricts access so that only one process at a time can access particular parts of memory, such as the
portion of the screen memory corresponding to a particular window.
Windows NT carries the object-orientation one step further, implementing I/O as a message-passing system
from the source through various intermediaries to the device.
A series of lookup tables and mappings makes the access of different devices flexible, and somewhat
transparent to users.
It is the time taken for the disk arm to move the heads to the cylinder containing the desired sector.
2.What is meant by Rotational Latency? (May/June 2012)(Nov/Dec 2010)
It is defined as the additional time waiting for the disk to rotate the desired sector to the disk head.
3.What is meant by Low-level formatting?
Low-level formatting fills the disk with a special data structure for each sector .The Data structure for a
sector typically consists of a header, a data area and a trailer.
4.What is meant by Swap-Space Management?
It is a low-level task of the operating system. Efficient management of the swap space is called as Swap
space management. This Swap space is the space needed for the entire process image including code and
Data segments.
5.What is meant by Disk Scheduling?
Disk scheduling is a process of allocation of the disk to one process at a time. In multi-programmed
system, many processes try to read or write the records on disks as the same time. To avoid disk arbitration,
it is necessary.
6.Why Disk Scheduling necessary? (April/May 2010)
To avoid Disk arbitration which occurs when many processes try to read or write the records on disks at
the same time, Disk Scheduling is necessary.
7.What are the characteristics of Disk Scheduling?
•Throughput
•Mean Response Time
•Variance of Response time
8.What are the different types of Disk Scheduling? (May/June 2014)
Some of the Disk Scheduling are (i) SSTF Scheduling (ii) FCFS Scheduling (iii) SCAN Scheduling
(iv) C-SCAN Scheduling (v) LOOK Scheduling (vi) C-LOOK Scheduling
9.What is meant by SSTF Scheduling?
SSTF algorithm selects the request with the minimum seek time from the current head position.
SSTF chooses the pending request to the current head position.
10.What is meant by FCFS Scheduling?
It is simplest form of disk scheduling. This algorithm serves the first come process always and is does not
provide fast service.
11.What is meant by SCAN scheduling?
In the SCAN algorithm, the disk arm starts at one end of the disk and moves toward the other end of the
disk. At the other end, the direction of head movement is reversed and servicing continues across the disk.
12. What is meant by C-SCAN Scheduling?
C-SCAN means Circular SCAN algorithm. This Scheduling is a variant of SCAN designed to provide a
more waiting time. This essentially treats the cylinder as a circular list that wraps around from the final
cylinder to the first one.
13.Define Throughput.
It is defined as the number of requests serviced per unit time.
25.Which scheduling algorithm would be best to optimize the performance of a RAM disk?
(Nov/Dec 2011)
Shortest seek time first algorithm.
26.Write the three basic functions which are provided by the hardware clocks and timers.
(April/May 2011)
Mr. N.Thanigaivel, AP/CSE 46
UNIT IV STORAGE MANAGEMENT CSE
There would be multiple paths to the same file, which could confuse users or encourage mistakes (deleting
a file with one path deletes the file in all the other paths).
35.Why is the bitmap for file allocation be kept on mass storage, rather than in main memory?
In case of system crash (memory failure) the free-space list would not be lost as it would be if the bit map
had been stored in main memory.
36.Consider a system that supports the strategies of contiguous, linked, and indexed allocation.
What criteria should be used in deciding which strategy is best utilized for a particular file?
•Contiguous—if file is usually accessed sequentially, if file is relatively small.
•Linked—if file is large and usually accessed sequentially.
•Indexed—if file is large and usually accessed randomly.
● Dimensions
10.Explain in detail about Kernel I/O Subsystem.
● I/O Scheduling
● Buffering
● Caching
● Spooling
● Device reservation
● Error handling