M5 File System
M5 File System
Chapter 1
File System
File Concept
• File Attributes
Different OS keep track of different file attributes,
including:
– Name -
– Identifier ( e.g. inode number )
– Type - Text, executable, other binary, etc.
– Location - on the hard drive.
– Size
– Protection
– Time & Date
– User ID
File Concept
• File Operations
The file ADT supports many common operations:
– Creating a file,Writing a file,Reading a file,Repositioning within a file,Deleting a
file,Truncating a file.
• Most OSes require that files be opened before access and closed after all access is
complete. Information about currently open files is stored in an open file table,
containing for example:
– File pointer - records the current position in the file, for the next read or write
access.
– File-open count - How many times has the current file been opened
( simultaneously by different processes ) and not yet closed? When this counter
reaches zero the file can be removed from the table.
– Disk location of the file.
– Access rights
• The mount command is given a file system to mount and a mount point ( directory ) on
which to attach it.
• Once a file system is mounted onto a mount point, any further references to that directory
actually refer to the root of the mounted file system.
• Any files ( or sub-directories ) that had been stored in the mount point directory prior to
mounting the new file system are now hidden by the mounted file system, and are no
longer available. For this reason some systems only allow mounting onto empty
directories.
• File systems can only be mounted by root, unless root has previously configured certain
file systems to be mountable onto certain pre-determined mount points. ( E.g. root may
allow users to mount floppy file systems to /mnt or something like it. ) Anyone can run the
mount command to see what file systems are currently mounted.
• At the lowest layer are the physical devices, consisting of the magnetic
media, motors & controls, and the electronics connected to them and
controlling them. Modern disk put more and more of the electronic controls
directly on the disk drive itself, leaving relatively little work for the disk
controller card to perform.
• The basic file system level works directly with the device drivers in terms
of retrieving and storing raw blocks of data, without any consideration for
what is in each block. Depending on the system, blocks may be referred to
with a single block number, ( e.g. block # 234234 ), or with head-sector-
cylinder combinations.
File-System Structure
• The file organization module knows about files and their
logical blocks, and how they map to physical blocks on
the disk.
• In addition to translating from logical to physical blocks,
the file organization module also maintains the list of free
blocks, and allocates free blocks to files as needed.
• The logical file system deals with all of the meta data
associated with a file ( UID, GID, mode, dates, etc ), i.e.
everything about the file except the data itself.
• This level manages the directory structure and the
mapping of file names to file control blocks, FCBs,
which contain all of the meta data as well as block
number information for finding the data on the disk.
Directory Implementation
• Directories need to be fast to search, insert, and delete,
with a minimum of wasted disk space.
1 Linear List
• A linear list is the simplest and easiest directory structure
to set up, but it does have some drawbacks.
• Finding a file ( or verifying one does not already exist upon
creation ) requires a linear search.
• Deletions can be done by moving all entries, flagging an
entry as deleted, or by moving the last entry into the newly
vacant position.
• Sorting the list makes searches faster, at the expense of
more complex insertions and deletions.
• A linked list makes insertions and deletions into a sorted
list easier, with overhead for the links.
• More complex data structures, such as B-trees, could also
be considered.
Directory Implementation
• The real disadvantage of a linear list of
directory entries is that finding a file requires
a linear search. Directory information is used
frequently, and users will notice if access to it
is slow
Directory Implementation
2.Hash Table
• A hash table can also be used to speed up searches.
• Hash tables are generally implemented in addition to a
linear or other structure.
• The hash table takes a value computed from the file
name and returns a pointer to the file name in the
linear list. Therefore, it can greatly decrease the
directory search time. Insertion and deletion are also
fairly straightforward, although some provision must
be made for collisions—situations in which two file
names hash to the same location
Directory Implementation
1. contiguous
2.linked
3.indexed.
Contiguous Allocation
In this scheme,
• each file occupies a contiguous set of blocks on the disk.
• For example, if a file requires n blocks and is given a block b
as the starting location, then the blocks assigned to the file will
be:
b, b+1, b+2,……b+n-1. This means that given the starting block
address and the length of the file (in terms of blocks required), we
can determine the blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
• Address of starting block
• Length of the allocated portion.
Contiguous Allocation
Contiguous Allocation
Advantages:
• Both the Sequential and Direct Accesses are supported by
this.
• For direct access, the address of the k block of the file
which starts at block b can easily be obtained as (b+k).
• This is extremely fast since the number of seeks are
minimal because of contiguous allocation of file blocks.
Disadvantages:
• This method suffers from both internal and external
fragmentation. This makes it inefficient in terms of memory
utilization.
• Increasing file size is difficult because it depends on the
availability of contiguous memory at a particular instance.
Linked Allocation
• In this scheme, each file is a linked list of disk
blocks which need not be contiguous.
• The disk blocks can be scattered anywhere on the
disk.
• The directory entry contains a pointer to the
starting and the ending file block. Each block
contains a pointer to the next block occupied by
the file.
• The file ‘abc’ in following image shows how the
blocks are randomly distributed. The last block
(25) contains -1 indicating a null pointer and
does not point to any other block.
Linked Allocation
Linked Allocation
Advantages:
• This is very flexible in terms of file size. File
size can be increased easily since the system
does not have to look for a contiguous chunk
of memory.
• This method does not suffer from external
fragmentation. This makes it relatively better
in terms of memory utilization.
Linked Allocation
Disadvantages:
• Because the file blocks are distributed randomly on the
disk, a large number of seeks are needed to access
every block individually. This makes linked allocation
slower.
• It does not support random or direct access. We can
not directly access the blocks of a file. A block k of a file
can be accessed by traversing k blocks sequentially
(sequential access ) from the starting block of the file
via block pointers.
• Pointers required in the linked allocation incur some
extra overhead.
Indexed Allocation
• In this scheme, a special block known as
the Index block contains the pointers to all
the blocks occupied by a file.
• Each file has its own index block. The ith entry
in the index block contains the disk address of
the ith file block.
• The directory entry contains the address of
the index block as shown in the image:
Indexed Allocation
Indexed Allocation
Advantages:
• This supports direct access to the blocks occupied by the
file and therefore provides fast access to the file blocks.
• It overcomes the problem of external fragmentation.
Disadvantages:
• The pointer overhead for indexed allocation is greater than
linked allocation.
• For very small files, say files that expand only 2-3 blocks,
the indexed allocation would keep one entire block (index
block) for the pointers which is inefficient in terms of
memory utilization.
• However, in linked allocation we lose the space of only 1
pointer per block.
Indexed allocation
For files that are very large, single index block may not be
able to hold all the pointers.
Following mechanisms can be used to resolve this:
1.Linked scheme: This scheme links two or more index
blocks together for holding the pointers.
• Every index block would then contain a pointer or the
address to the next index block.
2.Multilevel index: In this policy, a first level index block
is used to point to the second level index blocks which in
turn points to the disk blocks occupied by the file.
• This can be extended to 3 or more levels depending on
the maximum file size.
Indexed Allocation
3.Combined Scheme:
• In this scheme, a special block called the Inode (information
Node) contains all the information about the file such as the
name, size, authority, etc and the remaining space of Inode is
used to store the Disk Block addresses which contain the actual
file.
• The first few of these pointers in Inode point to the direct
blocks i.e the pointers contain the addresses of the disk blocks
that contain data of the file.
• The next few pointers point to indirect blocks. Indirect blocks
may be single indirect, double indirect or triple indirect.
• Single Indirect block is the disk block that does not contain the
file data but the disk address of the blocks that contain the file
data. Similarly, double indirect blocks do not contain the file
data but the disk address of the blocks that contain the address
of the blocks containing the file data.
Indexed Allocation
Combined Scheme
Free-Space Management
4. Counting
• When there are multiple contiguous blocks of free space then the system
can keep track of the starting address of the group and the number of
contiguous free blocks.
• As long as the average length of a contiguous group of free blocks is
greater than two this offers a savings in space needed for the free list.
5.Space Maps
A space map is a fairly straightforward log of the free / allocated space for a
region. Records are appended to the space map as space becomes freed /
allocated from the region
Efficiency and Performance
Efficiency
• UNIX pre-allocates inodes, which occupies space even before
any files are created.
• UNIX also distributes inodes across the disk, and tries to store
data files near their inode, to reduce the distance of disk seeks
between the inodes and the data.
• Some systems use variable size clusters depending on the file
size.
• The more data that is stored in a directory ( e.g. last access
time ), the more often the directory blocks have to be re-
written.
• Kernel table sizes used to be fixed, and could only be changed
by rebuilding the kernels. Modern tables are dynamically
allocated, but that requires more complicated algorithms for
accessing them.
Efficiency and Performance
Performance
a. Disk controllers generally include on-board caching.
b. When a seek is requested, the heads are moved into place, and
then an entire track is read, starting from whatever sector is
currently under the heads ( reducing latency. )
c. The requested sector is returned and the unrequested portion
of the track is cached in the disk's electronics.
d. Some OSes cache disk blocks they expect to need again in
a buffer cache.
e. A page cache connected to the virtual memory system is
actually more efficient as memory addresses do not need to be
converted to disk block addresses and back again.
Efficiency and Performance
DISK SCHEDULING
• Disk scheduling is a technique used by
the operating system to schedule
multiple requests for accessing the disk.
Rotational Latency:
Rotational Latency is the time taken by the desired sector of disk to rotate
into a position so that it can access the read/write heads. So the disk
scheduling algorithm that gives minimum rotational latency is better.
Transfer Time: Transfer time is the time to transfer the data. It depends on
the rotating speed of the disk and number of bytes to be transferred.
• Disk Response Time: Response Time is the average of time spent by a
request waiting to perform its I/O operation. Average Response time is the
response time of the all requests.
Disadvantages :
Input: Request sequence = {176, 79, 34, 60, 92, 11, 41, 114}
Initial head position = 50
Apply FCFS Disk Scheduling algorithm.
Write the seek sequence and calculate the total number of
seek operations
(176-50)+(176-79)+(79-34)+(60-34)+(92-60)+(92-11)+(41-11)+(114-41) = 510
SSTF Disk Scheduling Algorithm-
Disadvantages –
1. In SSTF there is an overhead of finding out the closest
request.
2. Starvation may occur for requests far from head.
3. In SSTF high variance is present in response time and
waiting time.
4. Frequent switching of the Head’s direction slows the
algorithm.
Problem-01:
• As the name suggests, this algorithm scans all the cylinders of the disk
back and forth.
• Head starts from one end of the disk and move towards the other end
servicing all the requests in between.
• After reaching the other end, head reverses its direction and move
towards the starting end servicing all the requests in between.
• The same process repeats.
NOTE-
•
• SCAN Algorithm is also called as Elevator Algorithm.
• This is because its working resembles the working of an elevator.
Advantages –
1. Scan scheduling algorithm is simple and easy to
understand and implement.
2. Starvation is avoided in SCAN algorithm.
3. Low variance Occurs in waiting time and response
time.
Disadvantages –
1. Long waiting time occurs for the cylinders which are
just visited by the head.
2. In SCAN the head moves till the end of the disk
despite the absence of requests to be serviced.
Problem 1
• Consider a disk queue with requests for I/O to blocks
on cylinders
• 98, 183, 41, 122, 14, 124, 65, 67.
Input: Request sequence = {176, 79, 34, 60, 92, 11, 41, 114}
Initial head position = 50
Direction = left
Apply SCAN algorithm to find the seek sequence and total seek count
Circular SCAN (C-SCAN)
• Circular SCAN (C-SCAN) scheduling algorithm is a
modified version of SCAN disk scheduling algorithm
that deals with the inefficiency of SCAN algorithm by
servicing the requests more uniformly.
• Like SCAN (Elevator Algorithm) C-SCAN moves the
head from one end servicing all the requests to the
other end.
• However, as soon as the head reaches the other end,
it immediately returns to the beginning of the disk
without servicing any requests on the return trip and
starts servicing again once reaches the beginning.
This is also known as the “Circular Elevator Algorithm”
as it essentially treats the cylinders as a circular list
that wraps around from the final cylinder to the first
one.
Problem 1
Disadvantages –
1. More seek movements are caused in C-SCAN compared to SCAN
Algorithm.
2. In C-SCAN even if there are no requests left to be serviced the Head
will still travel to the end of the disk unlike SCAN algorithm.
LOOK Algorithm
• LOOK Algorithm is an improved version of the
SCAN Algorithm.
• Head starts from the first request at one end of
the disk and moves towards the last request at
the other end servicing all the requests in
between.
• After reaching the last request at the other end,
head reverses its direction.
• It then returns to the first request at the starting
end servicing all the requests in between.
• The same process repeats.
The main difference between SCAN
Algorithm and LOOK Algorithm is-
• SCAN Algorithm scans all the cylinders of
the disk starting from one end to the other
end even if there are no requests at the
ends.
• LOOK Algorithm scans all the cylinders of
the disk starting from the first request at
one end to the last request at the other
end.
Problem 2
Advantages :
• If there are no requests left to be services the Head will not move to the
end of the disk unlike SCAN algorithm.
• Better performance is provided compared to SCAN Algorithm.
• Starvation is avoided in LOOK scheduling algorithm.
• Low variance is provided in waiting time and response time.
Disadvantages :
• Head starts from the first request at one end of the disk and moves
towards the last request at the other end servicing all the requests in
between.
• After reaching the last request at the other end, head reverses its
direction.
Advantages –
1. In C-LOOK the head does not have to move till the end of the disk if
there are no requests to be serviced.
2. There is less waiting time for the cylinders which are just visited by the
head in C-LOOK.
3. C-LOOK provides better performance when compared to LOOK
Algorithm.
4. Starvation is avoided in C-LOOK.
5. Low variance is provided in waiting time and response time.
• Disadvantages –
1. In C-LOOK an overhead of finding the end requests is present.