Chapter 5
5.1. File system
contents
o File concept
o File naming
o File type
o File access
o File attribute
o File operation
o File structure
o Directory
o Directory structure
o Directory operation
o File system implementation
o Implementing directory
File concept
o All computer applications need to store and
retrieve information.
o While a process is running, it can store a limited
amount of information within its own address
space.
o But the following requirements yields for long-
term information storage:
It must be possible to store a very large amount of
information.
The information must survive the termination of the
process using it.
Multiple processes must be able to access the
information concurrently.
o Magnetic disks have been used for years for this
File concept(cont……)
o disk support two operations: Read block and Write block.
o with these two operations one could, in principle, solve
the long-term storage problem.
o Just a few of the questions that quickly arise are:
How do you find information?
How do you keep one user from reading another user's
data?
How do you know which blocks are free?.
o So to solve this problem the operating system use a
new abstraction called the file which is logical units of
information created by processes
o As a whole, that part of the operating system dealing
with files is known as the file system
File naming
o Files are an abstraction mechanism and provide a
way to store information on the disk and read it back
later.
o The most important characteristic of any abstraction
mechanism is the way the objects being managed are
named.
o when a process creates a file, it gives the file a name.
When the process terminates, the file continues to
exist and can be accessed by other processes using
its name.
o The exact rules for file naming vary somewhat from
system to system, but all current operating systems
allow strings of one to eight letters as legal file
names.
File naming (con’t… )
o Frequently digits and special characters are also
permitted.
o Some file systems distinguish between upper and
lower case letters, whereas others do not. UNIX falls
in the first category; MS-DOS falls in the second.
o Many operating systems support two-part file names,
with the two parts separated by a period, as in prog.c.
o In some systems (e.g., UNIX), file extensions are just
conventions and are not enforced by the operating
system.
o In contrast, Windows is aware of the extensions and
assigns meaning to them.
File type
o Many operating systems support several types of files.
o UNIX and Windows, for example, have regular files and
directories.
o UNIX also has character and block special files.
o Regular files: are the ones that contain user
information. They are either ASCII files or binary files.
o Directories: are system files for maintaining the
structure of the file system.
o Character special files: are related to input/output and
used to model serial I/0 devices, such as terminals,
printers, and networks.
o Block special files: are used to model disks. In this
chapter we will be primarily interested in regular files.
File access
o File can be access either sequentially or
randomly.
o Sequential access: a process could read all the
bytes or records in a file in order, starting at the
beginning, but could not skip around and read
them out of order.
o Random access: Files bytes or records can be
read in any order. Used when the storage medium
was magnetic disk. Essential for many
applications.
o This method is used in UNIX and Windows.
File attribute
o Every file has a name and its data.
o In addition, all operating systems associate other
information with each file, for example, the date
and time the file was last modified and the file's
size.
o We will call these extra items the file's attributes ,
Some people call them metadata.
o The list of attributes varies considerably from
system to system.
File attribute(con’t…)
File operation
o Different systems provide different operations to
allow storage and retrieval.
o Create: The file is created with no data. The
purpose of the call is to announce that the file is
coming and to set some of the attributes.
o Delete: When the file is no longer needed, it has
to be deleted to free up disk space.
o Open: Before using a file, a process must open it.
The purpose of the open call is to allow the
system to fetch the attributes and list of disk
addresses into main memory for rapid access on
latter caller.
File operation(con’t…)
o Close: When all the access are finished, the
attributes and disk addresses are no longer
needed, so the file should be closed to free up
internal table space.
o Read: Data are read from file. Usually, the bytes
come from the current position. The caller must
specify how much data are needed and must also
provide a buffer to put them in.
o Write: Data are written to the file again, usually at
the current position. If the current position is the
end of the file, the file’s size increases. If the
current position is in the middle of the file,
existing data are and lost forever.
File operation(con’t…)
o Append: This call is a restricted in the form of
write. It can only add data to the end of the file.
o Seek: For random access files, a method is
needed to specify from where to take the data.
One common approach is a system call, seek,
that repositions the file pointer to a specific place
in the file. After this call has completed, data can
be read from, or written to, that position.
o Get Attributes: Processes often need to read file
attributes to do their work.
File operation(con’t…)
o Set Attributes: Some of the attributes are user
settable and can be changed after the file has
been created. This system call makes that
possible.
o The protection mode information is an
obvious example.
o Most of the flags also fall in this category.
o Rename: It frequently happens that a user needs
to change the name of an existing file. This
system call makes that possible
File structure
o Files can be structured in any of several ways.
o Three common possibilities are:
1. Byte sequence: All it sees are bytes. Any meaning must be
imposed by user-level programs.
– Both UNIX and Windows use this approach.
2. Record sequence: Central to the idea of a file being a sequence
of records is the idea that the read operation returns one record
and the write operation overwrites or appends one record.
• Used in database system.
3. Tree: File consists of tree not necessarily all the same length,
each containing a key field in a fixed position in the record. The
tree is sorted on the key field, to allow rapid searching for a
particular key.
File structure (con’t….)
(a) Byte sequence. (b) Record
sequence. (c) Tree.
Directory
o To keep track of files, file systems normally
have directories or folders, which in many
systems are themselves files and
containing information about all files.
Directory structure
o Defines the organization or logical structure of
directory.
Single-Level Directory Systems:
o The simplest form of directory system is having
one directory containing all the files.
o they require unique name and Sometimes it is
called the root directory.
o The advantages of this scheme are its simplicity
and the ability to locate files quickly-there is only
one place to look, after all.
o It is often used on simple embedded devices such
as telephones, digital cameras, and some portable
music players.
Directory structure(con’t…)
Hierarchical Directory Systems:
o With this approach, there can be as many directories as are
needed to group the files in natural ways.
o When the file system is organized as a directory tree, some
way is needed for specifying file names.
o Two different methods are commonly used.
o In the first method, each file is given an absolute path name consisting
of the path from the root directory to the file.
o Example : /usr/ast/mailbox
o The other kind of name is the relative path name. This is used
in conjunction with the concept of the working directory (also
called the current directory). A user can designate one
directory as the current working directory
o For example, if the current working directory is /usr/ast, then
the file whose absolute path is /usr/ast/mailbox can be
referenced simply as mailbox.
Directory operation
o They allowed system calls for managing directories exhibit
more variation from system to system than system calls for
files.
o Create: A directory is created. It is empty except for dot and
dot dot, which are put there automatically by the system .
o Delete: A directory is deleted. Only an empty directory can
be deleted.
o Opendir: Directories can be read. Before a directory can be
read, it must be opened, analogous to opening and reading
a file.
o Closedir: When a directory has been read, it should be
closed to free up internal table space.
Directory operation(con’t…)
o Readdir: This call returns the next entry in an open directory, no
matter which of the possible directory structures is being used.
o Rename: In many respects, directories are just like files and can
be renamed the same way files can be.
o Link: Linking is a technique that allows a file to appear in more
than one directory. This system call specifies an existing file and a
path name, and creates a link from the existing file to the name
specified by the path
track of the number of directory entries containing the file), is
sometimes called a hard link.
o Unlink: A directory entry is removed. If the file being unlinked is
only present in one directory (the normal case), it is removed from
the file system. If it is present in multiple directories, only the path
name specified is removed. The others remain.
o In UNIX, the system call for deleting files (discussed earlier) is, in
fact, unlink.
File system implementation
o Users are concerned with how files are
named, what operations are allowed on
them, what the directory tree looks like,
and similar interface issues.
o Implementers are interested in how files
and directories are stored, how disk space
is managed, and how to make everything
work efficiently and reliably
File system layout
o Most disks can be divided up into one or more
partitions, with independent file systems on each
partition.
o Sector 0 of the disk is called the MBR (Master
Boot Record) and is used to boot the computer.
o The end of the MBR contains the partition table
and gives the starting and ending addresses of
each partition.
o One of the partitions in the table is marked as
active
o When the computer is booted, the BIOS reads in
and executes the MBR.
o The first thing the MBR program does is locate
the active partition, read in its first block, called
the boot block, and execute it.
File system layout(con’t…)
o The program in the boot block loads the operating
system contained in that partition.
o every partition starts with a boot block, even if it
does not contain a bootable operating system.
o Often the file system will contain some of the
items
o superblock : contains all the key parameters
about the file system and is read into memory
when the computer is booted or the file system is
first touched.
o Typical information in the superblock includes: a
magic number, to identify the file system type,
the number of blocks, in the file system, and
other key administrative information.
File system layout(con’t…)
o Free space management: includes information
about free blocks in the file system, for example
in the form of a bitmap or a list of pointers. This
might be followed by the
o i-nodes :an array of data structures, one per file,
telling all about the file.
o root directory : which contains the top of the file
system tree.
o Finally, the remainder of the disk contains all the
other directories and files.
File system layout(con’t…)
A possible file system layout.
Implementing files
o Probably the most important issue in implementing file
storage is keeping track of which disk blocks go with which
file.
o Various methods are used in different operating systems.
o Contiguous allocation
o Linked list allocation
o i-node
Contiguous allocation
o The simplest allocation scheme is to store each file as a
contiguous run of disk blocks.
o it is simple to implement because keeping track of where a
file's blocks are is reduced to remembering two numbers:
the disk address of the first block and the number of blocks
in the file
o the read performance is excellent because the entire file
can be read from the disk in a single operation.
o Only one seek is needed (to the first block). After that, no
more seeks or rotational delays are needed, so data come
in at the full bandwidth of the disk.
o Thus contiguous allocation is simple to implement and has
high performance.
o contiguous allocation also has a fairly significant drawback:
over the course of time, the disk becomes fragmented.
Linked list allocation
o keep each one as a linked list of disk blocks.
o The first word of each block is used as a pointer
to the next one and the rest of the block is for
data.
o Unlike contiguous allocation, every disk block can
be used in this method. No space is lost to disk
fragmentation (except for internal fragmentation
in the last block).
o it is sufficient for the directory entry to merely
store the disk address of the first block. The rest
can be found starting there.
o although reading a file sequentially is
straightforward, random access is extremely slow.
Linked list allocation(con’t…)
o the amount of data storage in a block is no longer a
power of two because the pointer takes up a few
bytes.
o Both disadvantages of the linked list allocation can be
eliminated by taking the pointer word from each disk
block and putting it in a table in memory.
i-node
o method for keeping track of which blocks belong
to which file is to associate with each file a data
structure called an i-node (index-node)
o lists the attributes and disk addresses of the file's
blocks.
o The big advantage of this scheme over linked files
using an in-memory table is that the i-node need
only be in memory when the corresponding file is
open.
Implementing directory
o The main function of the directory system is to
map the ASCII name of the file onto the
information needed to locate the data.
o a directory consists of a list of fixed-size entries,
one per file, containing a (fixed-length) file name,
a structure of the file attributes, and one or more
disk addresses (up to some maximum) telling
where the disk blocks are.
Implementing directory(con’t…)
o Nearly all modem operating systems support
longer, variable-length file names.
How can these be implemented?
o The simplest approach is to set a limit on file
name length, typically 255 characters
o One alternative is to give up the idea that all
directory entries are the same size.
o With this method, each directory entry contains a
fixed portion, typically starting with the length of
the entry, and then followed by data with a fixed
format and other.
o A disadvantage of this method is that when a file
is removed, a variable-sized gap is introduced
into the directory into which the next file to be
entered may not fit.
Implementing directory(con’t…)
o Another way to handle variable-length names is to make the
directory entries themselves all fixed length and keep the
file names together in a heap at the end of the directory.
o This method has the advantage that when an entry is
removed, the next file entered will always fit there.
o In all of the designs so far, directories are searched linearly
from beginning to end when a file name has to be looked
up.
o One way to speed up the search is to use a hash table in
each directory.
o The table entry corresponding to the hash code is
inspected.
Implementing directory(con’t…)
o If it is unused, a pointer is placed there to the file
entry.
o If that slot is already in use, a linked list is
constructed, headed at the table entry and
threading through all entries with the same hash
value.
o A different way to speed up searching large
directories is to cache the results of searches.
nd
e e
Th