0% found this document useful (0 votes)
20 views81 pages

File Systems and Disk Management Overview

The document discusses file systems and I/O systems, focusing on mass storage structures, disk mechanisms, and various disk scheduling algorithms. It covers the characteristics and operations of files, file access methods, and directory structures. Additionally, it addresses disk management, swap-space management, and the booting process in Windows.

Uploaded by

abiramipriyacse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views81 pages

File Systems and Disk Management Overview

The document discusses file systems and I/O systems, focusing on mass storage structures, disk mechanisms, and various disk scheduling algorithms. It covers the characteristics and operations of files, file access methods, and directory structures. Additionally, it addresses disk management, swap-space management, and the booting process in Windows.

Uploaded by

abiramipriyacse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

UNIT -4

File Systems and I/O Systems


Mass Storage Structure
• Magnetic disks provide bulk of secondary storage of modern
computers
– A magnetic disk drive consists of a number of platters (disks)
coated in a magnetic material.
– Disks can be removable

– Disk Drive is attached to computer via I/O bus.

– Host controller in computer uses bus to talk to disk controller


built into drive.
– Transfer rate is rate at which data flow between drive and
computer
Moving-head Disk Mechanism
Hard Disks
• Disk platters range from 1.8" to 14" in diameter
• Commonly 3.5”, 2.5”, and 1.8”
• Today, disk drives typically have buffer capacities between 2
MB and 16 MB.
• The maximum recording storage density on any track is 200
bits/cm and minimum spacing between tracks is 0.25 mm
– Positioning time (random-access time) is time to move disk
arm to desired cylinder.
– Seek Time is defined as the time required by the read/write
head to move from one track to another.
– Head crash results from disk head making contact with the
disk surface.
Hard Disks…
• A hard disk/drive unit comes with a set rotation speed
varying from 4,200 revolutions per minute to 15,000 rpm.
• Most laptop and desktop PCs use hard disks that fall
between 5,400 rpm and 7,200 rpm.
• Range from 30GB to 3TB per drive
Solid-State Disks
• Nonvolatile memory used like a hard drive.
• Can be more reliable than HDDs
• More expensive per MB
• Maybe have shorter life span
• Less capacity
• But much faster
• Busses can be too slow
• No moving parts, so no seek time or rotational
latency
Magnetic Tape
• Was early secondary-storage medium

• Relatively permanent and holds large quantities of data

• Access time is slow

• Random access 1000 times slower than disk

• Mainly used for backup, storage of infrequently-used data.

• 200GB to 1.5TB typical storage


Disk Structure
• Disk drives are addressed as large 1-dimensional arrays of logical
blocks, where the logical block is the smallest unit of transfer.
• The 1-dimensional array of logical blocks is mapped into the
sectors of the disk sequentially.
– Sector 0 is the first sector of the first track on the outermost
cylinder
– Mapping proceeds in order through that track, then the rest of
the tracks in that cylinder, and then through the rest of the
cylinders from outermost to innermost
– Logical to physical address should be easy
Disk Scheduling

• The operating system is responsible for using hardware


efficiently for the disk drives, this means having a fast
access time and disk bandwidth.
• Minimize seek time

• Seek time  seek distance

• Disk bandwidth is the total number of bytes transferred,


divided by the total time between the first request for
service and the completion of the last transfer.
Disk Scheduling (Cont.)
• There are many sources of disk I/O request
– OS

– System processes

– Users processes

• I/O request includes input or output mode, disk address, memory


address, number of sectors to transfer
• OS maintains queue of requests, per disk or device

• Idle disk can immediately work on I/O request, busy disk means
work must be in queue.
Disk Scheduling (Cont.)
• Note that drive controllers have small buffers and can manage a
queue of I/O requests (of varying “depth”)
• Several algorithms exist to schedule the servicing of disk I/O requests

• The analysis is true for one or many platters

• The following FCFS algorithm is simple to implement, but doesn't


provide faster services.
• Not suitable for heavy loads.

• Consider the request 98, 183, 37, 122, 14, 124, 65, 67 . Head
pointer 53. Total number of tracks in disk is 200 (from 0 -199).
First Come First Serve (FCFS)
Illustration shows total head movement of 640 cylinders. Calculate
average seek time.
Next Track Number of tracks
accessed traversed
98 45
183 85
37 146
122 85
14 108
124 110
65 59
67 2

• Average Seek length = 45+85+146+85+108+110+59+2 / 8 = 640/8 = 80


SSTF
• Shortest Seek Time First selects the request with the
minimum seek time from the current head position.
• SSTF scheduling is a form of SJF scheduling, it may
cause starvation of some requests.
• Its throughput is better than FCFS.

• Minimize the response time.

• Less number of head movements.


SSTF
Illustration shows total head movement of 236 cylinders

Next Track Number of


accessed tracks
traversed
65 12
67 2
37 30
14 23
98 84
122 24
124 2
183 59

Average Seek length =12+2+30+23+84+24+2+59 = 236/ 8 =


29.5
SCAN
• The disk arm starts at one end of the disk, and moves toward the other
end, servicing requests until it gets to the other end of the disk, where
the head movement is reversed and servicing continues.
• SCAN algorithm is Sometimes called as elevator algorithm

• But, if requests are uniformly dense, then there will be largest density
at other end of disk and those request will wait fo the longest time.
• Eliminates starvation.

• Throughput is similar to SSTF.

• Needs a directional bit.


SCAN
Illustration shows total head movement of 236 cylinders

Next Number of
Track tracks
accessed traversed
37 16
14 23
65 14+65= 79
67 2
98 31
122 24
124 2
183 59

Average Seek length =16+23+79+2+31+24+2+59/8 = 236/ 8


= 29.5
C-SCAN
• Circular Scan Provides a more uniform wait time than SCAN.

• The head moves from one end of the disk to the other, servicing
requests as it goes
– When it reaches the other end, however, it immediately returns to the
beginning of the disk, without servicing any requests on the return trip

• Treats the cylinders as a circular list that wraps around from the
last cylinder to the first one.
C-SCAN (Cont.)
Next Number of
Track tracks
accessed traversed
65 12
67 2
98 31
122 24
124 2
183 59
14 17+200+14=
231
37 23
Average Seek length =12+2+31+24+2+59+231+23/8 = 384/ 8
= 48
LOOK
• LOOK a version of SCAN.

• Start the head moving in one direction.

• Satisfy the request in that direction, if there is no other


request in that direction to service than reverse its direction
and service the request on the way.
LOOK
Next Track Number of
accessed tracks traversed
65 12
67 2
98 31
122 24
124 2
183 59
37 146
14 23

Average Seek length =12+2+31+24+2+59+146+23/8 = 299/ 8


= 37.37
C-LOOK
• C-LOOK a version of C-SCAN.

• Start the head moving in one direction.

• Satisfy the request in that direction, if there is no other


request in that direction to service than reverse its direction,
then services the request to the cylinder nearest the
opposite side of the disk.
C-LOOK
Next Track Number of tracks
accessed traversed

65 12
67 2
98 31
122 24
124 2
183 59
14 183 -14 = 169
37 23

Average Seek length =12+2+31+24+2+59+169+23/8 = 322/ 8


= 40.25
Selecting a Disk-Scheduling Algorithm

• SSTF is common and has a natural appeal.


• SCAN and C-SCAN perform better for systems that place a
heavy load on the disk
• Performance depends on the number and types of
requests
• Requests for disk service can be influenced by the file-
allocation method.
• The disk-scheduling algorithm should be written as a
separate module of the operating system, allowing it to be
replaced with a different algorithm if necessary
• Either SSTF or LOOK is a reasonable choice for the default
algorithm
Disk Management
• Low-level formatting, or physical formatting — Divides a
disk into sectors that the disk controller can read and
write
• To use a disk to hold files, the operating system still
needs to record its own data structures on the disk
– Partition the disk into one or more groups of cylinders.
– Logical formatting or “making a file system”
– To increase efficiency most file systems group blocks
into clusters
• Disk I/O done in blocks
• File I/O done in clusters
Booting from a Disk in Windows
• Boot block initializes system
– The bootstrap is stored in ROM
– Bootstrap loader program
• Windows places its boot code in the first sector on the hard disk
(master boot record).
• Disk divided into more than one partitions with one as boot
partition, which contains OS and device drivers.
• Methods such as sector sparing used to handle bad blocks.
• Bad Block is an area of storing media that is no longer reliable
for the storage of data because it is completely damaged or
corrupted.
• Low-level formatting sets aside extra sectors, not visible to the
operating system. The controller tells to change each
bad sector logically with one of the spare sectors. This scheme is
called as sector sparing
Swap-Space Management
• Swap-space - Virtual memory uses disk space as an extension of
main memory
– Less common now due to memory capacity increases
• Swap-space management
– Kernel uses swap maps to track swap-space use
– OS allocates swap space only when a dirty page is forced out of
physical memory, not when the virtual memory page is first
created
• File data written to swap space until write to file system
requested
• Other dirty pages go to swap space due to no other home
• Text segment pages thrown out and reread from the file
system as needed
• Some systems allow multiple swap spaces
File System Storage
Files
 A file is a named collection of related information that is
recorded on secondary storage.
 The operating systems maps this logical storage unit to the
physical view of information storage.
 A file may have the following characteristics
 File Attributes

 File Operations

 File Types

 File Structures

 Internal Files
File Attributes
• File Name: The symbolic name , ie) human readable file attribute.

• Identifier: A unique number assigned to each file for identification


purpose.

• File Type: Some systems recognize various file types.

• For eg. (.exe, .bin,..)


• File Location: A pointer to a device to find a file.

• File Size: The current size of a file, or the maximum allowed size.
• File Protection: This is for access-control.(Read, Read-Write,..)
• File Date, Time, Owner, etc.
File Operations
A file can be considered as an abstract data type that has data
and accompanying operations.
• Creating a file
• Writing a file
• Reading a file
• Repositioning within a file
• Deleting a file
• Truncating a file

• Other operations (e.g., appending a file, renaming a file)


File Operations
disk

system-wide process
open-file table open-file table file

file index
file pointer

file open count


one file
disk location
access right
File Structure

• Some systems support specific file types that have special file
structures.

• For example, files that contain binary executables.

• An operating system becomes more complex when more file


types (i.e., file structures) are supported.

• In general, the number of supported file types is kept to minimum.


File Access Methods

• Access method:Defines how a file be used.

• There are three popular Methods:

• Sequential access

• Direct access

• Indexed access.
Sequential Access Method
 With the sequential access method, the file is processed in order, one
record after the other.
 If p is the file pointer, the next record to be accessed is either p+1 or p-1
(i.e., backspace).

current record

beginning end of file


next record

rewind
read/write
Direct Access Method
 A file is made up of fixed-length logical records.

 The direct access method uses record number to identify each


[Link] example, read rec 0, write rec 100, seek rec 75, etc.
 Some systems may use a key field to access a record (e.g., read
rec “Age=24” or write rec “Name=Dow”). This is usually
achieved using hashing.
 Since records can be accessed in random order, direct access is
also referred to as random access.
 Direct access method can simulate sequential access.
Indexed Access Method
• With the indexed access method, a file is sorted in ascending
order based on a number of keys.

• Each disk block may contain a number of fixed- length logical


records.

• An index table stores the keys of block in each block.

• We can search the index table to locate the block that contains
the desired record. Then, search the block to find the desired
record.

• This is exactly a one-level B-, B+.


• ie) B- Trees data are stored in leaf or internal. B+ trees data are stored only in leaf nodes.
data file
index table
last name logical
Adams rec # Ashcroft, … Asher, … Atkins
Arthur
Ashcroft

Smith, …. Sweeny, … Swell, …

Smith

index table is stored


In physical
memory
Directory Structure
• A large volume disk may be partitioned into partitions, or mini
disks, or volumes.

• Each partition contains information about files within it. This


information is stored in entries of a device directory or volume
table of content (VTOC).

• The device directory, or directory for short, stores the name,


location, size, type, access method, etc of each file.

• Operations perform on directory: search for a file, create a file,


delete a file, rename a file, traverse the file system, etc.
Directory Structure

There are five commonly used directory structures:


 Single-Level Directory

 Two-Level Directory

 Tree-Structure Directories

 Acyclic-Graph Directories

 General Graph Directories


Single-Level Directory

• All files are contained in the same directory.

• It is difficult to maintain file name uniqueness.

• Early version of MS - DOS use this directory structure.


Two-Level Directory
 This is an extension of the single-level directory for multi-user
system.
 Each user has his/her user file directory.
 The system’s master file directory is searched for the user
directory when a user job starts.
 Early CP/M-80 multi-user systems use this structure.
Two-Level Directory
 To locate a file, path name is used.
 For example, /user2/bo is the file bo of user 2.
 Different systems use different path names.
 For example, under MS-DOS it may be C:\user2\bo.
 The directory of a special user, say user 0, may contain all
system files.
Tree-Structured Directory
 Each directory or subdirectory contains files and subdirectories, and
forms a tree.
 Directories are special files.

/bin/mail/prog/spell
Acyclic-Graph Directory:

• This type of directories allows a


file/directory to be shared by
multiple directories.
• This is different from two copies of
the same file or directory.
• An acyclic-graph directory is more
flexible than a simple tree structure.
However, it is more complex.
• file count is shared by directories
dict and spell
General Graph Directory
 It is easy to traverse the directories of a tree or an acyclic directory system.

 However, if links are added arbitrarily, the directory graph becomes


arbitrary and may contain cycles.

a cycle
General Graph Directory
 We use reference count to delete a file.

 In a cycle, due to self-reference, the reference count may be

non-zero even when it is no longer possible to refer to a file or


directory.
 Thus, garbage collection may needed. A garbage
collector traverses the directory and marks files and
directories that can be accessed.
 A second round removes those inaccessible items.

 To avoid this time-consuming task, a system can check if a cycle


may occur when a link is made.
File allocation methods
• Files are allocated disk spaces by operating system.
• Allocation methods are for,
• Effective disk space utilization
• Allow fast file access
• Operating systems deploy following three main ways
to allocate disk space to files.
– Contiguous Allocation
– Linked Allocation
– Indexed Allocation
Contiguous allocation
• Each file occupies a set of contiguous blocks on the
disk.
• Simple – only starting location (block ) and length
(number of blocks) are required.
• Random access.
• Can be used for both sequential and direct files.
• Wasteful of space (dynamic storage-allocation
problem)
• Files cannot grow.
Contiguous Allocation of disk spaces
Linked Allocation
Linked allocation – each file has a linked list of blocks
• File ends at nil pointer
• Each block contains pointer to next block
• No compaction, external fragmentation
• Free space management system called when new block needed.
• Improve efficiency by clustering blocks into groups but increases internal
fragmentation
• Reliability can be a problem
• Locating a block can take many I/Os and disk seeks.
• Simple – need only starting address
• No random access
FAT (File Allocation Table) variation
• Beginning of volume has table, indexed by block number
• Much like a linked list, but faster on disk and cacheable
• New block allocation simple
Linked Allocation
File Allocation Table
Indexed Allocation Methods
• Each file has its own index block(s) of pointers to its data
blocks
• Logical view

• Need index table


• Random access
• Dynamic access without external fragmentation, but have
overhead of index block.
• Only 1 block for index table
Indexed Allocation
Indexed Allocation - Mapping
Best method depends on file access type
• Contiguous is great for sequential and random

• Linked is good for sequential, not random

• Indexed more complex, because single block access could


require index block.
File system mounting
• Mounting is a process by which the operating system makes files and
directories on a storage device.
• Storage devices such as hard drive, CD-ROM, or network share available
resources for users to access via the computer's file system.
• An opposite process of mounting is called un mounting,.
• Here, the operating system cuts off all user access to files and directories
on the mount point.
• Writes the remaining queue of user data to the storage device, refreshes
file system metadata, then stop holding the access to the device, making
the storage device safe for removal.
File system mounting...
• A mount point is a location in the partition used as a root file
system .
• Many different types of storage exist, including magnetic, optical,
and semiconductor (solid-state) drives.
• Each different file system provides the host operating system
with metadata so that it knows how to read and write data.
• When the medium is mounted, these metadata are read by the
operating system so that it can use the storage
File system mounting...
• In order to access a file system in Linux ,first we need
to mount it.
• Mounting a file system simply means making the
particular file system accessible at a certain point in
the Linux directory tree.
• When mounting a file system it does not matter if
the file system is a hard disk partition, CDROM,
floppy, or USB storage device.
• Mounting is the attaching of an additional file system
to the currently accessible file system of a computer.
File system mounting...
File Sharing
• Sharing of files in a multi user systems is desirable.
• Sharing may be done through a protection scheme.
• On distributed systems files may be shared across
network.
• Network File System is a common distributed file sharing
method.
File Sharing – Multiple Users
• User IDs - identify users, Used for allowing permissions and
protections to user.
• Group IDs - allow users to be in groups, and permit group
access rights.
File Sharing – Remote File Systems
• Uses networking to allow file system access between systems.
• Manually done through programs like FTP.
• Client-server model allows clients to mount remote file systems from
servers
• Server can serve multiple clients.
• Client identification is complicated in distributed environment.
• NFS is standard UNIX client-server file sharing protocol .
• The Network File System (NFS) is a client/server application that lets a
computer user view and optionally store and update files on a remote
computer as though they were on the user's own computer.
• CIFS(Common Internet File System) is standard Windows protocol.
• Standard operating system file calls are translated into remote calls
• Distributed Information Systems (distributed naming services) such as
LDAP(Lightweight Directory Access Protocol), DNS, NIS, Active Directory
implement unified access to information needed for remote computing
File Protection

• We can keep files safe from physical damage (i.e.,


reliability) and improper access (i.e., protection).

• Reliability is generally provided by backup.

• The need for file protection is a direct result of the ability to


access files.

• Access control may be a complete protection by denying


access.

• Or, the access may be controlled.


File Protection: Types of Access
 Access control may be implemented by limiting the types of file access that
can be made.
 The types of access may be

 Read: read from the file


 Write: write or rewrite the file

 Execute: load the file into memory and execute it

 Append: write new info at the end of a file

 Delete: delete a file

 List: list the name and attributes of the file


File Protection: Access Control

 The most commonly used approach is to make the access


dependent on the identity of the user.
 Each file and directory is associated with an access matrix
specifying the user name and the types of permitted
access.
 When a user makes a request to access a file or a
directory, his/her identity is compared against the
information stored in the access matrix.
File Protection: Access Control

Access Matrix

File 1 File 2 File 3 File 4 Account 1 Account 2

Own Own Inquiry


User A R W R W Credit

Own Inquiry Inquiry


User B R R W R debit Credit
W
Own Inquiry
User R W R R W debit
File Protection: Access Control
A B C
File 1 Own
R W R R
Access-control Lists
W

In practice, the access matrix is


B C
File 2 Own
R W R
sparse.
The matrix can be decomposed

File 3
A B into columns (files), yielding
Own
R W access-control lists (ACL)
W
However, this list can be very

B C long!
File 4 Own
R R W
File Protection: Access Control

User A
File 1
Own
File 3
Own
Capability Lists
R W R W
 Decomposition by
rows (users) yields
User B
File 1 File 2 File 3 File 4 capability tickets.
Own
R R W R  Each user has a
W
number ticket for
file/directory
User C
File 1 File 2 File 4
Own
access.
R R R W
W
RAID Structure
• RAID – redundant array of inexpensive disks.

• RAID is a technology that is used to increase the performance and


reliability of data storage.
• A RAID system consists of two or more disks working in parallel.

• Multiple disk drives provides reliability via redundancy.

• Frequently combined with nonvolatile RAM (NVRAM) to cache the RAID


array and improve write performance.
• This write-back cache is protected from data loss during power failures.

• RAID is arranged into six different levels.


• Several improvements in disk-use techniques involve the use of multiple
disks working cooperatively.
RAID (Cont)
• Disk striping uses a group of disks as one storage unit

• RAID schemes improve performance and improve the reliability of


the storage system by storing redundant data
– Mirroring or shadowing (RAID 1) keeps duplicate of each disk

– Block interleaved parity (RAID 4, 5, 6) uses much less


redundancy
• RAID within a storage array can still fail if the array fails, so
automatic replication of the data between arrays is common
RAID Levels
Raid Level 0

Raid Level 1

Raid Level 2

Raid Level 3

Raid Level 4

Raid Level 5

Raid Level 6
RAID Levels
RAID Level 0
 RAID 0 (also known as a stripe set or striped volume) splits
("stripes") data evenly across two or more disks, without parity
information, redundancy, or fault tolerance.

 Since RAID 0 provides no fault tolerance or redundancy, the


failure of one drive will cause the entire array to fail; as a result
of having data striped across all disks, the failure will result in
total data loss.

 This configuration is typically implemented having speed as the


intended goal.

 RAID 0 is normally used to increase performance, although it can


also be used as a way to create a large logical volume out of
two or more physical disks.
RAID Level 1
 RAID 1 consists of an exact copy (or mirror) of a set of
data on two or more disks; a classic RAID 1 mirrored
pair contains two disks.

 This configuration offers no parity, striping, or spanning


of disk space across multiple disks, since the data is
mirrored on all disks belonging to the array, and the
array can only be as big as the smallest member disk.

 This layout is useful when read performance or reliability


is more important than write performance or the
resulting data storage capacity.
RAID Level 2
 RAID 2, which is rarely used in
practice, stripes data at the bit
(rather than block) level, and uses a
Hamming code for error correction.
 The disks are synchronized by the
controller to spin at the same
angular orientation (they reach
index at the same, so it generally
cannot service multiple requests
simultaneously. It is the only original
level of RAID that is not currently
used.
 Extremely high data transfer rates
are possible.
RAID Level 3
 This level overcome the single disk failure.

 RAID 3 consists of Byte-level Striping. It stripes the


data onto multiple disk. The parity bit generated for
each disk section and stored on a different dedicated
disk. This level overcome the single disk [Link]
of the characteristics of RAID 3 is that it generally
cannot service multiple requests simultaneously.

 Therefore, any I/O operation requires activity on


every disk and usually requires synchronized
spindles.
RAID Level 4
• RAID 4 consist of Block-level Striping.
• In this level entire set or block of data
written onto the data disk and then the
parity is generated and stored on a
different set of disk.
• This level overcome at most one disk
failure.
• If more than one disk failure occur then
there is no way to recover the data.
• Both RAID 3 and RAID 4 require at least
three disk to implement RAID.
RAID Level 5
 RAID 5 consists of block-level striping
with distributed parity.
 Unlike in RAID 4, parity information is
distributed among the drives.
 It requires that all drives but one be
present to operate.
 Upon failure of a single drive,
subsequent reads can be calculated
from the distributed parity such that
no data is lost.
 RAID 5 requires at least three disks.
RAID Level 6
 RAID 6 extends RAID 5 by
adding another parity
block, thus, it uses block-
level striping with two
parity blocks distributed
across all member disks.

You might also like