0% found this document useful (0 votes)
25 views80 pages

File Systems

The document discusses the structure and operations of file systems, covering concepts such as file attributes, access methods, and directory structures. It explains various file operations, file sharing mechanisms, and protection schemes, as well as the implementation of file systems across different operating systems. Additionally, it highlights the importance of file system mounting and the organization of files within a disk structure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views80 pages

File Systems

The document discusses the structure and operations of file systems, covering concepts such as file attributes, access methods, and directory structures. It explains various file operations, file sharing mechanisms, and protection schemes, as well as the implementation of file systems across different operating systems. Additionally, it highlights the importance of file system mounting and the organization of files within a disk structure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

NATIONAL ECONOMICS UNIVERSITY

SCHOOL OF INFORMATION TECHNOLOGY AND DIGITAL ECONOMICS

CHAPTER 5
FILE-SYSTEM
OUTLINE

¡ File Concept
¡ Access Methods
¡ Disk and Directory Structure
¡ File-System Mounting
¡ File Sharing
¡ Protection
FILE CONCEPT

¡ Contiguous logical address space


¡ Types:
¡ Data
¡ numeric
¡ character
¡ binary

¡ Program

¡ Contents defined by file’s creator


¡ Many types
¡ Consider text file, source file, executable file 3
FILE ATTRIBUTES

¡ Name – only information kept in human-readable form


¡ Identifier – unique tag (number) identifies file within file system
¡ Type – needed for systems that support different types
¡ Location – pointer to file location on device
¡ Size – current file size
¡ Protection – controls who can do reading, writing, executing
¡ Time, date, and user identification – data for protection, security, and usage monitoring
¡ Information about files are kept in the directory structure, which is maintained on the disk
¡ Many variations, including extended file attributes such as file checksum
¡ Information kept in the directory structure
4
FILE INFO WINDOW ON MAC OS X

5
FILE OPERATIONS

¡ File is an abstract data type


¡ Create
¡ Write – at write pointer location
¡ Read – at read pointer location
¡ Reposition within file - seek
¡ Delete
¡ Truncate
¡ Open(Fi) – search the directory structure on disk for entry Fi, and move the content of entry
to memory
¡ Close (Fi) – move the content of entry Fi in memory to directory structure on disk
6
OPEN FILES

¡ Several pieces of data are needed to manage open files:


¡ Open-file table: tracks open files
¡ File pointer: pointer to last read/write location, per process that has the file open
¡ File-open count: indicate how many processes have the file open – to allow removal of data from
open-file table when last processes closes it
¡ Disk location of the file: cache ( on main memory) of data access information
¡ Access rights: per-process access mode information

7
OPEN FILE LOCKING

¡ Provided by some operating systems and file systems


¡ Similar to reader-writer locks
¡ Shared lock similar to reader lock – several processes can acquire concurrently
¡ Exclusive lock similar to writer lock

¡ Mediates access to a file


¡ Mandatory or advisory:
¡ Mandatory – access is denied depending on locks held and requested
¡ Advisory – processes can find status of locks and decide what to do

8
FILE LOCKING EXAMPLE – JAVA API

¡ In the Java API, acquiring a lock requires first obtaining the FileChannel for the file to
be locked.
¡ The lock() method of the FileChannel is used to acquire the lock.
¡ The API of the lock() method is
FileLock lock(long begin, long end, boolean shared)
¡ Setting shared to true is for shared locks; setting shared to false acquires the
lock exclusively.
¡ The lock is released by invoking the release() of the FileLock returned by the lock()
operation
9
FileLock sharedLock = null;
FileLock exclusiveLock = null;
RandomAccessFile raf = new RandomAccessFile("[Link]", "rw");
// get the channel for the file
FileChannel ch = [Link]();
// this locks the first half of the file - exclusive
exclusiveLock = [Link](0, [Link]() / 2, EXCLUSIVE);
/** Now modify the data . . . */
// release the lock
[Link]();
// this locks the second half of the file - shared
sharedLock = [Link]([Link]() / 2 + 1, [Link](), SHARED);
/** Now read the data . . . */
// release the lock
[Link]();
if (exclusiveLock != null)
[Link]();
if (sharedLock != null) FILE LOCKING EXAMPLE – JAVA API
[Link]();
FILE TYPES – NAME, EXTENSION

11
FILE STRUCTURE

¡ File types also can be used to indicate the internal structure of the file.
¡ None - sequence of words, bytes
¡ Simple record structure
¡ Lines
¡ Fixed length
¡ Variable length
¡ Complex Structures
¡ Formatted document
¡ Relocatable load file
¡ Who decides:
¡ Operating system
¡ Program
12
ACCESS METHODS

¡ Files store information.


¡ When it is used, this information must be accessed and read into computer memory.
¡ The information in the file can be accessed in several ways.
¡ Sequential Access
¡ Direct Access
¡ Other Access Methods

13
SEQUENTIAL-ACCESS FILE

¡ The simplest access method


¡ Information in the file is processed in order

¡ A read operation - read next() - reads the next portion of the file and automatically
advances a file pointer, which tracks the I/O location.
¡ Similarly, the write operation - write next() - appends to the end of the file and
advances to the end of the newly written material (the new end of file).
14
DIRECT ACCESS (OR RELATIVE ACCESS)

¡ A file is made up of fixed-length logical records that allow programs to read and write
records rapidly in no particular order
¡ Databases are often of this type
read n
write n
n position to n
read next
write next
rewrite n
¡ read(n), where n is the block number

15
SIMULATION OF SEQUENTIAL ACCESS ON DIRECT-ACCESS FILE

¡ Not all operating systems support both sequential and direct access for files.
¡ Some systems allow only sequential file access or only direct access
¡ Some systems require defining as sequential or direct when it is created
¡ We can easily simulate sequential access on a direct-access file by simply keeping a
variable cp that defines our current position,

16
OTHER ACCESS METHODS

¡ Can be built on top of base methods


¡ General involve creation of an index for the file
¡ Keep index in memory for fast determination of location of data to be operated on (consider UPC
code plus record of data about that item)
¡ If too large, index (in memory) of the index (on disk)
¡ IBM indexed sequential-access method (ISAM)
¡ Small master index, points to disk blocks of secondary index
¡ File kept sorted on a defined key
¡ All done by the OS
¡ VMS operating system provides index and relative files as another example (see next slide)

17
EXAMPLE OF INDEX AND RELATIVE FILES

18
DISK STRUCTURE

¡ Disk can be subdivided into partitions


¡ Disks or partitions can be RAID protected against failure
¡ Disk or partition can be used raw – without a file system, or formatted with a file
system
¡ Entity containing file system known as a volume
¡ Each volume containing file system also tracks that file system’s info in device
directory

19
A TYPICAL FILE-SYSTEM ORGANIZATION

20
EXAMPLE: SOLARIS FILE SYSTEMS

¡ a typical Solaris system may have dozens of file


systems of a dozen different types:

21
TYPES OF FILE SYSTEMS

¡ We mostly talk of general-purpose file systems


¡ But systems frequently have may file systems, some general and some special purpose
¡ Consider Solaris has:
¡ tmpfs – memory-based volatile FS for fast, temporary I/O
¡ objfs – interface into kernel memory to get kernel symbols for debugging
¡ ctfs – contract file system for managing daemons
¡ lofs – loopback file system allows one FS to be accessed in place of another
¡ procfs – kernel interface to process structures
¡ ufs, zfs – general purpose file systems

22
OPERATIONS PERFORMED ON DIRECTORY

¡ The directory can be viewed as a symbol table that translates file names into their
directory entries.
¡ Search for a file
¡ Create a file
¡ Delete a file
¡ List a directory
¡ Rename a file
¡ Traverse the file system

23
LOGICAL STRUCTURE OF A DIRECTORY

¡ The most common schemes for defining the logical structure of a directory:
¡ Single-Level Directory
¡ Two-Level Directory
¡ Tree-Structured Directories
¡ Acyclic-Graph Directories
¡ General Graph Directory

24
SINGLE-LEVEL DIRECTORY

¡ A single directory for all users

¡ Naming problem
¡ Grouping problem
25
TWO-LEVEL DIRECTORY

¡ Each user has his own user file directory (UFD)

¡ Solves the name-collision problem


¡ Can have the same file name for different user
¡ Efficient searching
¡ No grouping capability
26
TREE-STRUCTURED DIRECTORIES

27
TREE-STRUCTURED DIRECTORIES (CONT.)

¡ Efficient searching
¡ Grouping Capability
¡ In normal use, each process has a current directory that contains most of the files
that are of current interest to the process
¡ Change directories by calling change_directory()
¡ From one change_directory() system call to the next, all open() system calls search
the new current directory.

28
TREE-STRUCTURED DIRECTORIES (CONT)

¡ Absolute or relative path name:


¡ An absolute path name begins at the root and follows a path down to the specified
file, giving the directory names on the path.
¡ A relative path name defines a path from the current directory.
¡ For example:
¡ The current directory: root/spell/mail, then
¡ the relative path: prt/first
¡ the absolute path name: root/spell/mail/prt/first.

29
TREE-STRUCTURED DIRECTORIES (CONT)

¡ Creating a new file is done in current directory


¡ Delete a file:
rm <file-name>
¡ Creating a new subdirectory:
mkdir <dir-name>
Example: if in current directory /mail
mkdir count

Deleting “mail” Þ deleting the entire subtree rooted by “mail”

30
ACYCLIC-GRAPH DIRECTORIES

¡ Have shared subdirectories and files:

a graph with no cycles


31
ACYCLIC-GRAPH DIRECTORIES

¡ Shared files and subdirectories can be implemented in several ways:


¡ A common way, exemplified by many of the UNIX systems, is to create a new directory entry called
a link.
¡ Another common approach to implementing shared files is simply to duplicate all information
about them in both sharing directories

32
FILE SYSTEM MOUNTING

¡ A file system must be mounted before it can be accessed


¡ A unmounted file system (i.e., Fig. 11-11(b)) is mounted at a mount point

users

bill fred sue jane

help doc
prog

(a) (b)
33
MOUNT POINT

34
FILE SHARING

¡ Sharing of files on multi-user systems is desirable


¡ Sharing may be done through a protection scheme
¡ On distributed systems, files may be shared across a network
¡ Network File System (NFS) is a common distributed file-sharing method
¡ If multi-user system
¡ User IDs identify users, allowing permissions and protections to be per-user
Group IDs allow users to be in groups, permitting group access rights
¡ Owner of a file / directory
¡ Group of a file / directory

35
FILE SHARING – REMOTE FILE SYSTEMS

¡ Uses networking to allow file system access between systems


¡ Manually via programs like FTP
¡ Automatically, seamlessly using distributed file systems
¡ Semi automatically via the world wide web

¡ Client-server model allows clients to mount remote file systems from servers
¡ Server can serve multiple clients
¡ Client and user-on-client identification is insecure or complicated
¡ NFS (Network File System) is standard UNIX client-server file sharing protocol
¡ CIFS (Common Internet File System) is standard Windows protocol
¡ Standard operating system file calls are translated into remote calls
36
PROTECTION

¡ File owner/creator should be able to control:


¡ what can be done
¡ by whom

¡ Types of access
¡ Read
¡ Write
¡ Execute
¡ Append
¡ Delete
¡ List 37
ACCESS LISTS AND GROUPS

¡ Mode of access: read, write, execute


¡ Three classes of users on Unix / Linux
RWX
a) owner access 7 Þ 111
RWX
b) group access 6 Þ 110
RWX
c) public access 1 Þ 001

¡ Ask manager to create a group (unique name), say G, and add some users to the
group.
¡ For a particular file (say game) or subdirectory, define an appropriate access.
38
A SAMPLE UNIX DIRECTORY LISTING

39
WINDOWS 7 ACCESS-CONTROL LIST MANAGEMENT

40
OUTLINE

¡ File-System Structure
¡ File-System Implementation
¡ Directory Implementation
¡ Allocation Methods
¡ Free-Space Management
¡ Efficiency and Performance
¡ Recovery

41
BACKGROUND

¡ To improve I/O efficiency, I/O transfers between memory and disk are performed in
units of blocks. (1-n sectors) = n*512 bytes
¡ File system resides on secondary storage (disks)
¡ Provided user interface to storage, mapping logical to physical
¡ Provides efficient and convenient access to disk by allowing data to be stored, located retrieved
easily
¡ File control block (inode in UNIX) – storage structure consisting of information
about a file
¡ Device driver controls the physical device

42
FILE SYSTEM LAYERS

¡ File system organized into layers


¡ Each level in the design uses the features of lower
levels to create new features for use by higher
levels.
¡ Logical layers can be implemented by any coding
method according to OS designer

43
FILE SYSTEM LAYERS (CONT.)

¡ Logical file system manages metadata information


¡ manages the directory structure to provide the file-organization module with the information the
latter needs, given a symbolic file name
¡ maintains file structure via file-control blocks (FCB) which contains Information about the file,
including ownership, permissions, and location of the file contents
¡ File organization module understands files, logical address, and physical blocks
¡ Translates logical block # to physical block #
¡ Manages free space, disk allocation

44
FILE SYSTEM LAYERS

¡ Basic file system


¡ Input command like “retrieve block 123” translates to device driver (for example, drive 1, cylinder
73, track 2, sector 10)
¡ Also manages memory buffers and caches (allocation, freeing, replacement)
¡ Buffers hold data in transit
¡ Caches hold frequently used data

¡ Device drivers manage I/O devices at the I/O control layer


¡ Input commands like “read drive1, cylinder 72, track 2, sector 10, into memory location 1060”
¡ outputs low-level hardware specific commands to hardware controller

45
FILE SYSTEM

¡ Many file systems, sometimes many within an operating system, Each with its own
format:
¡ CD-ROM is ISO 9660
¡ Unix has UFS, FFS;
¡ Windows has FAT, FAT32, NTFS as well as floppy, CD, DVD Blu-ray
¡ Linux has more than 40 types, the standard Linux file system is known as the extended file system,
with the most common versions being ext3 and ext4
¡ New ones still arriving – ZFS, GoogleFS, Oracle ASM, FUSE

46
FILE-SYSTEM IMPLEMENTATION

¡ To create a new file, an application program calls the logical file system.
¡ It allocates a new File Control Block (FCB) which contains many details about the
file:

47
OPEN() FILE

¡ The open() call passes a file name to the logical file system.
¡ The open() system call first searches the system-wide open-file table to see if the
file is already in use by another process.
¡ If it is, a per-process open-file table entry is created pointing to the existing system-wide open-file
table. This algorithm can save substantial overhead.
¡ If the file is not already open, the directory structure is searched for the given file name. Parts of
the directory structure are usually cached in memory to speed directory operations.

48
OPEN() FILE

¡ Once the file is found, the FCB is copied into a system-wide open-file table in
memory.
¡ This table not only stores the FCB but also tracks the number of processes that have
the file open
49
READ() & WRITE() FILE

50
CLOSE() FILE

¡ When a process closes the file, the per-process table entry is removed, and the
system-wide entry’s open count is decremented.
¡ When all users that have opened the file close it, any updated metadata is copied back
to the disk-based directory structure, and the system-wide open-file table entry is
removed.

51
ALLOCATION METHODS

¡ Three major methods of allocating disk space are in wide use:


1. Contiguous
2. Linked
3. Indexed.
more common for a system to use one
¡ Although some systems support all three, it is
method for all files within a file-system type.

52
CONTIGUOUS ALLOCATION

¡ An allocation method refers to how disk blocks are allocated for files:
¡ Contiguous allocation – each file occupies set of contiguous blocks
¡ Best performance in most cases
¡ Simple – only starting location (block #) and length (number of blocks) are required
¡ Problems include finding space for file, knowing file size, external fragmentation, need for
compaction off-line (downtime) or on-line

53
CONTIGUOUS ALLOCATION (CONT)

¡ Mapping from logical to physical

54
CONTIGUOUS ALLOCATION (CONT)

¡ Problems:
¡ finding space for file (determining how much space is needed for a file?
¡ external fragmentation

55
EXTENT-BASED SYSTEMS

¡ Many newer file systems (i.e.,Veritas File System) use a modified contiguous allocation
scheme
¡ Extent-based file systems allocate disk blocks in extents
¡ An extent is a contiguous block of disks
¡ Extents are allocated for file allocation
¡ A file consists of one or more extents

56
LINKED ALLOCATION

¡ Linked allocation – each file a linked list of blocks


¡ File ends at nil pointer
¡ Each block contains pointer to next block
¡ No external fragmentation, compaction
¡ Free space management system called when new block needed

57
LINKED ALLOCATION

¡ With linked allocation, each file is a linked list of disk blocks; the disk blocks may be
scattered anywhere on the disk
¡ The directory contains a pointer to the first and last blocks of the file.
¡ For example, a file of five blocks might start at block 9 and continue at block 16, then
block 1, then block 10, and finally block 25.
¡ Each block contains a pointer to the next block.
¡ If each block is 512 bytes in size, and a disk address (the pointer) requires 4 bytes,
then the user sees blocks of 508 bytes.

58
LINKED ALLOCATION

59
LINKED ALLOCATION

¡ Create a new file -> create a new entry in the directory.


¡ Each directory entry has a pointer to the first disk block of the file.
¡ This pointer is initialized to null (the end-of-list pointer value) to signify an empty file.
¡ The size field is also set to 0.
¡ A write to the file causes the free-space management system to find a free block, and
this new block is written to and is linked to the end of the file.
¡ To read a file, we simply read blocks by following the pointers from block to block.
¡ The size of a file need not be declared when the file is created.
¡ A file can continue to grow as long as free blocks are available.
60
LINKED ALLOCATION

¡ Linked allocation has disadvantages:


¡ it can be used effectively only for sequential-access files. To find the ith block of a file, we must start at
the beginning of that file and follow the pointers until we get to the ith block.
¡ The space required for the pointers. If a pointer requires 4 bytes out of a 512-byte block, then 0.78
% of the disk is being used for pointers, rather than for information.
¡ The usual solution to this problem is to collect blocks into multiples, called clusters,
and to allocate clusters rather than blocks
¡ The cost of this approach is an increase in internal fragmentation

61
LINKED ALLOCATION

¡ Another problem of linked allocation is reliability.


¡ Recall that the files are linked together by pointers scattered all over the disk, and
consider what would happen if a pointer were lost or damaged.
¡ A bug in the operating-system software or a disk hardware failure might result in
picking up the wrong pointer.
¡ This error could in turn result in linking into the free-space list or into another file.
¡ One partial solution is to use doubly linked lists, and another is to store the file name
and relative block number in each block.
¡ However, these schemes require even more overhead for each file
62
ALLOCATION METHODS – LINKED (CONT.)

¡ An important variation on linked


allocation is the use of a file-allocation
table (FAT)
¡ Used by the MS-DOS operating system
¡ Benefit: disk head can find the location
of any block in the FAT
¡ Problem: can result in a significant
number of disk head seeks, unless the
FAT is cached

63
INDEXED ALLOCATION

¡ Each file has its own index block(s) of pointers to its data blocks
¡ Logical view:

index table
64
INDEXED ALLOCATION

¡ The directory contains the address of the index block.


¡ To find and read the ith block, we use the pointer in the ith index-block entry.
¡ This scheme is similar to the paging scheme
¡ When the file is created, all pointers in the index block are set to null.
¡ When the ith block is first written, a block is obtained from the free-space manager, and its address is
put in the ith index-block entry.

65
EXAMPLE OF INDEXED ALLOCATION

66
INDEXED ALLOCATION

¡ The pointer overhead of the index block is generally greater than the pointer
overhead of linked allocation.
¡ Consider a common case in which we have a file of only one or two blocks:
¡ With linked allocation, we lose the space of only one pointer per block
¡ With indexed allocation, an entire index block must be allocated, even if only one or two pointers
will be non-null.

67
INDEXED ALLOCATION

¡ This point raises the question of how large the index block should be.
¡ Every file must have an index block, so we want the index block to be as small as
possible.
¡ Mechanisms for this purpose include the following:
¡ Linked scheme: Link together several index blocks
¡ Multilevel index: Like multi-level paging
¡ Combined scheme: Another alternative, used in UNIX-based file systems

68
INDEXED ALLOCATION – LINKED SCHEME (CONT.)

¡ An index block might contain a small header giving a set of the first disk-block
addresses.
¡ The last address is a pointer to another index block.

69
INDEXED ALLOCATION – MAPPING (CONT.)

¡ With 4,096-byte blocks, we could store 1,024 four-byte pointers in an index block.
Two levels of indexes allow 1,048,576 data blocks and a file size of up to 4 GB.

70
COMBINED SCHEME: UNIX UFS

71
PERFORMANCE

¡ Best method depends on file access type


¡ Contiguous great for sequential and random

¡ Linked good for sequential, not random


¡ Declare access type at creation -> select either contiguous or linked
¡ Indexed more complex
¡ Single block access could require 2 index block reads then data block read
¡ Clustering can help improve throughput, reduce CPU overhead

72
DIRECTORY IMPLEMENTATION

¡ Linear list of file names with pointer to the data blocks


¡ Simple to program
¡ Time-consuming to execute
¡ Linear search time
¡ Could keep ordered alphabetically via linked list or use B+ tree

¡ Hash Table – linear list with hash data structure


¡ Decreases directory search time
¡ Collisions – situations where two file names hash to the same location
¡ Only good if entries are fixed size, or use chained-overflow method

73
FREE-SPACE MANAGEMENT

¡ File system maintains free-space list to track available blocks/clusters


¡ (Using term “block” for simplicity)
0 1 2 n-1
¡ Bit vector or bit map (n blocks)

!"#
1 Þ block[i] free
bit[i] =
0 Þ block[i] occupied
¡ For example, consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26,
and 27 are free and the rest of the blocks are allocated. The free-space bit map would
be:
001111001111110001100000011100000 ...
74
FREE-SPACE MANAGEMENT (CONT.)

¡ Bit map requires extra space (in main memory)


¡ A 1.3-GB disk with 512-byte blocks would need a bit map of over 332 KB to track its free blocks,
although clustering the blocks in groups of four reduces this number to around 83 KB per disk.
¡ A 1-TB disk with 4-KB blocks requires 256 MB to store its bit map.

¡ Given that disk size constantly increases, the problem with bit vectors will continue to
escalate as well.

75
LINKED FREE SPACE LIST ON DISK

¡ Linked list (free list)


¡ Keeping a pointer to the first free block
in a special location on the disk and
caching it in memory.
¡ Update last point when release

76
FREE-SPACE MANAGEMENT (CONT.)

¡ Grouping
¡ Modify linked list to store address of next n-1 free blocks in first free block, plus a pointer to next
block that contains free-block-pointers (like this one)
¡ Counting
¡ Because space is frequently contiguously used and freed, with contiguous-allocation allocation,
extents, or clustering
¡ Keep address of first free block and count of following free blocks
¡ Free space list then has entries containing addresses and counts

77
PERFORMANCE

¡ Some systems maintain a separate


section of main memory for a buffer
cache, where blocks are kept under the
assumption that they will be used again
shortly.
¡ Other systems cache file data using a
page cache which uses virtual memory
techniques to cache file data as pages
rather than as file-system-oriented
blocks
78
RECOVERY

¡ Consistency checking – compares data in directory structure with data blocks on disk, and
tries to fix inconsistencies
¡ Can be slow and sometimes fails

¡ Use system programs to back up data from disk to another storage device (magnetic tape,
other magnetic disk, optical)
¡ Recover lost file or disk by restoring data from backup

79
RECOVERY (CONT)

¡ A typical backup schedule may then be as follows:


¡ Day 1. Copy to a backup medium all files from the disk. This is called a full backup.
¡ Day 2. Copy to another medium all files changed since day 1. This is an incremental backup.
¡ Day 3. Copy to another medium all files changed since day 2.
...
¡ Day N. Copy to another medium all files changed since day N−1. Then go back to day 1.

¡ Using this method, we can restore an entire disk by starting restores with the full
backup and continuing through each of the incremental backups

80

You might also like