LECTURES 9 & 10: FILE
SYSTEMS & FILE
SYSTEM
IMPLEMENTATION
OVERVIEW
File Systems
File Systems
Access Methods
Implementation
Disk and Directory Structure
File-System Mounting File-System Structure
File Sharing File-System Implementation
Protection Directory Implementation
Allocation Methods
Free-Space Management
Efficiency and Performance
2
OBJECTIVES
To explain the function of file systems
To describe the interfaces to file systems
To discuss file-system design tradeoffs, including access
methods, file sharing, file locking, and directory structures
To explore file-system protection
To describe the details of implementing local file systems
and directory structures
To describe the implementation of remote file systems
To discuss block allocation and free-block algorithms and
trade-offs
3
FILE CONCEPT
Contiguous logical address space
Types:
Data
numeric
character
binary
Program
Contents defined by file’s creator (user)
Many types
Consider text file, source file, executable file
4
FILE ATTRIBUTES
Name – only information kept in human-readable form
Identifier – unique tag (number) identifies file within file system
Type – needed for systems that support different types
Location – pointer to file location on device (path)
Size – current file size
Protection – controls who can do reading, writing, executing
Time, date, and user identification – data for protection, security, and usage monitoring
Information about files are kept in the directory structure, which is maintained on the disk
Many variations, including extended file attributes such as file checksum
Information kept in the directory structure
5
FILE STRUCTURE
None - sequence of words, bytes
Simple record structure
Lines
Fixed length
Variable length
Complex Structures
Formatted document
Relocatable load file
Can simulate last two with first method by inserting appropriate control
characters
Who decides the structure:
Operating system
Program
6
FILE OPERATIONS
File is an abstract data type
Create
Write – at write pointer location
Read – at read pointer location
Reposition within file - seek
Delete – remove a file
Truncate - act of making something shorter or
quicker, especially by removing the end of it
Open(Fi) – search the directory structure on disk for
entry Fi, and move the content of entry to memory
Close (Fi) – move the content of entry Fi in memory
to directory structure on disk
7
OPEN FILES
Several pieces of data are needed to manage open files:
Open-file table - to tracks open files
File pointer: pointer to last read/write location, per process that has
the file open
File-open count - count the number of times a file is open – to
allow removal of data from open-file table when last processes closes
it
Disk location of the file: cache of data access information
Access rights: per-process access mode information 8
OPEN FILE LOCKING
Provided by some operating systems and file systems
Similar to reader-writer locks
Shared lock like reader lock – several processes can acquire concurrently
Exclusive lock like writer lock
Mediates access to a file
Mandatory or advisory:
Mandatory – access is denied depending on locks held and requested
Advisory – processes can find status of locks and decide what to do
9
FILE TYPES – NAME,
EXTENSION
10
SEQUENTIAL-ACCESS
FILE
11
ACCESS METHODS
12
SIMULATION OF SEQUENTIAL
ACCESS ON DIRECT-ACCESS FILE
13
OTHER ACCESS
METHODS
Can be built on top of base methods
General involve creation of an index for the file
Keep index in memory for fast determination of location of data to be
operated on (consider UPC code plus record of data about that item)
If too large, index (in memory) of the index (on disk)
Exmp: IBM Indexed Sequential-access Method (ISAM)
Small master index, points to disk blocks of secondary index
File kept sorted on a defined key
All done by the OS
VMS operating system provides index and relative files as another
example (see next slide)
14
EXAMPLE OF INDEX AND
RELATIVE FILES
Logical record number = memory location 15
DIRECTORY
STRUCTURE
A collection of nodes containing information about all files
Directory
Files
F1 F2 F4
F3
Fn
Both the directory structure and the files reside on disk
16
DISK STRUCTURE
Disk can be subdivided into partitions
Disks or partitions can be RAID !=0 (backup) protected against failure
Disk or partition can be used raw – without a file system, or formatted
with a file system
Partitions also known as minidisks, slices
Entity containing file system known as a volume
Each volume containing file system also tracks that file system’s info in
device directory or volume table of contents
As well as general-purpose file systems there are many special-
purpose file systems, frequently all within the same operating
system or computer
17
A TYPICAL FILE-SYSTEM
ORGANIZATION
18
OPERATIONS
PERFORMED ON
DIRECTORY
19
ORGANIZATION OF THE
DIRECTORY
Efficiency – locating a file quickly (Must be effective)
Naming – convenient to users
Two users can have same name for different files
The same file can have several different names
Grouping – logical grouping of files by properties, (e.g., all Java
programs, all games, …)
Single Directory for all users
Naming problem and Grouping Problem
Separate Directory for each users
Can have the same file name for different user
Efficient searching
No grouping capability
20
21
22
23
24
Tree-Structured Directories
Advantages
•Commonly Use
•Efficient searching
•Grouping Capability
Each user has a Current directory (working
directory)
cd /spell/mail/prog
type list
25
26
TREE-STRUCTURED
DIRECTORIES
27
28
FILE SYSTEM
MOUNTING
(a) A file system must be mounted before it can be
accessed
(b) A unmounted file system is mounted at a
mount point
29
30
FILE SHARING
Sharing of files on multi-user systems is desirable
Sharing may be done through a protection scheme
On distributed systems, files may be shared across a network
Network File System (NFS) is a common distributed file-sharing method
Remote file systems add new failure modes, due to network failure, will affect server
failure
Recovery from failure can involve state information about status of each remote
request
Stateless protocols such as NFS v3 include all information in each request, allowing
easy recovery but less security
If multi-user system
User IDs identify users, allowing permissions and protections to be per-user
Group IDs allow users to be in groups, permitting group access rights
Owner of a file / directory
Group of a file / directory 31
FILE SHARING – REMOTE
FILE SYSTEMS
Uses networking to allow file system access between systems
Manually via programs like FTP
Automatically, seamlessly using distributed file systems (DFS)
Semi automatically via the world wide web (WWW)
Client-server model allows clients to mount remote file systems from
servers
Server can serve multiple clients
Client and user-on-client identification is insecure or complicated
NFS is standard UNIX client-server file sharing protocol
CIFS is standard Windows protocol
Standard operating system file calls are translated into remote calls
Distributed Information Systems (distributed naming services) such as
LDAP, DNS, NIS implement unified access to information needed for
remote computing. 32
PROTECTION
File owner/creator should be able to control:
what can be done
By whom
Types of access
Read
Write
Execute
Append
Delete
List
33
ACCESS LISTS AND
GROUPS
Mode of access: read, write, execute
Three classes of users on Unix / Linux
RWX
a) owner access 7 111
RWX
b) group access 6 110
RWX
c) public access 1 001
Ask manager to create a group (unique name), say G,
and add some users to the group.
For a particular file (say game) or subdirectory, define
an appropriate access.
owner group public
chmod 761 game
Change a group ownership of a file or directory:
chgrp G game
34
A SAMPLE UNIX
DIRECTORY LISTING
35
DIRECTORY
IMPLEMENTATION
tree
Linear list of file names with pointer
to the data blocks
Simple to program
Time-consuming to execute
Linear search time
Could keep ordered alphabetically via
linked list or use B+
Hash Table – linear list with hash
data structure
Decreases directory search time
Collisions – situations where two file
names hash to the same location
Only good if entries are fixed size, or 36
use chained-overflow method
ALLOCATION METHODS
- CONTIGUOUS
An allocation method refers to how disk blocks are
allocated for files:
Contiguous allocation – each file occupies set of
contiguous blocks
Best performance in most cases
Simple – only starting location (block #) and length
(number of blocks) are required (based on frame
concept)
Problems include finding space for file, knowing file
size, external fragmentation, need for compaction
off-line (downtime) or on-line
37
CONTIGUOUS ALLOCATION
OF DISK SPACE
38
ALLOCATION METHODS -
LINKED
Linked allocation – each file a linked list of blocks
File ends at nil pointer
No external fragmentation
Each block contains pointer to next block
No compaction
Free space management system called when new block needed
Improve efficiency by clustering blocks into groups but increases internal
fragmentation
Reliability can be a problem
Locating a block can take many I/Os and disk seeks
FAT (File Allocation Table) variation
Beginning of volume has table, indexed by block number
Much like a linked list, but faster on disk and cacheable
New block allocation simple
39
LINKED ALLOCATION
Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk
40
FILE-ALLOCATION TABLE
41
ALLOCATION METHODS
-Indexed
INDEXED
allocation
Each file has its own index block(s) of pointers to its data blocks
Logical view
index table
42
EXAMPLE OF INDEXED
ALLOCATION
43
COMBINED SCHEME: UNIX UFS
(4K BYTES PER BLOCK, 32-BIT ADDRESSES)
Note: More index blocks
than can be addressed
with 32-bit file pointer
44
PERFORMANCE
Best method depends on file access type
Contiguous great for sequential and random
Linked good for sequential, not random
Declare access type at creation -> select either contiguous or linked
Indexed more complex
Single block access could require 2 index block reads then data block read
Clustering can help improve throughput, reduce CPU overhead
Adding instructions to the execution path to save one disk I/O is reasonable
Intel Core i7 Extreme Edition 990x (2011) at 3.46Ghz = 159,000 MIPS
http://en.wikipedia.org/wiki/Instructions_per_second
Typical disk drive at 250 I/Os per second
159,000 MIPS / 250 = 630 million instructions during one disk I/O
Fast SSD drives provide 60,000 IOPS
159,000 MIPS / 60,000 = 2.65 millions instructions during one disk I/O
45
FREE-SPACE
MANAGEMENT
File system maintains free-space list to track available blocks/clusters
(Using term “block” for simplicity)
Bit vector or bit map (n blocks)
0 1 2 n-1
…
1 block[i] free
bit[i] =
0 block[i] occupied
Block number calculation
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit
CPUs have instructions to return offset within word of first “1” bit
46
FREE-SPACE MANAGEMENT
(CONT.)
Bit map requires extra space
Example:
block size = 4KB = 212 bits
disk size = 240 bits (1 terabyte)
n = 240/212 = 228 bits (or 256 MB)
if clusters of 4 blocks -> 256/4 = 64MB of memory
Easy to get contiguous files
Linked list (free list)
Cannot get contiguous space easily
No waste of space
No need to traverse the entire list (if # free blocks recorded)
47
FREE-SPACE MANAGEMENT (CONT.)
Grouping
Modify linked list to store address of next n-1 free blocks in first free
block, plus a pointer to next block that contains free-block-pointers
(like this one)
Counting
Because space is frequently contiguously used and freed, with
contiguous-allocation allocation, extents, or clustering
Keep address of first free block and count of following free blocks
Free space list then has entries containing addresses and counts
48
LINKED FREE SPACE LIST
ON DISK
49
EFFICIENCY AND
PERFORMANCE
Efficiency dependent on:
Disk allocation and directory algorithms
Types of data kept in file’s directory entry
Pre-allocation or as-needed allocation of metadata
structures
Fixed-size or varying-size data structures
Performance
Keeping data and metadata close together
Buffer cache – separate section of main memory
for frequently used blocks
Synchronous writes sometimes requested by apps
or needed by OS
No buffering / caching – writes must hit disk before
acknowledgement
50
EFFICIENCY AND
PERFORMANCE CONT
Asynchronous writes more common, buffer-able, faster
Free-behind and read-ahead – techniques to optimize
sequential access
Reads frequently slower than writes
51
RECOVERY
Consistency checking – compares data in directory structure
with data blocks on disk, and tries to fix inconsistencies
Can be slow and sometimes fails
Use system programs to back up data from disk to another
storage device (magnetic tape, other magnetic disk, optical)
Recover lost file or disk by restoring data from backup
52
LOG STRUCTURED
FILE SYSTEMS
Log structured (or journaling) file systems record each metadata update to
the file system as a transaction
All transactions are written to a log (similar like the logbook)
A transaction is considered committed once it is written to the log (sequentially)
Sometimes to a separate device or section of disk
However, the file system may not yet be updated
The transactions in the log are asynchronously written to the file system
structures
When the file system structures are modified, the transaction is removed from the
log
If the file system crashes, all remaining transactions in the log must still be
performed
Faster recovery from crash, removes chance of inconsistency of metadata 53
END OF CHAPTER 9 - 10
54