Chapter 4.
INTERNAL REPRESENTATION OF FILES
Database Lab. 3
Contents
Inodes & how the kernel manipulates it Structure of a regular file & how to read, write Directories as a hierarchy of files Conversion of user file names to inodes Structure of the Super block Algorithms for assignment of disk inodes and disk blocks to files Other file types
2
Lower Level File system Algorithms
namei alloc free iget iput bmap buffer allocation algorithms ialloc ifree
getblk
brelse
bread
breada
bwrite
Inodes
Exits in a static form on disk The kernel reads them into an in-core inode to manipulate them
Disk Inodes (1/2)
Owner identifier
Individual owner Group owner Set of users who have access rights to a file File
Type
Regular, directory, character or block special
FIFO (pipe)
Access permissions
To protect by three classes(owner, group, other) Read, write, execute
5
Disk Inodes (2/2)
Access times
Last modified Number of links to the file
Number of links to the file Table of contents for the disk address of data in a file
Kernel saves the data in discontiguous disk blocks The Inodes identifies the disk blocks
Size
1 greater than the highest byte offset of data Byte offset 1000 -> the size of the file : 1001 bytes
6
Disk Inodes - Sample
owner mjb group os type regular file perms rwxr-xr-x accessed Oct 23 1984 1:45 P.M modified Oct 22 1984 10:30 A.M inode Oct 23 1984 1:30 P.M size 6030 bytes Disk addresses
Distinction between writing the contents of an inode to disk and writing the contents of a file to disk
7
in-core inode
Contents in addition to the fields fo the disk inode
status of the in-core inode
Locked Process In-core representation of the inode In-core representation of the file The file is a mount point
logical device number of the file system inode number
Inodes : Its position in the linear array on disk
pointers to other in-core inodes reference count
Number of instances of the active file
8
Accessing Inodes (1/4)
Kernel
Identifies particular inodes by their file system and inode number Allocates in-core inodes Allocates an in-core copy of an inode Map the device number and inode number into a hash queue Search the queue for the inode If not find the inode, allocates one from the free list and locks it Read the disk copy of the newly accessed inode into the in-core copy
Using algorithm iget
Accessing Inodes (2/4)
Computation of logical disk block
block num = ((inode number 1)/number of indoes per block) + start block of inode list Example
Block 2 : beginning of the inode list 8 inodes per block Inode number 8 is in disk block ? Inode number 9 is ?
10
Accessing Inodes (3/4)
Read the block using the algorithm bread Computation of the byte offset of the inode in the block
((inode # 1) modulo (# of inodes per block)) * size of disk inode Example
Each disk inode 64 bytes 1 block : 8 inodes Inode 8 starts at byte offset 448 in the disk block
11
Accessing Inodes (4/4)
Hold lock during execution of a system call for possibly consistency Release it at the end of system call The lock is free between system calls
To allow processes to share simultaneous access to a file
The reference count remains set between system calls
To prevent the kernel from reallocating an active in-core inode
12
Algorithm iget
Input: file system inode number Output: locked inode { while(not done) { if (inode in inode cache) { if (inode locked) { sleep(event inode becomes unlocked); continue; /* loop back to while */ } /* special processing for mount points (Chapter 5) */ if (inode on inode free list) remove from free list; increment inode reference count; return (inode); }
13
Algorithm iget continue
/* inode not in inode cache */ if (no inodes on free list) return (error); remove nre inode from free list; reset inode number and file system; remove inode from old hash queue, place on new one; read inode from disk (algorithim bread); initialize inode (e.g. reference count to 1); return (inode);
}
14
Release Inodes
Using algorithm iput
decrements in-core reference count Write the inode to disk
Reference count is 0 The in-core copy differs from the disk copy For caching
Add the inode on the free list of indoes
15
Algorithm iput
/* release (put) access to in-core inode */ Input : pointer to in-core inode Output : none { lock inode if not already locked; decrement inode reference count; if (reference count == 0) { if (inode link count == 0) { free disk blocks for file (algorithm free, section 4.7); set file type to 0; free inode ( algorithm ifree, section 4.6); } if (file accessed or inode changed or file changed) update disk inode; put inode on free list; } release inode lock; }
16
Structure of a Regular File
Table of contents in an inode
Location of a files data on disk A set of disk block #
Each block on a disk is addressable by number Why ? When a file expand or contract Fragmentations occur
17
Not contiguous file allocation strategy
Sample - Fragmentation
. 40 File A 50 File B 60 File C 70 .
. 40
File A 50
Free 60
File C 70
File B 85
File B was expanded Garbage collection too high cost
18
Structure of a Regular File UNIX System V
13 entries in the inode table of contents
10 direct, 1 indirect, 1 double indirect, 1 triple indirect block
a logical block = 1K bytes a block number is addressable by a 32 bit (4 bytes) integer a block can hold up to 256 block numbers
Assume
Byte Capacity of a File
10K bytes 256K bytes 64M bytes 16G bytes
10 direct blocks with 1K bytes each= 1 indirect block with 256 direct blocks= 1K*256 = 1 double indirect block with 256 indirect blocks = 256K*256= 1 triple indirect block with 256 double indirect blocks=64M*256=
19
Direct and Indirect Blocks in Inode UNIX System V
Inode
direct0 direct1 direct2 direct3
Data Blocks
direct4
direct5 direct6 direct7 direct8 direct9 single indirect double indirect triple indirect
20
..
Block Layout of a Sample File and its Inode
4096 228 45423 0 808th 1 disk block = 1024 bytes byte offset 9000, byte offset 350,000
0
11111 0 101 367 0 428 9156 824 9156 331
21
367 816th
331
75
3333
3333
Structure of a Regular File
Processes
access data in a file by byte offset view a file as a stream of bytes
The kernel
accesses the inode converts the logical file block into the appropriate disk block
22
Algorithm bmap
Input: (1) inode (2) byte offset Output : (1) block number in file system (2) byte offset into block (3) bytes of I/O in block (4) read ahead block number { calculate logical block number in file from byte offset; calculate start byte in clock for I/O; calculate number of bytes to copy to user; check if read-ahead applicable, mark inode; determine level of indirection; while (not at necessary level of indirection) { calculate index into inode or indirect block from logical block number in file; get disk block number from inode or indirect block; release buffer from previous disk read, if any (algorithm brelse); if ( no more levels of indirection ) return (block number); read indirect disk block ( algorithm bread); adjust logical block number in file according to level of indirection; } } 23
Directories (1/2)
A directory is a file
Its data is a sequence of entries Contents of each entries
an inode number and the name of a file
Path name is a null terminated character string divided by slash (/) UNIX System V
Maximum of component name : 14 characters Inode # : 2 bytes Size of a directory : 16 bytes
24
Directories (2/2)
Directory layout for /etc
Byte Offset in Directory Inode Number
0 16 32 48 224 256 83 2 1798 1276 0 188
File Name
. .. init fsck crash inittab
25
Conversion of a Path Name to an Inode
The initial access to a file is bye its path name
Open, chdir, link system calls
The kernel works internally with inodes rather than with path name
Converting the path names to inodes Using algorithm namei
parse the path name one component at a time convert each component into an inode finally return the inode of the input path name
26
Algorithm namei
/* convert path-name to inode */
Input : path name Output : locked inode { if (path name starts from root) working inode = root inode (algorithm iget); else working inode = current directory inode ( algorithm iget); while (there is more path name) { read next path name component from input; varify that working inode is of directory, access permissions OK; if (working inode is of root and component is ..) continue; /* loop back to while */ read directory ( working inode ) by repeated use of alogrithms bmap, bread and brelse; if (component matches an entry in directory (working inode)) { get inode number for matched component; release working inode (algorithm iput); working inode = inode of matched component (algorithm iget); } else return ( no inode); } return (working inode); }
27
Sample-namei(/etc/passwd)
1. 2. 3. 4.
Encounters / and gets the system root inode Current working inode = root Permission check Search root for a file etc
1. 2.
Access data in the root directory block by block Search each block one entry-etc
5.
Finding
1.
2.
Release the inode for root(iput) Allocate the inode for etc(iget) by inode # found
6. 7. 8.
Permission check for etc Search etc block by block for a directory struct. entry for passwd Finding
1.
2.
Relase the the inode for etc Allocate the inode for passwd
9.
Return that inode
28
Super Block
Contents
the size of the file system the number of free blocks a list of free blocks available the index of the next free block in the free block list the size of the inode list the number of free inodes a list of free inodes the index of the next free inode in the free inode list lock fields for the free block and free inode lists a flag indicating that the super block has been modified
29
Inode Assignment to A New File (1/4)
a known inode
Algorithm iget : to allocate Algorithm namei : to determine inode #
Algorithm ialloc
To assign a disk inode to a newly created file
To improve performance of searching a free inode To cache the numbers of free inodes
30
Super block contains an array
Algorithm ialloc /* allocate inode */
Input : file system Output : locked inode { while (not done) { if (super block locked) { sleep (event super block become free); continue; /* while loop */ } if (inode list in super block is empty) { lock super block; get remembered inode for free inode search; search disk for free inodes until super block full, or no more free inodes (algorithms bread and brelse); unlock super block; wake up (event super block becomes free); if (no free inodes found on disk) return (no inode); set remembered inode for next free inode search; }
31
/* there are inodes in super block inode list */ get inode numbers in super block inode list; get inode (algorithm iget); if (inode not free after all) /* !!! */ { write inode to disk; release inode (algorithm iput); continue; /* while loop */ } /* inode is free */ initialize inode; write inode to disk; decrement file system free inode count; return (inode); } }
32
Inode Assignment to a New File (2/4)
Super Block Free Inode List free inodes 83
18 19 48 20 empty array1
index
Super Block Free Inode List free inodes 83 18 19 20 index Assigning Free Inode from Middle of List
33
empty array2
Inode Assignment to a New File (3/4)
Super Block Free Inode List 470 empty
0 remembered inode Super Block Free Inode List 535 free inodes array1
index
array2 476 475 471
48
49
50
index
Assigning Free Inode Super Block List Empty
34
Algorithm ifree
Input Output { increment file system free inode count; if (super block locked) return; if (inode list full) { if (inode number less than remembered inode for search) set remembered inode for search = input inode number; } else store inode number in inode list; return; }
/* inode free */ : file system inode number : none
35
Inode Assignment to a New File (4/4)
535 remembered inode Original Super Block List of Free Inodes 499 remembered inode Free Inode 499 499 free inodes 476 475 471 index Free Inode 601
36
476 475 471 free inodes index
476 475 471 free inodes index
remembered inode
Race Condition
A Race Condition Scenario in Assigning Inodes
three processes A, B, and C are acting in time sequence
1.
2.
3.
The kernel, acting on behalf of process A, assigns inode I but goes to sleep before it copies the disk inode into the in-core copy. While process A is asleep, process B attempts to assign a new inode but free inode list is empty, and attempts assign free inode at an inode number lower than that of the inode that A is assigning. Process C later requests an inode and happens to pick inode I from the super block free list
37
Race Condition in Assigning Inodes (1/2)
Process A Assigns inode I from super block Sleeps while reading inode(a) Process B Process C
Tries to assign inode from super block
Super block empty(b)
Search for free inodes on disk, puts inode I in super block (c) Inode I in core Does usual activity Completes search, assigns another inode(d)
Assigns inode I from super block I is in use! Assign another inode(e)
time
38
Race Condition in Assigning Inodes (2/2)
time
(a) (b) I . Empty . free inodes J I K ... free inodes J . I
(c) (d) (e)
free inodes .
39
Allocation of Disk Blocks
When a process writes data to a file, the kernel must allocate disk blocks An array in the file system super block
To cache the numbers of free disk block in the file system
Mkfs
Organize the data blocks of a file system in a linked list The block contains an array of free disk block numbers One array entry is the number of the next block of the 40 linked list
Each link is a disk block
Linked list of free disk block number
109 109 103 100 ...
109
211 208 205 202
112
211
310 307 304 301 214
310
409 406 403 400 313
41
Algorithm alloc
Input Output { while (super block locked) sleep (event super block ont locked); remove block from super block free list; if (removed last block from free list) { lock super block; read block just taken from free list (algorithm bread); copy block numbers in block into super block; release block buffer (algorithm brelse); unlock super block; wake up processes (event super block not locked); } get buffer for block removed from super block list (algorithm getblk); zero buffer contents; decrement total count of free blocks; make super block modified; return buffer; }
42
/* file system block allocation */ : file system number : buffer for new block
Requesting and Freeing Disk Blocks (1/2)
super block list 109 109 211 208 205 202
.. 112
original configuration 109 949 .. 109 211 208 205 202 . 112 After freeing block number 949
43
Requesting and Freeing Disk Blocks (2/2)
109 .. 109 211 208 205 202 . 112 After assigning block number(949) 211 208 205 202 112
211 344 341 338 335 . 243 After assigning block number(109) replenish super block free list
44
Other File Types
Pipe
fifo (first-in-first-out)
its data is transient
Once data is read from a pipe, it cannot be read again The data is read in the order that it was written to the pipe, no deviation from that order
using only direct block
Special File (including block device, character device)
Specifying devices
The inode does not reference any data
The inode contains the major and minor device number major number
a device type such as terminal or disk the unit number of the device
45
minor number