0% found this document useful (0 votes)
22 views50 pages

L10 - File Management Introduction

Uploaded by

hcwsherman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views50 pages

L10 - File Management Introduction

Uploaded by

hcwsherman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

File System Management

File System Introduction

Lecture 10

1
Overview
 File System
 Definition
 Vs Memory Management
 Motivation

 File
 Metadata
 Operations

 Directory
 Directory Structure
2
File System: Motivation
 Physical memory is volatile
 Use external storage to store persistent information

 Direct access to the storage media is not


portable:
 Dependent on hardware specification and
organization (see next slide for example)

 File System provides:


 An abstraction on top of the physical media
 A high level resource management scheme
 Protection between processes and users
 Sharing between processes and users
3
Recap: Hard Disk Layout

A = Track
B = Geometric Sector
C = Track Sector
D = Cluster

Images taken from wikipedia, MSDN (Microsoft)


4
File System: General Criteria
 Self-Contained:
 Information stored on a media is enough to
describe the entire organization
 Should be able to "plug-and-play" on another
system
 Persistent:
 Beyond the lifetime of OS and processes
 Efficient:
 Provides good management of free and used
space
 Minimum overhead for bookkeeping information

5
Memory Management vs File Management
Memory File System
Management Management
Underlying
RAM Disk
Storage
Access Speed Constant Variable disk I/O time
Unit of
Physical memory address Disk sector
Addressing
Address space for
process Non-volatile data
Usage
Implicit when process Explicit access
runs
Many different FS:
Paging/Segmentation: ext* (Linux), FAT*
Organization
determined by HW & OS (Windows), HFS* (Mac
OS)etc.
6
Key Topics
File System Abstraction

• Discuss the logical entities present in file system


• E.g. Files / Directories

File System Implementation


Approaches
• Common implementation schemes
• Discuss pros/cons

File System Case Studies

• Delve into a few commonly used file systems

7
You mean files and folders are not real?

FILE SYSTEM ABSTRACTIONS

8
File System Abstraction
 File System:
 Consists of a collection of files and directory
structures
 File: An abstract storage of data
 Directory (Folder): Organization of files

 Provides an abstraction of accessing and using


the above

 Look at the two abstractions closely next:


 File
 Directory (Folder)

9
File: Overview
 Basic Definition

 File Metadata

 File Data
 File structure
 Access Methods

 File Operations

10
File: Basic Description
 Represent a logical unit of information created
by process
 An abstraction
 Essentially an Abstract Data Type:
 A set of common operations with various possible
implementation

 Contains:
 Data: Information structured in some ways

 Metadata: Additional information associated with the


file
 Also known as file attributes

11
File Metadata
Name: A human readable reference to the file

Identifier: A unique id for the file used internally by FS

Indicate different type of files


Type:
E.g. executable, text file, object file, directory etc

Size: Current size of file (in bytes, words or blocks)

Access permissions, can be classified as reading, writing


Protection:
and execution rights

Time, date and


owner Creation, last modification time, owner id etc
information:

Table of
Information for the FS to determine how to access the file
content:

12
File Name
 Different FS has different naming rule
 To determine valid file name

 Common naming rule:


 Length of file name
 Case sensitivity
 Allowed special symbols
 File extension
 Usual form Name.Extension
 On some FS, extension is used to indicate file type

13
File Type
 An OS commonly supports a number of file
types

 Each file type has:


 An associated set of operations
 Possibly a specific program for processing

 Common file types:


 Regular files: contains user information
 Directories: system files for FS structure
 Special files: character/block oriented

14
Two Major Types of Regular Files
 ASCII files:
 Example: text file, programming source codes, etc
 Can be displayed or printed as is

 Binary files:
 Example: executable, Java class file, pdf file,
mp3/4, png/jpeg/bmp etc

 Have a predefined internal structure that can be


processed by specific program
 JVM to execute Java class file
 PDF reader for pdf file etc

15
Distinguishing File Type
1. Use file extension as indication:
 Used by Windows OS
 e.g. XXX.docx  Words document
 Change of extension implies a change in file
type!

2. Use embedded information in the file:


 Used by Unix
 Usually stored at the beginning of the file
 Commonly known as magic number

16
File Protection
 Controlled access to the information stored in
a file

 Type of access:
 Read: Retrieve information from file
 Write: Write/Rewrite the file
 Execute: Load file into memory and execute it
 Append: Add new information to the end of file
 Delete: Remove the file from FS
 List: Read metadata of a file

17
File Protection: How?
 Most common approach:
 Restrict access base on the user identity

 Most general scheme:


 Access Control List
 A list of user identity and the allowed access types
 Pros: Very customizable
 Cons: Additional information associated with file

 A common condensed file protection scheme


is discussed next

18
File Protection: Permission Bits
 Classified the users into three classes:
1. Owner: The user who created the file
2. Group: A set of users who need similar access
to a file
3. Universe: All other users in the system
 Example (Unix)
 Define permission of three access types
(Read/Write/Execute) for the 3 classes of users
 Use "ls –l" to see the permission bits for a file

rwxr--r-- somefile.eg

Owner Universe
Group

19
File Protection: Access Control List
 In Unix, Access Control List (ACL) can be:
 Minimal ACL (the same as the permission bits)
 Extended ACL (added named users / group )
"getfacl" is the command
$ getfacl exampleDir to get ACL information

# file: exampleDir
# owner: sooyj Permission for
# group: compsc Specific User
user::rwx
user:aaron:rwx
group::r-x Permission for
group:cohort17:rwx Specific Group
mask::rwx
other::---
Permission
"upperbound"
20
Operations on File Metadata
 Rename:
 Change filename

 Change attributes:
 File access permissions
 Dates
 Ownership
 etc

 Read attribute:
 Get file creation time

21
File Data: Structure
 Array of bytes:
 The traditional Unix view
 No interpretation of data: just raw bytes
 Each byte has an unique offset (distance) from
the file start
 Fixed length records:
 Array of records, can grow/shrink
 Can jump to any record easily:
 Offset of the Nth record = size of Record * (N-1)
 Variable length records
 Flexible but harder to locate a record

22
File Data: Access Methods
 Sequential Access:
 Data read in order, starting from the beginning
 Cannot skip but can be rewound

 Random Access:
 Data can be read in any order
 Can be provided in two ways:
1. Read( Offset ): Every read operation explicitly
state the position to be accessed
2. Seek( Offset ): A special operation is provided to
move to a new location in file
 E.g. Unix and Windows uses (2)

23
File Data: Access Methods (cont )
 Direct Access:
 Used for file contains fixed-length records
 Allow random access to any record directly

 Very useful where there is a large amount of


records
 e.g. In database

 The basic random access method can be view as


a special case:
 Where each record == one byte

24
File Data: Generic Operations
Create: New file is created with no data
Performed before further operations
Open: To prepare the necessary information for file
operations later
Read: Read data from file, usually starting from current
position
Write: Write data to file, usually starting from current
position
Also known as seek
Repositioning: Move the current position to a new location
No actual Read/Write is performed
Removes data between specified position to end of
Truncate:
file

25
File Operations as System Calls
 OS provides file operations as system calls:
 Provide protection, concurrent and efficient
access
 Maintain information

 Information kept for an opened file:


 File Pointer: Current location in file
 Disk Location: Actual file location on disk
 Open Count: How many process has this file
opened?
 Useful to determine when to remove the entry in table

26
File Operations as System Calls (cont)
 Consider:
 Several processes can open the same file
 Several different files can be opened at any time
 What is a good way to organize the open-file
information?

 Common approach:
 System-wide open-file table:
 One entry per unique file
 Per-process open-file table:
 One entry per file used in the process
 Each entry points to the system-wide table

27
File Operations: Unix Illustration
Op.Type: …
Proc A PCB 0 File offset: …
Process make "File Data":
file system 0
calls, usually 1 File1.abc
with file … …
descriptor fd … …

fd Op.Type: Read
x File offset: 1234
File Descriptor "File Data":
System Table
Calls
… …
Proc B PCB
0
Op.Type: Write
1 y File offset: 5678 File2.def
"File Data":
… …

fd … …
File Descriptor
Table
Open File Table "Actual File"
28
Process Sharing File in Unix: Case 1
 Two processes using different file descriptors
 I/O can occurs at independent offsets
 Example:
 Two process open the same file
 Same process open the file twice

Proc A PCB

Op.Type: …
File offset: 5000
fd1
"File Data":

File.abc
Proc B PCB Op.Type: …
File offset: 2000
fd2 "File Data":

The shared file

29
Process Sharing File in Unix: Case 2
 Two processes using the same file descriptor
 Only one offset  I/O changes the offset for the other
process
 Example:
 fork() after file is opened

Parent PCB

fd1

Op.Type: …
File offset: 3000
Child PCB "File Data": File.abc

fd1

The shared file


30
Just your regular folders

DIRECTORY

31
Directory: Basics
 Directory ( folder ) is used to:
1. Provide a logical grouping of files
 The user view of directory
2. Keep track of files
 The actual system usage of directory

 Several ways to structure directory:


 Single-Level
 Tree-Structure
 Directed Acyclic Graph (DAG)
 General Graph

32
Directory Structure: Single-Level
Usually known as
the root directory

directory

file 1 file 2 file 3 file 4

33
Directory Structure: Tree-Structured
dir1

dir2
file 1 file 2

dir3
file 3

file 3

34
Directory Structure: Tree-Structured
 General Idea:
 Directories can be recursively embedded in other
directories
 Naturally forms a tree structure
 Two ways to refers to a file:
 Absolute Pathname:
 Directory names followed from root of tree + final file
 i.e. the Path from root directory to the file
 Relative Pathname:
 Directory names followed from the current working
directory (CWD)
 CWD can be set explicitly or implicitly changed by
moving into a new directory under shell prompt

35
Directory Structure: DAG
/

dir2

dir3

alias

If this link can


be added, then
Tree  DAG file3

36
Directory Structure: DAG
 If a file can be shared:
 Only one copy of actual content
 "Appears" in multiple directories
 With different path names
 Then tree structure  DAG

 Two implementations in Unix:


 Hard Link
 Limited to file only
 Symbolic Link
 Can be file or directory
 This has an "interesting" effect….

37
DAG: Unix Hard Link
 Consider:
 Directory A is the owner of file F
 Directory B wants to share F
 Hard Link:
 A and B has separate pointers point to the
actual file F in disk
 Pros:
 Low overhead, only pointers are added in directory
 Cons:
 Deletion problems:
 e.g. If B deletes F? If A deletes F?
 Unix Command: " ln "

38
DAG: Unix Symbolic Link
 Symbolic Link:
 B creates a special link file, G
 G contains the path name of F
 When G is accessed:
 Find out where is F, then access F
 Pros:
 Simple deletion:
 If B deletes: G deleted, not F
 If A deletes: F is gone, G remains (but not working)
 Cons:
 Larger overhead:
 Special link file take up actual disk space
 Unix Command: "ln –s"

39
Directory Structure: General Graph

dir2

cycle
dir3

If this link can be


added, then Tree
 General Graph
40
Directory Structure: General Graph
 General Graph Directory Structure is not
desirable:
 Hard to traverse
 Need to prevent infinite looping

 Hard to determine when to remove a


file/directory

 In Unix:
 Symbolic link is allowed to link to directory
 General Graph can be created

41
Summary
 Covered basics of file system from a user
point of view

 Understand the basic requirements of a FS

 Understand the components of a FS:


 File and Directory

42
For your reference only

UNIX FILE OPERATIONS

43
File Operations Example: Unix System Calls
 Header Files:
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>

 File related Unix System Calls


 open(), read(), write(), lseek(), close()
 General Information:
 Opened file has an identifier
 File Descriptor: Integer
 Used for other operations
 File is access on a byte-by-byte basis
 No interpretation of data
44
Opening Files: open( )
 Function Call:
int open( char *path, int flags )
 Return:
 -1: Failed to open file
 >=0: file descriptor, a unique index for opened file
 Parameters:
 path: File path
 flags: Many options can be set using bit-wise-OR
 Read, Write or Read+Write mode
 Truncation, Append mode
 Create file if no exists
 … Many many more 

45
Opening Files: open() (cont)
 Example:
int fd; //file descriptor

//Open an existing file for read only


fd = open( "data.txt", O_RDONLY );

//Create the file if not found, open for read + write


fd = open("data.txt", O_RDWR | O_CREAT );

 By convention:
 Default file descriptors:
 STDIN (0), STDOUT (1), STDERR (2)

46
Read Operation: read()
 Function Call:
int read(int fd, void *buf, int n)
 Purpose:
 reads up to n bytes from current offset into buffer buf
 Return:
 number of bytes read, can be 0...n

 <n : end of file is reached

 Parameters:
 fd: file descriptor (must be opened for read)
 buf: An array large enough to store n bytes
 read() is sequential read:
 starts at current offset and increments offset by bytes read
47
Write Operation: write()
 Function Call:
int write(int fd, void *buf, int n)
 Purpose:
 writes up to n bytes from current offset from buffer buf

 Return:
 -1: Error

 >= 0: Number of bytes written

 Parameters:
 fd: file descriptor (must be opened for write)

 buf: An array of at least n bytes with values to be written

 Possible errors:
 exceeds file size limit, quota, disk space, etc.

 write() is sequential write:


 starts at current offset and increments offset by bytes written

 can increase file size beyond EOF  append new data

48
Repositioning: lseek()
 Function Call:
off_t lseek(int fd, off_t offset, int whence)
 Purpose:
 Move current position in file by offset

 Return:
 -1: Error

 >= 0: Current offset in file

 Parameters:
 fd: file descriptor (must be opened)

 offset: positive = move forward, negative = move backward

 whence: Point of reference for interpreting the offset


 SEEK_SET: absolute offset (count from the file start)
 SEEK_CUR: relative offset from current position (+/-)
 SEEK_END: relative offset from end of file (+/-)
 Can seek anywhere in file, even beyond end of existing data

49
Closing Files: close()
 Function Call:
int close( int fd )
 Return:
 -1: Error
 0: Successful
 Parameters:
 fd: file descriptor (must be opened)
 With close():
 fd no longer used anymore
 Kernel can remove associated data structures
 The identifier fd can be reused later
 By default:
 Process termination automatically closes all open files

50

You might also like