Distributed File System: Design
Comparisons
Pei Cao
Cisco Systems, Inc.
Background Reading Material
NFS:
rfc 1094 for v2 (3/1989)
rfc 1813 for v3 (6/1995)
rfc 3530 for v4 (4/2003)
AFS: Scale and Performance in a Distributed File
System, TOCS Feb 1988
http://www-2.cs.cmu.edu/afs/cs/project/codawww/ResearchWebPages/docdir/s11.pdf
Sprite: Caching in the Sprite Network File Systems,
TOCS Feb 1988
http://www.cs.berkeley.edu/projects/sprite/papers/caching.ps
More Reading Material
CIFS spec:
http://www.itl.ohiou.edu/CIFS-SPEC-0P9-REVIEW.pdf
CODA file system:
http://www-2.cs.cmu.edu/afs/cs/project/coda/Web/docdir/s13.pdf
RPC related RFCs:
XDR representation: RFC 1831
RPC: RFCS 1832
RPC security: RFC 2203
Outline
Why Distributed File System
Basic mechanisms to build DFS
Using NFSv2 as an example
Design choices and their implications
Naming (this lecture)
Authentication and Access Control (this lecture)
Batched Operations (this lecture)
Caching (next lecture)
Concurrency Control (next lecture)
Locking implementation (next lecture)
Why Distributed File System
What Distributed File System
Provides
Provide accesses to date stored at servers using file
system interfaces
What are the file system interfaces?
Open a file, check status on a file, close a file;
Read data from a file;
Write data to a file;
Lock a file or part of a file;
List files in a directory, delete a directory;
Delete a file, rename a file, add a symlink to a file;
etc;
Why is DFS Useful
Data sharing of multiple users
User mobility
Location transparency
Location independence
Backups and centralized management
Not all DFS are the same:
High-speed network DFS vs. low-speed network DFS
File System Interfaces vs. Block
Level Interfaces
Data are organized in files, which in turn are
organized in directories
Compare these with disk-level access or block
access interface: [Read/Write, LUN, block#]
Key differences:
Implementation of the directory/file structure and
semantics
Synchronization
Digression: Buzz Word
Discussion
NAS
SAN
Access Methods
File access
Disk block access
Access Medium
Ethernet
Fiber Channel and Ethernet
Transport Protocol
Layer over TCP/IP
SCSI/FC and SCSI/IP
Efficiency
Less
More
Sharing and Access
Control
Good
Poor
Integrity demands
Strong
Very strong
Clients
Workstations
Database servers
Basic DFS Implementation
Mechanisms
Components in a DFS
Implementation
Client side:
What has to happen to enable applications access a
remote file in the same way as accessing a local file
Communication layer:
Just TCP/IP or some protocol at higher abstraction
Server side:
How does it service requests from the client
Client Side Example: Basic UNIX
Implementations
Accessing remote files in the same way as
accessing local files kernel support
Vnode interface
read(fd,..)
struct file
Mode
Vnode
offset
process
file table
struct vnode
V_data
fs_op
{int (*open)();
int (*close)();
int (*read)();
int (*write)();
int (*lookup)();
Communication Layer Example:
Remote Procedure Calls (RPC)
RPC call
RPC reply
xid
call
service
version
procedure
auth-info
arguments
xid
reply
reply_stat
auth-info
results
Failure handling: timeout and re-issuance
RPC over UDP vs. RPC over TCP
RPC: Extended Data
Representation (XDR)
Argument data and response data in RPC are
packaged in XDR format
Integers are encoded in big-endian
Strings: len followed by ascii bytes with NULL padded
to four-byte boundaries
Arrays: 4-byte size followed by array entries
Opaque: 4-byte len followed by binary data
Marshalling and un-marshalling
Extra overhead in data conversion to/from XDR
NFS RPC Calls
NFS / RPC using XDR / TCP/IP
Proc.
Input args
Results
lookup
dirfh, name
status, fhandle, fattr
read
fhandle, offset, count
status, fattr, data
create
dirfh, name, fattr
status, fhandle, fattr
write
fhandle, offset, count,
status, fattr
data
fhandle: 32-byte opaque data (64-byte in v3)
Whats in the file handle
NFS Operations
V2:
NULL, GETATTR, SETATTR
LOOKUP, READLINK, READ
CREATE, WRITE, REMOVE, RENAME
LINK, SYMLINK
READIR, MKDIR, RMDIR
STATFS
V3: add
READDIRPLUS, COMMIT
FSSTAT, FSINFO, PATHCONF
Server Side Example: mountd and
nfsd
Mountd: provides the initial file handle for the
exported directory
Client issues nfs_mount request to mountd
Mountd checks if the pathname is a directory and if the
directory is exported to the client
nfsd: answers the rpc calls, gets reply from local
file system, and sends reply via rpc
Usually listening at port 2049
Both mountd and nfsd use underlying RPC
implementation
NFS Client Server Interactions
Client machine:
Application nfs_vnops-> nfs client code ->
rcp client interface
Server machine:
rpc server interface nfs server code
ufs_vops -> ufs code -> disks
NFS File Server Failure Issues
Semantics of file write in V2
Bypass UFS file buffer cache
Semantics of file write in V3
Provide COMMIT procedure
Server-side retransmission cache
Idempotent vs. non-idempotent requests
Design Choices in DFS
Topic 1: Name-Space
Construction and Organization
NFS: per-client linkage
Server: export /root/fs1/
Client: mount server:/root/fs1 /fs1 fhandle
AFS: global name space
Name space is organized into Volumes
Global directory /afs;
/afs/cs.wisc.edu/vol1/; /afs/cs.stanfod.edu/vol1/
Each file is identified as <vol_id, vnode#, vnode_gen>
All AFS servers keep a copy of volume location database,
which is a table of vol_id server_ip mappings
Implications on Location
Transparency
NFS: no transparency
If a directory is moved from one server to another,
client must remount
AFS: transparency
If a volume is moved from one server to another, only
the volume location database on the servers needs to be
updated
Implementation of volume migration
File lookup efficiency
Are there other ways to provide location
transparency?
Topic 2: User Authentication and
Access Control
User X logs onto workstation A, wants to access files on server B
How does A tell B who X is
Should B believe A
Choices made in NFS v2
All servers and all client workstations share the same <uid,
gid> name space B send Xs <uid,gid> to A
Problem: root access on any client workstation can lead to creation of
users of arbitrary <uid, gid>
Server believes client workstation unconditionally
Problem: if any client workstation is broken into, the protection of data
on the server is lost;
<uid, gid> sent in clear-text over wire request packets can be faked
easily
User Authentication (contd)
How do we fix the problems in NFS v2
Hack1: root remapping strange behavior
Hack 2: UID remapping no user mobility
Real Solution: use a centralized
Authentication/Authorization/Access-controll
(AAA) system
Example AAA System: NTLM
Microsoft Windows Domain Controller
Centralized AAA server
NTLM v2: per-connection authentication
Domain Controller
1
client
23
6 7
5
file server
A Better AAA System: Kerberos
Basic idea: shared secrets
User prove to KDC who he is; KDC generates shared
secret between client and file server
KDC
s
f
ticket server
ss
e
c
ac
T
o
generates
S
t
d
e
e
]
N
S
K[
file server
[
t
fs S
n
e
K cli
client
S: specific to {client,fs} pair;
short-term session-key; has expiration time (e.g. 8 hours);
Kerberos Interactions
1.
client
KDC
Need to access fs
ticket server
T
Kclient[S], ticket = Kfs[ use S for client] generates S
ticket=Kfs[use S for client], S[client, time]
2.
client
S{time}
file server
why time: guard against replay attack
mutual authentication
File server doesnt store S, which is specific to {client, fs}
Client doesnt contact ticket server every time it contacts fs
Kerberos: User Log-on Process
How does user prove to KDC who the user
is
Long-term key: 1-way-hash-func(passwd)
Long-term key comparison happens once only, at which
point the KDC generates a shared secret for the user
and the KDC itself ticket-granting ticket, or logon
session key
The ticket-granting ticket is encrypted in KDCs
long-term key
Operator Batching
Should each client/server interaction
accomplish one file system operation or
multiple operations?
Advantage of batched operations
How to define batched operations
Examples of Batched Operators
NFS v3:
Readdirplus
NFS v4:
Compound RPC calls
CIFS:
AND-X requests
Summary
Functionalities of DFS
Implementation of DFS
Client side: Vnode
Communication: RPC or TCP/UDP
Server side: server daemons
DFS name space construction
Mount vs. Global name space
DFS access control
NTLM
Kerberos