CMU 15-213 CSAPP (Ch10)

菜=原罪

已于 2022-11-28 01:25:21 修改

阅读量756

点赞数

CC 4.0 BY-SA版权

分类专栏：操作系统文章标签：操作系统 CSAPP

于 2022-03-06 21:31:21 首次发布

本文链接：https://blog.csdn.net/qq_34707209/article/details/123017007

操作系统专栏收录该内容

13 篇文章

订阅专栏

本文深入探讨了Unix系统中的I/O抽象，包括统一的文件概念、不同类型的文件及其操作，RIO包的高效封装，以及标准IO库的使用与特性。特别关注了文件位置、元数据、I/O重定向和网络套接字处理。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

CMU 15-213 CSAPP (Ch1~Ch3)
CMU 15-213 CSAPP (Ch5~Ch7)
CMU 15-213 CSAPP (Ch8)
CMU 15-213 CSAPP (Ch9)
CMU 15-213 CSAPP (Ch10)
视频链接
 课件链接
 课程补充
该课程使用 64位编译器！

Ch10. System-Level I/O

" If you really want to understand how software or system works, the best thing you can do is to look at good quality source code. And I really recommend for RIO chapter and its’ source code. "

Randal E. Bryant

10.1 Unix I/O

比起其他操作系统，Unix 优势之一就是更简单 ( simpler ) 统一 ( unified) 的 I/O 抽象，文件；
不同于 Windows 或早期 Macintosh 系统，Unix 系统不关心文件内部的详细结构，文件就是字节序列;
Unix 用文件这个概念，来表示 ( represent ) 很多东西
- 如所有的 I/O 设备 ( devices );
  - /dev/tty2 ( teletype，历史上这个术语指打字机与计算机的接口 )，连接到某台机器的 I/O 设备，即终端 ( Terminal )；
  - /dev/sda2 ，用户硬盘分区；
  - 网络连接，俗称套接字 ( socket )，写入套接字发送数据，读出套接字接收数据；
- 如内核也用文件表示
  - /boot/vmlinuz-3.13.0-55-generic ( kernel image )
  - /proc ( kernel data structure )
大多数文件都有一个 “文件位置” ( file position ) 属性 ( attribute )，因为读入文件并不总是需要从头开始，“文件位置” 负责记录已读字节数 ( 从数组下标的角度看，指向下一个读入字节的位置 )；可以通过 lseek () 进行修改；
- tty 或套接字没有 “文件位置” 属性，因为无法读取还没有输入的数据或找回已经被取走的数据；
每个文件都有 “类型” ( type ) 表明它在系统中的功能
- Regular file，包含任意数据，通常存在磁盘上；
- Directory，一组相关文件的索引，内容是描述其它文件位置和属性的条目；
- Socket，与另外一台设备上的进程通信；
- Named pipes ( FIFO )，有名管道，进程A的输出，进程B的输入；
- Symbolic links，软连接，无需创建副本就能为文件别名的一种方法；
- Character and block devices，字符与块文件；

10.1.1 Regular Files

系统内核并不关心文件内容，但一些应用程序会区分文本文件和二进制文件；
- 文本文件仅包含 ASCII 或 Unicode 字符，主要特征是换行符 ‘\n’ 会被 gets 等函数解读为新行 ( EOF，end-of-line ) ；
  - 在 Windows 和 MAC ( 或 Linux ) 系统之间传输文本文件时，两大类系统对换行符的编码不一样，MAC ( 或 Linux ) 使用一个字符 ‘\0x0a’ ( LF，line feed )，而 Windows 和 Internet 协议使用两个字符 ‘\0x0d \0x0a’ ( carriage return + line feed )；
- 剩下的全是二进制文件，如 OBJ 文件、JPEG 图像文件、音视频文件等；
int open ( char *pathname，int mode )
- 向内核请求访问文件，mode 值指明希望对文件进行的操作和打开文件的模式；
- 返回文件描述符 ( file descriptor )，标识程序正在进行的操作；由于程序开始运行后才开始顺序增大，所以值不会很大 ( 很多操作系统也会限制每个进程同时打开的文件数量，如果值很大说明可能存在文件打开后忘记关的 bug )；
- 返回 -1 表示文件打开失败，文件不存在，没有按照指定模式打开文件的权限，等；
- 每个从 shell 启动的进程都会有三个特殊的、打开的、与终端相关的文件描述符：
  - 0 对应标准输入 stdin；
  - 1 对应标准输出 stdout；
  - 2 对应标准错误 stderr；

[Unix]$ ulimit -a
...
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100001   
pipe size            (512 bytes, -p) 8        
...

int close ( int fd )
- 多线程程序中，关闭已经被关闭的文件会引发 “灾难”；
- 因此任何时候都应该 check 函数的返回值；
ssize_t read ( int fd, void *buf, size_t count )
- fd 可以是 stdin，socket 等文件，从当前 “文件位置” ( file position ) 指向的字节开始，尝试向内存拷贝 count 个字节，最后更新 “文件位置”；
- 返回 ssize_t ( long int ) 类型，0 表示遇到 EOF ( 网络连接关闭 )，负数表示错误，正数表示实际拷贝的字节数，小于 count 称作 short count；
ssize_t write ( int fd, void *buf, size_t count )
- 与 read 相似，从内存向文件中写入，也会更新 “文件位置”；
Short Counts，即 “less than expected”
- 遇到 EOF；
- 从终端读取命令行；
- 读写网络套接字 ( 最小传输单元，minimum transmission unit，约1500 字节 ) ；
不可能出现 Short Counts 的情况
- 从磁盘读取文件 ( 除非 EOF )；
- 向磁盘文件写入字节；
read () 返回 0 的唯一原因是遇到 EOF，因此 read () = 0 可以作为文件读入结束的特征；
由于 open、read、write、close 这种低级别系统调用 ( low-level file I/O ) 十分消耗资源，因此常被封装到高级库中使用；

10.1.2 Directories

操作系统的文件系统使用特定的方式对目录文件进行解读；
目录文件由链接 ( Links ) 数组组成，每一个链接映射到一个文件名；
每个目录文件至少包含两个条目，“.” 和 “…”；
一系列目录文件形成层级结构 ( hierarchy )，锚在根目录 “/” 上，内核会为每一个进程维护一个 “当前目录” 变量 ( cwd，current working directory )，通常使用 cd ( change directory ) 命令修改 “当前目录” ( shell 进程同理 )；
文件在这个层级结构中的位置通过路径名 ( pathnames ) 标识，路径名可以是 “/” 开头的绝对路径，可以是 “.“、“./” 开头的相对路径，可以是 “~” ( 用户目录，也是一种 shortcut ) 开头的相对路径；

10.2 RIO Package

O’Hallaron 教授为应用程序编写了一套更高效、更鲁棒的底层I/O封装，包含有两种不同级别的文件I/O接口；
- 低级封装即无缓冲 I/O，仅仅解决了 short count 问题
  - rio_readn ()，必须接收 n 字节，或者遇到 EOF，否则不会 return，因此读取网络套接字可能会因为等待 package 导致阻塞，常用于已知待读入字节数的情况；
  - rio_writen ()，永远不会在没有读满 n 字节 ( 即 short count ) 的情况下 return；
  - rio_readn () 和 rio_writen () 都是线程安全的；

#include "csapp.h"
/*
 * rio_readn - Robustly read n bytes (unbuffered)
 */
ssize_t rio_readn(int fd, void *usrbuf, size_t n) 
{
    size_t nleft = n;
    ssize_t nread;
    char *bufp = usrbuf;

    while (nleft > 0) 
    {
		if ((nread = read(fd, bufp, nleft)) < 0) 
		{
			// slow syscall，某些系统上 会被信号终止，并返回 error
	    	if (errno == EINTR) /* Interrupted by sig handler return */
				nread = 0;       /* and call read() again */
	    	else
				return -1;       /* errno set by read() */ 
		} 
		else if (nread == 0)
	    	break;              /* EOF */
		nleft -= nread;
		bufp += nread;
    }
    return (n - nleft);         /* Return >= 0 */
}
ssize_t rio_writen(int fd, void *usrbuf, size_t n);

高一级的封装被称为 缓冲区 I/O ( Buffered I/O )，是实践中最常见的一种，在用户空间内建立一个缓冲区，存放已经被读入内存但没还没有被应用消费，和等待集中写入文件或网络中的字节；
- 一个文件关联一个内存 buffer，rio_readnb 将文件的一部分 Cache 到 buffer ( 读满 buffer 大小 )，当用户程序读入数据时，首先检查是否已经被读入 buffer，有就直接返回这些数据，否则重新从文件 cache 数据，填满缓冲区；

typedef struct
{
	int rio_fd;		/* descriptor for file associated with this internal buf */
	int rio_cnt;					/* unread bytes in internal buf */
	char *rio_bufptr;				/* next unread byte in internal buf */
	char *rio_buf[RIO_BUFSIZE];		/* internal buffer */
} rio_t;

#include "csapp.h"
int main(int argc,char **argv)
{
	int n;
	rio_t rio;
	char buf[MAXLINE];
	Rio_readinitb(&rio, stdin);
	while((n = Rio_readlined(&rio, buf, sizeof(buf))) != 0)
		Rio_write(stdout, buf, n);
	return 0;
}

源码见课程补充，强烈建议阅读源码；

10.3 File Metadata

Metadata 是描述文件的数据

[Unix]$ man 2 stat
...
struct stat 
{
     dev_t     st_dev;         /* ID of device containing file */
     ino_t     st_ino;         /* Inode number */
     mode_t    st_mode;        /* File type and mode */
     nlink_t   st_nlink;       /* Number of hard links */
     uid_t     st_uid;         /* User ID of owner */
     gid_t     st_gid;         /* Group ID of owner */
     dev_t     st_rdev;        /* Device ID (if special file) */
     off_t     st_size;        /* Total size, in bytes */
     blksize_t st_blksize;     /* Block size for filesystem I/O */
     blkcnt_t  st_blocks;      /* Number of 512B blocks allocated */

     /* Since Linux 2.6, the kernel supports nanosecond
        precision for the following timestamp fields.
        For the details before Linux 2.6, see NOTES. */

     struct timespec st_atim;  /* Time of last access */
     struct timespec st_mtim;  /* Time of last modification */
     struct timespec st_ctim;  /* Time of last status change */

 #define st_atime st_atim.tv_sec      /* Backward compatibility */
 #define st_mtime st_mtim.tv_sec
 #define st_ctime st_ctim.tv_sec
};
...

10.4 Kernel Represents

操作系统维护文件的 metadata ( 教授提醒，这一块需要好好看书 )；
- 一个进程 ( running program ) 维护一个文件描述符表 ( descriptor table )，指向 ”打开文件表“ ( Open File Table ) 中的表项 ( entries )；
- ”打开文件表“ 描述被打开文件状态的信息，如虚拟节点表头 ( virtual node table ) 、”文件位置“ ( file position )、被引用次数；”打开文件表“ 被所有进程共享，由操作系统进行维护；
- 虚拟节点表描述了 ( 包括未打开 ) 文件的静态信息，如存储路径、大小、类型等，这些信息可以被 stat 函数获取；
一个进程中，每调用一次 open 生成一个 “打开文件表“ 的表项，表项可以指向同一个文件的 v-node table；比如希望从同一个文件的不同位置读取不同的信息，可以 open ( filename ) 两次，使用两个 descriptor 独立 lseek ()；
fork 出来的 child 会继承 parent 的 Descriptor table，即使执行 execve() 也无法改变，使用 int fcntl ( int fd, int cmd, … /* arg */ ) 函数手动修改 Descriptor table 表项才可行；这也意味着父子进程间文件共享是 Open file table 级别上的 ( refcnt++ ) ；( 父子进程间调用 lseek 会通过 file position 相互影响 )；

10.5 I/O Redirection

Shell 通过 int dup2 ( int oldfd, int newfd ) 系统调用，复制文件描述符号表中的表项完成重定向；最常见的就是标准输入输出重定向；
操作系统在启动程序后，dup2 ( 4, 1 )，将4号描述符 ( oldfd ) 的值复制给 2号描述符 ( newfd )；

需要注意，newfd 如果在调用 dup2 之前已经被打开，dup2 会先 ”悄悄“ ( silently ) refcnt - 1，再 close 这个描述符，被赋予新值后再打开；关闭和打开这两个动作是自动的，人为使用 close 和 dup 系统调用实现同样功能需要考虑 ”竞争冒险“ ( race hazard/condition)；

man dup2()
If the file descriptor newfd was previously open, it is silently closed before being reused. 
The  steps of closing and reusing the file descriptor newfd are performed atomically.  
This is important, because trying to implement equivalent functionality using close(2) 
and dup() would be subject to race conditions, whereby newfd might be reused between 
the two steps. Such  reuse could  happen  because  the main program is interrupted 
by a signal handler that allocates a file descriptor, or because a parallel thread 
allocates a file descriptor.
 *  If oldfd is not a valid file descriptor, then the call fails, and newfd is not closed.
 *  If oldfd is a valid file descriptor, and newfd has the same value as oldfd, then dup2() 
 does nothing, and returns newfd.

#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>

int main(void)
{
        int fd = open("./test.txt", O_RDWR | O_APPEND | O_CREAT);
        if(fd <= 0)
        {
                fprintf(stderr,"open failed\n");
        }

        printf("fd = [%d]\n",fd);

        dup2(fd,1);  //从这里开始 stdout 被替换成了 test.txt

        int res = write(1,"This is TEST !\n", 13);
        if(res < 0)
        {
                fprintf(stderr,"write error [%d][%s]\n",errno,strerror(errno));
        }

        printf("This is TEST 2 !\n");
        fprintf(stderr,"This is final Test !\n");

        close(fd);

        return 0;
}

[Unix]$ ls
main.c
[Unix]$ gcc main.c
[Unix]$ ./a.out
fd = [3]
This is final Test !
[Unix]$ cat test.txt
This is TEST This is TEST 2 !
[Unix]$

10.6 Standard I/O

标准输入输出函数，也是 C标准库 libc.so 的一部分；
- fopen & fclose
- fread & fwrite
- fgets & fputs
- fscanf & fprintf

[Unix]$ man 3 fgets
Linux Programmer's Manua
NAME
	#include <stdio.h>
    int fgetc(FILE *stream);
    char *fgets(char *s, int size, FILE *stream);
    int getc(FILE *stream);
    int getchar(void);
    int ungetc(int c, FILE *stream);
SYNOPSIS

	fgetc()	reads the next character from stream and returns it as an 
			unsigned char cast to an int, or EOF on end of file or error.
			
	getc()  is equivalent to fgetc() except that it may be implemented  
       		as a macro which evaluates stream more than once.
       		
    getchar() is equivalent to getc(stdin).
    
    fgets()	reads in at most one less than size characters from stream  
       		and  stores  them  into the buffer pointed to by s.  Reading 
       		stops after an EOF or a newline. If a newline is read, it 
       		is stored into the buffer. A terminating null byte is stored 
       		after the last character in the buffer.
       		
	ungetc() pushes c back to stream, cast to unsigned char, where  it  
       		is available for subsequent read operations. Pushed-back 
       		characters will be returned in reverse order; only one pushback 
       		is guaranteed.
...

标准输入输出将打开的文件 ( 文件描述符 + 内存 buffer ) 抽象成标准输入输出流 ( Standard I/O models open files as streams )；其中 stdin、stdout、stderr 自程序开始运行便被打开
与 10.2小节中的 RIO Package 类似，标准 I/O 函数同样具备缓冲，尽量避免底层操作；
- 标准输出函数如 printf 遇到 “\n”、调用 fflush () 或 exit ()、从 main 函数 return 时才会调用 write 一次性将缓冲区内容全部写入到 stdout；

#include <stdio.h>
int main(void)
{
        printf("Hello");
        while(1);
        return 0;
}

[Unix]$ ./a.out
^C
[Unix]$

即便终止也不会输出到 stdout，在缓冲区中随着程序一起 Terminated

#include <stdio.h>
#include <unistd.h>
int main(void)
{
        printf("Hello\n");
        printf("T11111\r"); // back to head of current line
        printf("T2");
        fflush(stdout);     // T2 should cover T1 and shown on screen
        printf("\nT3");
        for(int i=0;i<3;i++)
        {
                sleep(1);
                fprintf(stderr,"sleep %d sec\n",i);
        }
        return 0;			// T3 shown after for-loop, before return()
}

[Unix]$ ./a.out
Hello
T21111
sleep 0 sec
sleep 1 sec
sleep 2 sec
T3[Unix]$
[Unix]$ strace -e trace=write ./a.out
write(1, "Hello\n", 6)                  = 6
write(1, "T11111\rT2", 9)               = 9
write(1, "\n", 1)                       = 1
write(2, "sleep 0 sec\n", 12)           = 12
write(2, "sleep 1 sec\n", 12)           = 12
write(2, "sleep 2 sec\n", 12)           = 12
write(1, "T3", 2)                     	= 2
+++ exited with 0 +++
[Unix]$

总结

既然已经有了 Unix 标准 IO 库，为什么还要其他库？
因为标准 IO 库针对命令行终端，不适用于网络传输，RIO 很适用于网络传输；
同步信号安全性差不适用于信号处理函数；
不要对没有行概念的文件 ( 如 jpeg ) 进行进行基于行的 I/O 操作，会因为读到 0x0a 换行符而停止读入动作；
- 基于行概念的 I/O 函数有 fgets、scanf、rio_readlineb；
- 应该使用 rio_readn 或 rio_readnb；