Linux VFS文件系统分析1

ListQueue

已于 2024-12-21 22:47:38 修改

阅读量935

点赞数 25

CC 4.0 BY-SA版权

分类专栏： Linux内核子系统文章标签： linux 服务器运维

于 2024-12-21 22:36:22 首次发布

本文链接：https://blog.csdn.net/zouhaicheng/article/details/144637001

Linux内核子系统专栏收录该内容

199 篇文章

订阅专栏

Linux VFS文件系统分析1（基于Linux6.6）---VFS概念及数据结构介绍

一、概述

VFS（Virtual File System，虚拟文件系统）是 Linux 内核中用于抽象不同文件系统类型的关键组件。它提供了一个统一的接口，使得用户空间程序和内核能够以一致的方式操作不同类型的文件系统，实际上屏蔽了底层文件系统的实现细节。VFS 的目标是使得操作系统能够支持多种不同的文件系统，而不需要用户和应用程序知道具体的文件系统类型。

VFS 是 Linux 文件系统的核心，它负责管理文件系统的操作，并提供一组通用的 API，供各种具体文件系统（如 ext4、XFS、Btrfs、NTFS 等）实现。

1. VFS 体系结构

VFS 的设计包括几个核心的结构和概念，以下是 VFS 主要的组成部分和功能：

file_operations

file_operations 是一组函数指针，定义了文件操作的各种行为（如打开、读取、写入、关闭等）。每个具体的文件系统实现会提供一组文件操作函数，并在 file_operations 结构中注册。

每个文件系统都会实现这些操作，根据文件系统的特点提供对应的功能。

inode

inode 是文件系统中的一个数据结构，用于描述文件的元数据（如文件大小、权限、所有者、类型等）。在 VFS 中，inode 是统一表示文件的信息结构，VFS 通过 inode 对文件进行操作。

在内核中，每个文件都与一个 inode 对象相关联，inode 存储了文件的索引信息。
每个文件的 inode 由不同的文件系统实现，例如在 ext4 文件系统中，inode 可能表示一个文件或目录的存储块信息，而在其他文件系统中，inode 的实现方式可能不同。

dentry（目录项）

dentry（directory entry）表示一个文件系统中的目录项，它维护了文件名到 inode 的映射。VFS 使用 dentry 来缓存文件路径，以加速路径解析。

dentry 用于实现文件系统的路径查找。它使得文件系统操作更加高效，因为路径的解析可以通过查找 dentry 来避免每次都访问磁盘。
VFS 中的 dentry 使得目录项可以被缓存，从而减少了磁盘 I/O 操作。

super_block

super_block 是文件系统的一个数据结构，它描述了一个文件系统的元数据，并包含了文件系统的总体信息，如块大小、空闲块、文件系统类型、文件系统的挂载点等。

每个挂载的文件系统都会有一个 super_block 对象，super_block 提供了文件系统的基本操作，如挂载、卸载和检查文件系统类型等。

file

file 结构表示进程在操作文件时使用的描述符，它包含了文件的操作函数、文件状态等信息。每个进程都有一个文件描述符表，指向正在使用的文件。file 结构不仅表示文件内容本身，还包含了文件的读写状态。

2. VFS 的操作流程

VFS 提供了统一的接口，用户空间程序（如应用程序）通过这些接口来执行文件操作，而这些操作背后由 VFS 负责调用具体文件系统的实现。其操作流程大致如下：

路径查找： 当用户请求访问一个文件时，VFS 首先会将路径解析成一个 dentry 对象。如果该路径已经被解析并存在于 dentry 缓存中，VFS 会直接返回对应的 inode；否则，它会通过读取目录信息来查找对应的文件 inode。
文件操作： 在得到文件的 inode 后，VFS 会查找对应的 file_operations 结构，调用文件系统实现的具体操作（如打开、读取、写入等）。
文件系统接口： 每种文件系统都实现了 file_operations 中的具体函数，这些函数会根据文件系统的特点执行相应的磁盘 I/O 操作。
返回结果： 操作完成后，VFS 会将结果返回给用户空间程序，或者在失败时返回错误代码。

3. 文件系统的挂载和卸载

在 Linux 中，文件系统必须挂载到某个目录才能使用。VFS 提供了挂载和卸载文件系统的功能：

挂载（mount）： 挂载操作将某个具体的文件系统挂载到 VFS 中的某个挂载点（如 /mnt、/home）。通过挂载，文件系统的 super_block 会被添加到 VFS 中，VFS 会为该文件系统创建对应的目录结构。

挂载操作会调用文件系统提供的接口（如 mount 函数）来初始化文件系统并注册到 VFS。
卸载（umount）： 卸载操作会将文件系统从 VFS 中移除，释放所有相关资源，确保所有的数据被写回磁盘并同步。

4. 文件系统的类型

VFS 允许内核支持多种文件系统类型，常见的文件系统类型包括：

ext4：Linux 中最常用的文件系统，支持日志记录和高效的磁盘空间管理。
XFS：高性能的日志文件系统，通常用于大型数据存储。
Btrfs：现代的文件系统，支持快照、压缩和数据冗余等高级功能。
FAT32、NTFS：支持与 Windows 系统兼容的文件系统。
NFS：网络文件系统，用于在网络上共享文件。
tmpfs：临时文件系统，通常用于内存中的文件存储。

二、文件系统、vfs、系统调用之间的关系

而文件系统、vfs、系统调用之间的关系，如下图所示。

三、vfs相关的概念及结构体介绍

针对vfs主要涉及超级块（superblock）、目录项（dentry）、索引节点（inode）、文件系统类型。

3.1、超级块（superblock）

superblock主要用于表示文件系统相关的信息，代表了一个文件系统整个信息，包括文件系统的类型、逻辑块大小、该文件系统支持的最大文件大小、根索引节点、根目录项、该文件系统类型、超级块相关的处理接口（索引节点相关的操作（节点的申请、释放、读、写）、获取文件系统的状态（statfs）、同步（sync_fs）等）

include/linux/fs.h

struct super_block {
	struct list_head	s_list;		/* Keep this first */
	dev_t			s_dev;		/* search index; _not_ kdev_t */
	unsigned char		s_blocksize_bits;
	unsigned long		s_blocksize;
	loff_t			s_maxbytes;	/* Max file size */
	struct file_system_type	*s_type;
	const struct super_operations	*s_op;
	const struct dquot_operations	*dq_op;
	const struct quotactl_ops	*s_qcop;
	const struct export_operations *s_export_op;
	unsigned long		s_flags;
	unsigned long		s_iflags;	/* internal SB_I_* flags */
	unsigned long		s_magic;
	struct dentry		*s_root;
	struct rw_semaphore	s_umount;
	int			s_count;
	atomic_t		s_active;
#ifdef CONFIG_SECURITY
	void                    *s_security;
#endif
	const struct xattr_handler **s_xattr;
#ifdef CONFIG_FS_ENCRYPTION
	const struct fscrypt_operations	*s_cop;
	struct fscrypt_keyring	*s_master_keys; /* master crypto keys in use */
#endif
#ifdef CONFIG_FS_VERITY
	const struct fsverity_operations *s_vop;
#endif
#if IS_ENABLED(CONFIG_UNICODE)
	struct unicode_map *s_encoding;
	__u16 s_encoding_flags;
#endif
	struct hlist_bl_head	s_roots;	/* alternate root dentries for NFS */
	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
	struct block_device	*s_bdev;
	struct backing_dev_info *s_bdi;
	struct mtd_info		*s_mtd;
	struct hlist_node	s_instances;
	unsigned int		s_quota_types;	/* Bitmask of supported quota types */
	struct quota_info	s_dquot;	/* Diskquota specific options */

	struct sb_writers	s_writers;

	/*
	 * Keep s_fs_info, s_time_gran, s_fsnotify_mask, and
	 * s_fsnotify_marks together for cache efficiency. They are frequently
	 * accessed and rarely modified.
	 */
	void			*s_fs_info;	/* Filesystem private info */

	/* Granularity of c/m/atime in ns (cannot be worse than a second) */
	u32			s_time_gran;
	/* Time limits for c/m/atime in seconds */
	time64_t		   s_time_min;
	time64_t		   s_time_max;
#ifdef CONFIG_FSNOTIFY
	__u32			s_fsnotify_mask;
	struct fsnotify_mark_connector __rcu	*s_fsnotify_marks;
#endif

	char			s_id[32];	/* Informational name */
	uuid_t			s_uuid;		/* UUID */

	unsigned int		s_max_links;

	/*
	 * The next field is for VFS *only*. No filesystems have any business
	 * even looking at it. You had been warned.
	 */
	struct mutex s_vfs_rename_mutex;	/* Kludge */

	/*
	 * Filesystem subtype.  If non-empty the filesystem type field
	 * in /proc/mounts will be "type.subtype"
	 */
	const char *s_subtype;

	const struct dentry_operations *s_d_op; /* default d_op for dentries */

	struct shrinker s_shrink;	/* per-sb shrinker handle */

	/* Number of inodes with nlink == 0 but still referenced */
	atomic_long_t s_remove_count;

	/*
	 * Number of inode/mount/sb objects that are being watched, note that
	 * inodes objects are currently double-accounted.
	 */
	atomic_long_t s_fsnotify_connectors;

	/* Read-only state of the superblock is being changed */
	int s_readonly_remount;

	/* per-sb errseq_t for reporting writeback errors via syncfs */
	errseq_t s_wb_err;

	/* AIO completions deferred from interrupt context */
	struct workqueue_struct *s_dio_done_wq;
	struct hlist_head s_pins;

	/*
	 * Owning user namespace and default context in which to
	 * interpret filesystem uids, gids, quotas, device nodes,
	 * xattrs and security labels.
	 */
	struct user_namespace *s_user_ns;

	/*
	 * The list_lru structure is essentially just a pointer to a table
	 * of per-node lru lists, each of which has its own spinlock.
	 * There is no need to put them into separate cachelines.
	 */
	struct list_lru		s_dentry_lru;
	struct list_lru		s_inode_lru;
	struct rcu_head		rcu;
	struct work_struct	destroy_work;

	struct mutex		s_sync_lock;	/* sync serialisation lock */

	/*
	 * Indicates how deep in a filesystem stack this SB is
	 */
	int s_stack_depth;

	/* s_inode_list_lock protects s_inodes */
	spinlock_t		s_inode_list_lock ____cacheline_aligned_in_smp;
	struct list_head	s_inodes;	/* all inodes */

	spinlock_t		s_inode_wblist_lock;
	struct list_head	s_inodes_wb;	/* writeback inodes */
} __randomize_layout;

系统中所有超级块是通过链表链接的，链表头为super_blocks，如下图为超级块的关联图：

针对struct super_operations，主要是索引节点相关的操作接口、mount相关的接口等。

include/linux/fs.h

struct super_operations {
   	struct inode *(*alloc_inode)(struct super_block *sb);
	void (*destroy_inode)(struct inode *);
	void (*free_inode)(struct inode *);

   	void (*dirty_inode) (struct inode *, int flags);
	int (*write_inode) (struct inode *, struct writeback_control *wbc);
	int (*drop_inode) (struct inode *);
	void (*evict_inode) (struct inode *);
	void (*put_super) (struct super_block *);
	int (*sync_fs)(struct super_block *sb, int wait);
	int (*freeze_super) (struct super_block *, enum freeze_holder who);
	int (*freeze_fs) (struct super_block *);
	int (*thaw_super) (struct super_block *, enum freeze_holder who);
	int (*unfreeze_fs) (struct super_block *);
	int (*statfs) (struct dentry *, struct kstatfs *);
	int (*remount_fs) (struct super_block *, int *, char *);
	void (*umount_begin) (struct super_block *);

	int (*show_options)(struct seq_file *, struct dentry *);
	int (*show_devname)(struct seq_file *, struct dentry *);
	int (*show_path)(struct seq_file *, struct dentry *);
	int (*show_stats)(struct seq_file *, struct dentry *);
#ifdef CONFIG_QUOTA
	ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
	struct dquot __rcu **(*get_dquots)(struct inode *);
#endif
	long (*nr_cached_objects)(struct super_block *,
				  struct shrink_control *);
	long (*free_cached_objects)(struct super_block *,
				    struct shrink_control *);
	void (*shutdown)(struct super_block *sb);
};

3.2、目录项（dentry）

存放目录项与对应文件进行链接的信息，目录本身也是一个文件，当目录对应索引节点表示目录时，则索引节点（inode）中的块数据是该目录下所有文件或者子目录的名称；当目录对应索引节点表示文件时，则索引节点（inode）中的块数据中保存的是文件的数据。每一个目录或者文件均有一个索引节点（inode），该一个索引节点则可以和多个目录项关联。

include/linux/dcache.h

struct dentry {
	/* RCU lookup touched fields */
	unsigned int d_flags;		/* protected by d_lock */
	seqcount_spinlock_t d_seq;	/* per dentry seqlock */
	struct hlist_bl_node d_hash;	/* lookup hash list */
	struct dentry *d_parent;	/* parent directory */
	struct qstr d_name;
	struct inode *d_inode;		/* Where the name belongs to - NULL is
					 * negative */
	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */

	/* Ref lookup also touches following */
	struct lockref d_lockref;	/* per-dentry lock and refcount */
	const struct dentry_operations *d_op;
	struct super_block *d_sb;	/* The root of the dentry tree */
	unsigned long d_time;		/* used by d_revalidate */
	void *d_fsdata;			/* fs-specific data */

	union {
		struct list_head d_lru;		/* LRU list */
		wait_queue_head_t *d_wait;	/* in-lookup ones only */
	};
	struct list_head d_child;	/* child of parent list */
	struct list_head d_subdirs;	/* our children */
	/*
	 * d_alias and d_rcu can share memory
	 */
	union {
		struct hlist_node d_alias;	/* inode alias list */
		struct hlist_bl_node d_in_lookup_hash;	/* only for in-lookup ones */
	 	struct rcu_head d_rcu;
	} d_u;
} __randomize_layout;

通过d_parent、d_subdirs，一个文件系统的所有dentry组成了一个树状的结构。同时，在系统运行时为加快对dentry的访问，linux内核使用hash表缓存dentry，定义的名称为dentry_cache（内核的多个模块，在涉及的快速查找相关的变量，均会使用cache，如路由cache）。

3.3、索引节点

索引节点主要表示一个文件相关的信息，包括文件的大小、文件的创建时间、文件的修改时间、文件的最近访问时间、文件的块数据大小

include/linux/fs.h

/*
 * Keep mostly read-only and often accessed (especially for
 * the RCU path lookup and 'stat' data) fields at the beginning
 * of the 'struct inode'
 */
struct inode {
	umode_t			i_mode;
	unsigned short		i_opflags;
	kuid_t			i_uid;
	kgid_t			i_gid;
	unsigned int		i_flags;

#ifdef CONFIG_FS_POSIX_ACL
	struct posix_acl	*i_acl;
	struct posix_acl	*i_default_acl;
#endif

	const struct inode_operations	*i_op;
	struct super_block	*i_sb;
	struct address_space	*i_mapping;

#ifdef CONFIG_SECURITY
	void			*i_security;
#endif

	/* Stat data, not accessed from path walking */
	unsigned long		i_ino;
	/*
	 * Filesystems may only read i_nlink directly.  They shall use the
	 * following functions for modification:
	 *
	 *    (set|clear|inc|drop)_nlink
	 *    inode_(inc|dec)_link_count
	 */
	union {
		const unsigned int i_nlink;
		unsigned int __i_nlink;
	};
	dev_t			i_rdev;
	loff_t			i_size;
	struct timespec64	i_atime;
	struct timespec64	i_mtime;
	struct timespec64	__i_ctime; /* use inode_*_ctime accessors! */
	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
	unsigned short          i_bytes;
	u8			i_blkbits;
	u8			i_write_hint;
	blkcnt_t		i_blocks;

#ifdef __NEED_I_SIZE_ORDERED
	seqcount_t		i_size_seqcount;
#endif

	/* Misc */
	unsigned long		i_state;
	struct rw_semaphore	i_rwsem;

	unsigned long		dirtied_when;	/* jiffies of first dirtying */
	unsigned long		dirtied_time_when;

	struct hlist_node	i_hash;
	struct list_head	i_io_list;	/* backing dev IO list */
#ifdef CONFIG_CGROUP_WRITEBACK
	struct bdi_writeback	*i_wb;		/* the associated cgroup wb */

	/* foreign inode detection, see wbc_detach_inode() */
	int			i_wb_frn_winner;
	u16			i_wb_frn_avg_time;
	u16			i_wb_frn_history;
#endif
	struct list_head	i_lru;		/* inode LRU list */
	struct list_head	i_sb_list;
	struct list_head	i_wb_list;	/* backing dev writeback list */
	union {
		struct hlist_head	i_dentry;
		struct rcu_head		i_rcu;
	};
	atomic64_t		i_version;
	atomic64_t		i_sequence; /* see futex */
	atomic_t		i_count;
	atomic_t		i_dio_count;
	atomic_t		i_writecount;
#if defined(CONFIG_IMA) || defined(CONFIG_FILE_LOCKING)
	atomic_t		i_readcount; /* struct files open RO */
#endif
	union {
		const struct file_operations	*i_fop;	/* former ->i_op->default_file_ops */
		void (*free_inode)(struct inode *);
	};
	struct file_lock_context	*i_flctx;
	struct address_space	i_data;
	struct list_head	i_devices;
	union {
		struct pipe_inode_info	*i_pipe;
		struct cdev		*i_cdev;
		char			*i_link;
		unsigned		i_dir_seq;
	};

	__u32			i_generation;

#ifdef CONFIG_FSNOTIFY
	__u32			i_fsnotify_mask; /* all events this inode cares about */
	struct fsnotify_mark_connector __rcu	*i_fsnotify_marks;
#endif

#ifdef CONFIG_FS_ENCRYPTION
	struct fscrypt_info	*i_crypt_info;
#endif

#ifdef CONFIG_FS_VERITY
	struct fsverity_info	*i_verity_info;
#endif

	void			*i_private; /* fs or device private pointer */
} __randomize_layout;

3.4、文件系统类型

主要用于说明文件系统的类型，并提供超级块、根目录项、根节点的创建接口等。

/home/zouguoyuan/project/nari-scm905/scm905/linux/include/linux/fs.h

struct file_system_type {
	const char *name;
	int fs_flags;
#define FS_REQUIRES_DEV		1 
#define FS_BINARY_MOUNTDATA	2
#define FS_HAS_SUBTYPE		4
#define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
#define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
#define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
#define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
	int (*init_fs_context)(struct fs_context *);
	const struct fs_parameter_spec *parameters;
	struct dentry *(*mount) (struct file_system_type *, int,
		       const char *, void *);
	void (*kill_sb) (struct super_block *);
	struct module *owner;
	struct file_system_type * next;
	struct hlist_head fs_supers;

	struct lock_class_key s_lock_key;
	struct lock_class_key s_umount_key;
	struct lock_class_key s_vfs_rename_key;
	struct lock_class_key s_writers_key[SB_FREEZE_LEVELS];

	struct lock_class_key i_lock_key;
	struct lock_class_key i_mutex_key;
	struct lock_class_key invalidate_lock_key;
	struct lock_class_key i_mutex_dir_key;
};

系统中所有已注册的文件系统，也是通过链表链接在一起，表头为file_systems，如下为已注册文件系统的关联图：

四、举例应用

代码将展示如何使用 register_filesystem 函数来注册文件系统类型。

步骤概览

实现文件系统操作（file_operations）：首先，你需要定义自己的文件系统操作，比如打开、读取、写入等。
定义文件系统结构：然后，你需要定义一个文件系统结构，它将包括文件系统的名字、操作以及其他必需的函数。
注册文件系统类型：最后，你需要在内核模块的初始化函数中调用 register_filesystem 来注册这个文件系统。

下面是一个完整的简单示例：

1. 定义文件系统操作和文件系统结构

我们假设我们的文件系统非常简单，仅仅实现了文件打开和读取操作。我们将使用一个内存中的虚拟文件系统进行测试。

#include <linux/fs.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/string.h>
#include <linux/slab.h>

// 定义文件操作结构
static struct file_operations myfs_file_ops = {
    .open = simple_open,
    .read = simple_read_from_buffer,
};

// 定义超级块操作结构
static struct super_operations myfs_super_ops = {
    .statfs = simple_statfs,
};

// 定义一个超级块结构
struct myfs_super_block {
    unsigned long magic;
};

// 定义超级块挂载和卸载
static struct dentry *myfs_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data) {
    struct dentry *dentry;
    struct super_block *sb;
    struct myfs_super_block *myfs_sb;

    sb = sget(fs_type, NULL, set_anon_super, flags, NULL);
    if (IS_ERR(sb)) {
        return ERR_CAST(sb);
    }

    myfs_sb = kzalloc(sizeof(struct myfs_super_block), GFP_KERNEL);
    if (!myfs_sb) {
        return ERR_PTR(-ENOMEM);
    }
    
    sb->s_fs_info = myfs_sb;
    sb->s_op = &myfs_super_ops;

    dentry = mount_bdev(fs_type, flags, dev_name, data, NULL);
    if (IS_ERR(dentry)) {
        kfree(myfs_sb);
        return dentry;
    }

    printk(KERN_INFO "myfs mounted successfully\n");
    return dentry;
}

static void myfs_kill_superblock(struct super_block *sb) {
    struct myfs_super_block *myfs_sb = sb->s_fs_info;
    kfree(myfs_sb);
    kill_block_super(sb);
}

static struct file_system_type myfs_fs_type = {
    .owner = THIS_MODULE,
    .name = "myfs",
    .mount = myfs_mount,
    .kill_sb = myfs_kill_superblock,
    .fs_flags = FS_REQUIRES_DEV,
};

2. 初始化和清理模块

在 init 和 exit 函数中完成文件系统的注册和注销操作。

static int __init myfs_init(void) {
    int ret;

    // 注册文件系统
    ret = register_filesystem(&myfs_fs_type);
    if (ret != 0) {
        printk(KERN_ERR "myfs registration failed\n");
        return ret;
    }

    printk(KERN_INFO "myfs file system registered successfully\n");
    return 0;
}

static void __exit myfs_exit(void) {
    // 注销文件系统
    unregister_filesystem(&myfs_fs_type);
    printk(KERN_INFO "myfs file system unregistered\n");
}

module_init(myfs_init);
module_exit(myfs_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Your Name");
MODULE_DESCRIPTION("A simple virtual file system example.");

3. 实现必要的操作

在上面的示例中，我们使用了 simple_open 和 simple_read_from_buffer 作为基础文件操作函数。你可以根据需要自定义这些操作，或者使用简单的内核接口。

4. 测试代码

完成以上代码后，你可以编写一个简单的测试程序，尝试挂载文件系统并进行文件操作。

加载模块： 使用 insmod 命令加载模块：

```
sudo insmod myfs.ko
```
挂载文件系统： 使用 mount 命令将文件系统挂载到一个目录：
```
sudo mount -t myfs /dev/sda1 /mnt/myfs
```
请注意，你的代码示例中并没有实现具体的设备驱动，所以可以根据你的需求调整文件系统类型，或者使用 tmpfs 或其他设备来进行挂载测试。
卸载文件系统： 使用 umount 卸载文件系统：
```
sudo umount /mnt/myfs
```
卸载模块： 使用 rmmod 卸载内核模块：

sudo rmmod myfs