Hadoop HDFS操作指南-CSDN博客

本文链接：https://blog.csdn.net/qq_37356854/article/details/104868694

本文详细介绍了HDFS的背景、优缺点、架构以及Shell操作，包括文件上传、下载、删除、权限修改等。深入探讨了NameNode和DataNode的工作原理，并讲解了HDFS的数据流过程。此外，还涉及了HDFS的故障处理、多目录配置和2.x新特性，如文件存档。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.HDFS产生的背景及定义

在这里插入图片描述
HDFS的优点

高容错性
适合处理大数据
可构建在廉价的机器上

HDFS的缺点
不适合低时延数据访问
无法高效的对大量小文件进行存储
不支持并发写入、文件随机修改

HDFS组成架构
NameNode
DataNode
Client
Secondary NameNode

HDFS文件块大小

在这里插入图片描述

2.HDFS的Shell操作（开发重点）

1．基本语法bin/hadoop fs 具体命令 OR bin/hdfs dfs

具体命令dfs是fs的实现类。

（0）启动Hadoop集群（方便后续的测试）

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh
[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

（1）-help：输出这个命令参数

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -help rm

（2）-ls: 显示目录信息
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -ls /

（3）-mkdir：在HDFS上创建目录

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -mkdir -p /sanguo/shuguo

（4）-moveFromLocal：从本地剪切粘贴到HDFS

[atguigu@hadoop102 hadoop-2.7.2]$ touch kongming.txt
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -moveFromLocal ./kongming.txt /sanguo/shuguo

（5）-appendToFile：追加一个文件到已经存在的文件末尾

[atguigu@hadoop102 hadoop-2.7.2]$ touch liubei.txt[atguigu@hadoop102 hadoop-2.7.2]$ vi liubei.txt输入san gu mao lu
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -appendToFile liubei.txt /sanguo/shuguo/kongming.txt

（6）-cat：显示文件内容

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -cat /sanguo/shuguo/kongming.txt

（7）-chgrp 、-chmod、-chown：Linux文件系统中的用法一样，修改文件所属权限

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -chmod 666 /sanguo/shuguo/kongming.txt
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -chown atguigu:atguigu /sanguo/shuguo/kongming.txt

（8）-copyFromLocal：从本地文件系统中拷贝文件到HDFS路径去

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -copyFromLocal README.txt /

（9）-copyToLocal：从HDFS拷贝到本地

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -copyToLocal /sanguo/shuguo/kongming.txt ./.

（10）-cp ：从HDFS的一个路径拷贝到HDFS的另一个路径

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -cp /sanguo/shuguo/kongming.txt /zhuge.txt

（11）-mv：在HDFS目录中移动文件

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -mv /zhuge.txt /sanguo/shuguo/

（12）-get：等同于copyToLocal，就是从HDFS下载文件到本地

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -get /sanguo/shuguo/kongming.txt ./

（13）-getmerge：合并下载多个文件，比如HDFS的目录 /user/atguigu/test下有多个文件:log.1, log.2,log.3,…

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -getmerge /user/atguigu/test/* ./zaiyiqi.txt

（14）-put：等同于copyFromLocal

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -put ./zaiyiqi.txt /user/atguigu/test/

（15）-tail：显示一个文件的末尾

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -tail /sanguo/shuguo/kongming.txt

（16）-rm：删除文件或文件夹

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -rm /user/atguigu/test/jinlian2.txt

（17）-rmdir：删除空目录

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -mkdir /test
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -rmdir /test

（18）-du统计文件夹的大小信息

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -du -s -h /user/atguigu/test
2.7 K /user/atguigu/test
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -du -h /user/atguigu/test
1.3 K /user/atguigu/test/README.txt
15 /user/atguigu/test/jinlian.txt
1.4 K /user/atguigu/test/zaiyiqi.txt

（19）-setrep：设置HDFS中文件的副本数量

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -setrep 10 /sanguo/shuguo/kongming.txt

3.HDFS客户端操作

1．根据自己电脑的操作系统拷贝对应的编译后的hadoop jar包到非中文路径（例如：D:\Develop\hadoop-2.7.2），如图3-4所示。
在这里插入图片描述
2. 配置HADOOP_HOME环境变量，如图3-5所示。

3. 配置Path环境变量。
4. 创建一个Maven工程HdfsClientDemo
5．导入相应的依赖坐标+日志添加
6．创建包名：com.atguigu.hdfs
7．创建HdfsClient类
8．执行程序

3.1HDFS文件上传（测试参数优先级）

1.编写源代码

@Testpublic void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException { 	
	// 1 获取文件系统		
	Configuration configuration = new Configuration();
	configuration.set("dfs.replication", "2");					      
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu"); 		
	// 2 上传文件	
	fs.copyFromLocalFile(new Path("e:/banzhang.txt"), new Path("/banzhang.txt")); 	
	// 3 关闭资源		
	fs.close(); 	
	System.out.println("over");
	}

略

3.2HDFS文件下载

@Testpublic void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException{ 		
// 1 获取文件系统		
	Configuration configuration = new Configuration();	
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");			
	// 2 执行下载操作	
	// boolean delSrc 指是否将原文件删除		
	// Path src 指要下载的文件路径		
	// Path dst 指将文件下载到的路径		
	// boolean useRawLocalFileSystem 是否开启文件校验		
	fs.copyToLocalFile(false, new Path("/banzhang.txt"), new Path("e:/banhua.txt"), true);				
	// 3 关闭资源		
	fs.close();
	}

3.3HDFS文件夹删除

@Testpublic void testDelete() throws IOException, InterruptedException, URISyntaxException{ 	
	// 1 获取文件系统
	Configuration configuration = new Configuration();	
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");			
	// 2 执行删除	
	fs.delete(new Path("/0508/"), true);		
	// 3 关闭资源	
	fs.close();}

3.4HDFS文件名更改

public void testRename() throws IOException, InterruptedException, URISyntaxException{ 	
	// 1 获取文件系统	
	Configuration configuration = new Configuration();	
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu"); 			
	// 2 修改文件名称
	fs.rename(new Path("/banzhang.txt"), new Path("/banhua.txt"));		
	// 3 关闭资源	
	fs.close();}

3.5HDFS文件详情查看

查看文件名称、权限、长度、块信息

@Testpublic void testListFiles() throws IOException, InterruptedException, URISyntaxException{ 	
// 1获取文件系统	
	Configuration configuration = new Configuration();	
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu"); 			
	// 2 获取文件详情	
	RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);			
	while(listFiles.hasNext()){	
		LocatedFileStatus status = listFiles.next();			
		// 输出详情		
		// 文件名
		System.out.println(status.getPath().getName());
		// 长度		  
		System.out.println(status.getLen());		
		// 权限		  
		System.out.println(status.getPermission());	
		// 分组	
		System.out.println(status.getGroup());		
		// 获取存储的块信息		
		BlockLocation[] blockLocations = status.getBlockLocations();				
		for (BlockLocation blockLocation : blockLocations)  
		{							
		// 获取块存储的主机节点			
		String[] hosts = blockLocation.getHosts();							 
		for (String host : hosts) {							
		System.out.println(host);			}		}					
		System.out.println("-----------班长的分割线----------");	} // 3 关闭资源
		fs.close();
		}

3.6HDFS文件和文件夹判断

@Testpublic void testListStatus() throws IOException, InterruptedException, URISyntaxException{			
	// 1 获取文件配置信息	
	Configuration configuration = new Configuration();	
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");			
	// 2 判断是文件还是文件夹	
	FileStatus[] listStatus = fs.listStatus(new Path("/"));		
	for (FileStatus fileStatus : listStatus) {	
	// 如果是文件		
	if (fileStatus.isFile()) {
	System.out.println("f:"+fileStatus.getPath().getName());					
	}
	else {
	System.out.println("d:"+fileStatus.getPath().getName());
        //3 关闭资源	
	fs.close();}

3.7HDFS的I/O流操作

上面我们学的API操作HDFS系统都是框架封装好的。那么如果我们想自己实现上述API的操作该怎么实现呢？我们可以采用IO流的方式实现数据的上传和下载。

3.7.1HDFS文件上传

需求：把本地e盘上的banhua.txt文件上传到HDFS根目录
编写代码

@Testpublic void putFileToHDFS() throws IOException,InterruptedException, URISyntaxException { 	
	// 1 获取文件系统
	Configuration configuration = new Configuration();	
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu"); 	
	// 2 创建输入流	
	FileInputStream fis = new FileInputStream(new File("e:/banhua.txt")); 	
	// 3 获取输出流	
	FSDataOutputStream fos = fs.create(new Path("/banhua.txt")); 	
	// 4 流对拷	
	IOUtils.copyBytes(fis, fos, configuration); 	
	// 5 关闭资源	
	IOUtils.closeStream(fos);	
	IOUtils.closeStream(fis);   
	fs.close();
	}

3.7.2 HDFS文件下载

从HDFS上下载banhua.txt文件到本地e盘上
编写代码

// 文件下载
@Testpublic void getFileFromHDFS() throws IOException, InterruptedException, URISyntaxException{ 	
	// 1 获取文件系统	
	Configuration configuration = new

HDFS_学习笔记