文章目录
参考官网:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
实验环境:
172.25.0.1 server1: NameNode DFSZKFailoverController ResourceManager
172.25.0.5 server5: NameNode DFSZKFailoverController ResourceManager
172.25.0.2 server2: JournalNode QuorumPeerMain DataNode NodeManager
172.25.0.3 server3: JournalNode QuorumPeerMain DataNode NodeManager
172.25.0.4 server4: JournalNode QuorumPeerMain DataNode NodeManager
1.设置单节点群集
实验环境:server1
1.1 安装软件及设置
解压hadoop
[root@server1 ~]# su - red ##使用普通用户
[red@server1 ~]$ ls
hadoop-3.2.1.tar.gz jdk-8u181-linux-x64.tar.gz
[red@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz
[red@server1 ~]$ ls
hadoop-3.2.1.tar.gz jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
[red@server1 ~]$ ln -s jdk-8u181/ jdk
[red@server1 ~]$ ls
hadoop-3.2.1.tar.gz jdk jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
[red@server1 ~]$ tar zxf hadoop-3.2.1.tar.gz
[red@server1 ~]$ ls
hadoop-3.2.1 hadoop-3.2.1.tar.gz jdk jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
[red@server1 ~]$ ln -s hadoop-3.2.1 hadoop
[red@server1 ~]$ ls
hadoop hadoop-3.2.1 hadoop-3.2.1.tar.gz jdk jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz
设置环境变量:
vim hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/red/jdk
export HADOOP_HOME=/home/red/hadoop
[red@server2 ~]$ vim .bash_profile
[red@server2 ~]$ cat .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HOME/hadoop/bin:$HOME/jdk/bin
export PATH
[red@server2 ~]$ source .bash_profile
[red@server2 ~]$ hadoop
1.2 独立运行
默认情况下,Hadoop被配置为以非分布式模式作为单个Java进程运行。
下面的示例复制解压缩的conf目录以用作输入,然后查找并显示给定正则表达式的每个匹配项。 输出被写入给定的输出目录。
[red@server1 ~]$ mkdir input
[red@server1 ~]$ cp hadoop/etc/hadoop/*.xml input
[red@server1 ~]$ ls input/
capacity-scheduler.xml hadoop-policy.xml httpfs-site.xml kms-site.xml yarn-site.xml
core-site.xml hdfs-site.xml kms-acls.xml mapred-site.xml
[red@server1 ~]$ hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
[red@server1 ~]$ cd output/
[red@server1 output]$ cat *
1 dfsadmin
1.3 伪分布式操作
Hadoop也可以以伪分布式模式在单节点上运行,其中每个Hadoop守护程序都在单独的Java进程中运行。
vim hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
vim hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
设置ssh免密
[red@server1 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/red/.ssh/id_rsa):
Created directory '/home/red/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/red/.ssh/id_rsa.
Your public key has been saved in /home/red/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:IimMgV1OCo2rVOq6sS/eUyL7wwO0klylhAMEZ1b2U4Y red@server1
The key's randomart image is:
+---[RSA 2048]----+
|B+=.= .o |
|oO.O oEo |
|o.B + o |
|.B o . . |
|B.= o . S |
|+* o o . |
|+ = o |
|o+.= |
|+=oo+ |
+----[SHA256]-----+
[red@server1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[red@server1 ~]$ chmod 0600 ~/.ssh/authorized_keys
本地运行MapReduce作业
格式化文件系统:
[red@server1 ~]$ hdfs namenode -format
启动NameNode守护程序和DataNode守护程序:
[red@server1 ~]$ cd hadoop/sbin/
[red@server1 sbin]$ ./start-dfs.sh
hadoop守护程序日志输出将写入$ HADOOP_LOG_DIR目录(默认为$ HADOOP_HOME / logs)
浏览Web界面的NameNode;默认情况下,它在以下位置可用:
NameNode-http:// localhost:9870 /
设置执行MapReduce作业所需的HDFS目录:
[red@server1 ~]$ hdfs dfs -ls
ls: `.': No such file or directory
[red@server1 ~]$ hdfs dfs -ls /
[red@server1 ~]$ hdfs dfs -mkdir /user
[red@server1 ~]$ hdfs dfs -mkdir /user/red
[red@server1 ~]$ hdfs dfs -ls
将输入文件复制到分布式文件系统中:
[red@server1 ~]$ hdfs dfs -put input/
运行示例:
[red@server1 ~]$ rm -fr input output
[red@server1 ~]$ hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
检查输出文件:将输出文件从分布式文件系统复制到本地文件系统并检查它们:
[red@server1 ~]$ hdfs dfs -ls
Found 2 items
drwxr-xr-x - red supergroup 0 2020-07-15 18:44 input
drwxr-xr-x - red supergroup 0 2020-07-15 18:48 output
[red@server1 ~]$ hdfs dfs -cat output/*
2020-07-15 18:51:17,368 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
1 dfsadmin
[red@server1 ~]$ hdfs dfs -get output
2020-07-15 18:51:40,542 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[red@server1 ~]$ ls
hadoop hadoop-3.2.1 hadoop-3.2.1.tar.gz jdk jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz output
[red@server1 ~]$ cd output/
[red@server1 output]$ ls
part-r-00000 _SUCCESS
删除输出文件