Hadoop环境搭建
选择的版本:
hadoop:hadoop-2.7.2.tar.gz
Linux:CentOS 7
jdk:jdk-8u161-linux-x64.tar.gz
机器名:hadoop0(主),hadoop1,hadoop2
1.安装虚拟机,新建一个虚拟机
注意:新建完成后,启动前需要设置打开虚拟化
重启电脑->进入BIOS设置->开启下列选项
安装过程省略。。。
2.安装完成后,安装VMware Tools(可以实现本机与虚拟机文件的复制,粘贴)
3.关闭防火墙(不然可能导致虚拟机外无法连接访问)
查询防火墙状态:firewall-cmd --state
关闭防火墙:systemctl stop firewalld.service
禁止firewall开机启动:systemctl disable firewalld.service
4.安装jdk
将本地下载的Linux jdk包复制到虚拟机中,解压
tar -zxvf jdk-8u161-linux-x64.tar.gz
添加到系统环境变量:
方法一: /etc/profile(root权限)
export JAVA_HOME=/home/hadoop/jdk1.8.0_73
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
方法二:终端输入 vi ~/.bash_profile
添加:export JAVA_HOME=/home/hadoop/jdk1.8.0_73
export PATH=$JAVA_HOME/bin:$PATH
使得环境变量生效:source /etc/profile(或~/.bash_profile),也可以重新登陆
验证:输入 java version,出现版本信息即成功
5.安装ssh,并设置免密码登录
yum install ssh(若已安装,可省)
ssh-keygen -t rsa(一直回车就行)
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop0
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop1
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop2
6.Linux hostname设置 /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=机器名(例:hadoop0)
7.hostname和ip设置 /etc/hosts
hadoop0的IP地址 hadoop0
hadoop1的IP地址 hadoop1
hadoop2的IP地址 hadoop2
8.安装hadoop
tar -zxvf hadoop-2.7.2.tar.gz
设置环境变量
root用户打开/etc/profile,添加如下信息:
export HADOOP_HOME=/home/hadoop/hadoop-2.7.2
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin: $PATH
9.hadoop配置文件的修改:hadoop-name(hadoop安装路径)/etc/hadoop
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop0:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop-2.7.2/tmp</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>0</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop-2.7.2/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.7.2/dfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave2.hadoop:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop0:19888</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux_services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop0:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop0:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop0:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop0:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop0:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8096</value>
</property>
</configuration>
hadoop-env.sh(25行)、yarn-env.sh(23行)、mapred-env.sh(16行)、httpfs-env.sh(最后)添加以下内容:
export JAVA_HOME=/home/hadoop/jdk1.8.0_73
配置slaves
~/etc/hadoop/slaves添加所有slaves节点名称:(排版如下,写在顶格)
hadoop1
hadoop2
10.启动
对NameNode做格式化:在hadoop0运行即可
hdfs namenode -format
启动:
start-all.sh
验证:
在各个slave上,运行jps,可以查看到DataNode和NodeManager两个进程
访问job监控页面:http://hadoop0(虚拟机外需输入hadoop0的ip地址,下同):8088/
访问namenode监控页面:http://hadoop0:50070/
停止:
stop-all.sh