利用ZooKeeper搭建Hadoop的高可用集群


1.高可用集群

为了解决单点故障问题,Hadoop在更新迭代过程中,允许一个HDFS集群中存在多个NameNode节点。一个NameNode节点处于Active状态,其它NameNode节点处于Standby状态。处于Active状态的NameNode管理HDFS的元数据信息,与客户端进行交互。处于Standby状态的NameNode节点仅同步处于Active状态的NameNode节点所管理的元数据。一旦发现处于Active状态的NameNode发生故障,Hadoop会借助ZooKeeper从而确保整个HDFS集群可以正常运行。
在这里插入图片描述

1.HDFS高可用集群

1.JournalNode

HDFS高可用集群在启动时,会在多台服务器中启动JournalNode进程,服务器的数量要满足2N+1台,即最少在3台服务器中启动JournalNode进程,多个JournalNode进程会组成一个Qurom Journal Manager实现多个NameNode之间元数据的同步。当处于Active状态的NameNode工作时,会将元数据保存到Qurom Journal Manager中,而处于Standby状态的NameNode会通过Qurom Journal Manager将元数据同步给自己。
Qurom Journal Manager主要用于保存EditLog,Fsimage文件依旧保存在NameNode的本地磁盘上。每个JournalNode节点保存相同的EditLog副本,每次NameNode向本地磁盘写EditLog时,也会并行的将EditLog写入到Qurom Journal Manager。

2.QuorumPeerMain

QuorumPeerMain是ZooKeeper服务的进程,能够确保ZooKeeper集群正常运行,以此来保证启动HDFS高可用时,每个NameNode都会向ZooKeeper中创建一个持久节点,用来记录NameNode的状态信息。
HDFS高可用集群中只能存在一个Active状态的NameNode,与HDFS Federation 功能可以实现多个 Active 状态的 NameNode 是不同的。在HDFS高可用集群中不管是Active状态的NameNode,还是Standby状态的NameNode都使用同一个NameSpace。而在HDFS Federation功能中,所有Active状态的NameNode之间相互独立,拥有各自对应的NameSpace。

3.ZKFailoverController

HDFS高可用集群在启动时,会为每个NameNode所运行的服务器分配一个ZKFailoverController进程,该进程主要用于实现HDFS中NameNode主备切换的功能,即当处于Active状态的NameNode宕机时,从多个处于Standby状态的NameNode中选举出一个新的NameNode,并将其状态更改为Active。
ZKFailoverController的内部包含HealthMonitor和ActiveStandbyElector两个组件,其中HealthMonitor 主要负责检测NameNode的健康状态;ActiveStandbyElector主要负责借助ZooKeeper完成NameNode的主备切换。


2.YARN高可用集群

为了解决单点故障问题,Hadoop在更新迭代过程中,允许单个YARN集群中存在多个ResourceManager节点。一个ResourceManager节点处于Active状态,其它ResourceManager节点处于Standby状态。处于Active状态的ResourceManager负责对集群的资源进行监控、调度和管理等工作。处于Standby状态的ResourceManager不做任何工作,仅等待着接收转为Active状态的消息。一旦发现处于Active状态的ResourceManager发生故障,Hadoop会借助ZooKeeper,从多个处于Standby状态的ResourceManager中选举出一个新的ResourceManager,并将其状态更改为Active,确保整个YARN集群可以正常运行,这就是YARN高可用集群。
在这里插入图片描述
在ResourceManager内部有一个内嵌的基于ZooKeeper的ActiveStandbyElector组件。当YARN启动时,ResourceManager中的ActiveStandbyElector组件会通过选举规则选出Active状态的ResourceManager。
YARN集群在运行的过程中,处于Active状态的ResourceManager会定期将状态信息存储在RMStateStore,当处于Active状态的ResourceManager宕机时,新选举的ResourceManager会从RMStateStore读取状态信息,尽可能的恢复ResourceManager宕机之前的操作。
RMStateStore支持两种持久化实现,分别是FileSystemRMStateStore和ZKRMStateStore,前者是通过文件系统持久化状态信息,后者是通过ZooKeeper持久化状态信息。由于ZKRMStateStore默认只允许一个Active状态的ResourceManager写入状态信息,避免由于网络延迟等原因,使YARN中出现多个Active状态的ResourceManager写入状态信息从而造成混淆,所以可以起到很好的隔离机制,在YARN集群中推荐使用ZKRMStateStore持久化状态信息。

2.高可用集群搭建

在搭建高可用集群前,首先得启动Zookeeper的服务,Zookeeper集群搭建

#创建高可用集群目录
 mkdir hadoop-HA
 #解压hadoop安装包
 tar -zxvf hadoop-3.3.0.tar.gz -C /export/servers/hadoop-HA/
 #把hadoop添加到环境变量里面
 vi /etc/profile
 export HADOOP_HOME=/export/servers/hadoop-HA/hadoop-3.3.0
 export PATH=$PATH:$HADOOP_HOME/bin
 source /etc/profile

在hadoop的安装目录里的etc/hadoop文件里面修改hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml和workers文件,配置如下:


vi hadoop-env.sh
#在文件最后加上
export JAVA_HOME=/export/servers/jdk1.8.0_241
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
vi core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
	    <name>fs.defaultFS</name>
	    <value>hdfs://ns1</value>
</property>
<property>
	    <name>hadoop.tmp.dir</name>
	    <value>/export/data/hadoop-HA/hadoop/</value>
</property>
<property>
	    <name>ha.zookeeper.quorum</name>
	    <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>
<property>
	    <name>hadoop.http.staticuser.user</name>
	    <value>root</value>
</property>
<property>
	    <name>hadoop.proxyuser.root.hosts</name>
	    <value>*</value>
</property>
<property>
	    <name>hadoop.proxyuser.root.groups</name>
	    <value>*</value>
</property>
</configuration>
vi hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
	<name>dfs.replication</name>
	<value>3</value>
</property>
<property>
	<name>dfs.namenode.name.dir</name>
	<value>/export/data/hadoop/namenode</value>
</property>
<property>
	<name>dfs.datanode.data.dir</name>
	<value>/export/data/hadoop/datanode</value>
</property>
<property>
	<name>dfs.nameservices</name>
	<value>ns1</value>
</property>
<property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn1,nn2</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.ns1.nn1</name>
    <value>hadoop1:9000</value>
</property>
<property>
    <name>dfs.namenode.http-address.ns1.nn1</name>
    <value>hadoop1:9870</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.ns1.nn2</name>
    <value>hadoop2:9000</value>
</property>
<property>
    <name>dfs.namenode.http-address.ns1.nn2</name>
    <value>hadoop2:9870</value>
</property>
<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/ns1</value>
</property>
<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/export/data/journaldata</value>
</property>
<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>
<!-- 此处<value></value>的内容较长所以进行了换行处理,在实际操作时不需要换行-->
<property>
    <name>dfs.client.failover.proxy.provider.ns1</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
    <name>dfs.permissions.enable</name>
    <value>false</value>
</property>
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>
        sshfence
        shell(/bin/true)
    </value>
</property>
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
</property>
<property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
</property>

</configuration>
vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
	   <name>mapreduce.framework.name</name>
	   <value>yarn</value>
</property>
<property>
	   <name>mapreduce.jobhistory.address</name>
	   <value>hadoop1:10020</value>
</property>
<property>
	   <name>mapreduce.jobhistory.webapp.address</name>
	   <value>hadoop1:19888</value>
</property>
<property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
    <name>mapreduce.map.env</name>
    <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
    <name>mapreduce.reduce.env</name>
    <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
vi yarn-site.xml
<?xml version