hadoop-2.6.0+hbase-0.98.8+zookeeper-3.4.6+hive-0.14.0集群搭建详解

本文链接：https://blog.csdn.net/lingco/article/details/42171443

所用的版本都是apache上的稳定版。经过测试效果还行。没有出太多的问题。

初步想法是用hbase做数据的插入，然后通过hive之流做查询和数据的展现。

一.环境

我这里试验的有四台机器，都是centOS x86_64

基本配置如下

s3      resourceManager namenode，hmaster,hregionserver
s4      secondarynamenode,hregionserver
s5      datanode01,hregionserver,nodemanager
s6      datanode02,hregionserver,nodemanager

首先设置服务器别名，在/etc/hosts文件中把这四台机器的ip和别名加进去

然后设置路径啥的，在/etc/profile文件中把java_home, path之类的添上

加进去以后用 . /etc/profile使之生效，注意前面的点。

然后建立用户名，起名叫hadoop.以避免使用root.四台机器上都要有。

二.建立四台机器间的无密码互访

在hadoop的用户名里，生成自己机器的公钥，然后把这个公钥发给其它三台服务器，让其它三台服务器加到自己的公钥库里，就可以实现无密码互访。具体步骤如下。以下如无特殊说明都在s3服务器上

1. 修改ssh配置文件在/etc/ssh/sshd_config 中，把如下的三行反注释
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

表明启动公钥验证，其它公钥文件存在用户主目录下的.ssh/authorized_keys文件之中.重启sshd服务， service sshd restart. 这个要用root执行

2. su hadoop,在hadoop, 用户主目录中建立 .ssh 文件夹，chmod 700

3.在.ssh 文件夹中建立authorized_keys 文件 chmod 600,注意这两个chmod一定要做，不然不行。

4.生成公钥，ssh-keygen -t rsa. 这样会生成一个叫id_rsa.pub的文件，就是公钥了。把这个公钥添加到自己的authorized_keys文件之中，方法 cat id_rsa.pub >> authorized_keys. 这样，ssh s3 就能无密码登录到s3上了。。。就是个验证，这样做显得很二。

5.把自己的公钥拷贝到其它三台机器上 scp id_rsa.pub s4:/home/hadoop/s3_rsa.pub.

6.在其它机器上依次生成自己的公钥并把其它机器的公钥加入authorized_keys 文件。这样保证互联互通。

三.安装

zookeeper, hadoop, hbase,这些都是需要四台机器上都有一样的。我的安装目录如下。当然jdk也是需要安装的。这就不讲了。。。

/home/hadoop/hadoop

/home/hadoop/hbase

/home/hadoop/zookeeper

安装就是把下载的tar.gz包解压改名，很简单。

主要是配置。hadoop的配置文件在其文件夹下的/etc/hadoop文件夹中，以我的为例/home/hadoop/hadoop/etc/hadoop/

hbase和zookeeper的配置都在conf文件夹下

这个配置讲的很多，我就不多说了。

简单说下端口，因为这个涉及到的端口很多，大家如果不确定到底哪些端口能用http来访问，可能这么干

注意一点，hbase-env.sh配置中有一个export HBASE_MANAGES_ZK=false这个表明用外部的zookeeper，如果是true则是内部

先jps查出进程，如图所示

然后用netstat -anp | grep 进程号来看他们使用了哪些端口，一般能查看的是这几个

http://localhost:50070/（MapReduce的Web页面）
http://localhost:8088
·All Applications的界面；
http://localhost:9000/
http://localhost:60010/
http://localhost:50090/ secondary namenode
http://localhost:50075/

然后记得把所有的bin加到path里，以便直接输入命令能访问到

四.启动及验证

要记得关掉防火墙，service iptables stop.

启动顺序如下，先zookeeper, zkServer.sh start,这个所有机器上都要做

然后启动hadoop, start-dfs.sh, start-yarn.sh,只做首机就行

然后启动hbase, start-hbase.sh,只做首机

然后在每台机器上执行下jps看看是不是都启动了。

然后可以用hbase hbck命令来看一下状态。

五.java连接hbase

我写了一个简单的测试类，可以把这个打成一个jar包，在服务器上执行一下

代码如下，代码的作用就是把一个文件传到hbase上sm_attachments的一个表里。我可能会把这个jar包传网上来。。。

package com.z4;

/**
 * @author 爱育黎拨力八达
 * @param
 * @date 
 * @version
 * @since
 * @throws
 * @return
 * @class
 * @extends
 * @type
 * @public
 * @private
 */
import java.io.BufferedReader;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.util.Date;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.*;

public class TestClient {

	/**
	 * <p>
	 * <br/>
	 * 
	 * @title main<br/>
	 * @date 2014年12月23日 上午10:52:10<br/>
	 * @author 爱育黎拨力八达
	 * @version v1.0.0
	 *          </p>
	 * 
	 * @param args
	 *            <pre>
	 * 无匹配数据：
	 * 成功：
	 * 失败：
	 * </pre>
	 */

	// TODO Auto-generated method stub

	public static void main(String[] args) throws IOException {


		// Configuration conf = HBaseConfiguration.create();
		// conf.set("hbase.zookeeper.quorum", "daaserver");
		// conf.set("hbase.zookeeper.property.clientPort", "2181");
		//
		generateRowKey("aa");
		
		if (args.length < 3) {
			System.out.println("usage java -jar t.jar [host] [filename] [indexState(0 or 1)]");
			return;
		}		

		String host = args[0];
		String file = args[1];
		String indexState=args[2];

		Configuration conf = HBaseConfiguration.create();

		HBaseAdmin admin = new HBaseAdmin(conf);
		System.out.println("data inputing ......");
		conf.set("hbase.zookeeper.quorum", host);
		conf.set("hbase.zookeeper.property.clientPort", "2181");
		try {
			HTable table = new HTable(conf, "sm_attachments");
			//Put put = new Put(generateRowKey("test_" + file + "_"));
			Date t=new Date();
			//Long time= t.getTime();
			String pre="test_";
			//Put put = new Put(Bytes.toBytes(pre+Long.toString(time)));
			Put put = new Put(generateRowKey(pre));
					
			System.out.println("inserting data ......");




			put.add(Bytes.toBytes("info"), Bytes.toBytes("name"),
					Bytes.toBytes("000"));

			put.add(Bytes.toBytes("info"), Bytes.toBytes("indexState"),
					Bytes.toBytes(indexState));

			put.add(Bytes.toBytes("content"), Bytes.toBytes("attachments"),
					readFileByBytes(file));
			table.put(put);
			System.out.println("data load finished.");
		} catch (Exception e) {
			System.out.println("error");
			System.out.println(e.getMessage());
		} finally {
			admin.close();
		}


	}

	public static byte[] readFileByBytes(String fileName) {
		File file = new File(fileName);
		InputStream in = null;
		try {
			System.out.println("以字节为单位读取文件内容，一次读一个字节：");
			in = new FileInputStream(file);

			ByteArrayOutputStream byout = new ByteArrayOutputStream();
			int readLen;
			int bufLength = 20 * 1024;
			byte[] buf = new byte[bufLength];
			while ((readLen = in.read(buf, 0, bufLength)) > 0) {
				byout.write(buf, 0, readLen);
			}
			byte[] byteContent;

			byteContent = byout.toByteArray();

			in.close();
			return byteContent;
		} catch (IOException e) {
			e.printStackTrace();
			return "aaa".getBytes();
		}
	}

}

六.hive与hbase的整合

用hive目的就是用类sql的语法来查询hadoop上的数据。所以我们设计的是让数据源源不断出去hbase,然后就hive查出来。

这个版本的hive不用依赖其它包可以查询hbase的表

方法如下，先在hive里生成一个表，并且做到hbase的映射

CREATE TABLE book(id int,name string,publisher string, cost float)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES
("hbase.columns.mapping"=":key,info:name,info:publisher,info:cost") 
TBLPROPERTIES ("hbase.table.name"="book");

这样就生成了一个表叫book,并且映射到了hbase.

然后我们在hbase shell中list就能看到这个表了，然后put 'book','1','info:name', 'gone with wind'，就插入一条数据了。

然后在hive里查询,如图所示。