1.1Hbase定义
HBase是一种分布式、可扩展(动态上下线)、支持海量数据的NoSQL(KEY-VALUE)数据库
1.2数据模型
逻辑上数据模型和关系型数据库类似,数据存在一张表中。底层物理逻辑是K-V键值对。
与mysql区别:1.将列分成了列簇 (一行包含很多列簇)(宽表切分)2.行被切成了Region(瘦表切分)
逻辑结构:
物理存储:
row key,列簇,列名,时间戳,type,value
都是PUT会显示时间戳大的,删除是type为delete,配合时间戳查询的时候看是否删除了
1.2.2数据模型
1)Namespcae 类似database
2)Region:表的切片,类似于mysql的表概念。HBase定义表的时候只需要定义列簇,不需要具体的列,列动态增加的。
3)Row: 每一行数据是一个RowKey和多个Column组成,按照Rowkey的字典顺序存储,查询时只能用Rowkey检索
4)Column:列是有列簇和列限定符进行限定
5)timestamp:时间戳,表示数据的不同版本
6)cell 单元格,由以上五个字段可以唯一确定一个单元,cell中数据没有类型,是字节码形式存储。
1.3HBase基本架构
Region 放在Region server上,有多个region server,分布式存储
master:负责表的增删改查,分配regions到每个regionserver,监控每个RS的状态
备用master实现高可用
1.4HBase搭建和相关配置
下载对应HADOOP!版本HBASE!
配置文件修改
hbase-site.xml
注意rootdir的端口要和hdfs-site里面的fs default端口一致
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master.port</name>
<value>16000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
<description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. </description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/apache-zookeeper-3.5.9-bin/zkData</value>
<description>Property from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored. </description>
</property>
</configuration>
hbase-env.sh
@rem/**
@rem * Licensed to the Apache Software Foundation (ASF) under one
@rem * or more contributor license agreements. See the NOTICE file
@rem * distributed with this work for additional information
@rem * regarding copyright ownership. The ASF licenses this file
@rem * to you under the Apache License, Version 2.0 (the
@rem * "License"); you may not use this file except in compliance
@rem * with the License. You may obtain a copy of the License at
@rem *
@rem * http://www.apache.org/licenses/LICENSE-2.0
@rem *
@rem * Unless required by applicable law or agreed to in writing, software
@rem * distributed under the License is distributed on an "AS IS" BASIS,
@rem * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem * See the License for the specific language governing permissions and
@rem * limitations under the License.
@rem */
@rem Set environment variables here.
@rem The java implementation to use. Java 1.7+ required.
@rem set JAVA_HOME=c:\apps\java
@rem Extra Java CLASSPATH elements. Optional.
@rem set HBASE_CLASSPATH=
@rem The maximum amount of heap to use. Default is left to JVM default.
@rem set HBASE_HEAPSIZE=1000
@rem Uncomment below if you intend to use off heap cache. For example, to allocate 8G of
@rem offheap, set the value to "8G".
@rem set HBASE_OFFHEAPSIZE=1000
@rem For example, to allocate 8G of offheap, to 8G:
@rem etHBASE_OFFHEAPSIZE=8G
@rem Extra Java runtime options.
@rem Below are what we set by default. May only work with SUN JVM.
@rem For more on why as well as other possible settings,
@rem see http://wiki.apache.org/hadoop/PerformanceTuning
@rem JDK6 on Windows has a known bug for IPv6, use preferIPv4Stack unless JDK7.
@rem @rem See TestIPv6NIOServerSocketChannel.
set HBASE_OPTS="-XX:+UseConcMarkSweepGC" "-Djava.net.preferIPv4Stack=true"
@rem Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
set HBASE_MASTER_OPTS=%HBASE_MASTER_OPTS% "-XX:PermSize=128m" "-XX:MaxPermSize=128m" "-XX:ReservedCodeCacheSize=256m"
set HBASE_REGIONSERVER_OPTS=%HBASE_REGIONSERVER_OPTS% "-XX:PermSize=128m" "-XX:MaxPermSize=128m" "-XX:ReservedCodeCacheSize=256m"
@rem Uncomment below to enable java garbage collection logging for the server-side processes
@rem this enables basic gc logging for the server processes to the .out file
@rem set SERVER_GC_OPTS="-verbose:gc" "-XX:+PrintGCDetails" "-XX:+PrintGCDateStamps" %HBASE_GC_OPTS%
@rem this enables gc logging using automatic GC log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. Either use this set of options or the one above
@rem set SERVER_GC_OPTS="-verbose:gc" "-XX:+PrintGCDetails" "-XX:+PrintGCDateStamps" "-XX:+UseGCLogFileRotation" "-XX:NumberOfGCLogFiles=1" "-XX:GCLogFileSize=512M" %HBASE_GC_OPTS%
@rem Uncomment below to enable java garbage collection logging for the client processes in the .out file.
@rem set CLIENT_GC_OPTS="-verbose:gc" "-XX:+PrintGCDetails" "-XX:+PrintGCDateStamps" %HBASE_GC_OPTS%
@rem Uncomment below (along with above GC logging) to put GC information in its own logfile (will set HBASE_GC_OPTS)
@rem set HBASE_USE_GC_LOGFILE=true
@rem Uncomment and adjust to enable JMX exporting
@rem See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
@rem More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
@rem
@rem set HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false"
@rem set HBASE_MASTER_OPTS=%HBASE_JMX_BASE% "-Dcom.sun.management.jmxremote.port=10101"
@rem set HBASE_REGIONSERVER_OPTS=%HBASE_JMX_BASE% "-Dcom.sun.management.jmxremote.port=10102"
@rem set HBASE_THRIFT_OPTS=%HBASE_JMX_BASE% "-Dcom.sun.management.jmxremote.port=10103"
@rem set HBASE_ZOOKEEPER_OPTS=%HBASE_JMX_BASE% -Dcom.sun.management.jmxremote.port=10104"
@rem File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
@rem set HBASE_REGIONSERVERS=%HBASE_HOME%\conf\regionservers
@rem Where log files are stored. $HBASE_HOME/logs by default.
@rem set HBASE_LOG_DIR=%HBASE_HOME%\logs
@rem A string representing this instance of hbase. $USER by default.
@rem set HBASE_IDENT_STRING=%USERNAME%
@rem Seconds to sleep between slave commands. Unset by default. This
@rem can be useful in large clusters, where, e.g., slave rsyncs can
@rem otherwise arrive faster than the master can service them.
@rem set HBASE_SLAVE_SLEEP=0.1
@rem Tell HBase whether it should manage it's own instance of Zookeeper or not.
@rem set HBASE_MANAGES_ZK=true
regionserver
jamjar@master:/opt/hbase-1.3.2/conf$ cat regionservers
master
slave1
slave2
ln -s 超链接hadoop core-site.xml 和 hdfs-site.xml
1.5Hbase启动
常用命令
开启 Master :
# sh hbase-1.4.13/bin/hbase-daemon.sh start master
关闭 Master:
# sh hbase-1.4.13/bin/hbase-daemon.sh stop master
开启 RegionServer :
# sh hbase-1.4.13/bin/hbase-daemon.sh start regionserver
停止 RegionServer :
# sh hbase-1.4.13/bin/hbase-daemon.sh stop regionserver
集群群体开启命令:
# sh hbase-1.4.13/bin/start-hbase.sh
集群群体关闭命令:
# sh hbase-1.4.13/bin/stop-hbase.sh
致谢:原文链接:https://blog.csdn.net/lyq19870515/article/details/103398180