最近在进行Hbase数据迁移的时候,发现HbaseSnapshot是比较常用的一种方法,个人理解快照应该是元数据而不包含数据的,类似与HDFS快照
如果使用HDFS进行数据迁移要使用到distcp.但是为什么hbase得快照迁移不需要迁移数据呢,执行命令就可以通过快照做到数据迁移
sudo –u hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot s1 –copy-from hdfs://ip:port/hbase –copy-to hdfs://ip:port/hbase –mappers 16 –chuser hbase –chgroup hbase
我了解到的Hbase快照的定义是
HBase表快照时瞬时的,几乎对整个机器没有任何性能影响
HBase快照是一系列的元数据信息集合,可以通过快照将表恢复到生成快照时的状态
快照不是表的复制,而是记录元数据(表信息和域)和数据(HFiles,内存存储,WALs)一系列操作的集合,在生成快照操作中没有任何执行复制数据的动作
既然本身并没有数据是如何进行得迁移呢,答案在ExportSnapshot中
You can export any snapshot from one cluster to another. Exporting the
snapshot copies the table’s hfiles, logs, and the snapshot metadata,
from the source cluster to the destination cluster. Specify the
-copy-from option to copy from a remote cluster to the local cluster or another remote cluster. If you do not specify the -copy-from
option, the hbase.rootdir in the HBase configuration is used, which
means that you are exporting from the current cluster. You must
specify the -copy-to option, to specify the destination cluster.
The ExportSnapshot tool executes a MapReduce Job similar to distcp to
copy files to the other cluster. It works at file-system level, so the
HBase cluster can be offline.
Run ExportSnapshot as the hbase user or the user that owns the files.
If the user, group, or permissions need to be different on the
destination cluster than the source cluster, use the -chuser,
-chgroup, or -chmod options as in the second example below, or be sure the destination directory has the correct permissions. In the
following examples, replace the HDFS server path and port with the
appropriate ones for your cluster.
当时混淆了Snapshot和ExportSnapshot得含义,以为ExportSnapshot仅仅是导出一个快照,实际上并不是…
记录一下