HBase - KYlin build cube时出现问题的汇总

本文总结了在使用HBase和Kylin构建数据立方体时遇到的问题及其解决方案,包括HBase scan超时、Build Dimension Dictionary时的cardinality Too high错误、Dup key found问题、USER_ID全局dict构建失败以及Reduce阶段的OOM问题。针对这些问题,提供了调整配置、修改模型结构和优化内存分配等策略。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.在单条非空数据时出现null值而导致报错

Vertex failed, vertexName=Map 1, vertexId=vertex_1494251465823_0017_1_01, diagnostics=[Task failed, taskId=task_1494251465823_0017_1_01_000021, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {
  
  "weibo_id":"3959784874733088","content":null,"json_file":null,"geohash":null,"user_id":"3190257607","time_id":null,"city_id":null,"province_id":null,"country_id":null,"unix_time":null,"pic_url":null,"lat":null,"lon":null}
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
    at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {
  
  "weibo_id":"3959784874733088","content":null,"json_file":null,"geohash":null,"user_id":"3190257607","time_id":null,"city_id":null,"province_id":null,"country_id":null,"unix_time":null,"pic_url":null,"lat":null,"lon":null}
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
    ... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {
  
  "weibo_id":"3959784874733088","content":null,"json_file":null,"geohash":null,"user_id":"3190257607","time_id":null,"city_id":null,"province_id":null,"country_id":null,"unix_time":null,"pic_url":null,"lat":null,"lon":null}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
    at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
    ... 17 more

实际检索Hive时发现该条微博是非空的

select * from check_in_table where weibo_id = '3959784874733088';

result:
3959784874733088    #清明祭英烈#今天的和平安定是先烈们用生命换来的,我们要珍惜今天的和平生活,努力学习,早日实现中国梦。 http://t.cn/R2dLEhU {...}   wtn901f5q32n    3190257607  1459526400000   1901    19    00    2016-04-02 12:02:26 0   28.31096    121.64364

修改星型模型得以解决

2.在Kylin Build时失败,显示了一个Hbase scan超时的错误

报错如下:

Vertex re-running, vertexName=Map 2, vertexId=vertex_1494251465823_0016_1_00
Vertex failed, vertexName=Map 1, vertexId=vertex_1494251465823_0016_1_01, diagnostics=[Task failed, taskId=task_1494251465823_0016_1_01_000015, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: org.apache.hadoop.hbase.client.ScannerTimeoutException: 425752ms passed since the last invocation, timeout is currently set to 60000
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
    at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
    at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值