CDH 6.3.2集成flink 1.18,在此基础上集成数据湖hudi 1.0,执行sql错误[ERROR] Could not execute SQL statement. Reason:
java.lang.ClassNotFoundException:org.apache.hudi.sink.StreamWriteOperatorCoordinator$Provider。
1.上传hudi相关jar
上传hudi相关jar到flink的lib目录下,CDH6.3.2集成了flink1.18,因此需要上传hudi1.0 jar,需要上传jar文件为:hudi-flink1.18-bundle-1.0.0-beta1.jar
2.flink集成hudi
到CDH集成flink的bin目录下,如/opt/cloudera/FLINK/lib/flink/bin执行./sql-client.sh ,执行脚本信息如下所示:
CREATE TABLE t1(
uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
name VARCHAR(10),
age INT,
ts TIMESTAMP(3),
`partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
WITH (
'connector' = 'hudi',
'path' = 'hdfs://192.168.104.20:8020/hudi/t1',
'table.type' = 'MERGE_ON_READ',
'read.streaming.enabled' = 'true'
);
INSERT INTO t1
VALUES ('id1', 'Danny', 23, TIMESTAMP '1970-01-01 00:00:01', 'par1'),
('id2', 'Stephen', 33, TIMESTAMP '1970-01-01 00:00:02', 'par1'),
('id3', 'Julian', 53, TIMESTAMP '1970-01-01 00:00:03', 'par2'),
('id4', 'Fabian', 31, TIMESTAMP '1970-01-01 00:00:04', 'par2'),
('id5', 'Sophia', 18, TIMESTAMP '1970-01-01 00:00:05', 'par3'),
('id6', 'Emma', 20, TIMESTAMP '1970-01-01 00:00:06', 'par3'),
('id7', 'Bob', 44, TIMESTAMP '1970-01-01 00:00:07', 'par4'),
('id8', 'Han', 56, TIMESTAMP '1970-01-01 00:00:08', 'par4');
执行脚本后错误信息如下:
[ERROR] Could not execute SQL statement. Reason:
java.lang.ClassNotFoundException: org.apache.hudi.sink.StreamWriteOperatorCoordinator$Provider
3.解决方案
脚本执行需加上参数,携带hudi jar,如下命令所示:
./sql-client.sh embedded -j ../lib/hudi-flink1.18-bundle-1.0.0-beta1.jar shell
4.相关大数据学习demo地址:
https://github.com/carteryh/big-data