作者: 阳龙生
Flink1.13及其之前
Flink1.13及其之前写入Kafka都可以使用这个类:
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer
此类继承自TwoPhaseCommitSinkFunction,他是个抽象类,继承了flink的Sink输出类和快照接口。这是我们以前常用做法,两阶段提交,里面有我们常用的开始事务,预提交,提交,和取消事务的方法。
public abstract class TwoPhaseCommitSinkFunction<IN, TXN, CONTEXT> extends RichSinkFunction<IN>
implements CheckpointedFunction, CheckpointListener {
protected abstract TXN beginTransaction() throws Exception;
protected abstract void preCommit(TXN transaction) throws Exception;
protected abstract void commit(TXN transaction);
protected abstract void abort(TXN transaction);
}
Flink1.14版本
而在Flink1.14版本该类已经废弃,
/**
* @deprecated Please use {@link org.apache.flink.connector.kafka.sink.KafkaSink}.
*/
@Deprecated
@PublicEvolving
public class FlinkKafkaProducer<IN>
extends TwoPhaseCommitSinkFunction<
IN,
FlinkKafkaProducer.KafkaTransactionState,
FlinkKafkaProducer.KafkaTransactionContext> {
官方提示我们用这个类:org.apache.flink.connector.kafka.sink.KafkaSink。
public class KafkaSink<IN> implements Sink<IN, KafkaCommittable, KafkaWriterState, Void> {
但是我们看到这个类并没有继承TwoPhaseCommitSinkFunction,虽然它创建了KafkaWriter,但这个KafkaWriter也没有继承TwoPhaseCommitSinkFunction,那我们Flink1.14是怎么实现分布式事务的呢?让我们分析源码一探究竟,了解官方kafkaSink有利于我们项目中自定义其他数据库sink实现类似一致性语义。
Api的改变:
旧api:

新api:

我们可以看到新版KafkaSink实现的是这个接口interface Sink<InputT, CommT, WriterStateT, GlobalCommT>,而这个接口就是我们实现一次性语义的重要方法:

注意看到,里面有个很重要的状态KafkaWriterState,sink接口有个两个很重要的方法:createCommitter createGlobalCommitter
我们看到以下方法注释再次提到了两阶段提交 2-phase-commit,
/**
* Creates a {@link Committer} which is part of a 2-phase-commit protocol. The {@link
* SinkWriter} creates committables through {@link SinkWriter#prepareCommit(boolean)} in the
* first phase. The committables are then passed to this committer and persisted with {@link
* Committer#commit(List)}. If a committer is returned, the sink must also return a {@link
* #getCommittableSerializer()}.
*
* @return A committer for the 2-phase-commit protocol.
* @throws IOException for any failure during creation.
*/
Optional<Committer<CommT>> createCommitter() throws IOException;
/**
* Creates a {@link GlobalCommitter} which is part of a 2-phase-commit protocol. The {@link
* SinkWriter} creates committables through {@link SinkWriter#prepareCommit(boolean)} in the
* first phase. The committables are then passed to the Committer and persisted with {@link
* Committer#commit(List)}. The committables are also passed to this {@link GlobalCommitter} of
* which only a single instance exists. If a global committer is returned, the sink must also
* return a {@link #getCommittableSerializer()} and {@link #getGlobalCommittableSerializer()}.
*
* @return A global committer for the 2-phase-commit protocol.
* @throws IOException for any failure during creation.
*/
Optional<GlobalCommitter<CommT, GlobalCommT>> createGlobalCommitter() throws IOException;
KafkaSink中实现
@Override
public Optional<Committer<KafkaCommittable>> createCommitter() throws IOException {
return Optional.of(new KafkaCommitter(kafkaProducerConfig));
}
于是我们来到KafkaCommitter类:
/**
* Committer implementation for {@link KafkaSink}
*
* <p>The committer is responsible to finalize the Kafka transactions by committing them.
*/
class KafkaCommitter implements Committer<KafkaCommittable>, Closeable {
/**
* This class holds the necessary information to construct a new {@link FlinkKafkaInternalProducer}
* to commit transactions in {@link KafkaCommitter}.
*/
class KafkaCommittable {
可以看到事务主要是通过上面两个类来处理的,从状态中的恢复和事务提交等问题。