起因:
在原有canal项目上做了新需求,自测的时候习惯性开启info模式的日志,发现如下错误。
ps:该错误日志级别是warn,所以之前我们默认的都是ERROR,问了问周围同事,也都完全没有发现这个报错,也不知道是什么原因。
我还是自己来吧,践行我的工作签名:每天都在找bug中进步。
太长不看:
正确地、顺序的ack这些消费过的数据,就能解决该问题。
错误日志:
2021-07-21 16:38:29.896 [Thread-12] WARN c.a.o.c.c.impl.ClusterCanalConnector -something goes wrong when getWithoutAck data from server:/127.0.0.1:22299
com.alibaba.otter.canal.protocol.exception.CanalClientException: something goes wrong with reason: something goes wrong with channel:[id: 0x7aa00163, /127.0.0.1:53075 => /127.0.0.1:22299], exception=com.alibaba.otter.canal.meta.exception.CanalMetaManagerException: batchId:3 is not the firstly:1
看起来是说我们的batchId不对 ,为什么不对呢?client框架是搭好的也用了这么些年,没出啥问题呀,看看错误堆栈(我把项目路径隐藏了,但是不影响对问题的分析):
at com.alibaba.otter.canal.client.impl.SimpleCanalConnector.receiveMessages(SimpleCanalConnector.java:298)
at com.alibaba.otter.canal.client.impl.SimpleCanalConnector.getWithoutAck(SimpleCanalConnector.java:275)
at com.alibaba.otter.canal.client.impl.SimpleCanalConnector.getWithoutAck(SimpleCanalConnector.java:248)
at com.alibaba.otter.canal.client.impl.ClusterCanalConnector.getWithoutAck(ClusterCanalConnector.java:174)
at com.xxx.AbstractCanalClient.process(AbstractCanalClient.java:130)
at com.xxx.AbstractCanalClient.access$000(AbstractCanalClient.java:25)
at com.xxx.AbstractCanalClient$1.run(AbstractCanalClient.java:84)
at java.lang.Thread.run(Thread.java:748)
问题出在获取数据的时候,整个调用链关键代码:
我自己的client:
connector.getWithoutAck(batchSize)
ClusterCanalConnector.java
Message msg = currentConnector.getWithoutAck(batchSize);
SimpleCanalConnector.java
public Message getWithoutAck(int batchSize) throws CanalClientException {
return getWithoutAck(batchSize, null, null);
}
public Message getWithoutAck(int batchSize, Long timeout, TimeUnit unit) throws CanalClientException {
waitClientRunning();
try {
int size = (batchSize <= 0) ? 1000 : batchSize;
long time = (timeout == null || timeout < 0) ? -1 : timeout; // -1代表不做timeout控制
if (unit == null) {
unit = TimeUnit.MILLISECONDS;
}
writeWithHeader(channel,
Packet.newBuilder()
.setType(PacketType.GET)
.setBody(Get.newBuilder()
.setAutoAck(false)
.setDestination(clientIdentity.getDestination())
.setClientId(String.valueOf(clientIdentity.getClientId()))
.setFetchSize(size)
.setTimeout(time)
.setUnit(unit.ordinal())
.build()
.toByteString())
.build()
.toByteArray());
return receiveMessages();
} catch (IOException e) {
throw new CanalClientException(e);
}
}
堆栈显示的问题在receiveMessages()这里,通过debug解释:<