Kafka反解域名导致发消息失败:java.io.IOException: Can't resolve address: kafka-05:9092

本文描述了在跨集群环境中向Kafka发送数据时遇到的主机名解析问题,详细分析了错误原因并提供了具体解决方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题描述

       由于项目需求,需要跨集群向kafka发数据,发数据程序所在集群没有配置目标kafka集群的主机名,所以只能使用目标集群的ip地址。经测试两个集群网络通信是没有问题的。但是发kafka数据程序运行一直报错,显示无法解析kafka主机名。

 

问题详情

约定:由于跨集群发kafka数据,程序运行所在集群和目标kafka集群属于2个不同集群。为方便描述,程序运行所在集群简称为A集群,目标kafka集群为B集群。

1 程序主要代码

final static String KAFKA_BROKER_LIST
            = "172.25.102.70:9092,172.25.102.71:9092,172.25.102.72:9092,172.25.102.75:9092,172.25.102.76:9092";

 Properties props = buildKafkaProperties(KAFKA_BROKER_LIST);
 KafkaProducer producer = new KafkaProducer<String, String>(props);
 String to_topic = "liu-text-20200507";
 String record = "message record";
 producer.send(new ProducerRecord<String, String>(to_topic,
                null,
                record));

 producer.close();



 //构造kafka配置参数
 public static Properties buildKafkaProperties(String kafka_broker_list){
        Properties props = new Properties();
        props.put("bootstrap.servers", kafka_broker_list);
        props.put("acks", "all"); //是否成功的标准
        props.put("retries", "3"); //重试次数
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        return props;
    }

由代码中可知,我传入的地址是ip,而不是目标kafka主机域名。

 

2 程序错误信息

2020-05-07 11:41  WARN kafka-producer-network-thread | producer-1 clients.NetworkClient:873 - [Producer clientId=producer-1] Error connecting to node kafka-05:9092 (id: 115 rack: null)
java.io.IOException: Can't resolve address: kafka-05:9092
	at org.apache.kafka.common.network.Selector.doConnect(Selector.java:235)
	at org.apache.kafka.common.network.Selector.connect(Selector.java:214)
	at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:864)
	at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:265)
	at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:266)
	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238)
	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:176)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.UnresolvedAddressException
	at sun.nio.ch.Net.checkAddress(Net.java:101)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
	at org.apache.kafka.common.network.Selector.doConnect(Selector.java:233)
	... 7 more

错误提示未解析的地址异常,不认识主机域名 kafka-05!

3 初步推断

注意:A,B集群为上文约定的简称

应当是构造kafkaproducer的时候,和B集群通信时,在B集群根据ip地址反解析了B集群的主机名列表,拿回到A集群,然后作为kafkaproducer的通信地址,结果查询A集群hosts,并没有找到这些主机名对应的ip地址,因此才报出错误。

4 debug程序排查

debug程序时发现

    a) 构造kafkaproducer的时候,的确会有反解析主机名的操作,不过时根据A集群本身的hosts配置,如果自己hosts并没有配置这些ip域名,则仍然使用ip列表作为地址。所以问题并不发生在本阶段

    b) send数据时,kafkaproducer会和B集群kafka的broker进行通信,在B集群拿到了B集群的kafka主机域名,并更新kafkaproducer本身的通信地址。到真正发数据时,在A集群查询kafka主机域名时,发现并没有找到,然后又尝试一步步查询外网所有DNS域名服务器(这一步非常耗时!),结果还是没有查到,导致解析主机名失败,抛出异常!

:kafkaproducer根据ip到达B集群,查询kafka broker元数据时,查询的是kafka配置的zookeeper,我们进入B集群的zookeeper,可以查看kafka元数据信息。

进入zookeeper命令窗口后,进行下列操作

[zk: localhost:2181(CONNECTED) 0] ls /brokers/ids
[201, 202, 203, 204, 205]
[zk: localhost:2181(CONNECTED) 1] get /brokers/ids/201
{"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://kafka-01:9092"],"jmx_port":9393,"host":"kafka-01","timestamp":"1585815634552","port":9092,"version":4}

可以看到每个kafka broker都可以查到host主机名和port端口号

5 解决办法

在A集群上所有计算节点配上B集群kafka所有主机名映射ip,避免B集群反解析出域名,返回A集群解析这些kafka主机名失败。然后就可以愉快的向B集群kafka发数据了!

 

 

--- apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: flinksql-hive-test-job spec: flinkConfiguration: classloader.resolve-order: parent-first pekko.ask.timeout: '30s' taskmanager.numberOfTaskSlots: '4' execution.checkpointing.interval: 60000ms state.backend.type: 'rocksdb' execution.checkpointing.incremental: true execution.checkpointing.dir: 'hdfs:///tmp/checkpoints' execution.checkpointing.savepoint-dir: 'hdfs:///tmp/savepoints' security.kerberos.login.keytab: '/opt/keytab/work.keytab' security.kerberos.login.principal: 'work/work@BAIDU.COM' flinkVersion: v1_20 image: 'swr.cn-east-3.myhuaweicloud.com/yifanzhang/flink:1.20_cdc_mysql_to_paimon_starrocks' imagePullPolicy: Always job: jarURI: 'local:///opt/flink/lib/flink-sql-runner-1.12.0.jar' args: ["/opt/flink/sql-scripts/flinksql.sql"] state: running upgradeMode: savepoint jobManager: replicas: 1 resource: cpu: 2 memory: 4096m taskManager: replicas: 1 resource: cpu: 4 memory: 8192m restartNonce: 0 serviceAccount: flink podTemplate: apiVersion: v1 kind: Pod spec: containers: # don't modify this name - name: flink-main-container volumeMounts: - name: flinksql mountPath: /opt/flink/sql-scripts - name: hadoop-config mountPath: /opt/hadoop/etc/hadoop - name: krb5-config mountPath: /etc/krb5.conf subPath: krb5.conf - name: keytab mountPath: /opt/keytab volumes: - configMap: name: flinksql-hive-test-configmap name: flinksql - configMap: name: hadoop-config-configmap name: hadoop-config - configMap: name: krb5-config-configmap name: krb5-config - secret: secretName: work-keytab name: keytab hostAliases: - ip: "10.8.75.101" hostnames: - "dn1.bmr.cde.cscec8b.com.cn" - ip: "10.8.75.102" hostnames: - "dn2.bmr.cde.cscec8b.com.cn" apiVersion: v1 data: flinksql.sql: |- SET 'hadoop.security.authentication' = 'kerberos'; SET 'hive.metastore.sasl.enabled' = 'true'; SET 'hive.metastore.kerberos.principal' = 'nm/dn1.bmr.cde.cscec8b.com.cn@BAIDU.COM'; -- 替换为实际principal -- 测试部分:验证Hive表读取 CREATE CATALOG hive WITH ( 'type' = 'hive', 'hadoop-conf-dir' = '/opt/hadoop/etc/hadoop', 'hive-conf-dir' = '/opt/hadoop/etc/hadoop' ); -- 测试查询2:抽样10条数据验证字段 SELECT person_no, second_unit_code, end_time, post_status_id FROM hive.dws.dws_user_system_total_user_quantity_log WHERE rfq = (SELECT MAX(rfq) FROM hive.dws.dws_user_system_total_user_quantity_log) LIMIT 10; kind: ConfigMap metadata: name: flinksql-hive-test-configmap 以上2个脚本有什么问题,为什么报错如下: Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Can't get Master Kerberos principal for use as renewer at java.base/java.util.concurrent.FutureTask.report(Unknown Source) at java.base/java.util.concurrent.FutureTask.get(Unknown Source) at org.apache.flink.connectors.hive.MRSplitsGetter.getHiveTablePartitionMRSplits(MRSplitsGetter.java:79) ... 13 more Caused by: java.io.IOException: Can't get Master Kerberos principal for use as renewer at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:134) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
最新发布
07-14
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值