一 故障描述

9月22日,全国kafka集群中的其中一台kafka因磁盘空间不足宕机后,业务会受到影响,无法生产与消费消息。程序报错:

WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)

二 故障模拟

2.1 topic分区的replicas为1时情形

#生产消息

[root@Centos7-Mode-V8 kafka]# bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd

>aa

>bb

#消费消息:

[root@Centos7-Mode-V8 kafka]#  bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd

aa

bb

2.1.1 模拟关掉该topic所属leader节点

#用kafka tool查看该topic的分区的leader在哪个节点上

/*

用kafka命令也可以看

bin/kafka-topics.sh --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --describe

结果输出如下:

*/

关掉其leader节点,发现生产者和所有消费者进程都一直在刷如下信息:

[2021-09-23 17:09:53,495] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)

无法发送消息,也无法消费消息。

2.1.2 模拟关掉非leader节点

有时消费者进程会报错:[2021-09-23 17:21:22,480] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] Connection to node 2147483645 (/192.168.144.253:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

报错期间可以正常生产消息,但无法消费这中间产生的数据。

2.1.3 总结

在分区replicats等于1的情况下,停掉任意一个节点,都会影响业务。

其中,当某个分区leader所在节点宕机,会影响生产消息与消费消息。

当非leader节点宕机,会影响消费消息。

2.2 分区有多个副本情形

分区在无其他副本情况下,影响业务可以理解,因此尝试为topic配置多个副本,发现竟然还是影响业务:

#创建一个拥有三副本的topic

bin/kafka-topics.sh --create --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --replication-factor 3 --partitions 1 --topic song

#查看副本信息

[root@Centos7-Mode-V8 kafka]# bin/kafka-topics.sh  --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --describe --topic song

Topic:song PartitionCount:1 ReplicationFactor:3 Configs:

Topic: song Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1

#发消息

bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song

#消费进程1

bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g1

#消费进程2

bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g2

#模拟关掉该topic所属leader节点

发现还能生产消息,没有报1 partitions have leader brokers without a matching listener错了,但是发现消费者在连不上topic leader后,有时报错:

[2021-09-24 19:01:06,316] WARN [Consumer clientId=consumer-1, groupId=console-consumer-27609] Connection to node 2147483647 (/192.168.144.247:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

这期间生产的数据有时没有过来,无法消费节点故障期间产生的消息。

只是为什么有了多个副本之后节点宕机还是会丢消息呢?

答:__consumer_offsets只有1个副本,会导致即使拥有多个副本的topic也无法实现高可用。

#后来通过扩kafka自带的这个topic(__consumer_offsets)的副本,可以实现其他普通topic的高可用了,虽然停掉某个节点后,还是报Broker may not be available,但是不再影响业务了。

三 故障定位

Kafka配置文件中没配置default.replication.factor=3,而该参数默认为1,表示没有其他副本,因此相当于是单点。

四 解决办法

4.1 修改default.replication.factor参数

修改所有kafka节点配置文件,调大topic的默认副本因子(该参数默认为1):

default.replication.factor=3

设置了default.replication.factor=3,offsets.topic.replication.factor也会默认为3。

注意,不要设置了default.replication.factor=3,又设置offsets.topic.replication.factor=1,这样offsets.topic.replication.factor的值会覆盖default.replication.factor的值。

#重启kafka,使配置生效

systemctl restart kafka

4.2 为现有普通topic扩副本

可参考https://blog.csdn.net/yabingshi_tech/article/details/120443647

4.3 为__consumer_offset扩副本

方法同上,json文件如下:

{"version": 1, "partitions": [{"topic": "__consumer_offsets", "partition": 0, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 1, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 2, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 3, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 4, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 5, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 6, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 7, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 8, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 9, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 10, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 11, "replicas": [0, 1, 2]},{"topic": "__consumer_offsets", "partition": 12, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 13, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 14, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 15, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 16, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 17, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 18, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 19, "replicas": [0, 1, 2                ]},{"topic": "__consumer_offsets", "partition": 20, "replicas": [0, 1, 2                ]},{"topic": "__consumer_offsets", "partition": 21, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 22, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 23, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 24, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 25, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 26, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 27, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 28, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 29, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 30, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 31, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 32, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 33, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 34, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 35, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 36, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 37, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 38, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 39, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 40, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 41, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 42, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 43, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 44, "replicas": [2, 0, 1 ]},{"topic": "__consumer_offsets", "partition": 45, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 46, "replicas": [0, 1, 2 ]},{"topic": "__consumer_offsets", "partition": 47, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 48, "replicas": [1, 2, 0 ]},{"topic": "__consumer_offsets", "partition": 49, "replicas": [2, 0, 1 ]}]
}

--本篇文章参考了:Kafka突然宕机了?稳住,莫慌!

1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.相关推荐

  1. kafka服务器报错1 partitions have leader brokers without a matching listener, including [topic_log-0]

    服务器部分报错信息截图 2022-06-11 17:18:18,140 (PollableSourceRunner-KafkaSource-r1) [WARN - org.apache.kafka.c ...

  2. 【kafka】连接kafka报错 partitions have leader brokers without a matching listener

    1.概述 一个正常的kafka消费者,开始正常,后来报错 partitions have leader brokers without a matching listener WARN [tag-se ...

  3. 连接kafka报错:1 partitions have leader brokers without a matching listener

    服务输出部分错误日志截图 2020/12/25 下午2:32:442020-12-25 14:32:44.320 WARN [tag-service,,,] 1 --- [ntainer#4-0-C- ...

  4. WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers

    一 问题描述 同事反馈我们的三节点kafka集群当其中一台服务器宕机后,业务受到影响,无法生产与消费消息.程序报错: WARN [Consumer clientId=consumer-1, group ...

  5. 【Kafka】测试集群中Broker故障对客户端的影响

    本文主要测试Kafka集群中Broker节点故障对客户端的影响. 集群信息:4个broker.topic:100+(每个topic30个partition).集群加密方式:plaintext.存储:c ...

  6. OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions

    问题描述: OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions ...

  7. Kafka深度解析(如何在producer中指定partition)(转)

    原文链接:Kafka深度解析 背景介绍 Kafka简介 Kafka是一种分布式的,基于发布/订阅的消息系统.主要设计目标如下: 以时间复杂度为O(1)的方式提供消息持久化能力,即使对TB级以上数据也能 ...

  8. Docker安装Kafka(docker-compose.yml)

    Docker安装Kafka(docker-compose.yml) 前置条件 请先安装Docker 创建docker-compose.yml文件 version: '2' services:zooke ...

  9. kafka redis vs 发布订阅_发布订阅的消息系统 Kafka的深度解析

    背景介绍 Kafka简介 Kafka是一种分布式的,基于发布/订阅的消息系统.主要设计目标如下: 以时间复杂度为O(1)的方式提供消息持久化能力,即使对TB级以上数据也能保证常数时间的访问性能 高吞吐 ...

最新文章

  1. 开始升级我的工作流系统
  2. Trie树详解及其应用
  3. SMB(Server Message Block) Protocal Research
  4. ABAP 在程序中启动后台JOB
  5. SSL四次握手的过程
  6. spring mvc hello
  7. nodejs解析apk
  8. netlink的内核实现原理
  9. 探探经营范围变更:新增演出经纪和电信业务
  10. 智慧水务、智慧泵房、水厂监控、营收管理、DMA漏损、GIS系统、维护管理、档案管理、仓库管理、水质监控、数据中心、指挥调度中心、消防栓、管网、供水、水质、水厂调度、加压泵站、库存调拨、物料申请
  11. i++和++i哪个效率高
  12. Android Contacts 联系人源码分析
  13. 笔记︱盘点实验科学的三种实验模型(A/B实验、因果推断、强化学习)
  14. ih5连接mysql数据库_iH5高级教程:H5数据应用,多种数据的判断
  15. qq邮箱发送邮件到163邮箱
  16. javaweb项目运转流程
  17. 比较Perl、PHP、Python、Java和Ruby
  18. 新概念英语(第三册)复习(原文)——Lesson 21 - Lesson 30
  19. 累加器使用的注意点及自定义累加器
  20. 这3个今日头条常见的赚钱方法,掌握后,月入过万都不难

热门文章

  1. 【C语言基础】折半查找法
  2. 率土之滨显示没选择服务器,率土之滨 这些实用攻略还不知道你就OUT啦
  3. java字符串转换成时间Could not parse date: Unparseable date: 2018-12-28]
  4. Zipkin和Sleuth
  5. idea常用的快捷键和常用设置
  6. 民办双非二本学院,能有多少人上岸计算机研究生?看文华学院信息学部2022考研光荣榜...
  7. 信创环境下的Wps-Chrome浏览器插件开发
  8. Golang协程goroutine的调度与状态变迁分析
  9. 业务需求访谈之求生法则(已发表在IT168)
  10. JPQL(JPA的查询语句)