最近在项目中用到了zookeeper的主从任务分发的功能。其中遇到了zookeeper server主动断开与client连接的问题:

2017-09-0109:45:57,390 INFO org.apache.zookeeper.ClientCnxn: Client session timed out,have not heard from server in 1668ms for sessionid 0x0, closing socketconnection and attempting reconnect2017-09-01 09:45:58,224 ERRORorg.apache.hadoop.ha.ActiveStandbyElector: Connection timed out: couldn'tconnect to ZooKeeper in 5000 milliseconds2017-09-0109:45:58,450 INFO org.apache.zookeeper.ZooKeeper: Session: 0x0 closed2017-09-0109:45:58,450 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down2017-09-0109:45:58,451 WARN org.apache.hadoop.ha.ActiveStandbyElector:org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =ConnectionLoss2017-09-0109:46:03,453 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,connectString=test-ssps-s-02:2181,test-ssps-s-03:2181,test-ssps-s-04:2181sessionTimeout=5000watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@7f0d36c2017-09-0109:46:03,455 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection toserver test-ssps-s-04/10.117.210.216:2181. Will not attempt to authenticateusing SASL (unknown error)2017-09-0109:46:03,463 INFO org.apache.zookeeper.ClientCnxn: Socket connectionestablished to test-ssps-s-04/10.117.210.216:2181, initiating session2017-09-0109:46:05,131 INFO org.apache.zookeeper.ClientCnxn: Client session timed out,have not heard from server in 1668ms for sessionid 0x0, closing socketconnection and attempting reconnect2017-09-0109:46:05,885 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection toserver test-ssps-s-03/10.51.20.155:2181. Will not attempt to authenticate usingSASL (unknown error)2017-09-0109:46:06,626 INFO org.apache.zookeeper.ClientCnxn: Socket connectionestablished to test-ssps-s-03/10.51.20.155:2181, initiating session2017-09-0109:46:08,293 INFO org.apache.zookeeper.ClientCnxn: Client session timed out,have not heard from server in 1667ms for sessionid 0x0, closing socketconnection and attempting reconnect2017-09-01 09:46:08,454 ERROR org.apache.hadoop.ha.ActiveStandbyElector:Connection timed out: couldn't connect to ZooKeeper in 5000 milliseconds2017-09-0109:46:09,157 INFO org.apache.zookeeper.ZooKeeper: Session: 0x0 closed2017-09-01 09:46:09,157 INFO org.apache.zookeeper.ClientCnxn:EventThread shut down2017-09-0109:46:09,158 WARN org.apache.hadoop.ha.ActiveStandbyElector:org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =ConnectionLoss

日志显示无法与zk建立连接,导致failover终止。

查看leader日志:

2017-09-0109:45:49,754 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Acceptedsocket connection from /10.51.20.155:407132017-09-0109:45:49,754 INFO org.apache.zookeeper.server.ZooKeeperServer: Clientattempting to establish new session at /10.51.20.155:407132017-09-0109:45:51,090 INFO org.apache.zookeeper.server.PrepRequestProcessor: Gotuser-level KeeperException when processing sessionid:0x15e2a2922a213b5type:setData cxid:0x21 zxid:0x6d00050f33 txntype:-1 reqpath:n/a ErrorPath:/yarn-leader-election/yarnRM/ActiveBreadCrumb Error:KeeperErrorCode =BadVersion for /yarn-leader-election/yarnRM/ActiveBreadCrumb2017-09-0109:45:52,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session0x35e2a2922671668, timeout of 5000ms exceeded2017-09-0109:45:52,001 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processedsession termination for sessionid: 0x35e2a29226716682017-09-0109:45:52,482 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Acceptedsocket connection from /10.117.68.10:335892017-09-0109:45:52,484 INFO org.apache.zookeeper.server.ZooKeeperServer: Clientattempting to renew session 0x15e2a2922a21870 at /10.117.68.10:335892017-09-0109:45:52,484 INFO org.apache.zookeeper.server.ZooKeeperServer: Establishedsession 0x15e2a2922a21870 with negotiated timeout 5000 for client/10.117.68.10:335892017-09-0109:45:52,884 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end ofstream exception2017-09-0109:45:52,884 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socketconnection for client /10.51.20.155:40712 which had sessionid 0x15de73379b600002017-09-0109:45:53,244 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Acceptedsocket connection from /10.51.20.155:407302017-09-0109:46:36,520 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Acceptedsocket connection from /10.51.20.155:407492017-09-01 09:46:29,773 WARN org.apache.zookeeper.server.persistence.FileTxnLog:fsync-ing the write ahead log in SyncThread:3 took53812ms which will adversely effect operation latency. See theZooKeeper troubleshooting guide2017-09-01 09:46:17,239 INFOorg.apache.zookeeper.server.quorum.Leader: Shutting down2017-09-0109:45:58,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session0x15e2a2922a21870, timeout of 5000ms exceeded2017-09-01 09:45:53,439 ERRORorg.apache.zookeeper.server.NIOServerCnxn: Unexpected Exception:java.nio.channels.CancelledKeyException atsun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) atsun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) atorg.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) atorg.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)atorg.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:171)atorg.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)2017-09-0109:46:36,648 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session0x15e2a2922a21871, timeout of 10000ms exceeded2017-09-0109:46:36,648 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session0x15e2a2922a213b5, timeout of 10000ms exceeded2017-09-0109:46:36,648 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session0x15de73379b60000, timeout of 6000ms exceeded2017-09-0109:46:36,648 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session0x25de7337bb00000, timeout of 6000ms exceeded2017-09-0109:46:36,648 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session0x35e2a2922671890, timeout of 10000ms exceeded2017-09-0109:46:36,648 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session0x15e2a2922a213b6, timeout of 10000ms exceeded2017-09-01 09:46:36,647 INFOorg.apache.zookeeper.server.quorum.Leader: Shutdown called

从日志信息可以看出,zookeerper 服务在同步日志过程中耗时太长,花了53812ms(正常应该在3秒内,server和client心跳检测时间这边设置为5s),同步日志会导致ZK无法响应外部请求,进而引发session过期,进而引发zk 服务端shut down。

从监控及分析结果来看,均属ZK服务端在fsync-ing the write ahead log日志时超长引起。

关于ZK日志存放,官网给出如下建议:

Having a dedicated log devicehas a large impact on throughput and stable latencies. It is highly recommenedto dedicate a log device and set dataLogDir to point to a directory on thatdevice, and then make sure to point dataDir to a directory not residing on thatdevice.

故为避免此类问题,dataLogDir存放目录应该与dataDir分开,可单独采用一套存储设备来存放ZK日志。

在ZOO.CFG中增加:forceSync=no

默认是开启的,为避免同步延迟问题,ZK接收到数据后会立刻去讲当前状态信息同步到磁盘日志文件中,同步完成后才会应答。将此项关闭后,客户端连接可以得到快速响应。

关闭forceSync选项后,会存在潜在风险,虽然依旧会刷磁盘(log.flush()首先被执行),但因为操作系统为提高写磁盘效率,会先写缓存,当机器异常后,可能导致一些zk状态信息没有同步到磁盘,从而带来ZK前后信息不一样问题。

zookeeper服务器主动断开与客户端的连接问题相关推荐

  1. 主动断开socket链接_TCP连接与断开详解(socket通信)

    http://blog.csdn.net/Ctrl_qun/article/details/52518479 一.TCP数据报结构以及三次握手 TCP(Transmission Control Pro ...

  2. 服务器 主动 推送 客户端浏览器 消息***

    前言 通常情况下,无论是web浏览器还是移动app,我们与服务器之间的交互都是主动的,客户端向服务器端发出请求,然后服务器端返回数据给客户端,客户端浏览器再将信息呈现,客户端与服务端对应的模式是: 客 ...

  3. 云服务器开启TCP Server 客户端无法连接的解决方法

    一.问题描述 华为云服务器运行TCPServer后,等待客户端连接,客户端一直无法连接到服务器.经过测试,客户端可以ping通服务器的地址. 客户端网络防火墙已经完全放开. 二.解决办法 1.查看云服 ...

  4. 财务系统无法连接到服务器,用友T3客户端无法连接到服务器用友T3 11.2标准版

    近日使用的用友T3财务软件的T3中碰到一个问题: T3客户端无法连接到服务器 详细的问题情况是这样的: T3 11.2标准版客户端WIN10操作系统  服务器WIN7操作系统 2008R2 数据库  ...

  5. 服务器主动断开连接异常

    简单的说,当TidTCPServer调用Read方法接收数据时.或调用Write方法发送数据时,客户端主动直接断开了连接,就会触发该异常:这是正常,忽略这一错误就可以了. Indy : Connect ...

  6. win10安装sshpass_Windows上SSH服务器的配置以及客户端的连接

    如何在Windows上建立ssh服务器 作者:许腾 日期:2010/9/16 1.ssh简介以及本例的应用场景 ①ssh的简介 SSH是一个用来替代TELNET.FTP以及R命令的工具包,主要是想解决 ...

  7. wsus服务器无响应,WSUS客户端无法连接到服务器

    仍然无法连接到WSUS服务器,一下为WSUSLOG日志,麻烦帮忙看一下: 2009-08-17 15:28:16+0800 1272 df0 PT: Using serverID {3DA21691- ...

  8. PHP 在 Nginx 下主动断开连接 Connection Close 与 ignore_user_abort 后台运行

    这两天弄个PHP调用 SVN 同步 update 多台服务器更新的程序,为了避免 commit 的时候不会被阻塞卡半天得想个办法只请求触发,而不需要等待程序 update 完成返回结果这样耗时太长,所 ...

  9. 航天信息管理软件无法连接服务器,航天信息客户端怎样连接服务器

    航天信息客户端怎样连接服务器 内容精选 换一换 介绍使用同一VPC内弹性云服务器ECS上的C++ hiredis连接Redis实例的方法.更多的客户端的使用方法请参考Redis客户端.本章节操作,仅适 ...

最新文章

  1. Image Super-Resolution Using Deep Convolutional Networks
  2. pip快速下载安装python 模块module
  3. php丢包率测试,linux 网络延时、丢包与传输带宽关系测试
  4. 联合查询是要多创建一个实体类么_[译] 如何用 Room 处理一对一,一对多,多对多关系?...
  5. CoderForces Round54 (A~E)
  6. HashMap的put方法返回值问题
  7. 使用加速度计进行崩溃检测
  8. maven项目建立pom xml报无法解析org apache maven plugins maven resource
  9. 权限设计,可控制每个接口的使用。
  10. 小米4进入开发者模式
  11. C博客作业02--循环结构
  12. 3399 android root,RK3399 android8.1 app获取root权限
  13. DirectSound学习(二)--流式缓冲区
  14. python实现excel单元格合并_python进行excel单元格合并逆操作
  15. 光速掌握史上最全--计算机数制转换
  16. AI每日小练习之磨砂玻璃质感图标
  17. 抛弃flex执念:利用border-collapse实现顺序展示的快速方法
  18. JavaScript深入浅出第5课:Chrome是如何成功的?
  19. Android系统辅助触控,辅助触控大师软件下载-辅助触控大师 安卓版v5.0.6-PC6安卓网...
  20. 计算机二进制基本运算规则,二进制信息最基本的逻辑运算有哪三种

热门文章

  1. 《加密与解密》ASProtect 2.1x SKE 脱壳过程中遇到的问题与解决方法及脱壳小结
  2. 使用div和css重构网站,DIV+CSS网页重构概念详解
  3. 一些优秀的学习网站(Android)
  4. 一个月刷完机器学习笔试题300题(3)
  5. java判断一个数是否是素数。(也是质数:定义为在大于1的自然数中,除了1和它本身以外不再有其他因数。)
  6. docker如何配置加速器
  7. GavinNLP星空对话机器人Transformer课程片段2
  8. Python 快速实现栅格地图
  9. C#实现textbox控件多行显示和自动换行
  10. 有道云笔记暗夜主题黑夜模式