场景描述

我们电商服务中使用了Elasticsearch嵌入式服务,然后再一次错误代码提交后,导致elasticsearch服务检索了大量数据使得内存无法释放,最后服务发生stop-the-world,宕机了

原因解析

网上查询可能是因为Elasticsearch服务的gc高占用引起的,所以就开分析日志,分析命令为:

cat xxx.log |grep "INFO elasticsearch\[estore\]\[scheduler\]\[T#1\]"

以上是抓取elasticsearch服务的gc处理日志,其返回结果为:

16:06:47 1.6-2018-02-07 16:06:47,609 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][young][319][334] duration [832ms], collections [1]/[1.2s], total [832ms]/[7.1s], memory [530.6mb]->[653.8mb]/[7.9gb], all_pools {[young] [4.1mb]->[4mb]/[123.5mb]}{[survivor] [0b]->[47.4mb]/[47.5mb]}{[old] [526.5mb]->[602.2mb]/[7.7gb]}
16:07:50 1.6-2018-02-07 16:07:50,065 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][old][363][10] duration [6.2s], collections [1]/[6.9s], total [6.2s]/[23.9s], memory [5.4gb]->[5.3gb]/[7.9gb], all_pools {[young] [2.9mb]->[3.7mb]/[99mb]}{[survivor] [39.6mb]->[0b]/[80mb]}{[old] [5.3gb]->[5.3gb]/[7.7gb]}
16:08:02 1.6-2018-02-07 16:08:02,274 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][old][368][11] duration [7.8s], collections [1]/[8s], total [7.8s]/[31.8s], memory [6.3gb]->[6.3gb]/[7.9gb], all_pools {[young] [32.1mb]->[2.2mb]/[121.5mb]}{[survivor] [55.9mb]->[0b]/[67.5mb]}{[old] [6.2gb]->[6.3gb]/[7.7gb]}
16:08:15 1.6-2018-02-07 16:08:15,852 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][old][373][12] duration [8.2s], collections [1]/[9.1s], total [8.2s]/[40s], memory [7.2gb]->[7.4gb]/[7.9gb], all_pools {[young] [2.2mb]->[2.3mb]/[118mb]}{[survivor] [54.4mb]->[0b]/[67mb]}{[old] [7.2gb]->[7.4gb]/[7.7gb]}
16:08:50 1.6-2018-02-07 16:08:50,028 INFO elasticsearch[estore][scheduler][T#1] - [estore] [gc][old][377][15] duration [8.4s], collections [1]/[8.4s], total [8.4s]/[1.2m], memory [7.8gb]->[7.7gb]/[7.9gb], all_pools {[young] [57.9mb]->[49.6mb]/[124.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [7.7gb]->[7.7gb]/[7.7gb]}

分析结果,最后一条{[old] [7.7gb]->[7.7gb]/[7.7gb]},该结果表示Elasticsearch服务最后无法将gc释放,导致了内存高占用,使得服务宕机了

以上是宕机原因,但什么使得gc高占用呢?

继续查看结果分析,发现在第一条结尾 {[old] [526.5mb]->[602.2mb]/[7.7gb]},而最后一条{[old] [7.7gb]->[7.7gb]/[7.7gb]}

说明应该是有不合适索引检索导致的,那么继续分析日志

cat xxx.log | grep "elasticsearch\[estore\]\[search\]"

抓取结果

16:03:04 1.6-2018-02-07 16:03:04,331 TRACE elasticsearch[estore][search][T#3] - [estore] [blanksimple][2] took[575.1ms], took_millis[575], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:03:04 1.6-2018-02-07 16:03:04,407 TRACE elasticsearch[estore][search][T#4] - [estore] [blanksimple][3] took[652.8ms], took_millis[652], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:03:04 1.6-2018-02-07 16:03:04,424 TRACE elasticsearch[estore][search][T#2] - [estore] [blanksimple][1] took[669.4ms], took_millis[669], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:03:04 1.6-2018-02-07 16:03:04,439 TRACE elasticsearch[estore][search][T#5] - [estore] [blanksimple][4] took[684.4ms], took_millis[684], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:03:04 1.6-2018-02-07 16:03:04,441 TRACE elasticsearch[estore][search][T#1] - [estore] [blanksimple][0] took[687.1ms], took_millis[687], types[goodsType], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{"must":[{"term":{"bill.GOODSSTATUS":1}},{"term":{"bill.GOODSTYPE":1}},{"term":{"bill.STOREID":6635387}},{"bool":{"should":[{"term":{"bill.STATUS":"0"}},{"term":{"bill.STATUS":"1"}}]}},{"bool":{"should":{"term":{"bill.STOREGOODSTYPEID":"6636222"}}}},{"range":{"bill.RELEASEFROM":{"from":null,"to":"2018-02-07T16:03:03","include_lower":true,"include_upper":false}}},{"range":{"bill.RELEASETO":{"from":"2018-02-07T16:03:03","to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":0.0,"to":null,"include_lower":false,"include_upper":true}}},{"range":{"bill.ES_MINPRICE":{"from":null,"to":1.7976931348623157E308,"include_lower":true,"include_upper":false}}}]}},"explain":false,"_source":{"includes":["id"],"excludes":[]},"sort":[{"_score":{"order":"desc"}},{"bill.MAINSCORE":{"order":"desc"}},{"bill.UPDATETIME":{"order":"desc"}},{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,327 DEBUG elasticsearch[estore][search][T#14] - [estore] [blanksimple][0] took[4.2s], took_millis[4288], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,331 DEBUG elasticsearch[estore][search][T#20] - [estore] [blanksimple][1] took[4.2s], took_millis[4292], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,332 DEBUG elasticsearch[estore][search][T#21] - [estore] [blanksimple][2] took[4.2s], took_millis[4294], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,343 DEBUG elasticsearch[estore][search][T#19] - [estore] [blanksimple][3] took[4.3s], took_millis[4304], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:06:48 1.6-2018-02-07 16:06:48,347 DEBUG elasticsearch[estore][search][T#22] - [estore] [blanksimple][4] took[4.3s], took_millis[4308], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
...

很多,但是主要注意到里面出现了 WRAN,于是修改抓取命令

cat xxx.log |grep WARN | grep "elasticsearch\[estore\]\[search\]" 

其结果

16:07:38 1.6-2018-02-07 16:07:38,981 WARN elasticsearch[estore][search][T#2] - [estore] [blanksimple][2] took[49.8s], took_millis[49814], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:07:39 1.6-2018-02-07 16:07:39,090 WARN elasticsearch[estore][search][T#24] - [estore] [blanksimple][3] took[49.9s], took_millis[49924], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:07:39 1.6-2018-02-07 16:07:39,101 WARN elasticsearch[estore][search][T#18] - [estore] [blanksimple][0] took[49.9s], took_millis[49934], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:07:39 1.6-2018-02-07 16:07:39,201 WARN elasticsearch[estore][search][T#25] - [estore] [blanksimple][4] took[50s], took_millis[50035], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],
16:07:39 1.6-2018-02-07 16:07:39,469 WARN elasticsearch[estore][search][T#23] - [estore] [xblanksimple][1] took[50.3s], took_millis[50303], types[errotype], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"from":0,"size":2147483647,"query":{"bool":{}},"explain":false,"_source":{"includes":["bill.OECODE","bill.NAME","bill.PRODUCT"],"excludes":[]},"sort":[{"_score":{"order":"desc"}}]}], extra_source[],

发现这个查询有问题,这个查询在没有条件的情况下,其查询条件size却设为了Integer最大值,该types下的数据量很大,从而导致gc高占用,这个是业务代码问题

此外的第2个因素,索引数据量达到了12个G,内嵌elasticsearch-web服务jvm才8G,这个索引数据大小和jvm的问题请大家自行查询下分片策略等,总之jvm应该超过当前服务所有用的分片的数据总量才行

总结

以上2个错误同存在,导致gc无法释放内存,导致宕机事件

参考内容:http://blog.csdn.net/quicknet/article/details/45148447

纪实:嵌入式Elasticsearch服务因为gc无法释放内存,导致宕机事件相关推荐

  1. 云宕机事件盘点:IBM云服务全球宕机四小时,安全稳定成空话?

    随着越来越多的企业及应用将它们的数据搬运至云端,即便只是云服务上的一个小小宕机事件,都可能引发一场大灾难. 6月10日,IBM云计算发生了长达四个小时的中断故障,导致多项托管于平台上的互联网服务中断, ...

  2. 频繁分配释放内存导致的性能问题的分析--brk和mmap的实现

    现象 1 压力测试过程中,发现被测对象性能不够理想,具体表现为:  进程的系统态CPU消耗20,用户态CPU消耗10,系统idle大约70  2 用ps -o majflt,minflt -C pr ...

  3. 【百度分享】频繁分配释放内存导致的性能问题的分析

    现象1 压力测试过程中,发现被测对象性能不够理想,具体表现为:  进程的系统态CPU消耗20,用户态CPU消耗10,系统idle大约70  2 用ps -o majflt,minflt -C prog ...

  4. 苹果服务两天内经历两次宕机:部分服务受影响 现已修复

    3月23日消息,据国外媒体报道,苹果旗下服务连续第二天出现宕机.美国时间周二,大量用户再次抱怨苹果的服务和应用程序再次出现问题,而就在前一天苹果服务器就宕机了几个小时. 据悉,App Stores.A ...

  5. linux内存不足宕机,记一次linux机器内存占用太多导致的服务宕机

    背景 最近我们测试环境部署的一个项目总是不停的宕机,之前也有过,但是最近特别频繁 猜测 可能是因为cup或者内存占用太大导致的服务宕机 执行 1.登录linux服务器 2.top命令 下面是对每一行信 ...

  6. java没有释放内存_java – G1年轻的GC没有释放内存 – 空间耗尽

    我正在使用G1GC,jdk 1.7 Java HotSpot(TM) 64-Bit Server VM (24.79-b02) for linux-amd64 JRE (1.7.0_79-b15), ...

  7. 一次region过多导致HBase服务宕机事件

    具体情况是,甲方有10个节点的HBase集群,主要业务表共10张,region总数达23000+,平均每台RegionServer(RS)的region数量2300左右,每台RS堆内存配置96G(初始 ...

  8. java 验证服务器宕机_java服务宕机原因查询

    背景 在java服务项目上线之后经常会出现宕机的情况 常见原因 内存溢出 1.查到服务进程号 [root@wms ~]# ps -ef|grep java root 6399 6069 0 08:57 ...

  9. java项目宕机出现原因,java服务宕机原因查询

    在JAVA服务项目上线之后经常会出现宕机的情况 常见原因 内存溢出 1.查到服务进程号 [root@wms ~]# ps -ef|grep java root 6399 6069 0 08:57 pt ...

最新文章

  1. Python导入其他文件中的.py文件 即模块
  2. python交互式绘图库_一个交互式可视化Python库——Bokeh
  3. Android 用adb 打印linux内核调试信息dmesg和kmsg命令
  4. 计算机控制系统三种信号,计算机控制技术模拟试题3
  5. 数据库菜鸟不可不看 简单SQL语句小结
  6. 不用L约束又不会梯度消失的GAN,了解一下?
  7. FastCGI中文规范
  8. 一、linux搭建jenkins+github详细步骤
  9. linux tar 大小不同,linux – 如何在使用tar时设置bzip2块大小?
  10. 一个方便使用的在线截图Web控件-WebImageMaker
  11. 论述计算机硬件结构的理解论文,论述对汇编语言教学内容和方法及特点的认识与思考...
  12. 数据算法之二叉树平衡(BinTreeNode Rotate)的Java实现
  13. 获取表数据_大数据抽取解决方案——kettle分页循环
  14. 数据结构与算法总结(八股文)
  15. android studio外接模拟器,Android Studio,使用外部模拟器作为生成app调试的模拟器
  16. WebCollector
  17. 详细讲解深层神经网络DNN
  18. vue 饿了么ui如何修改内联样式:element.style
  19. 配电室环境监测系统,智能配电室环境监控系统完整方案
  20. WPS以及Office 下 word 文档,使用通配符进行高级替换

热门文章

  1. 静态变量、自动变量与寄存器变量的存储
  2. 苹果应用内支付(iOS IAP)的流程与常用攻击方式
  3. C++异常处理之abort()、异常机制、exception 类
  4. phpinfo无法显示
  5. 2020年中国研究生数学建模竞赛D题
  6. 微信小程序给json数据再添加新的字段
  7. 一千万数据,怎么快速查询?
  8. 干货 | 想给你的学术研究拍张美照吗?
  9. 天津理工大学信息系统复习(二)
  10. scope=“prototype“是什么意思?