es 主要内存使用大户

一、Segment Memory

词典的前缀索引(Term Index)，具体概念参考相关文档。
别小看只是前缀的索引，如果索引库多，数据量大，每个节点达到10G不稀奇，node没节点heap内存才设置了32G啊。 :joy:

查看方法
_cat/segments
_cat/segments?v&h=index,shard,segment,size,size.memory
_cat/nodes?v&h=ip,port,sm,heap.*
解决方法
删除不用的索引。

关闭索引（文件仍然存在于磁盘，只是释放掉内存）。需要的时候可以重新打开。

定期对不再更新的索引做optimize (ES2.0以后更改为force merge api)。这Optimze的实质是对segment file强制做合并，可以节省大量的segment memory。

二、Filter Cache

查询缓存，查询次数多了，改缓存消耗的内存也是大数字了，可以通过query-max等参数限制。

查看方法：

_nodes/d28-84/stats/indices/query_cache

解决方法：

$ curl -XPOST ‘http://localhost:9200/clientlog-201607,clientlog-201606/_cache/clear’

/kimchy,elasticsearch/_cache/clear?request_cache=true

三、Field Data cache

indices.breaker.fielddata.limit indices.fielddata.cache.size 等参数可以限制

1.解决方法：
启用Doc Values,禁用_all

2.查看方法：

/_stats/fielddata/
# Indices Stat
curl -XGET 'http://localhost:9200/_stats/fielddata/?fields=field1,field2&pretty'# You can use wildcards for field names
curl -XGET 'http://localhost:9200/_stats/fielddata/?fields=field*&pretty'

es 2.x后，默认都是docvalue，改问题越来越好解决了，不然lucene的field data cache也是很惊人的大，偶是很少用field data cache的。

四、Bulk Queue

批量写入数据bulk的时候队列中保存的数据占用的内存。
别小看这里的数据量，如果队列1000个。每个bulk 4092行，每行10k这样占用的内存也超级惊人啊。

1、查看方法
/_cat/thread_pool?v&h=host,bulk.*
threadpool:
bulk:
type: fixed
size: 60
queue_size: 1000
2、解决方法：
设置队列小一点，forcemerge写入数据。
本工程的队列值可以控制写入数据，减少队列数据。

五、Indexing Buffer

Indexing Buffer是用来缓存新数据，当其满了或者refresh/flush interval到了，就会以segment file的形式写入到磁盘。
indices.memory.index_buffer_size
Accepts either a percentage or a byte size value. It defaults to 10%, meaning that 10% of the total heap allocated to a node will be used as the indexing buffer size.
indices.memory.min_index_buffer_size
If the index_buffer_size is specified as a percentage, then this setting can be used to specify an absolute minimum. Defaults to 48mb.
indices.memory.max_index_buffer_size
If the index_buffer_size is specified as a percentage, then this setting can be used to specify an absolute maximum. Defaults to unbounded.
indices.memory.min_shard_index_buffer_size
Sets a hard lower limit for the memory allocated per shard for its own indexing buffer. Defaults to 4mb

六、超大搜索聚合结果集的fetch

1、常用或者必用查询字段作route,数据在不同node分布均匀；
2、对超时间查询请求作分析，增加wrapper作屏蔽, 判断slow log显示的查询;
3、预处理数据，高并发常用查询使用预处理索引库（最佳方式）；
4、将实时ad-hoc请求适用spark+es合并执行方式处理（es-hadoop spark）。