filebeat + es 日志分析

官网下载filebeat

下载及介绍，略。注意，保持fielbeat和es的版本一致，否则可能会报错。

配置filebeat.yml

主要是：

日志文件路径

单条日志多行合并

kibana和es连接

可以参考官网：https://www.elastic.co/guide/en/beats/filebeat/6.3/index.html

下面是我的配置：

###################### Filebeat Configuration Example ########################## This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.#=========================== Filebeat inputs =============================filebeat.inputs:# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.- type: logenabled: truepaths:- /xxx/*/*/*.log- /xxx/xxx/*/*/*.log# exclude_files: ['/home/zile/prod-log/gateway/*/*/*/*']ignore_older: 12htail_files: false# Optional additional fields. These fields can be freely picked# to add additional information to the crawled log files for filtering### Multiline options 多行处理，匹配以日期开头的multiline.pattern: ^\[?\d{4}\-\d{2}\-\d{2}multiline.negate: truemultiline.match: after#============================== Kibana =====================================# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:# Kibana Host# Scheme and port can be left out and will be set to the default (http and 5601)# In case you specify and additional path, the scheme is required: http://localhost:5601/path# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601host: "https://xxxxx:5601"username: "xxx"password: "xxxx"#================================ Outputs =====================================# Configure what output to use when sending the data collected by the beat.#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:# Array of hosts to connect to.hosts: ["xxxx:9200"]# Optional protocol and basic auth credentials.protocol: "http"username: "xxx"password: "xxx"# 配置es数据预处理管道，这个可以稍后配置pipeline: log-pipe# ......

启动filebeat

debug启动filebeat看看，能否正常运行：

./filebeat -e -d “*"

根据输出信息，可以方便的排查问题。

配置es管道，对日志数据进行预处理

在日志发送到es之前，利用es的管道提前预处理，分离一些字段出来，方便后续的检索

官方文档参考：https://www.elastic.co/guide/en/elasticsearch/reference/6.3/ingest.html

定义管道：kibana开发者工具

grok定义要匹配的日志格式

# set pipeline for common
# <pattern>[%d{yyyy-MM-dd HH:mm:ss:SSS}] [ai-course] [%level] [%thread] [%F.%M:%L] - %msg%n</pattern>
# <pattern>[%d{yyyy-MM-dd HH:mm:ss:SSS}] [ai-course] %level - %msg%n</pattern>
PUT _ingest/pipeline/log-pipe
{"description" : "for log","processors": [{"grok": {"field": "message”,"patterns": ["""\[%{TIMESTAMP_ISO8601:log_time}\] \[%{NOTSPACE:server}\] \[%{LOGLEVEL:log_level}\] \[%{NOTSPACE:thread}\] \[%{NOTSPACE:java_class}\] - %{GREEDYDATA:content}""","""\[%{TIMESTAMP_ISO8601:log_time}\] \[%{NOTSPACE:server}\] %{LOGLEVEL:log_level} - %{GREEDYDATA:content}"""],"ignore_failure": true}},{"date": {"field": "log_time","formats": ["yyyy-MM-dd HH:mm:ss:SSS"],"timezone": "Asia/Shanghai","target_field": "@timestamp","ignore_failure": true}}]
}

稳定运行

参考之前的博客，supervisor运行elastic search：
https://blog.csdn.net/Ayhan_huang/article/details/100096183

supervisor的安装可以参考：

centos 7上的仓库没有supervirosr，需要先安装 EPEL Repository

https://cloudwafer.com/blog/how-to-install-and-configure-supervisor-on-centos-7/

supervisor使用参考：https://blog.csdn.net/Ayhan_huang/article/details/79023553

确保filebeat的运行用户对filebeat目录拥有权限

supervisor配置完成后，执行以下命令：让配置生效并运行

systemctl daemon-reload
systemctl restart supervisord

可以通过supervisorctl status查看filebeat进程的运行情况

定期清除es中的过期日志索引

默认情况下，filebeat按天创建索引，如果日志量很多的话，会占用es大量存储空间，可以考虑定时删除较早的日志

这里用python脚本配合Linux定时任务来搞定：

Linux定时任务参考：https://blog.csdn.net/Ayhan_huang/article/details/72833436

下面是我的python脚本，主要是利用es的如下两个接口：

get _cat/indices 获取索引
delete /index_name 删除索引

"""
检查ES日志索引，并自动删除3天前的索引，如果不足3个，不删除
"""
import re
import requestsHOST = "xxxx:9200"
USERNAME = "xx"
PASSWORD = "xxx"
LOG_KEEP_DAYS = 3URL = f"http://{USERNAME}:{PASSWORD}@{HOST}"# 获取索引：
# https://www.elastic.co/guide/en/elasticsearch/reference/6.3/cat-indices.html
res = requests.get(f"{URL}/_cat/indices/filebeat*?v&s=index")
indices = re.findall(r"(filebeat.*?)\s", res.text)if len(indices) <= LOG_KEEP_DAYS:exit(0)# 将索引按照日期排序
indices.sort()
print("logs:")
print(indices)"""
自然排序后结果示例：
[
'filebeat-6.3.2-1020.11.26',
'filebeat-6.3.2-2019.12.28',
'filebeat-6.3.2-2020.01.31',
'filebeat-6.3.2-2020.11.27',
'filebeat-6.3.2-2020.12.01',
'filebeat-6.3.2-2021.01.01'
]
"""# 保留最近3天的日志，更早的删除
dropped_indices = indices[0: len(indices) - LOG_KEEP_DAYS]
print("del logs：")
print(dropped_indices)# 删除索引：
# https://www.elastic.co/guide/en/elasticsearch/reference/6.3/indices-delete-index.html
dropped_indices_str = ','.join(dropped_indices)
res = requests.delete(f"{URL}/{dropped_indices_str}")
print(res.text)