Elasticsearch 日志监控方案
现在大部分公司都会选择将应用、中间件、系统等日志存储在 Elasticsearch 中,如何发现日志中的异常数据并且及时告警通知就显得十分重要。本文将会介绍两种主流的日志监控方案,分别是 Yelp 公司开源的 ElastAlert 和 Elastic 官方的商业版功能 Watcher。
如下图所示,日志数据源是一台 Nginx 服务器,在该服务器上安装 Filebeat 收集 Nginx 日志并输出到 Elasticsearch,之后会分别演示用 ElastAlert 和 Watcher 两种方案监控日志并进行告警。
部署 Nginx
安装依赖
yum install -y gcc gcc-c++ autoconf pcre pcre-devel make automake wget httpd-tools vim tree zlib-devel
下载安装包
wget http://nginx.org/download/nginx-1.14.0.tar.gz
tar -xzvf nginx-1.14.0.tar.gz
编译安装
cd nginx-1.14.0
./configure
配置 Nginx
编辑配置文件 /usr/local/nginx/conf/nginx.conf,在 Nginx 上配置一个静态网页服务。
worker_processes 1;events {worker_connections 1024;
}http {server {listen 80;location / {root html;}}
}
启动 Nginx:
sbin/nginx
访问 Nginx:
部署 Filebeat
下载并安装 Filebeat。
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.14.0-x86_64.rpm
sudo rpm -vi filebeat-7.14.0-x86_64.rpm
编辑 /etc/filebeat/filebeat.yml 配置文件,读取 Nginx 日志文件输出到 Elasticsearch 的 nginx 索引中,后缀是当前日期。
filebeat.inputs:
- type: logenabled: truepaths:- /usr/local/nginx/logs/*.log
output.elasticsearch:hosts: ["192.168.1.8:9200"]index: "nginx-%{+yyyy.MM.dd}"#username: "elastic"#password: "changeme"
setup.ilm.enabled: false
setup.template.name: "nginx"
setup.template.pattern: "nginx-*"
启动 Filebeat:
systemctl start filebeat
ElastAlert
ElastAlert 是 Yelp 公司开源的一套用 Python 写的 Elasticsearch 告警框架,可以从 Elasticsearch 当中查询出匹配规则的数据进行告警。
ElastAlert 有以下特点:
- 支持多种匹配规则(频率、阈值、数据变化、黑白名单、变化率等)。
- 支持多种告警类型(邮件、HTTP POST、自定义脚本等)。
- 支持用户自定义规则和告警类型。
- 匹配项汇总报警,重复告警抑制,告警失败重试和过期。
- 可用性强,状态信息保存到 Elasticsearch 的索引中。
- 支持调试和审计。
部署 Elastalert
安装 Python
wget https://www.python.org/ftp/python/3.6.9/Python-3.6.9.tgz
tar -zxvf Python-3.6.9.tgz
cd Python-3.6.9
./configure
make && make install
检查 Python 版本:
python3 -V
安装依赖
yum install gcc libffi-devel python3-devel openssl-devel -y
pip3 install -U pip
pip3 install "setuptools>=11.3"
安装 Elastalert
python3 install elastalert
配置 Elastalert
克隆代码到本地:
git clone https://github.com/Yelp/elastalert.git
cd elastalert
我们可以在 ElastAlert 源码文件的根目录下找到一个叫做 config.yaml.example 的文件,修改文件名为 config.yaml:
mv config.yaml.example config.yaml
创建存放规则的目录。
mkdir rules
cd rules
编辑 config.yaml 文件,修改主配置:
#规则存放的目录
rules_folder: rules#运行的频率
run_every:minutes: 1#ElastAlert 将缓存最近一段时间的结果,以防某些日志源不是实时的
buffer_time:minutes: 45#Elasticsearch 地址
es_host: 192.168.1.8#Elasticsearch 端口
es_port: 9200#Elasticsearch 用户名密码(可选)
#es_username: someusername
#es_password: somepassword#ElastAlert 元数据存储索引
writeback_index: elastalert_status#如果警报因某种原因失败,ElastAlert将重试发送警报,直到该时间段结束
alert_time_limit:days: 2
创建 rules/nginx.yaml 文件,编辑 rule:
规则内容为:在 1 分钟内如果查询 nginx-* 索引的 message 字段匹配 到 error 5 次就触发告警,往指定的 URL 发送一个 HTTP POST 请求。
# Alert when the rate of events exceeds a threshold# (Required)
# Elasticsearch host
es_host: 192.168.1.8# (Required)
# Elasticsearch port
es_port: 9200# (OptionaL) Connect with SSL to elasticsearch
#use_ssl: True# (Optional) basic-auth username and password for elasticsearch
#es_username: someusername
#es_password: somepassword# (Required)
# Rule name, must be unique
name: nginx rule# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency# (Required)
# Index to search, wildcard supported
index: nginx-*# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
num_events: 5# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:minutes: 1# (Required)
# A list of elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
filter:
- term:message: "error"# (Required)
# The alert is use when a match is found
alert:
- "post"http_post_url: "https://webhook.site/2f64f4b3-8b43-488c-b2df-695136079e36"
https://webhook.site 网站提供了测试的 Webhook 接口,每个人的 URL 都是独立的,拷贝这个 URL 复制到 http_post_url 中。
ElastAlert 会把执行记录存放到一个索引中,可以方便我们审核和调试。使用以下命令创建这个索引的,默认情况下,索引名叫 elastalert_status。
root@ydt-net-es-node1:/software #elastalert-create-index
Enter Elasticsearch host: 192.168.1.8
Enter Elasticsearch port: 9200
Use SSL? t/f: f
#如果有认证输入用户名密码
Enter optional basic-auth username (or leave blank):
Enter optional basic-auth password (or leave blank):
Enter optional Elasticsearch URL prefix (prepends a string to the URL of every request):
New index name? (Default elastalert_status)
New alias name? (Default elastalert_alerts)
Name of existing index to copy? (Default None)
Elastic Version: 7.9.3
Reading Elastic 6 index mappings:
Reading index mapping 'es_mappings/6/silence.json'
Reading index mapping 'es_mappings/6/elastalert_status.json'
Reading index mapping 'es_mappings/6/elastalert.json'
Reading index mapping 'es_mappings/6/past_elastalert.json'
Reading index mapping 'es_mappings/6/elastalert_error.json'
New index elastalert_status created
Done!
发送 2 个请求,1 个是正确请求,1 个是错误请求。
> curl http://192.168.1.134 -I
HTTP/1.1 200 OK
Server: nginx/1.14.2
Date: Mon, 16 Aug 2021 07:28:42 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Wed, 16 Jun 2021 02:46:13 GMT
Connection: keep-alive
ETag: "60c965f5-264"
Accept-Ranges: bytes> curl http://192.168.1.134/xxxxxx -I
HTTP/1.1 404 Not Found
Server: nginx/1.14.2
Date: Mon, 16 Aug 2021 07:28:43 GMT
Content-Type: text/html
Content-Length: 169
Connection: keep-alive
在 Kibana 上可以看到 Nginx 的日志,错误请求会在 access.log 和 error.log 各写一次,因此这里看到 3 条记录。
运行 elastalert-test-rule 命令检验配置文件是否正确并且可以看到规则匹配的次数,elastalert-test-rule 命令并不会真正触发告警。
> elastalert-test-rule rules/nginx.yaml
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.To send them but remain verbose, use --verbose instead.
Didn't get any results.
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.To send them but remain verbose, use --verbose instead.
1 rules loaded
INFO:apscheduler.scheduler:Adding job tentatively -- it will be properly scheduled when the scheduler starts
#匹配一次
INFO:elastalert:Queried rule nginx rule from 2021-08-16 15:28 CST to 2021-08-16 15:29 CST: 1 / 1 hits Would have written the following documents to writeback index (default is elastalert_status):elastalert_status - {'rule_name': 'nginx rule', 'endtime': datetime.datetime(2021, 8, 16, 7, 29, 30, 422431, tzinfo=tzutc()), 'starttime': datetime.datetime(2021, 8, 16, 7, 28, 29, 822431, tzinfo=tzutc()), 'matches': 0, 'hits': 1, '@timestamp': datetime.datetime(2021, 8, 16, 7, 29, 30, 527080, tzinfo=tzutc()), 'time_taken': 0.02203655242919922}
1分钟内连续发送错误请求 5 次达到触发告警的阈值:
for i in {1..3};do curl http://192.168.1.134/xxxxxx -I;done
此时可以看到发送的告警格式。
> elastalert-test-rule rules/nginx.yaml
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.To send them but remain verbose, use --verbose instead.
Didn't get any results.
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.To send them but remain verbose, use --verbose instead.
1 rules loaded
INFO:apscheduler.scheduler:Adding job tentatively -- it will be properly scheduled when the scheduler starts
INFO:elastalert:Queried rule nginx rule from 2021-08-16 15:33 CST to 2021-08-16 15:34 CST: 5 / 5 hits
INFO:elastalert:Alert for nginx rule at 2021-08-16T07:34:26.230Z:
INFO:elastalert:nginx ruleAt least 5 events occurred between 2021-08-16 15:33 CST and 2021-08-16 15:34 CST@timestamp: 2021-08-16T07:34:26.230Z
_id: 0CDiTXsBCANUjLffFM2O
_index: nginx-2021.08.16
_type: _doc
agent: {"ephemeral_id": "4ee4bd89-cb8e-43fb-9331-476c229a5480","hostname": "nginx-plus1","id": "629442a8-34ab-40db-80a8-16e4fda8dec7","name": "nginx-plus1","type": "filebeat","version": "7.14.0"
}
ecs: {"version": "1.10.0"
}
host: {"name": "nginx-plus1"
}
input: {"type": "log"
}
log: {"file": {"path": "/usr/local/nginx/logs/error.log"},"offset": 16944
}
message: 2021/08/16 15:34:22 [error] 4022#0: *40 open() "/usr/local/nginx/html/xxxxxx" failed (2: No such file or directory), client: 192.168.1.35, server: , request: "GET /xxxxxx HTTP/1.1", host: "192.168.1.134"
num_hits: 5
num_matches: 1Would have written the following documents to writeback index (default is elastalert_status):silence - {'exponent': 0, 'rule_name': 'nginx rule', '@timestamp': datetime.datetime(2021, 8, 16, 7, 34, 42, 866184, tzinfo=tzutc()), 'until': datetime.datetime(2021, 8, 16, 7, 35, 42, 866174, tzinfo=tzutc())}elastalert_status - {'rule_name': 'nginx rule', 'endtime': datetime.datetime(2021, 8, 16, 7, 34, 42, 810992, tzinfo=tzutc()), 'starttime': datetime.datetime(2021, 8, 16, 7, 33, 42, 210992, tzinfo=tzutc()), 'matches': 1, 'hits': 5, '@timestamp': datetime.datetime(2021, 8, 16, 7, 34, 42, 868045, tzinfo=tzutc()), 'time_taken': 0.015259981155395508}
使用以下命令运行 elastalert,可以看到触发了告警:
> elastalert --verbose --rule rules/nginx.yaml
1 rules loaded
INFO:elastalert:Starting up
INFO:elastalert:Disabled rules are: []
INFO:elastalert:Sleeping for 59.999839 seconds
INFO:elastalert:Queried rule nginx rule from 2021-08-16 14:54 CST to 2021-08-16 15:39 CST: 7 / 7 hits
INFO:elastalert:HTTP Post alert sent.
INFO:elastalert:Ran nginx rule from 2021-08-16 14:54 CST to 2021-08-16 15:39 CST: 7 query hits (0 already seen), 1 matches, 1 alerts sent
访问 https://webhook.site 网站可以看到 ElastAlert 发送的 HTTP POST 请求。
查询 elastalert_status 索引可以看到 ElastAlert 的执行记录。
GET elastalert_status/_search
#返回结果
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "elastalert_status","_type" : "_doc","_id" : "1SDmTXsBCANUjLff0M1Q","_score" : 1.0,"_source" : {"match_body" : {"input" : {"type" : "log"},"agent" : {"hostname" : "nginx-plus1","name" : "nginx-plus1","id" : "629442a8-34ab-40db-80a8-16e4fda8dec7","ephemeral_id" : "4ee4bd89-cb8e-43fb-9331-476c229a5480","type" : "filebeat","version" : "7.14.0"},"@timestamp" : "2021-08-16T07:34:26.230Z","ecs" : {"version" : "1.10.0"},"log" : {"file" : {"path" : "/usr/local/nginx/logs/error.log"},"offset" : 16740},"host" : {"name" : "nginx-plus1"},"message" : "2021/08/16 15:34:22 [error] 4022#0: *39 open() \"/usr/local/nginx/html/xxxxxx\" failed (2: No such file or directory), client: 192.168.1.35, server: , request: \"GET /xxxxxx HTTP/1.1\", host: \"192.168.1.134\"","_id" : "zyDiTXsBCANUjLffFM2O","_index" : "nginx-2021.08.16","_type" : "_doc","num_hits" : 7,"num_matches" : 1},"rule_name" : "nginx rule","alert_info" : {"type" : "http_post","http_post_webhook_url" : ["https://webhook.site/2f64f4b3-8b43-488c-b2df-695136079e36"]},"alert_sent" : true,"alert_time" : "2021-08-16T07:39:35.185929Z","match_time" : "2021-08-16T07:34:26.230Z","@timestamp" : "2021-08-16T07:39:37.418536Z"}}]}
}
Watcher
Watcher 是 Elastic 官方提供的一个对日志数据监控和报警的功能,Watcher 属于收费功能,我们可以在 License Management 中开启 30 天的试用。
Watcher 由以下 5 个部分组成:
- trigger:定义 watcher 触发的时间或者周期。
- input:定义数据的来源,可以是一个索引或者 HTTP 请求的结果等等。如果没有设置输入将为空。
- condition:定义执行 action 触发的条件。如果没有设置默认总是触发 action。
- transform(可选):修改 watcher 的 payload。
- actions:定义执行的动作,例如 email,webhook,index,logging,slack 等等。
创建 1 个 Watcher:
- trigger:每分钟运行一次。
- input:通配符匹配 nginx-* 的索引,查询 message 字段中的 error 关键字,每次针对在过去5分钟内发生的事件来进行查询。
- condition:如果在查询结果中,匹配到 1 次,就触发 action。
- action:向指定 URL 发送一个 HTTP POST 请求。
PUT _watcher/watch/nginx-watcher
{"trigger": {"schedule" : {"interval" : "1m"}},"input": {"search": {"request": {"indices": ["nginx-*"],"body": {"query": {"bool": {"must": {"match": {"message": "error"}},"filter": {"range": {"@timestamp": {"from": "{{ctx.trigger.scheduled_time}}||-5m","to": "{{ctx.trigger.triggered_time}}"}}}}}}}}},"condition": {"compare": {"ctx.payload.hits.total": {"gt": 0}}},"actions": {"my_webhook": {"throttle_period": "2m","webhook": {"method": "POST","url": "https://webhook.site/2f64f4b3-8b43-488c-b2df-695136079e36","body": "Number of Nginx Error: {{ctx.payload.hits.total}}"}}}
}
查看刚刚创建的 watcher:
1分钟内连续发送 5 次错误请求。
for i in {1..3};do curl http://192.168.1.134/xxxxxx -I;done
查看 watcher 状态,可以看到触发了 action。
访问 https://webhook.site 可以看到最新的 Webhook 事件已经被触发了,而且它的 Raw Content 和我们之前定义的 body 格式是一致的。
如果我们设置的 watcher 间隔时间比较久,Elasticsearch 为了方便我们测试,提供了_execute 接口,通过执行下面命令可以立即运行一下我们的 watcher。
PUT _watcher/watch/nginx-watcher/_execute
参考资料
- https://zhuanlan.zhihu.com/p/386722918
- https://elastalert.readthedocs.io/
- https://www.elastic.co/guide/en/elasticsearch/reference/7.14/xpack-alerting.html
- https://blog.csdn.net/UbuntuTouch/article/details/106298651
- https://elasticstack.blog.csdn.net/article/details/105340379
- https://elasticstack.blog.csdn.net/article/details/103820572
欢迎关注
Elasticsearch 日志监控方案相关推荐
- Nginx容器日志收集方案fluentd+elasticsearch+kilbana
容器技术在发展到今天已经是相当的成熟,但容器不同于虚拟机,我们在使用容器的同时也有很多相关的技术问题需要解决,比如:容器性能监控,数据持久化,日志监控与分析等.我们不能像平时使用虚拟机一样来管理容器, ...
- elasticsearch docker无法挂载_Docker 容器监控方案怎么选?看看这套开源方案
来自:简书,作者:__七把刀__ 链接:https://www.jianshu.com/p/abfa502e43a6 随着线上服务的全面docker化,对docker容器的监控就很重要了.SA的监控系 ...
- 【转】Filebeat+Kafka+Logstash+ElasticSearch+Kibana 日志采集方案
前言 Elastic Stack 提供 Beats 和 Logstash 套件来采集任何来源.任何格式的数据.其实Beats 和 Logstash的功能差不多,都能够与 Elasticsearch 产 ...
- ElasticSearch实战-日志监控平台
1.概述 在项目业务倍增的情况下,查询效率受到影响,这里我们经过讨论,引进了分布式搜索套件--ElasticSearch,通过分布式搜索来解决当下业务上存在的问题.下面给大家列出今天分析的目录: El ...
- ELK 搭建 TB 级海量日志监控系统,这个太强了!
欢迎关注方志朋的博客,回复"666"获面试宝典 作者:非洲羚羊 来源:cnblogs.com/dengbangpang/p/12961593.html 本文主要介绍怎么使用 ELK ...
- 打造一个TB级微服务日志监控平台
本文主要介绍怎么使用 ELK Stack 帮助我们打造一个支撑起日产 TB 级的日志监控系统.在企业级的微服务环境中,跑着成百上千个服务都算是比较小的规模了.在生产环境上,日志扮演着很重要的角色,排查 ...
- TB级微服务海量日志监控平台
本文主要介绍怎么使用 ELK Stack 帮助我们打造一个支撑起日产 TB 级的日志监控系统.在企业级的微服务环境中,跑着成百上千个服务都算是比较小的规模了.在生产环境上,日志扮演着很重要的角色,排查 ...
- 如何打造一个TB级微服务海量日志监控平台
前沿技术早知道,弯道超车有希望 积累超车资本,从关注DD开始 来源:性能与架构.图文编辑:xj 本文主要介绍怎么使用 ELK Stack 帮助我们打造一个支撑起日产 TB 级的日志监控系统.在企业级的 ...
- 如何用ELK搭建TB级的日志监控系统?
点击上方"朱小厮的博客",选择"设为星标" 后台回复"加群",加入新技术 来源:8rr.co/6UEz 本文主要介绍怎么使用 ELK Sta ...
最新文章
- RBF(Radial Basis Function Network)+径向基网络
- Android基础--tools:context=.TestActivity作用
- Strophe的示例程序运行
- vue 项目使用通过经纬度显示地图
- jeDate 日期控件
- 国家出手,终于不用再担心网上「​裸奔」​了!
- 解决——完美解决Anaconda打开Spyder5报错:link image0 hasn’t been detected!
- 一切皆是映射:浅谈操作系统内核的缺页异常(Page Fault)
- 电赛 电容触摸串口屏
- VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-S CALE IMAGE RECOGNITION-论文笔记
- 【深度学习之美】全面连接困何处,卷积网络见解深(入门系列之九)
- EF6 T4 Model.TT文件的修改-自动加上注释
- java下载压缩包文件zip
- 单片机入门学习五 STM32单片机学习二 跑马灯程序衍生出的stm32编程基础
- Three.js《踩坑日记1》
- 批量注释基因到基因座上(map gene to locus)
- SpringBoot启动报错Could not resolve placeholder ‘XXX.XXX‘ in value
- 2021年G2电站锅炉司炉考试题及G2电站锅炉司炉新版试题
- Unity3D如何快速入门?
- Harbor集成Clair镜像安全扫描原理探知