prometheus 使用钉钉告警

安装alertmanager.service

cd /opt
# wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz
tar -xvf alertmanager-0.20.0.linux-amd64.tar.gz
mv alertmanager-0.20.0.linux-amd64 /data/alertmanager# 修改权限 新增配置文件之后最好也执行一下这步
chown -R prometheus.prometheus /data/

配置 Alertmanager


cat  >> /data/alertmanager/alertmanager.yml <<EOF
# 全局配置项
global:resolve_timeout: 5m # 处理超时时间,默认为5min# 定义路由树信息
route:group_by: [alertname]  # 报警分组依据receiver: ops_notify   # 设置默认接收人group_wait: 30s        # 最初即第一次等待多久时间发送一组警报的通知group_interval: 60s    # 在发送新警报前的等待时间repeat_interval: 1h    # 重复发送告警时间。默认1hroutes:- receiver: ops_notify  # 基础告警通知group_wait: 10smatch_re:alertname: 实例存活告警|磁盘使用率告警   # 匹配告警规则中的名称发送- receiver: info_notify  # 消息告警通知group_wait: 10smatch_re:alertname: 内存使用率告警|CPU使用率告警# 定义基础告警接收者
receivers:
- name: ops_notifywebhook_configs:- url: http://localhost:8060/dingtalk/ops_dingding/send send_resolved: true  # 警报被解决之后是否通知# 定义消息告警接收者
- name: info_notifywebhook_configs:- url: http://localhost:8060/dingtalk/info_dingding/send send_resolved: true# 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。
inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
EOF

修改prometheus配置如下

cat >> /data/prometheus/conf/prometheus.yml <<EOF
global:
alerting:alertmanagers:- static_configs:- targets:- 192.168.0.25:9093 # 这里修改为 alertmanagers 的地址
rule_files:- "/data/prometheus/conf/rule*.yml"
scrape_configs:
# - job_name: 'prometheus'是监听prometheus服务本身- job_name: 'prometheus'static_configs:- targets: ['localhost:9090']
# job_name: 'node_exporter'是按固定IP:PORT的方式监听微服务- job_name: 'node_exporter'static_configs:- targets: ['localhost:9100']
# job_name: 'overwritten-default'就是一个监听consul的任务,在consul_sd_configs下,server是consul服务器的访问地址,services是微服务名的数组,如果什么都不填,则默认取consul上注册node_exporter- job_name: 'consul'consul_sd_configs:#- server:   'localhost:8500'- server:   '192.168.0.25:8500'services: ['test']# relabel_config表示向consul注册服务的时候, 只加载匹配regex表达式的标签的服务加载到自己的配置文件#relabel_configs:#  - source_labels: [__meta_consul_tags]#    regex: .*test.*#    action: keep- job_name: 'blackbox'metrics_path: /probeparams:module: [http_2xx]  # Look for a HTTP 200 response.file_sd_configs:- refresh_interval: 1mfiles:- "/data/prometheus/conf/blackbox*.yml"relabel_configs:- source_labels: [__address__]target_label: __param_target- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: 192.168.0.25:9115   # The blackbox exporter's real hostname:port.
EOF

创建告警规则 自己定义也可以

# 服务器存活报警
cat > /data/prometheus/conf/rule_node_down.yml << EOF
groups:
- name: 实例存活告警规则rules:- alert: 实例存活告警expr: up == 0for: 1mlabels:user: prometheusseverity: warningannotations:description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
EOF# mem报警
cat > /data/prometheus/conf/rule_memory_over.yml << EOF
groups:
- name: 内存报警规则rules:- alert: 内存使用率告警expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80for: 1mlabels:user: prometheusseverity: warningannotations:description: "服务器: 内存使用超过80%!(当前值: {{ $value }}%)"
EOF# disk报警
cat > /data/prometheus/conf/rule_disk_over.yml << EOF
groups:
- name: 磁盘报警规则rules:- alert: 磁盘使用率告警expr: (node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100 > 80for: 1mlabels:user: prometheusseverity: warningannotations:description: "服务器: 磁盘设备: 使用超过80%!(挂载点: {{ $labels.mountpoint }} 当前值: {{ $value }}%)"
EOF# cpu报警
cat > /data/prometheus/conf/rule_cpu_over.yml << EOF
groups:
- name: CPU报警规则rules:- alert: CPU使用率告警expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 90for: 1mlabels:user: prometheusseverity: warningannotations:description: "服务器: CPU使用超过90%!(当前值: {{ $value }}%)"
EOF

修改文件权限 重启prometheus

# 修改权限 新增配置文件之后最好也执行一下这步
chown -R prometheus.prometheus /data/
systemctl  restart prometheus

启动 Alertmanager

cat >>/lib/systemd/system/alertmanager.service <<EOF
[Unit]
Description=Prometheus: the alerting system
Documentation=http://prometheus.io/docs/
After=prometheus.service[Service]
ExecStart=/data/alertmanager/alertmanager --config.file=/data/alertmanager/alertmanager.yml
Restart=always
StartLimitInterval=0
RestartSec=10[Install]
WantedBy=multi-user.target
EOFsystemctl enable alertmanager.service
systemctl stop alertmanager.service
systemctl restart alertmanager.service
systemctl status alertmanager.service#查看端口
netstat -anpt | grep 9093

将钉钉接入 Prometheus AlertManager WebHook

# 命令行测试机器人发送消息,验证是否可以发送成功,有的时候prometheus-webhook-dingtalk会报422的错误,就是因为钉钉的安全限制(这里的安全策略是发送消息,必须包含prometheus才可以正常发送)
curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=${你的钉钉机器人token}

出现下面这个问题 估计是符号的问题 是不是用了中文

出现下面这个问题 需要在钉钉里 添加白名单

发送成功

二进制包方式部署插件 prometheus-webhook-dingtalk

直接用我编译好的二进制文件

# 直接用我编译好的prometheus-webhook-dingtalk二进制文件
mkdir -p /data/alertmanager/prometheus-webhook-dingtalk
cp /opt/prometheus-webhook-dingtalk   /data/alertmanager/prometheus-webhook-dingtalk/cat >> /etc/systemd/system/prometheus-webhook-dingtalk.service <<EOF
[Unit]
Description=prometheus-webhook-dingtalk
After=network-online.target[Service]
Restart=on-failure
ExecStart=/data/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk        --ding.profile=ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=${你的钉钉机器人token}[Install]
WantedBy=multi-user.target
EOFsystemctl daemon-reload
systemctl stop prometheus-webhook-dingtalk
systemctl restart prometheus-webhook-dingtalk
systemctl status prometheus-webhook-dingtalknetstat -nltup|grep 8060

关闭一台机子测试

# 定义基础告警接收者
receivers:- name: ops_notifywebhook_configs:- url: http://localhost:8060/dingtalk/ops_dingding/send send_resolved: true  # 警报被解决之后是否通知# 定义消息告警接收者- name: info_notifywebhook_configs:- url: http://localhost:8060/dingtalk/info_dingding/send send_resolved: true

恢复之后的告警

这里还有一点缺陷 就是告警的Graph点击的时候 还是显示prometheus的主机名,还需要 修改源码。替换一下域名显示。


分界线

参考以下文章 先偷了 避免失效

https://blog.51cto.com/51reboot/2449530

一、Prometheus 安装及配置

1、下载及解压安装包

cd /usr/local/src/export VER="2.13.1"
wget https://github.com/prometheus/prometheus/releases/download/v${VER}/prometheus-${VER}.linux-amd64.tar.gzmkdir -p /data0/prometheus
groupadd prometheus
useradd -g prometheus prometheus -d /data0/prometheustar -xvf prometheus-${VER}.linux-amd64.tar.gz
cd /usr/local/src/
mv prometheus-${VER}.linux-amd64 /data0/prometheus/prometheus_servercd /data0/prometheus/prometheus_server/
mkdir -p {data,config,logs,bin}
mv prometheus promtool bin/
mv prometheus.yml config/chown -R prometheus.prometheus /data0/prometheus

2 、设置环境变量

vim /etc/profilePATH=/data0/prometheus/prometheus_server/bin:$PATH:$HOME/binsource /etc/profile

3、检查配置文件

promtool check config /data0/prometheus/prometheus_server/config/prometheus.ymlChecking /data0/prometheus/prometheus_server/config/prometheus.ymlSUCCESS: 0 rule files found

4、创建 prometheus.service 的 systemd unit 文件

#命令行测试机器人发送消息,验证是否可以发送成功,有的时候prometheus-webhook-dingtalk会报422的错误,就是因为钉钉的安全限制(这里的安全策略是发送消息,必须包含prometheus才可以正常发送)
curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223

4.1、常规服务

sudo tee /etc/systemd/system/prometheus.service <<-'EOF'
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target[Service]
Type=simple
User=prometheus
ExecStart=/data0/prometheus/prometheus_server/bin/prometheus --config.file=/data0/prometheus/prometheus_server/config/prometheus.yml --storage.tsdb.path=/data0/prometheus/prometheus_server/data --storage.tsdb.retention=60d
Restart=on-failure[Install]
WantedBy=multi-user.target
EOFsystemctl enable prometheus.service
systemctl stop prometheus.service
systemctl restart prometheus.service
systemctl status prometheus.service

4.2、使用 supervisor 管理 prometheus_server

yum install -y epel-release supervisorsudo tee /etc/supervisord.d/prometheus.ini<<-"EOF"
[program:prometheus]
# 启动程序的命令;
command = /data0/prometheus/prometheus_server/bin/prometheus --config.file=/data0/prometheus/prometheus_server/config/prometheus.yml --storage.tsdb.path=/data0/prometheus/prometheus_server/data --storage.tsdb.retention=60d
# 在supervisord启动的时候也自动启动;
autostart = true
# 程序异常退出后自动重启;
autorestart = true
# 启动5秒后没有异常退出,就当作已经正常启动了;
startsecs = 5
# 启动失败自动重试次数,默认是3;
startretries = 3
# 启动程序的用户;
user = prometheus
# 把stderr重定向到stdout,默认false;
redirect_stderr = true
# 标准日志输出;
stdout_logfile=/data0/prometheus/prometheus_server/logs/out-prometheus.log
# 错误日志输出;
stderr_logfile=/data0/prometheus/prometheus_server/logs/err-prometheus.log
# 标准日志文件大小,默认50MB;
stdout_logfile_maxbytes = 20MB
# 标准日志文件备份数;
stdout_logfile_backups = 20
EOFsystemctl daemon-reload
systemctl enable supervisord
systemctl stop supervisord
systemctl restart supervisord
supervisorctl restart prometheus
supervisorctl status

5、prometheus.yml 配置文件

#创建Alertmanager告警规则文件
mkdir -p /data0/prometheus/prometheus_server/rules/
touch /data0/prometheus/prometheus_server/rules/node_down.yml
touch /data0/prometheus/prometheus_server/rules/memory_over.yml
touch /data0/prometheus/prometheus_server/rules/disk_over.yml
touch /data0/prometheus/prometheus_server/rules/cpu_over.yml#prometheus配置文件
cat > /data0/prometheus/prometheus_server/config/prometheus.yml << \EOF
# my global config
global:scrape_interval: 15s # 设置抓取(pull)时间间隔,默认是1mevaluation_interval: 15s # 设置rules评估时间间隔,默认是1m# scrape_timeout is set to the global default (10s).# 告警管理配置,默认配置
alerting:alertmanagers:- static_configs:- targets:- 192.168.56.11:9093 # 这里修改为 alertmanagers 的地址# 加载rules,并根据设置的时间间隔定期评估
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"- "/data0/prometheus/prometheus_server/rules/node_down.yml"                 # 实例存活报警规则文件- "/data0/prometheus/prometheus_server/rules/memory_over.yml"               # 内存报警规则文件- "/data0/prometheus/prometheus_server/rules/disk_over.yml"                 # 磁盘报警规则文件- "/data0/prometheus/prometheus_server/rules/cpu_over.yml"                  # cpu报警规则文件# 抓取(pull),即监控目标配置
# 默认只有主机本身的监控配置
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'# metrics_path defaults to '/metrics'# scheme defaults to 'http'.# 可覆盖全局配置设置的抓取间隔,由15秒重写成5秒。scrape_interval: 10sstatic_configs:- targets: ['localhost:9090', 'localhost:9100']- job_name: 'DMC_HOST'file_sd_configs:- files: ['./hosts.json']  # 被监控的主机,可以通过static_configs罗列所有机器,这里通过file_sd_configs参数加载文件的形式读取# 被监控的主机,可以json或yaml格式书写,我这里以json格式书写,target里面写监控机器的ip,labels非必须,可以由你自己定
EOF#file_sd_configs参数形式配置主机列表
cat > /data0/prometheus/prometheus_server/config/hosts.json << \EOF
[
{"targets": ["192.168.56.11:9100","192.168.56.12:9100","192.168.56.13:9100"
],
"labels": {"service": "db_node"}
},
{"targets": ["192.168.56.14:9100","192.168.56.15:9100","192.168.56.16:9100"
],
"labels": {"service": "web_node"}
}
]
EOF# 服务器存活报警
cat > /data0/prometheus/prometheus_server/rules/node_down.yml <<\EOF
groups:
- name: 实例存活告警规则rules:- alert: 实例存活告警expr: up == 0for: 1mlabels:user: prometheusseverity: warningannotations:description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
EOF# mem报警
cat > /data0/prometheus/prometheus_server/rules/memory_over.yml <<\EOF
groups:
- name: 内存报警规则rules:- alert: 内存使用率告警expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80for: 1mlabels:user: prometheusseverity: warningannotations:description: "服务器: 内存使用超过80%!(当前值: {{ $value }}%)"
EOF# disk报警
cat > /data0/prometheus/prometheus_server/rules/disk_over.yml <<\EOF
groups:
- name: 磁盘报警规则rules:- alert: 磁盘使用率告警expr: (node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100 > 80for: 1mlabels:user: prometheusseverity: warningannotations:description: "服务器: 磁盘设备: 使用超过80%!(挂载点: {{ $labels.mountpoint }} 当前值: {{ $value }}%)"
EOF# cpu报警
cat > /data0/prometheus/prometheus_server/rules/cpu_over.yml <<\EOF
groups:
- name: CPU报警规则rules:- alert: CPU使用率告警expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 90for: 1mlabels:user: prometheusseverity: warningannotations:description: "服务器: CPU使用超过90%!(当前值: {{ $value }}%)"
EOF

6、查看 UI

Prometheus 自带有简单的 UI

http://192.168.56.11:9090/

http://192.168.56.11:9090/targets
http://192.168.56.11:9090/graph

二、node_exporter 安装及配置

1、下载及解压安装包

cd /usr/local/src/export VER="0.18.1"
wget https://github.com/prometheus/node_exporter/releases/download/v${VER}/node_exporter-${VER}.linux-amd64.tar.gzmkdir -p /data0/prometheus
groupadd prometheus
useradd -g prometheus prometheus -d /data0/prometheustar -xvf node_exporter-${VER}.linux-amd64.tar.gz
cd /usr/local/src/
mv node_exporter-${VER}.linux-amd64 /data0/prometheus/node_exporterchown -R prometheus.prometheus /data0/prometheus

2、创建 node_exporter.service的 systemd unit 文件

  • centos 下创建服务
cat > /usr/lib/systemd/system/node_exporter.service <<EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target[Service]
Type=simple
User=prometheus
ExecStart=/data0/prometheus/node_exporter/node_exporter
Restart=on-failure[Install]
WantedBy=multi-user.target
EOF
  • ubuntu 下创建服务
cat > /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target[Service]
Type=simple
User=prometheus
ExecStart=/data0/prometheus/node_exporter/node_exporter
Restart=on-failure[Install]
WantedBy=multi-user.target
EOF

3、启动服务

systemctl daemon-reload
systemctl stop node_exporter.service
systemctl enable node_exporter.service
systemctl restart node_exporter.service

4、运行状态

systemctl status node_exporter.service

5、客户监控端数据汇报

访问:http://192.168.56.11:9100/metrics 查看从 exporter 具体能抓到的数据。

三、部署 Alertmanager 钉钉报警

1、下载及解压安装包

cd /usr/local/src/export VER="0.19.0"
wget https://github.com/prometheus/alertmanager/releases/download/v${VER}/alertmanager-${VER}.linux-amd64.tar.gzmkdir -p /data0/prometheus
groupadd prometheus
useradd -g prometheus prometheus -d /data0/prometheustar -xvf alertmanager-${VER}.linux-amd64.tar.gz
cd /usr/local/src/
mv alertmanager-${VER}.linux-amd64 /data0/prometheus/alertmanagerchown -R prometheus.prometheus /data0/prometheus

2、配置 Alertmanager

alertmanager 的 webhook 集成了钉钉报警,钉钉机器人对文件格式有严格要求,所以必须通过特定的格式转换,才能发送给你钉钉的机器人。有人已经t贴心的为大家写了转换插件,那我们也就直接拿来用吧!

( https://github.com/timonwong/prometheus-webhook-dingtalk.git )

cat >/data0/prometheus/alertmanager/alertmanager.yml<<-"EOF"
# 全局配置项
global:resolve_timeout: 5m # 处理超时时间,默认为5min# 定义路由树信息
route:group_by: [alertname]  # 报警分组依据receiver: ops_notify   # 设置默认接收人group_wait: 30s        # 最初即第一次等待多久时间发送一组警报的通知group_interval: 60s    # 在发送新警报前的等待时间repeat_interval: 1h    # 重复发送告警时间。默认1hroutes:- receiver: ops_notify  # 基础告警通知group_wait: 10smatch_re:alertname: 实例存活告警|磁盘使用率告警   # 匹配告警规则中的名称发送- receiver: info_notify  # 消息告警通知group_wait: 10smatch_re:alertname: 内存使用率告警|CPU使用率告警# 定义基础告警接收者
receivers:
- name: ops_notifywebhook_configs:- url: http://localhost:8060/dingtalk/ops_dingding/send send_resolved: true  # 警报被解决之后是否通知# 定义消息告警接收者
- name: info_notifywebhook_configs:- url: http://localhost:8060/dingtalk/info_dingding/send send_resolved: true# 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。
inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
EOF

3、启动 Alertmanager

cat >/lib/systemd/system/alertmanager.service<<\EOF
[Unit]
Description=Prometheus: the alerting system
Documentation=http://prometheus.io/docs/
After=prometheus.service[Service]
ExecStart=/data0/prometheus/alertmanager/alertmanager --config.file=/data0/prometheus/alertmanager/alertmanager.yml
Restart=always
StartLimitInterval=0
RestartSec=10[Install]
WantedBy=multi-user.target
EOFsystemctl enable alertmanager.service
systemctl stop alertmanager.service
systemctl restart alertmanager.service
systemctl status alertmanager.service#查看端口
netstat -anpt | grep 9093

4、将钉钉接入 Prometheus AlertManager WebHook

#命令行测试机器人发送消息,验证是否可以发送成功,有的时候prometheus-webhook-dingtalk会报422的错误,就是因为钉钉的安全限制(这里的安全策略是发送消息,必须包含prometheus才可以正常发送)
curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223

4.1、二进制包方式部署插件

cd /usr/local/src/
export VER="0.3.0"
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v${VER}/prometheus-webhook-dingtalk-${VER}.linux-amd64.tar.gz
tar -zxvf prometheus-webhook-dingtalk-${VER}.linux-amd64.tar.gz
mv prometheus-webhook-dingtalk-${VER}.linux-amd64 /data0/prometheus/alertmanager/prometheus-webhook-dingtalk#使用方法:prometheus-webhook-dingtalk --ding.profile=钉钉接收群组的值=webhook的值cat > /etc/systemd/system/prometheus-webhook-dingtalk.service<<\EOF
[Unit]
Description=prometheus-webhook-dingtalk
After=network-online.target[Service]
Restart=on-failure
ExecStart=/data0/prometheus/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk \--ding.profile=ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9 \--ding.profile=info_dingding=https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223          [Install]
WantedBy=multi-user.target
EOFsystemctl daemon-reload
systemctl stop prometheus-webhook-dingtalk
systemctl restart prometheus-webhook-dingtalk
systemctl status prometheus-webhook-dingtalknetstat -nltup|grep 8060

4.2、docker 方式部署插件

docker pull timonwong/prometheus-webhook-dingtalk:v0.3.0#docker run -d --restart always -p 8060:8060 timonwong/prometheus-webhook-dingtalk:v0.3.0 --ding.profile="<web-hook-name>=<dingtalk-webhook>"docker run -d --restart always -p 8060:8060 timonwong/prometheus-webhook-dingtalk:v0.3.0 --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9" --ding.profile="info_dingding=https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223"这里解释一下两个变量:<web-hook-name> :prometheus-webhook-dingtalk 支持多个钉钉 webhook,不同 webhook 就是靠名字对应到 URL 来做映射的。要支持多个钉钉 webhook,可以用多个 --ding.profile 参数的方式支持,例如:sudo docker run -d --restart always -p 8060:8060 timonwong/prometheus-webhook-dingtalk:v0.3.0 --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=token1" --ding.profile="webhook2=https://oapi.dingtalk.com/robot/send?access_token=token2"。而名字和 URL 的对应规则如下,ding.profile="webhook1=......",对应的 API URL 为:http://localhost:8060/dingtalk/webhook1/send<dingtalk-webhook>:这个就是之前获取的钉钉 webhook

4.3、源码方式部署插件

#安装golang环境
cd /usr/local/src/
wget https://dl.google.com/go/go1.13.4.linux-amd64.tar.gz
tar -zxvf go1.13.4.linux-amd64.tar.gz
mv go/ /usr/local/#vim /etc/profile
export GOROOT=/usr/local/go
export PATH=$PATH:$GOROOT/bin#添加环境变量GOPATH
mkdir -p /opt/path
export GOPATH=/opt/path#若 $GOPATH/bin 没有加入$PATH中,你需要执行将其可执行文件移动到$GOBIN下
export GOPATH=/opt/path
export PATH=$PATH:$GOROOT/bin:$GOPATH/bin
source /etc/profile#下载插件
cd /usr/local/src/
git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git
cd prometheus-webhook-dingtalk
go get github.com/timonwong/prometheus-webhook-dingtalk/cmd/prometheus-webhook-dingtalk
make   #(make成功后,会产生一个prometheus-webhook-dingtalk二进制文件)#将钉钉告警插件拷贝到alertmanager目录
cp prometheus-webhook-dingtalk /data0/prometheus/alertmanager/#启动服务
nohup /data0/prometheus/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9" --ding.profile="info_dingding=https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223" 2>&1 1>/tmp/dingding.log &#检查端口
netstat -anpt | grep 8060

四、Grafana 安装及配置

1、下载及安装

cd /usr/local/src/export VER="6.4.3"
wget https://dl.grafana.com/oss/release/grafana-${VER}-1.x86_64.rpm
yum localinstall -y grafana-${VER}-1.x86_64.rpm

2、启动服务

systemctl daemon-reload
systemctl enable grafana-server.service
systemctl stop grafana-server.service
systemctl restart grafana-server.service

3、访问 WEB 界面

默认账号/密码:admin/admin http://192.168.56.11:3000

4、Grafana 添加数据源

在登陆首页,点击"Configuration-Data Sources"按钮,跳转到添加数据源页面,配置如下:
Name: prometheus
Type: prometheus
URL: http://192.168.56.11:9090
Access: Server
取消Default的勾选,其余默认,点击"Add",如下:需要安装饼图的插件
grafana-cli plugins install grafana-piechart-panel
systemctl restart grafana-server.service请确保安装后能正常添加饼图。安装consul数据源插件
grafana-cli plugins install sbueringer-consul-datasource
systemctl restart grafana-server.service

五、替换 grafana 的 dashboards

https://grafana.com/grafana/dashboards/11074 基础监控-new

https://grafana.com/dashboards/8919 基础监控

https://grafana.com/dashboards/7362 数据库监控

参考文档:

https://www.jianshu.com/p/e59cfd15612e Centos 7 部署 Prometheus、Alertmanager、Grafana 监控 Linux 主机

https://juejin.im/entry/5c2c4a7f6fb9a049b82a90ee 使用 Prometheus 监控 Ceph

https://blog.csdn.net/xiegh2014/article/details/84936174 CentOS7.5 Prometheus2.5+Grafana5.4监控部署

https://www.cnblogs.com/smallSevens/p/7805842.html Grafana+Prometheus打造全方位立体监控系统

https://www.cnblogs.com/sfnz/p/6566951.html安装prometheus+grafana 监控 mysql redis kubernetes 等

https://blog.csdn.net/hzs33/article/details/86553259 prometheus+grafana 监控 mysql、canal服务器

[prometheus]Step6-prometheus使用钉钉告警相关推荐

  1. 安装kube-prometheus项目:k8s部署prometheus、监控k8s核心组件、添加告警(微信、钉钉、企业微信)、进行数据持久化

    概述 很多地方提到Prometheus Operator是kubernetes集群监控的终极解决方案,但是目前Prometheus Operator已经不包含完整功能,完整的解决方案已经变为kube- ...

  2. prometheus 发送恢复 值_基于prometheus+grafana+alertmanager监控系统配置钉钉告警

    概述 因为目前工作基本都是用钉钉办公,所以今天主要介绍一下怎么在prometheus配置钉钉告警,这里的前提是已经部署了alertmanager. 一.配置go 由于Prometheus 是用gola ...

  3. 通过Alertmanager实现Prometheus的告警告警配置(邮箱加钉钉)

    通过Alertmanager实现Prometheus的告警 告警配置 Prometheus本身不支持的告警功能,主要通过插件Alertmanager来实现告警.Alertmanager用于接收Prom ...

  4. Prometheus 配置钉钉告警

    背景 之前很少用钉钉,因为手机多装了一个软件,感觉占用系统资源.但是感觉确实有一些人使用钉钉告警,所以本篇来讲解如何通过钉钉来实现Prometheus的告警. 首先先注册钉钉,然后创建一个群.群的设置 ...

  5. Prometheus 通过钉钉告警

    一:创建钉钉告警机器人 一:创建钉钉告警机器人 1.在PC版钉钉上打开您想要添加报警机器人的钉钉群,并单击右上角的群设置图标. 2.在群设置面板中单击智能群助手. 3.在智能群助手面板单击添加机器人. ...

  6. Prometheus监控(三)—— 钉钉和企业微信告警

    一.prometheus 实现钉钉和企业微信告警 基础流程 1.1 钉钉通知 altermanager基础设置可以参照: https://editor.csdn.net/md/?articleId=1 ...

  7. prometheus 钉钉告警

    环境承上两篇: prometheus 邮件告警 https://blog.csdn.net/oToyix/article/details/120160633 prometheus process-ex ...

  8. Prometheus+Alertmanager+webhook-dingtalk实现钉钉告警

    文章目录 一.前提准备及规划 二.安装及启动 2.1 Prometheus安装启动 2.2 Node_export安装启动 2.3 Alertmanager安装启动 2.4 Webhook-dingt ...

  9. prometheus grafana graylog 钉钉告警 短信告警 电话告警系统 PrometheusAlert

    PrometheusAlert 简介 PrometheusAlert是开源的运维告警中心消息转发系统,支持主流的监控系统Prometheus,日志系统Graylog和数据可视化系统Grafana发出的 ...

最新文章

  1. Lucene:基于Java的全文检索引擎简介(转载)
  2. mysql可视化导入csv文件_我们如何将数据从.CSV文件导入MySQL表?
  3. 风之语.甲骨文裁员之我见
  4. es6 Promise
  5. 数据分析 python 用途-python数据统计分析
  6. 安装node,vue编译环境
  7. 【线上分享】华为云RTC服务架构及应用实践
  8. 操作系统--多进程管理CPU
  9. javascript一些面试常用的问题总结
  10. linux wc read,Linux 下使用 wc 统计文件夹下所有文件的代码行数(包括子目录)-Go语言中文社区...
  11. 解决MyEclipse里Tomcat端口被占用而无法启动的情况
  12. Linux中如何判断一个另外进程是否活着
  13. 2021-09-07NVIDIA Jetson Xavier NX载板 RTSO-6002使用TF(MicroSD)卡说明
  14. 云播 Android,云播放(Air Playit)android版
  15. android_adb pm和adb am +启动/杀死app进程
  16. loss 加权_为每个类别/实例编写自定义损失加权,对,的,loss
  17. 苹果电脑破音的解决办法
  18. 数字抽奖小程序_发挥想象力,用 PowerBI 做抽奖小程序
  19. FT2004/D2000 概念说明
  20. 跟着尚硅谷学大数据(二)MapReduce

热门文章

  1. python制作生日礼物_TurnipBit:和孩子一起动手DIY“滚动”的生日礼物
  2. IT-linux-top系列--top静态使用
  3. MicroNet实战:使用MicroNet实现图像分类(二)
  4. 剪映电脑版使用教程(超详细)
  5. 【Contra】 矩阵乘法优化 dp
  6. eslint报错Parsing error: Unexpected token prettier/prettier
  7. 【android】SSL peer shut down incorrectly
  8. 创维4k电视测试软件,创维4色4K真牛 国产硬件最强电视评测!
  9. python3中使用requests库出现的编码问题
  10. 计算机主机接电视机,电脑连接电视,详细教您电脑连接电视当显示器的方法