在K8S集群中部署Node-exporter、Prometheus、Grafana,同时使用Prometheus对K8S整个集群进行监控

一、K8S集群部署Node exporter

1、在Master和Node节点下载P8S相关镜像,操作指令如下:

docker pull prom/node-exporter
docker pull prom/prometheus:v2.26.0
docker pull grafana/grafana

2、基于Daemonset方式部署 node-exporter 组件,每个节点只部署一个node-exporter实例,操作指令如下:

cat>node-exporter.yaml<<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:name: node-exporternamespace: kube-systemlabels:k8s-app: node-exporter
spec:selector:matchLabels:k8s-app: node-exportertemplate:metadata:labels:k8s-app: node-exporterspec:containers:- image: prom/node-exportername: node-exporterports:- containerPort: 9100protocol: TCPname: http
---
apiVersion: v1
kind: Service
metadata:labels:k8s-app: node-exportername: node-exporternamespace: kube-system
spec:ports:- name: httpport: 9100nodePort: 31672protocol: TCPtype: NodePortselector:k8s-app: node-exporter
EOF
kubectl apply -f  node-exporter.yaml

二、K8S集群部署Prometheus

1、部署Prometheus相关服务组件,可以从网上下载,我在网盘上也准备了一份

mkdir prometheus/
cd prometheus/

链接:https://pan.baidu.com/s/1z6ovhmOYi4UMaQ0JnLtOlQ
提取码:l9mk
批量应用网盘下载的YAML脚本;

for i in alertmanager-configmap.yaml alertmanager-deployment.yaml alertmanager-pvc.yaml configmap.yaml grafana-deploy.yaml grafana-service.yaml node-exporter.yaml prometheus.deploy.yml prometheus-rules.yaml prometheus.svc.yaml rbac-setup.yaml ;do kubectl apply -f $i ;sleep 3 ;done

也可以跟着下面一步一步操作
2、部署Prometheus相关服务组件,部署Rbac认证,rbac-setup.yaml操作指令如下:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:
- apiGroups: [""]resources:- nodes- nodes/proxy- services- endpoints- podsverbs: ["get", "list", "watch"]
- apiGroups:- extensionsresources:- ingressesverbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus
subjects:
- kind: ServiceAccountname: prometheusnamespace: kube-system

3、部署Prometheus相关服务组件,部署Prometheus主程序,prometheus.deploy.yml操作指令如下:

apiVersion: apps/v1
kind: Deployment
metadata:labels:name: prometheus-deploymentname: prometheusnamespace: kube-system
spec:replicas: 1selector:matchLabels:app: prometheustemplate:metadata:labels:app: prometheusspec:containers:- image: prom/prometheus:v2.26.0name: prometheuscommand:- "/bin/prometheus"args:- "--config.file=/etc/prometheus/prometheus.yml"- "--storage.tsdb.path=/prometheus"- "--storage.tsdb.retention=24h"ports:- containerPort: 9090protocol: TCPvolumeMounts:- mountPath: "/prometheus"name: data- mountPath: "/etc/prometheus"name: config-volumeresources:requests:cpu: 100mmemory: 100Milimits:cpu: 500mmemory: 2500MiserviceAccountName: prometheus    volumes:- name: dataemptyDir: {}- name: config-volumeconfigMap:name: prometheus-config

4、部署Prometheus相关服务组件,部署Prometheus Service,prometheus.svc.yml操作指令如下:

kind: Service
apiVersion: v1
metadata:labels:app: prometheusname: prometheusnamespace: kube-system
spec:type: NodePortports:- port: 9090targetPort: 9090nodePort: 30003selector:app: prometheus

5、以Configmap的形式管理Prometheus组件的配置文件,configmap.yaml操作指令如下:

apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confignamespace: kube-system
data:prometheus.yml: |global:scrape_interval:     15sevaluation_interval: 15sscrape_configs:- job_name: 'kubernetes-apiservers'kubernetes_sd_configs:- role: endpointsscheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]action: keepregex: default;kubernetes;https- job_name: 'kubernetes-nodes'kubernetes_sd_configs:- role: nodescheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenrelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- target_label: __address__replacement: kubernetes.default.svc:443- source_labels: [__meta_kubernetes_node_name]regex: (.+)target_label: __metrics_path__replacement: /api/v1/nodes/${1}/proxy/metrics- job_name: 'kubernetes-cadvisor'kubernetes_sd_configs:- role: nodescheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenrelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- target_label: __address__replacement: kubernetes.default.svc:443- source_labels: [__meta_kubernetes_node_name]regex: (.+)target_label: __metrics_path__replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor- job_name: 'kubernetes-service-endpoints'kubernetes_sd_configs:- role: endpointsrelabel_configs:- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]action: replacetarget_label: __scheme__regex: (https?)- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]action: replacetarget_label: __address__regex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2- action: labelmapregex: __meta_kubernetes_service_label_(.+)- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_service_name]action: replacetarget_label: kubernetes_name- job_name: 'kubernetes-services'kubernetes_sd_configs:- role: servicemetrics_path: /probeparams:module: [http_2xx]relabel_configs:- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]action: keepregex: true- source_labels: [__address__]target_label: __param_target- target_label: __address__replacement: blackbox-exporter.example.com:9115- source_labels: [__param_target]target_label: instance- action: labelmapregex: __meta_kubernetes_service_label_(.+)- source_labels: [__meta_kubernetes_namespace]target_label: kubernetes_namespace- source_labels: [__meta_kubernetes_service_name]target_label: kubernetes_name- job_name: 'kubernetes-ingresses'kubernetes_sd_configs:- role: ingressrelabel_configs:- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]action: keepregex: true- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]regex: (.+);(.+);(.+)replacement: ${1}://${2}${3}target_label: __param_target- target_label: __address__replacement: blackbox-exporter.example.com:9115- source_labels: [__param_target]target_label: instance- action: labelmapregex: __meta_kubernetes_ingress_label_(.+)- source_labels: [__meta_kubernetes_namespace]target_label: kubernetes_namespace- source_labels: [__meta_kubernetes_ingress_name]target_label: kubernetes_name- job_name: 'kubernetes-pods'kubernetes_sd_configs:- role: podrelabel_configs:- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]action: keepregex: true- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]action: replacetarget_label: __metrics_path__regex: (.+)- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]action: replaceregex: ([^:]+)(?::\d+)?;(\d+)replacement: $1:$2target_label: __address__- action: labelmapregex: __meta_kubernetes_pod_label_(.+)- source_labels: [__meta_kubernetes_namespace]action: replacetarget_label: kubernetes_namespace- source_labels: [__meta_kubernetes_pod_name]action: replacetarget_label: kubernetes_pod_name

三、K8S集群部署Granafa

1、部署Grafana WEB图形展示界面 :

cat>grafana-deploy.yaml<<EOF
apiVersion: apps/v1
kind: Deployment
metadata:name: grafana-corenamespace: kube-systemlabels:app: grafanacomponent: core
spec:replicas: 1selector:matchLabels:app: grafanatemplate:metadata:labels:app: grafanacomponent: corespec:containers:- image: grafana/grafana:4.2.0name: grafana-coreimagePullPolicy: IfNotPresent# env:resources:# keep request = limit to keep this container in guaranteed classlimits:cpu: 100mmemory: 500Mirequests:cpu: 100mmemory: 500Mienv:# The following env variables set up basic auth twith the default admin user and admin password.- name: GF_AUTH_BASIC_ENABLEDvalue: "true"- name: GF_AUTH_ANONYMOUS_ENABLEDvalue: "false"# - name: GF_AUTH_ANONYMOUS_ORG_ROLE#   value: Admin# does not really work, because of template variables in exported dashboards:# - name: GF_DASHBOARDS_JSON_ENABLED#   value: "true"readinessProbe:httpGet:path: /loginport: 3000# initialDelaySeconds: 30# timeoutSeconds: 1volumeMounts:- name: grafana-persistent-storagemountPath: /varvolumes:- name: grafana-persistent-storageemptyDir: {}
EOF
kubectl apply -f grafana-deploy.yaml

2 、部署Grafana WEB Service和对外暴露Node Port端口 :

cat>grafana-service.yaml<<EOF
apiVersion: v1
kind: Service
metadata:name: grafananamespace: kube-systemlabels:app: grafanacomponent: core
spec:type: NodePortports:- port: 3000selector:app: grafanacomponent: core
EOF
kubectl apply -f grafana-service.yaml



上面 ip为群集IP

这里模板ID号为3119

四、Alertmanager报警设置

安装配置 略,见之家文章,这里带上pod告警规则

vim /usr/local/prometheus/rules.yml
#编辑rule文件
groups: - name: linux_pod.rules  #指定名称rules:- alert: Pod_all_cpu_usage     #相当于zabbix中的监控项;也是邮件的标题expr: (sum by(name)(rate(container_cpu_usage_seconds_total{image!=""}[5m]))*100) > 75  #promql查询语句查询到所有pod的CPU利用率与后面的值做对比,查询到的是浮点数,需要乘以100,转换成整数for: 5m   #每5分钟获取一次POD的CPU利用率labels:severity: critical service: podsannotations:                                                                                   #此为当前所有容器的CPU利用率description: 容器 {{ $labels.name }} CPU 资源利用率大于 75% , (current value is {{ $value }})  #报警的描述信息内容summary: Dev CPU 负载告警- alert: Pod_all_memory_usageexpr: sort_desc(avg by(name)(irate(container_memory_usage_bytes{name!=""} [5m]))*100) > 1024^3*2   #通过promql语句获取到所有pod中内存利用率;将后面的单位G转换成字节for: 10m                                                                 labels:severity: criticalannotations:description: 容器 {{ $labels.name }} Memory 资源利用率大于 2G , (当前已用内存是: {{ $value }})summary: Dev Memory 负载告警 - alert: Pod_all_network_receive_usageexpr: sum by (name)(irate(container_network_receive_bytes_total{container_name="POD"}[1m])) > 1024*1024*50for: 10m          #获取的所有pod网络利用率是字节,所以把后面对比的Mb转换成字节labels:severity: criticalannotations:description: 容器 {{ $labels.name }} network_receive 资源利用率大于 50M , (current value is {{ $value }})

钉钉告警、短信告警 见之前文章

-----------------------end

prometheus k8s 监控告警相关推荐

  1. Prometheus+Grafana监控告警配置

    文章目录 Prometheus介绍 Prometheus及其组件安装 Prometheus安装 PromQL介绍 mysqld_exporter组件安装 node_exporter组件安装 alert ...

  2. prometheus +granfana监控告警

    编号 hostname hostIP 安装包 1 prometheus 192.168.10.102 prometheus-2.23.0.linux-amd64.tar.gz (github.com) ...

  3. prometheus监控告警功能

    prometheus监控K8S 监控告警功能 alertmanager邮箱告警配置 首先开通SMTP服务,QQ邮箱:设置–帐号–开通POP3/SMTP服务,记住生成的密码(其它邮箱同理) 编辑prom ...

  4. 记一次Prometheus完整监控案例

    时间过得真快,又是一周过去了.自上周转载了一篇关于普罗米休斯的文章后,有点心血来潮,因为在去年也曾浅浅研究过,关于普罗米休斯的文章网上有很多,但是很多都参差不齐,对于初学者或者小白来说,估计很难看懂. ...

  5. k8s实战之部署Prometheus+Grafana可视化监控告警平台

    写在前面 之前部署web网站的时候,架构图中有一环节是监控部分,并且搭建一套有效的监控平台对于运维来说非常之重要,只有这样才能更有效率的保证我们的服务器和服务的稳定运行,常见的开源监控软件有好几种,如 ...

  6. 新型监控告警工具prometheus(普罗米修斯)入门使用(附视频讲解)

    作者: 李佶澳   转载请保留:原文地址   发布时间:2018/08/03 10:26:00 说明 Prometheus 命名规则 metric类型 Job和Instance 部署.启动 prome ...

  7. k8s prometheus/grafana 监控系统建设

    全栈工程师开发手册 (作者:栾鹏) 架构系列文章 prometheus架构 其中 1.pushgateway是用来接收业务推送的数据形成metrics接口. 2.exporter是用来监控组件(三方中 ...

  8. 运维实操——kubernetes(十九)k8s中部署Prometheus、监控nginx、HPA自动伸缩

    k8s中部署Prometheus.监控nginx.HPA自动伸缩 1.什么是Prometheus? 2.k8s中部署Prometheus监控 3.prometheus监控nginx 4.基于prome ...

  9. gpio引脚介绍 树莓派3b_使用微创联合M5S空气检测仪、树莓派3b+、prometheus、grafana实现空气质量持续监控告警WEB可视化...

    1.简介 使用微创联合M5S空气检测仪.树莓派3b+.prometheus.grafana实现空气质量持续监控告警WEB可视化 grafana dashboard效果: 2.背景 2.1 需求: 1. ...

最新文章

  1. 好程序员web前端技术分享媒体查询
  2. iOS开发之--打印一堆奇怪东西的解决方案
  3. tar 压缩去除目录
  4. 适合0基础的web开发系列教程-换行和水平线
  5. Python中的strip(),lstrip(),rstrip()的用法
  6. 吉林省学计算机哪所三本好,吉林省2016年三本大学排名
  7. Iptables-外网地址及端口映射到内网地址及端口
  8. 调查问卷设计的一般步骤与方法
  9. Python3之模块及包的导入 import用法
  10. python打开excel大文件慢,excel内容很少,文件却很大,怎么解决:python处理excel文件...
  11. 论文翻译:2021_MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement
  12. Bootstrap3和Bootstrap4区别
  13. 华为畅享20 pro升级鸿蒙,最全升级机型汇总,附带华为鸿蒙HarmonyOS升级步骤
  14. IOS开发之工欲善其事必先利其器:Xcode
  15. Linux网络编程8——线程池模型
  16. 23年 车辆检测+车距检测+行人检测+车辆识别+车距预测(附yolo v5最新版源码)
  17. 做人,别伤人,别骗人,别负人!
  18. 鸿蒙系统内核为什么还是安卓,鸿蒙系统和安卓的区别
  19. 计算机二级新题word,计算机二级word试题最新.pdf
  20. opencv建立数学坐标系绘制函数曲线

热门文章

  1. 计算机专硕考博 什么时候准备,【干货】如果想要申请博士,那么在研究生期间该做哪些准备?...
  2. What's Great 2012
  3. 怎么用电脑看电视和点播电视剧电影,电脑上玩手机android游戏
  4. 决策树底层思想,决策树的损失函数与极大似然函数理解
  5. 关于D-InSAR、SBAS-InSAR、PS-InSAR的奋斗史
  6. python 四足机器人运动学_【基础知识】四足机器人的站立姿态控制原理
  7. 广东金融学院计算机专业学费,广东金融学院学费、住宿费收费标准
  8. vba抽奖ppt 深蓝计协电竞大赛
  9. 勒索软件即服务(RaaS)团伙年度盘点
  10. r数据处理与echart作图总结