kube-prometheus-stack 部署
1. helm kube-prometheus-stack chart 下载
通过 helm 的方式,对 kube-prometheus-stack chart 服务的进行部署:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo kube-prometheus-stack
helm pull prometheus/kube-prometheus-stack
tar xf kube-prometheus-stack-41.4.1.tgz && cd kube-prometheus-stack
2. 修改 values.yaml 文件
在部署 Prometheus 之前,已进行以下准备:
- 创建了一个名为 nfs-client 的 storageclass
- 在 ingress-nginx 的名称空间,部署 ingress
## 编辑 values.yaml,对以下配置进行调整
alertmanager:ingress:enabled: trueingressClassName: nginxhosts:- alertmanager.localpaths:- /alertmanagerSpec:retention: 720hstorage: volumeClaimTemplate:spec:storageClassName: nfs-clientaccessModes: ["ReadWriteOnce"]resources:requests:storage: 100Gi
---
grafana:adminPassword: 1qaz2wsx ingress:enabled: trueingressClassName: nginxhosts: - grafana.local
---
prometheus:ingress: enabled: trueingressClassName: nginxhosts:- prometheus.localpaths:- /prometheusSpes:retention: 360dstorageSpec:volumeClaimTemplate:spec:storageClassName: nfs-clientaccessModes: ["ReadWriteOnce"]resources:requests:storage: 300Gi
## 修改镜像的地址
prometheusOperator:admissionWebhookspatch:image:repository: registry.aliyuncs.com/google_containers/kube-webhook-certgen
---
## charts/grafana/values.yaml
persistence:enabled: truestorageClassName: nfs-clientsize: 100Gi
## chart/kube-state-metrics/values.yaml
## 修改镜像的地址
image:repository: bitnami/kube-state-metricstag: 2.6.0
3. 部署
## 将服务部署到 monitoring 名称空间
kubectl create ns monitoring
helm install promethues . -n monitoring
## 检查是否正常
kubectl get all -n monitoring
NAME READY STATUS RESTARTS AGE
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 1 (2m37s ago) 102s
pod/prometheus-grafana-7c466d88c5-tq9zh 3/3 Running 0 17m
pod/prometheus-kube-prometheus-operator-67b84b5d9b-z7cws 1/1 Running 0 17m
pod/prometheus-kube-state-metrics-77d5757f57-chrnx 1/1 Running 0 17m
pod/prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 17m
pod/prometheus-prometheus-node-exporter-gj6rr 1/1 Running 0 17m
pod/prometheus-prometheus-node-exporter-rkl6q 1/1 Running 0 17mNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 17m
service/prometheus-grafana ClusterIP 172.24.140.186 <none> 80/TCP 17m
service/prometheus-kube-prometheus-alertmanager ClusterIP 172.24.60.136 <none> 9093/TCP 17m
service/prometheus-kube-prometheus-operator ClusterIP 172.24.106.230 <none> 443/TCP 17m
service/prometheus-kube-prometheus-prometheus ClusterIP 172.24.114.84 <none> 9090/TCP 17m
service/prometheus-kube-state-metrics ClusterIP 172.24.250.206 <none> 8080/TCP 17m
service/prometheus-operated ClusterIP None <none> 9090/TCP 17m
service/prometheus-prometheus-node-exporter ClusterIP 172.24.74.178 <none> 9100/TCP 17mNAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/prometheus-prometheus-node-exporter 2 2 2 2 2 <none> 17mNAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-grafana 1/1 1 1 17m
deployment.apps/prometheus-kube-prometheus-operator 1/1 1 1 17m
deployment.apps/prometheus-kube-state-metrics 1/1 1 1 17mNAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-grafana-7c466d88c5 1 1 1 17m
replicaset.apps/prometheus-kube-prometheus-operator-67b84b5d9b 1 1 1 17m
replicaset.apps/prometheus-kube-state-metrics-77d5757f57 1 1 1 17mNAME READY AGE
statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager 1/1 17m
statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus 1/1 17m
报错处理:
报错1:**‘The CustomResourceDefinition “prometheuses.monitoring.coreos.com” is invalid: metadata.annotations: Too long: must have at most 262144 bytes’
处理**:cd ./kube-prometheus-stack/crds/ && kubectl create -f crd-prometheuses.yaml
报错2:‘failed calling webhook “prometheusrulemutate.monitoring.coreos.com”’
处理:应该是之前有装过不同版本的prometheus,在卸载后,相关 webhook 资源未完全删除。通过 kubectl get mutatingwebhookconfigurations 、 kubectl get validatingwebhookconfigurations 命令,查找报错的资源对象,删除即可:kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io prometheus-kube-prometheus-admission,最后再更新部署。
4. 配置调整
访问 prometheus.local 时,点击Status-> Targets 页面,会发现 Prometheus 并不能正常获取一些组件的 metrices。对于 Kubernetes 的组件,大多情况可以通过 HTTP/HTTPS 访问组件的 /metrics 端点来获取组件的metrics,对于一些默认情况下不暴露端点的组件,可以使用 --bind-address 标志进行启用。
把 prometheus.local/alertmanager.local/grafana.local 本地解析更新到 hosts 文件中
4.1 kube-controller-manager
- 修改配置
kube-controller-manager 组件暴露 metrics 的端口是 10257,当访问时测试时,会报 “curl: (7) Failed connect to 10.49.18.103:10257; Connection refused ”的错误。结合kube-controller-manager 官网说明,调整组件 bind-address 参数的配置:
vim /etc/kubernetes/manifests/kube-controller-manager.yamlcontainers:- command:- kube-controller-manager- --allocate-node-cidrs=true- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf### 修改#- --bind-address=127.0.0.1- --bind-address=0.0.0.0...省略...
参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/#options
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :: ), all interfaces will be used.
- 验证
lsof -i:10257
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kube-cont 13515 root 7u IPv6 38954551 0t0 TCP *:10257 (LISTEN)
kube-cont 13515 root 34u IPv6 38968508 0t0 TCP master01.pl.hpc:10257->node1:61771 (ESTABLISHED)### 10257 是安全端口,接收的是 https 请求
curl https://10.49.18.103:10257/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0...省略...
4.2 kube-proxy 组件
- 修改配置
kubectl edit cm -n kube-system kube-proxy
...省略...kind: KubeProxyConfiguration### 修改# metricsBindAddress: ""metricsBindAddress: 0.0.0.0:10249mode: ipvs
...省略...
备注:kube-proxy 配置是通过 configmap 的方式挂载到容器中,所以不要直接在 kube-proxy daemonset 中添加 metricBindAddress 参数,这种方式添加不会生效。
- 验证
lsof -i:10249
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kube-prox 36749 root 13u IPv6 39030529 0t0 TCP master01.pl.hpc:10249->node1:38685 (ESTABLISHED)
kube-prox 36749 root 14u IPv6 39100619 0t0 TCP *:10249 (LISTEN)curl 10.49.18.103:10249/metrics
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error inaudit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime.
# TYPE go_gc_cycles_automatic_gc_cycles_total counter
go_gc_cycles_automatic_gc_cycles_total 13
# HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application.
# TYPE go_gc_cycles_forced_gc_cycles_total counter
...省略...
参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/
–metrics-bind-address ipport Default: 127.0.0.1:10249
The IP address with port for the metrics server to serve on (set to ‘0.0.0.0:10249’ for all IPv4 interfaces and ‘[::]:10249’ for all IPv6 interfaces). Set empty to disable. This parameter is ignored if a config file is specified by --config.
4.3 kube-scheduler 组件
- 修改配置
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
...省略...
spec:containers:- command:- kube-controller-manager- --allocate-node-cidrs=true- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf### 修改 #- --bind-address=127.0.0.1- --bind-address=0.0.0.0
...省略...
- 验证
lsof -i:10259
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kube-sche 24404 root 7u IPv6 38957456 0t0 TCP *:10259 (LISTEN)
kube-sche 24404 root 10u IPv6 39009914 0t0 TCP master01.pl.hpc:10259->node1:29900 (ESTABLISHED)
### 10259 是安全端口,接收的是 https 请求
curl https://10.49.18.103:10259/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key --insecure
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="3600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="7200"} 0
...省略...
参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :kube-prometheus-stack 部署相关推荐
- k8s部署Kube Prometheus(Prometheus Operator)
摘要 本文通过Prometheus-operator框架一键化安装prometheus.alertmanage.granfana,并配置企业微信api以及告警推送,搭建 prometheus 的前提环 ...
- Prometheus — 安装部署(主机安装)
目录 文章目录 目录 环境信息 部署 Prometheus Server 部署 Node Exporter 部署 AlertManager 部署 Grafana 添加 Node Exporter 界面 ...
- 监控工具—Prometheus—安装部署
原文作者:虎纠卫 原文地址:监控神器-普罗米修斯Prometheus的安装 目录 步骤1:安装go 语言环境 步骤2:在监控服务器上安装prometheus 步骤3:在系统层用作系统监控 步骤4:安装 ...
- 使用 Docker Stack 部署多服务集群
使用 Docker Stack 部署多服务集群 前言 单机模式下,我们可以使用 Docker Compose 来编排多个服务,而在 上一篇文章 中介绍的 Docker Swarm 只能实现对单个服务的 ...
- Prometheus -Grafana部署及部署告警
目录 一.prometheus 概述 1.简介 2. 指标类型 3. 作业 job 和实列 targets/instance 4. PrometheusQL(数据查询语言也是时序数据库使用语言) 二. ...
- Docker 三剑客-------docker swam,visualizer监控、stack部署集群、Portainer可视化
Docker 三剑客-------docker swam.visualizer监控.stack部署集群.Portainer可视化 文章目录 Docker 三剑客-------docker swam.v ...
- Docker-三剑客之machine、compose、swam集群、visualizer监控、stack部署集群、Portainer可视化
目录: 介绍 一.docker-machine 1.machine安装 2.使用docker-machine 二.Docker-compose 1.docker-compose配置 2.修改hapro ...
- docker--swarm集群管理(结合harbor仓库、docker stack部署、Portainer可视化)
文章目录 一.swarm结合harbor私有仓库 1.启动配置好的harbor仓库 2.在各个节点上配置私有仓库及证书 二.docker stack部署 1.docker stack与docker-c ...
- forever不重启 node_运维监控Prometheus,部署安全的node_exporter监控主机
简介 prometheus监控系统的时候,是使用pull的方式来获取监控数据,需要被监控端监听对应的端口,prometheus从这些端口服务中拉取对应的数据. node_exporter安全性讨论 n ...
- 监控 prometheus及其部署及server discovery,alertmanager,grafana(更新结束)
prometheus 一.常用监控简介 1.cacti 2.Nagios 3.Zabbix zabbix核心组件介绍 4.Prometheus 二.运维监控平台设计思路 三.prometheus监控体 ...
最新文章
- 配置SSH是出现: sign_and_send_pubkey: signing failed: agent refused operation Permission denied
- docker pull下载镜像时的报错及其解决方法
- tcp/ip通信中udp头部结构udphdrp-check校验计算
- 【通知】《深度学习之摄影图像处理》配套代码开源!
- tensorflow随笔-检测浮点数类型check_numerics
- npoi 所有列调整为一页_别再浪费纸了,一张纸就能打印Word、Excel、PPT所有内容,真厉害...
- 富士康立讯精密可能仍在苹果汽车代工商候选名单中
- google官方上拉刷新
- 爪哇国新游记之二十六----迷宫寻路
- 帆软Tab控件与控制组件隐藏的异同点
- 怎么下载linux历史文件,快速学习Linux-Linux历史
- 使用用AI制作logo图标教程
- pytorch保存模型pth_pytorch模型文件pth详解
- python文件加密
- python 接入百度地图数据包下载_Python爬虫-利用百度地图API接口爬取数据并保存至MySQL数据库...
- 2021年,各类显卡的计算能力对比,天梯图
- Windows起一个Docker镜像——起起起起起~不~来~
- 技术前沿---5G技术的实现原理
- CGB2005 JT-1
- Microsoft Defender SmartScreen 阻止了无法识别的应用启动
热门文章
- (Python) PAT(Basic Level) Practice 刷题笔记(34-66)
- IPFS 服务的Python API参考
- 维权有期?韭菜与大佬派出所同框
- 工具进化史:到底是谁发明了工具?
- 安卓模拟器按键_横跨了几代人的经典!PSP模拟器深度教程:模拟器系列008
- 【FLink】access closed classloader classloader.check-leaked-classloader
- 设备管理系统(SSM)
- 基础(网络知识 二)——OSI七层与TCP/IP四/五层网络架构
- Python副业兼职,月赚7800元,一天只要两小时 !
- Crypto思潮编年史 ,从1997 到 2022