1. helm kube-prometheus-stack chart 下载

通过 helm 的方式，对 kube-prometheus-stack chart 服务的进行部署：

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo  kube-prometheus-stack
helm pull prometheus/kube-prometheus-stack
tar xf kube-prometheus-stack-41.4.1.tgz && cd kube-prometheus-stack

2. 修改 values.yaml 文件

在部署 Prometheus 之前，已进行以下准备：

创建了一个名为 nfs-client 的 storageclass

在 ingress-nginx 的名称空间，部署 ingress

## 编辑 values.yaml，对以下配置进行调整
alertmanager:ingress:enabled: trueingressClassName: nginxhosts:- alertmanager.localpaths:- /alertmanagerSpec:retention: 720hstorage: volumeClaimTemplate:spec:storageClassName: nfs-clientaccessModes: ["ReadWriteOnce"]resources:requests:storage: 100Gi
---
grafana:adminPassword: 1qaz2wsx ingress:enabled: trueingressClassName: nginxhosts: - grafana.local
---
prometheus:ingress: enabled: trueingressClassName: nginxhosts:- prometheus.localpaths:- /prometheusSpes:retention: 360dstorageSpec:volumeClaimTemplate:spec:storageClassName: nfs-clientaccessModes: ["ReadWriteOnce"]resources:requests:storage: 300Gi
## 修改镜像的地址
prometheusOperator:admissionWebhookspatch:image:repository: registry.aliyuncs.com/google_containers/kube-webhook-certgen
---
## charts/grafana/values.yaml
persistence:enabled: truestorageClassName: nfs-clientsize: 100Gi
## chart/kube-state-metrics/values.yaml
## 修改镜像的地址
image:repository: bitnami/kube-state-metricstag: 2.6.0

3. 部署

## 将服务部署到 monitoring 名称空间
kubectl create ns monitoring
helm install promethues . -n monitoring
## 检查是否正常
kubectl get all -n monitoring
NAME                                                         READY   STATUS    RESTARTS        AGE
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   1 (2m37s ago)   102s
pod/prometheus-grafana-7c466d88c5-tq9zh                      3/3     Running   0               17m
pod/prometheus-kube-prometheus-operator-67b84b5d9b-z7cws     1/1     Running   0               17m
pod/prometheus-kube-state-metrics-77d5757f57-chrnx           1/1     Running   0               17m
pod/prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0               17m
pod/prometheus-prometheus-node-exporter-gj6rr                1/1     Running   0               17m
pod/prometheus-prometheus-node-exporter-rkl6q                1/1     Running   0               17mNAME                                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                     ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   17m
service/prometheus-grafana                        ClusterIP   172.24.140.186   <none>        80/TCP                       17m
service/prometheus-kube-prometheus-alertmanager   ClusterIP   172.24.60.136    <none>        9093/TCP                     17m
service/prometheus-kube-prometheus-operator       ClusterIP   172.24.106.230   <none>        443/TCP                      17m
service/prometheus-kube-prometheus-prometheus     ClusterIP   172.24.114.84    <none>        9090/TCP                     17m
service/prometheus-kube-state-metrics             ClusterIP   172.24.250.206   <none>        8080/TCP                     17m
service/prometheus-operated                       ClusterIP   None             <none>        9090/TCP                     17m
service/prometheus-prometheus-node-exporter       ClusterIP   172.24.74.178    <none>        9100/TCP                     17mNAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-prometheus-node-exporter   2         2         2       2            2           <none>          17mNAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-grafana                    1/1     1            1           17m
deployment.apps/prometheus-kube-prometheus-operator   1/1     1            1           17m
deployment.apps/prometheus-kube-state-metrics         1/1     1            1           17mNAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-grafana-7c466d88c5                    1         1         1       17m
replicaset.apps/prometheus-kube-prometheus-operator-67b84b5d9b   1         1         1       17m
replicaset.apps/prometheus-kube-state-metrics-77d5757f57         1         1         1       17mNAME                                                                    READY   AGE
statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager   1/1     17m
statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus       1/1     17m

报错处理：

报错1：**‘The CustomResourceDefinition “prometheuses.monitoring.coreos.com” is invalid: metadata.annotations: Too long: must have at most 262144 bytes’
处理**：cd ./kube-prometheus-stack/crds/ && kubectl create -f crd-prometheuses.yaml
报错2：‘failed calling webhook “prometheusrulemutate.monitoring.coreos.com”’
处理：应该是之前有装过不同版本的prometheus，在卸载后，相关 webhook 资源未完全删除。通过 kubectl get mutatingwebhookconfigurations 、 kubectl get validatingwebhookconfigurations 命令，查找报错的资源对象，删除即可：kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io prometheus-kube-prometheus-admission，最后再更新部署。

4. 配置调整

访问 prometheus.local 时，点击Status-> Targets 页面，会发现 Prometheus 并不能正常获取一些组件的 metrices。对于 Kubernetes 的组件，大多情况可以通过 HTTP/HTTPS 访问组件的 /metrics 端点来获取组件的metrics，对于一些默认情况下不暴露端点的组件，可以使用 --bind-address 标志进行启用。

把 prometheus.local/alertmanager.local/grafana.local 本地解析更新到 hosts 文件中

4.1 kube-controller-manager

修改配置
kube-controller-manager 组件暴露 metrics 的端口是 10257，当访问时测试时，会报 “curl: (7) Failed connect to 10.49.18.103:10257; Connection refused ”的错误。结合kube-controller-manager 官网说明，调整组件 bind-address 参数的配置：

vim /etc/kubernetes/manifests/kube-controller-manager.yamlcontainers:- command:- kube-controller-manager- --allocate-node-cidrs=true- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf### 修改#- --bind-address=127.0.0.1- --bind-address=0.0.0.0...省略...

参考：https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/#options
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :: ), all interfaces will be used.

验证

lsof -i:10257
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-cont 13515 root    7u  IPv6 38954551      0t0  TCP *:10257 (LISTEN)
kube-cont 13515 root   34u  IPv6 38968508      0t0  TCP master01.pl.hpc:10257->node1:61771 (ESTABLISHED)### 10257 是安全端口，接收的是 https 请求
curl https://10.49.18.103:10257/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0...省略...

4.2 kube-proxy 组件

修改配置

kubectl edit cm -n kube-system kube-proxy
...省略...kind: KubeProxyConfiguration### 修改# metricsBindAddress: ""metricsBindAddress: 0.0.0.0:10249mode: ipvs
...省略...

备注：kube-proxy 配置是通过 configmap 的方式挂载到容器中，所以不要直接在 kube-proxy daemonset 中添加 metricBindAddress 参数，这种方式添加不会生效。

验证

lsof -i:10249
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-prox 36749 root   13u  IPv6 39030529      0t0  TCP master01.pl.hpc:10249->node1:38685 (ESTABLISHED)
kube-prox 36749 root   14u  IPv6 39100619      0t0  TCP *:10249 (LISTEN)curl 10.49.18.103:10249/metrics
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error inaudit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime.
# TYPE go_gc_cycles_automatic_gc_cycles_total counter
go_gc_cycles_automatic_gc_cycles_total 13
# HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application.
# TYPE go_gc_cycles_forced_gc_cycles_total counter
...省略...

参考：https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/
–metrics-bind-address ipport Default: 127.0.0.1:10249
The IP address with port for the metrics server to serve on (set to ‘0.0.0.0:10249’ for all IPv4 interfaces and ‘[::]:10249’ for all IPv6 interfaces). Set empty to disable. This parameter is ignored if a config file is specified by --config.

4.3 kube-scheduler 组件

修改配置

vim /etc/kubernetes/manifests/kube-controller-manager.yaml
...省略...
spec:containers:- command:- kube-controller-manager- --allocate-node-cidrs=true- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf### 修改 #- --bind-address=127.0.0.1- --bind-address=0.0.0.0
...省略...

验证

lsof -i:10259
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-sche 24404 root    7u  IPv6 38957456      0t0  TCP *:10259 (LISTEN)
kube-sche 24404 root   10u  IPv6 39009914      0t0  TCP master01.pl.hpc:10259->node1:29900 (ESTABLISHED)
### 10259 是安全端口，接收的是 https 请求
curl https://10.49.18.103:10259/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key   --insecure
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="3600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="7200"} 0
...省略...

参考：https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :

kube-prometheus-stack 部署相关推荐

k8s部署Kube Prometheus（Prometheus Operator）
摘要本文通过Prometheus-operator框架一键化安装prometheus.alertmanage.granfana,并配置企业微信api以及告警推送,搭建 prometheus 的前提环 ...

Prometheus — 安装部署（主机安装）
目录文章目录目录环境信息部署 Prometheus Server 部署 Node Exporter 部署 AlertManager 部署 Grafana 添加 Node Exporter 界面 ...

监控工具—Prometheus—安装部署
原文作者:虎纠卫原文地址:监控神器-普罗米修斯Prometheus的安装目录步骤1:安装go 语言环境步骤2:在监控服务器上安装prometheus 步骤3:在系统层用作系统监控步骤4:安装 ...

使用 Docker Stack 部署多服务集群
使用 Docker Stack 部署多服务集群前言单机模式下,我们可以使用 Docker Compose 来编排多个服务,而在上一篇文章中介绍的 Docker Swarm 只能实现对单个服务的 ...

Prometheus -Grafana部署及部署告警
目录一.prometheus 概述 1.简介 2. 指标类型 3. 作业 job 和实列 targets/instance 4. PrometheusQL(数据查询语言也是时序数据库使用语言) 二. ...

Docker 三剑客-------docker swam，visualizer监控、stack部署集群、Portainer可视化
Docker 三剑客-------docker swam.visualizer监控.stack部署集群.Portainer可视化文章目录 Docker 三剑客-------docker swam.v ...

Docker-三剑客之machine、compose、swam集群、visualizer监控、stack部署集群、Portainer可视化
目录: 介绍一.docker-machine 1.machine安装 2.使用docker-machine 二.Docker-compose 1.docker-compose配置 2.修改hapro ...

docker--swarm集群管理(结合harbor仓库、docker stack部署、Portainer可视化）
文章目录一.swarm结合harbor私有仓库 1.启动配置好的harbor仓库 2.在各个节点上配置私有仓库及证书二.docker stack部署 1.docker stack与docker-c ...

forever不重启 node_运维监控Prometheus，部署安全的node_exporter监控主机
简介 prometheus监控系统的时候,是使用pull的方式来获取监控数据,需要被监控端监听对应的端口,prometheus从这些端口服务中拉取对应的数据. node_exporter安全性讨论 n ...

监控 prometheus及其部署及server discovery，alertmanager，grafana（更新结束）
prometheus 一.常用监控简介 1.cacti 2.Nagios 3.Zabbix zabbix核心组件介绍 4.Prometheus 二.运维监控平台设计思路三.prometheus监控体 ...

最新文章

配置SSH是出现: sign_and_send_pubkey: signing failed: agent refused operation Permission denied

docker pull下载镜像时的报错及其解决方法

tcp/ip通信中udp头部结构udphdrp-check校验计算

【通知】《深度学习之摄影图像处理》配套代码开源！

tensorflow随笔-检测浮点数类型check_numerics

npoi 所有列调整为一页_别再浪费纸了，一张纸就能打印Word、Excel、PPT所有内容，真厉害...

富士康立讯精密可能仍在苹果汽车代工商候选名单中

google官方上拉刷新

爪哇国新游记之二十六----迷宫寻路

帆软Tab控件与控制组件隐藏的异同点

怎么下载linux历史文件,快速学习Linux-Linux历史

使用用AI制作logo图标教程

pytorch保存模型pth_pytorch模型文件pth详解

python文件加密

python 接入百度地图数据包下载_Python爬虫-利用百度地图API接口爬取数据并保存至MySQL数据库...

2021年，各类显卡的计算能力对比，天梯图

Windows起一个Docker镜像——起起起起起~不~来~

技术前沿---5G技术的实现原理

CGB2005 JT-1

Microsoft Defender SmartScreen 阻止了无法识别的应用启动

热门文章

(Python) PAT(Basic Level) Practice 刷题笔记(34-66)

IPFS 服务的Python API参考

维权有期？韭菜与大佬派出所同框

工具进化史：到底是谁发明了工具？

安卓模拟器按键_横跨了几代人的经典！PSP模拟器深度教程：模拟器系列008

【FLink】access closed classloader classloader.check-leaked-classloader

设备管理系统（SSM）

基础（网络知识二）——OSI七层与TCP/IP四/五层网络架构

Python副业兼职，月赚7800元，一天只要两小时！

Crypto思潮编年史，从1997 到 2022