1. helm kube-prometheus-stack chart 下载

通过 helm 的方式,对 kube-prometheus-stack chart 服务的进行部署:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo  kube-prometheus-stack
helm pull prometheus/kube-prometheus-stack
tar xf kube-prometheus-stack-41.4.1.tgz && cd kube-prometheus-stack

2. 修改 values.yaml 文件

在部署 Prometheus 之前,已进行以下准备:

  • 创建了一个名为 nfs-client 的 storageclass
  • 在 ingress-nginx 的名称空间,部署 ingress
## 编辑 values.yaml,对以下配置进行调整
alertmanager:ingress:enabled: trueingressClassName: nginxhosts:- alertmanager.localpaths:- /alertmanagerSpec:retention: 720hstorage: volumeClaimTemplate:spec:storageClassName: nfs-clientaccessModes: ["ReadWriteOnce"]resources:requests:storage: 100Gi
---
grafana:adminPassword: 1qaz2wsx ingress:enabled: trueingressClassName: nginxhosts: - grafana.local
---
prometheus:ingress: enabled: trueingressClassName: nginxhosts:- prometheus.localpaths:- /prometheusSpes:retention: 360dstorageSpec:volumeClaimTemplate:spec:storageClassName: nfs-clientaccessModes: ["ReadWriteOnce"]resources:requests:storage: 300Gi
## 修改镜像的地址
prometheusOperator:admissionWebhookspatch:image:repository: registry.aliyuncs.com/google_containers/kube-webhook-certgen
---
## charts/grafana/values.yaml
persistence:enabled: truestorageClassName: nfs-clientsize: 100Gi
## chart/kube-state-metrics/values.yaml
## 修改镜像的地址
image:repository: bitnami/kube-state-metricstag: 2.6.0

3. 部署

## 将服务部署到 monitoring 名称空间
kubectl create ns monitoring
helm install promethues . -n monitoring
## 检查是否正常
kubectl get all -n monitoring
NAME                                                         READY   STATUS    RESTARTS        AGE
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   1 (2m37s ago)   102s
pod/prometheus-grafana-7c466d88c5-tq9zh                      3/3     Running   0               17m
pod/prometheus-kube-prometheus-operator-67b84b5d9b-z7cws     1/1     Running   0               17m
pod/prometheus-kube-state-metrics-77d5757f57-chrnx           1/1     Running   0               17m
pod/prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0               17m
pod/prometheus-prometheus-node-exporter-gj6rr                1/1     Running   0               17m
pod/prometheus-prometheus-node-exporter-rkl6q                1/1     Running   0               17mNAME                                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                     ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   17m
service/prometheus-grafana                        ClusterIP   172.24.140.186   <none>        80/TCP                       17m
service/prometheus-kube-prometheus-alertmanager   ClusterIP   172.24.60.136    <none>        9093/TCP                     17m
service/prometheus-kube-prometheus-operator       ClusterIP   172.24.106.230   <none>        443/TCP                      17m
service/prometheus-kube-prometheus-prometheus     ClusterIP   172.24.114.84    <none>        9090/TCP                     17m
service/prometheus-kube-state-metrics             ClusterIP   172.24.250.206   <none>        8080/TCP                     17m
service/prometheus-operated                       ClusterIP   None             <none>        9090/TCP                     17m
service/prometheus-prometheus-node-exporter       ClusterIP   172.24.74.178    <none>        9100/TCP                     17mNAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-prometheus-node-exporter   2         2         2       2            2           <none>          17mNAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-grafana                    1/1     1            1           17m
deployment.apps/prometheus-kube-prometheus-operator   1/1     1            1           17m
deployment.apps/prometheus-kube-state-metrics         1/1     1            1           17mNAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-grafana-7c466d88c5                    1         1         1       17m
replicaset.apps/prometheus-kube-prometheus-operator-67b84b5d9b   1         1         1       17m
replicaset.apps/prometheus-kube-state-metrics-77d5757f57         1         1         1       17mNAME                                                                    READY   AGE
statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager   1/1     17m
statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus       1/1     17m

报错处理:

报错1:**‘The CustomResourceDefinition “prometheuses.monitoring.coreos.com” is invalid: metadata.annotations: Too long: must have at most 262144 bytes’
处理**:cd ./kube-prometheus-stack/crds/ && kubectl create -f crd-prometheuses.yaml
报错2:‘failed calling webhook “prometheusrulemutate.monitoring.coreos.com”’
处理:应该是之前有装过不同版本的prometheus,在卸载后,相关 webhook 资源未完全删除。通过 kubectl get mutatingwebhookconfigurations 、 kubectl get validatingwebhookconfigurations 命令,查找报错的资源对象,删除即可:kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io prometheus-kube-prometheus-admission,最后再更新部署。

4. 配置调整

访问 prometheus.local 时,点击Status-> Targets 页面,会发现 Prometheus 并不能正常获取一些组件的 metrices。对于 Kubernetes 的组件,大多情况可以通过 HTTP/HTTPS 访问组件的 /metrics 端点来获取组件的metrics,对于一些默认情况下不暴露端点的组件,可以使用 --bind-address 标志进行启用。

把 prometheus.local/alertmanager.local/grafana.local 本地解析更新到 hosts 文件中

4.1 kube-controller-manager

  • 修改配置
    kube-controller-manager 组件暴露 metrics 的端口是 10257,当访问时测试时,会报 “curl: (7) Failed connect to 10.49.18.103:10257; Connection refused ”的错误。结合kube-controller-manager 官网说明,调整组件 bind-address 参数的配置:
vim /etc/kubernetes/manifests/kube-controller-manager.yamlcontainers:- command:- kube-controller-manager- --allocate-node-cidrs=true- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf### 修改#- --bind-address=127.0.0.1- --bind-address=0.0.0.0...省略...

参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/#options
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :: ), all interfaces will be used.

  • 验证
lsof -i:10257
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-cont 13515 root    7u  IPv6 38954551      0t0  TCP *:10257 (LISTEN)
kube-cont 13515 root   34u  IPv6 38968508      0t0  TCP master01.pl.hpc:10257->node1:61771 (ESTABLISHED)### 10257 是安全端口,接收的是 https 请求
curl https://10.49.18.103:10257/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0...省略...

4.2 kube-proxy 组件

  • 修改配置
kubectl edit cm -n kube-system kube-proxy
...省略...kind: KubeProxyConfiguration### 修改# metricsBindAddress: ""metricsBindAddress: 0.0.0.0:10249mode: ipvs
...省略...

备注:kube-proxy 配置是通过 configmap 的方式挂载到容器中,所以不要直接在 kube-proxy daemonset 中添加 metricBindAddress 参数,这种方式添加不会生效。

  • 验证
lsof -i:10249
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-prox 36749 root   13u  IPv6 39030529      0t0  TCP master01.pl.hpc:10249->node1:38685 (ESTABLISHED)
kube-prox 36749 root   14u  IPv6 39100619      0t0  TCP *:10249 (LISTEN)curl 10.49.18.103:10249/metrics
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error inaudit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime.
# TYPE go_gc_cycles_automatic_gc_cycles_total counter
go_gc_cycles_automatic_gc_cycles_total 13
# HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application.
# TYPE go_gc_cycles_forced_gc_cycles_total counter
...省略...

参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/
–metrics-bind-address ipport Default: 127.0.0.1:10249
The IP address with port for the metrics server to serve on (set to ‘0.0.0.0:10249’ for all IPv4 interfaces and ‘[::]:10249’ for all IPv6 interfaces). Set empty to disable. This parameter is ignored if a config file is specified by --config.

4.3 kube-scheduler 组件

  • 修改配置
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
...省略...
spec:containers:- command:- kube-controller-manager- --allocate-node-cidrs=true- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf### 修改 #- --bind-address=127.0.0.1- --bind-address=0.0.0.0
...省略...
  • 验证
lsof -i:10259
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-sche 24404 root    7u  IPv6 38957456      0t0  TCP *:10259 (LISTEN)
kube-sche 24404 root   10u  IPv6 39009914      0t0  TCP master01.pl.hpc:10259->node1:29900 (ESTABLISHED)
### 10259 是安全端口,接收的是 https 请求
curl https://10.49.18.103:10259/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key   --insecure
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="3600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="7200"} 0
...省略...

参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :

kube-prometheus-stack 部署相关推荐

  1. k8s部署Kube Prometheus(Prometheus Operator)

    摘要 本文通过Prometheus-operator框架一键化安装prometheus.alertmanage.granfana,并配置企业微信api以及告警推送,搭建 prometheus 的前提环 ...

  2. Prometheus — 安装部署(主机安装)

    目录 文章目录 目录 环境信息 部署 Prometheus Server 部署 Node Exporter 部署 AlertManager 部署 Grafana 添加 Node Exporter 界面 ...

  3. 监控工具—Prometheus—安装部署

    原文作者:虎纠卫 原文地址:监控神器-普罗米修斯Prometheus的安装 目录 步骤1:安装go 语言环境 步骤2:在监控服务器上安装prometheus 步骤3:在系统层用作系统监控 步骤4:安装 ...

  4. 使用 Docker Stack 部署多服务集群

    使用 Docker Stack 部署多服务集群 前言 单机模式下,我们可以使用 Docker Compose 来编排多个服务,而在 上一篇文章 中介绍的 Docker Swarm 只能实现对单个服务的 ...

  5. Prometheus -Grafana部署及部署告警

    目录 一.prometheus 概述 1.简介 2. 指标类型 3. 作业 job 和实列 targets/instance 4. PrometheusQL(数据查询语言也是时序数据库使用语言) 二. ...

  6. Docker 三剑客-------docker swam,visualizer监控、stack部署集群、Portainer可视化

    Docker 三剑客-------docker swam.visualizer监控.stack部署集群.Portainer可视化 文章目录 Docker 三剑客-------docker swam.v ...

  7. Docker-三剑客之machine、compose、swam集群、visualizer监控、stack部署集群、Portainer可视化

    目录: 介绍 一.docker-machine 1.machine安装 2.使用docker-machine 二.Docker-compose 1.docker-compose配置 2.修改hapro ...

  8. docker--swarm集群管理(结合harbor仓库、docker stack部署、Portainer可视化)

    文章目录 一.swarm结合harbor私有仓库 1.启动配置好的harbor仓库 2.在各个节点上配置私有仓库及证书 二.docker stack部署 1.docker stack与docker-c ...

  9. forever不重启 node_运维监控Prometheus,部署安全的node_exporter监控主机

    简介 prometheus监控系统的时候,是使用pull的方式来获取监控数据,需要被监控端监听对应的端口,prometheus从这些端口服务中拉取对应的数据. node_exporter安全性讨论 n ...

  10. 监控 prometheus及其部署及server discovery,alertmanager,grafana(更新结束)

    prometheus 一.常用监控简介 1.cacti 2.Nagios 3.Zabbix zabbix核心组件介绍 4.Prometheus 二.运维监控平台设计思路 三.prometheus监控体 ...

最新文章

  1. 配置SSH是出现: sign_and_send_pubkey: signing failed: agent refused operation Permission denied
  2. docker pull下载镜像时的报错及其解决方法
  3. tcp/ip通信中udp头部结构udphdrp-check校验计算
  4. 【通知】《深度学习之摄影图像处理》配套代码开源!
  5. tensorflow随笔-检测浮点数类型check_numerics
  6. npoi 所有列调整为一页_别再浪费纸了,一张纸就能打印Word、Excel、PPT所有内容,真厉害...
  7. 富士康立讯精密可能仍在苹果汽车代工商候选名单中
  8. google官方上拉刷新
  9. 爪哇国新游记之二十六----迷宫寻路
  10. 帆软Tab控件与控制组件隐藏的异同点
  11. 怎么下载linux历史文件,快速学习Linux-Linux历史
  12. 使用用AI制作logo图标教程
  13. pytorch保存模型pth_pytorch模型文件pth详解
  14. python文件加密
  15. python 接入百度地图数据包下载_Python爬虫-利用百度地图API接口爬取数据并保存至MySQL数据库...
  16. 2021年,各类显卡的计算能力对比,天梯图
  17. Windows起一个Docker镜像——起起起起起~不~来~
  18. 技术前沿---5G技术的实现原理
  19. CGB2005 JT-1
  20. Microsoft Defender SmartScreen 阻止了无法识别的应用启动

热门文章

  1. (Python) PAT(Basic Level) Practice 刷题笔记(34-66)
  2. IPFS 服务的Python API参考
  3. 维权有期?韭菜与大佬派出所同框
  4. 工具进化史:到底是谁发明了工具?
  5. 安卓模拟器按键_横跨了几代人的经典!PSP模拟器深度教程:模拟器系列008
  6. 【FLink】access closed classloader classloader.check-leaked-classloader
  7. 设备管理系统(SSM)
  8. 基础(网络知识 二)——OSI七层与TCP/IP四/五层网络架构
  9. Python副业兼职,月赚7800元,一天只要两小时 !
  10. Crypto思潮编年史 ,从1997 到 2022