thanos监控多个kubernetes集群
一、准备
1、k8s 1.23
2、helm 3.8
3、minio最新版本 (请自行安装,本人使用docker部暑单节点)
4、kube-prometheus-stack 版本为:35.0.0 (helm安装)
5、kube-thanos版本为:10.3.6 (helm安装)
6、准备两套k8s, 分别使用 *.lady.cn(监控)
和 *.kids.cn(被监控)
二、目标
lady.cn 部暑以下组件
- grafana
- prometheus
- alertmanager
- query-frontend
- query #查询 (通过sidecar、storegateway、storegateway-kids)
- compactor #去重
- storegateway #为query提供查询objstore
- sidecar #在kube-prometheus-stack安装时已安装, 用于数据上传和query查询
- ruler # 告警
- storegateway-kids #被监控集群的objstore(需要yaml手动部暑 )
kids.cn部暑经以下组件
- grafana #可不安装
- alertmanager #可不安装
- prometheus
- query-frontend #可不安装
- query #查询本地 sidecar、storegateway,
- compactor #去重
- storegateway
- sidecar
三、 minio 已在独立服务器部暑minio,作为S3对象存储
172.16.0.39:9000 admin / Thanos@654321
四、部暑kube-prometheus-stack(分别在两个集群中部暑)
#添加 kubernetes-dashboard helm chart
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts# 更新下仓库
helm repo update #指定变量
pro=kube-prometheus-stack
chart_version=35.0.0mkdir -p /data/$pro
cd /data/$pro#下载charts
helm pull prometheus-community/$pro --version=$chart_version#提取values.yaml文件
tar zxvf $pro-$chart_version.tgz --strip-components 1 $pro/values.yaml cat > /data/$pro/start.sh << EOF
helm upgrade --create-namespace --wait --install $pro $pro-$chart_version.tgz \
-f values.yaml \
-n monitoring
EOF
- 修改配置values.yaml
kubeTargetVersionOverride: "1.23.4" #指定k8s版本
---
alertmanager:
# config:
# route:
# receiver: 'ding2wechat'
# routes:
# - match:
# alertname: Watchdog
# receiver: 'ding2wechat'
# receivers:
# - name: 'ding2wechat'
# webhook_configs:
# - url: 'http://dingtalk-webhook:8080/dingtalk/ding2wechat/send'ingress:enabled: truehosts:- alertmanager.lady.cn #注意修改
---
grafana:ingress:enabled: truehosts:- grafana.lady.cn #注意修改additionalDataSources:- name: Prometheustype: prometheusurl: http://thanos-query-frontend:9090/ #与query-frontend集成access: proxyisDefault: true
---
prometheus:thanosService:enabled: truethanosServiceExternal:enabled: true #设为开启type: NodePort #注意修改,有loadbance时,改为LoadBalancerextraSecret: #配置thanos的bucket-config, 里面包括了objstor(minio)的配置name: bucket-configdata:objstore.yml: |type: S3config:bucket: "lady-bucket" #minio的桶名,注意修改endpoint: "172.16.0.39:9000" #minio的地址access_key: "Thanos" #minio的帐号secret_key: "Thanos@654321" #minio的密码insecure: true #不验证tls证书ingress:enabled: truehosts:- prometheus.lady.cn #注意修改prometheusSpec:disableCompaction: true #kube-prometheus-stack 启用thanos-sidecarexternalLabels: cluster: lady.cn # 添加 cluster 标签区分集群secrets:- etcd-client-cert #添加etcd的证书,(etcd不在集群内)thanos:objectStorageConfig: #thanos使用上边的secret来配置thanos-sidecarname: bucket-configkey: objstore.yml
---
kubeControllerManager:endpoints:- 192.168.11.100 #注意修改service:port: 10257 #此处端口一定要配置
---
kubeScheduler:endpoints:- 192.168.11.100 #注意修改service:port: 10259 #此处端口一定要配置
---
kubeEtcd:endpoints:- 192.168.11.100 #注意修改
---
kubeProxy:endpoints:- 192.168.11.100 #注意修改
- 持久化 ---- grafana、prometheus、alertmanager(实验环境可不设置,生产环境需要配置持久化)
#alertmanagerstorage:volumeClaimTemplate:spec:accessModes: ["ReadWriteOnce"]resources:requests:storage: 20Gi
#prometheusstorageSpec:volumeClaimTemplate:spec:accessModes: ["ReadWriteOnce"]resources:requests:storage: 50Gi
启动
bash /data/kube-prometheus-stack/start.sh
本图是thanos-sidecar上传数据到minio的结果
五、kube-thanos安装
1、下载charts
#添加 kubernetes-dashboard helm chart
helm repo add bitnami https://charts.bitnami.com/bitnami# 更新下仓库
helm repo update #指定变量
pro=thanos
chart_version=10.3.6mkdir -p /data/$pro
cd /data/$pro#下载charts
helm pull bitnami/$pro --version=$chart_version#提取values.yaml文件
tar zxvf $pro-$chart_version.tgz --strip-components 1 $pro/values.yaml cat > /data/$pro/start.sh << EOFhelm upgrade --wait --create-namespace --install $pro $pro-$chart_version.tgz \
-f values.yaml \
-n monitoring
EOF
2、 配置values.yaml
#此处对应kube-prometheus-stack的values.yaml配置中的prometheus.extraSecret.name
existingObjstoreSecret: "bucket-config"
query:replicaLabel: [lady_replica] #去重标记,注意修改dnsDiscovery:sidecarsService: "kube-prometheus-stack-thanos-discovery" #kube-prometheus-stack的thanos-servicenamesidecarsNamespace: "monitoring" #kube-prometheus-stack部暑空间 ingress:enabled: trueingress:enabled: truehostname: thanos.lady.cn #注意修改
queryFrontend: #提供给grafana查询使用,看下图enabled: trueextraFlags:- --query-frontend.compress-responses #压缩http请求- --query-range.split-interval=12h # 将请求按照时间间隔分隔- --query-range.max-retries-per-request=5 - --query-frontend.log-queries-longer-than=10s # 打印查询时间大于指定值的查询时间。- --labels.split-interval=12h # 将请求按照时间间隔分隔- --labels.max-retries-per-request=5- --query-range.align-range-with-step # 使其开始和结束与步长保持一致,以获得更好的缓存能力。- --query-range.max-query-length=0 # 限制查询的时间范围,设置为0禁用,1h只能查询1小时范围数据- --query-range.response-cache-max-freshness=1m # 范围查询请求的最近允许的可缓存结果,为了防止最近的缓存结果不断变化- |---query-range.response-cache-config="config":max_size: "200MB"max_size_items: 0validity: 0stype: IN-MEMORY- |---labels.response-cache-config="config":max_size: "200MB"max_size_items: 0validity: 0stype: IN-MEMORYingress:enabled: truehostname: thanos-frontend.lady.cn
compactor:enabled: truepersistence:enabled: true #生产环境设为true,持久化
storegateway:enabled: true persistence:enabled: true #生产环境设为true,持久化
ruler:enabled: truereplicaLabel: lady_replica #去重标记,注意修改alertmanagers:- kube-prometheus-stack-alertmanager:9093 #kube-prometheus-stack的servicename地址existingConfigmap: "prometheus-kube-prometheus-stack-prometheus-rulefiles-0" #kube-prometheus-stack的ruler规则配置persistence:enabled: true #生产环境设为true,持久化ingress:enabled: truehostname: thanos-ruler.lady.cn #注意修改
注: 需要修改一下charts的原码
tar zxvf thanos-10.3.6.tgz
vi thanos/templates/ruler/statefulset.yaml --rule-file=/conf/rules/*.yml 改为 --rule-file=/conf/rules/*.yamlhelm package thanos #重新打包chart.
bash /data/thanos/start.sh
3、query图,包含了sidecar、store、rule
- grafana配置新的数据源为 http://thanos-query-frontend:9090/
在lady中集群中增加thanos-storegateway-kids 和thanos-query-kids来收集kids集群的数据
cat > /data/thanos/query-kids.yaml << 'EOF'
---
apiVersion: v1
kind: Endpoints
metadata:name: thanos-query-kidsnamespace: monitoring
subsets:
- addresses:- ip: 192.168.11.101 #注意修改,这里指向kids.cn的集群ports:- name: grpcport: 30901protocol: TCP- name: httpport: 30902protocol: TCP
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/instance: thanos-query-kidsname: thanos-query-kidsnamespace: monitoring
spec:ports:- name: grpcport: 30901protocol: TCPtargetPort: grpc- name: httpport: 30902protocol: TCPtargetPort: httptype: ClusterIP
EOFkubectl apply -f /data/thanos/query-kids.yaml
cat > /data/thanos/storegateway-kids.yaml << 'EOF'
apiVersion: v1
kind: Secret
metadata:labels:app: kube-prometheus-stack-prometheusapp.kubernetes.io/component: prometheusapp.kubernetes.io/instance: kube-prometheus-stackapp.kubernetes.io/part-of: kube-prometheus-stackname: bucket-config-kidsnamespace: monitoring
data:objstore.yml: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogImtpZHMtYnVja2V0IiAgICAgICAgICAgICAgICAgICAgICAjbWluaW/nmoTmobblkI0KICBlbmRwb2ludDogIjE3Mi4xNi4wLjM5OjkwMDAiICAgICAgICAgICAgICAgI21pbmlv55qE5Zyw5Z2ACiAgYWNjZXNzX2tleTogIlRoYW5vcyIgICAgICAgICAgICAgICAgICAgICAgICNtaW5pb+eahOW4kOWPtwogIHNlY3JldF9rZXk6ICJUaGFub3NANjU0MzIxIiAgICAgICAgICAgICAgICAjbWluaW/nmoTlr4bnoIEKICBpbnNlY3VyZTogdHJ1ZSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgI+S4jemqjOivgXRsc+ivgeS5pgo=
type: Opaque
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosname: thanos-storegateway-kidsnamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosserviceName: thanos-storegateway-headlesstemplate:metadata:labels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosspec:affinity:podAntiAffinity:preferredDuringSchedulingIgnoredDuringExecution:- podAffinityTerm:labelSelector:matchLabels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosnamespaces:- monitoringtopologyKey: kubernetes.io/hostnameweight: 1automountServiceAccountToken: truecontainers:- args:- store- --log.level=info- --log.format=logfmt- --grpc-address=0.0.0.0:10901- --http-address=0.0.0.0:10902- --data-dir=/data- --objstore.config-file=/conf/objstore.ymlimage: docker.io/bitnami/thanos:0.25.2-scratch-r5imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 6httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 30periodSeconds: 10successThreshold: 1timeoutSeconds: 30name: storegatewayports:- containerPort: 10902name: httpprotocol: TCP- containerPort: 10901name: grpcprotocol: TCPreadinessProbe:failureThreshold: 6httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 30periodSeconds: 10successThreshold: 1timeoutSeconds: 30securityContext:allowPrivilegeEscalation: falsereadOnlyRootFilesystem: truerunAsNonRoot: truerunAsUser: 1001terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilevolumeMounts:- mountPath: /confname: objstore-config- mountPath: /dataname: datadnsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext:fsGroup: 1001serviceAccount: thanos-storegatewayserviceAccountName: thanos-storegatewayterminationGracePeriodSeconds: 30volumes:- name: objstore-configsecret:defaultMode: 420secretName: bucket-config-kids- emptyDir: {}name: dataupdateStrategy:type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosname: thanos-storegateway-kidsnamespace: monitoring
spec:internalTrafficPolicy: ClusteripFamilies:- IPv4ipFamilyPolicy: SingleStackports:- name: httpport: 9090protocol: TCPtargetPort: http- name: grpcport: 10901protocol: TCPtargetPort: grpcselector:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanossessionAffinity: Nonetype: ClusterIP
EOFkubectl apply -f /data/thanos/storegateway-kids.yaml
修改lady集群中的thanos-query
kubectl edit -n monitoring deployments.apps thanos-query- --store=dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.cluster.local- --store=dnssrv+_grpc._tcp.thanos-storegateway.monitoring.svc.cluster.local - --store=dnssrv+_grpc._tcp.thanos-ruler.monitoring.svc.cluster.local- --store=dnssrv+_grpc._tcp.thanos-query-kids.monitoring.svc.cluster.local #增加此项,指向kids.cn- --store=dnssrv+_grpc._tcp.thanos-storegateway-kids.monitoring.svc.cluster.local #增加此项,指向kids.cn
验证
lady集群
kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 2 (44h ago) 2d1h
kube-prometheus-stack-grafana-799446c5b9-8h2kh 3/3 Running 3 (44h ago) 2d1h
kube-prometheus-stack-kube-state-metrics-6c5d86887c-hr7l7 1/1 Running 1 (44h ago) 2d1h
kube-prometheus-stack-operator-5bbb5f4f64-dk5dr 1/1 Running 1 (44h ago) 2d1h
kube-prometheus-stack-prometheus-node-exporter-r6pcz 1/1 Running 1 (44h ago) 2d1h
prometheus-kube-prometheus-stack-prometheus-0 3/3 Running 3 (44h ago) 2d1h
thanos-compactor-66ccd948d-g72zt 1/1 Running 2 (44h ago) 2d
thanos-query-5df6c68bc5-vptrq 1/1 Running 0 53m
thanos-query-frontend-59df69d5c-gndz4 1/1 Running 1 (44h ago) 2d
thanos-ruler-0 1/1 Running 1 (44h ago) 2d
thanos-storegateway-0 1/1 Running 2 (44h ago) 2d
thanos-storegateway-kids-0 1/1 Running 0 155m
kids集群
kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 44h
kube-prometheus-stack-grafana-799446c5b9-fdgng 3/3 Running 0 44h
kube-prometheus-stack-kube-state-metrics-6c5d86887c-m7tw5 1/1 Running 0 44h
kube-prometheus-stack-operator-5bbb5f4f64-rxxn6 1/1 Running 0 44h
kube-prometheus-stack-prometheus-node-exporter-fqtjl 1/1 Running 0 44h
prometheus-kube-prometheus-stack-prometheus-0 3/3 Running 0 44h
thanos-compactor-66ccd948d-7tfzd 1/1 Running 0 43h
thanos-query-f6ffddfb4-8qhdj 1/1 Running 0 23h
thanos-query-frontend-59df69d5c-pwbxs 1/1 Running 0 43h
thanos-storegateway-0 1/1 Running 0 43h
thanos-query-frontend配置https://blog.csdn.net/qq_34556414/article/details/124997111
如何使用 Thanos 实现 Prometheus 多集群监控 https://blog.csdn.net/xxxxaayy/article/details/104989792
thanos监控多个kubernetes集群相关推荐
- 使用FIT2CLOUD在青云QingCloud快速部署和管理Kubernetes集群
一.Kubernetes概述 Kubernetes是Google一直在推进的容器调度和管理系统,是Google内部使用的容器管理系统Borg的开源版本.它可以实现对Docker容器的部署,配置,伸缩和 ...
- 查看grafana版本_使用 Prometheus 与 Grafana 为 Kubernetes 集群建立监控与警报机制
作者 | Gregoire DAYET 策划 | 田晓旭 IT 团队已经明确意识到对基础设施进行监控的必要性.目前市面上存在着大量适用于传统基础设施且历史悠久的解决方案:Nagios.Zabbix 等 ...
- Prometheus-使用Prometheus监控Kubernetes集群
Prometheus是一个集数据收集存储.数据查询和数据图表显示于一身的开源监控组件.本文主要讲解如何搭建Prometheus,并使用它监控Kubernetes集群. 准备工作 Kubernete ...
- 使用Prometheus监控kubernetes集群
一键安装(网络可访问quay.io): kubectl apply --filename https://raw.githubusercontent.com/giantswarm/kubernetes ...
- kubernetes集群搭建Zabbix监控平台
kubernetes集群搭建Zabbix监控平台 一.zabbix介绍 1.zabbix简介 2.zabbix特点 3.zabbix的主要功能 4.zabbix架构图 二.检查本地k8s环境 1.检查 ...
- 巧用 Prometheus 监控 Kubernetes 集群所有组件的证书
KubeSphere 虽然提供了运维友好的向导式操作界面,简化了 Kubernetes 的运维操作,但它还是建立在底层 Kubernetes 之上的,Kubernetes 默认的证书有效期都是一年,即 ...
- 三种监控 Kubernetes 集群证书过期方案
公众号关注 「奇妙的 Linux 世界」 设为「星标」,每天带你玩转 Linux ! 前言 Kubernetes 中大量用到了证书, 比如 ca证书.以及 kubelet.apiserver.prox ...
- Kubernetes 集群和应用监控方案的设计与实践
Kubernetes 监控 当你的应用部署到 Kubenetes 后,你很难看到容器内部发生了什么,一旦容器死掉,里面的数据可能就永远无法恢复,甚至无法查看日志以定位问题所在,何况一个应用可能存在很多 ...
- 如何专业化监控一个Kubernetes集群?
简介:本文会介绍 Kubernetes 可观测性系统的构建,以及基于阿里云云产品实现 Kubernetes 可观测系统构建的最佳实践. 作者:佳旭 阿里云容器服务技术专家 引言 Kubernetes ...
最新文章
- 线程工具类(根据电脑逻辑处理器个数控制同时运行的线程个数)
- 电子商务系统的设计与实现(十二):技术选型
- 大厂抢夺冬奥会“第二赛场”
- 太太丘舍去_过中不至,太丘舍去,去后乃至的意思
- THUSCH 2017 大魔法师(矩阵乘法+线段树)
- 保存多序列tiff文件_解码TIFF文件
- 借条已经收回,他以没收据为由让我继续还款怎样办?
- 国家自然科学基金申请书写作攻略
- matlab qpsk代码 博客,完整版QPSK调制原理及matlab程序实现
- 有道词典pc离线包打包下载_【超福利】安卓手机上最好用的离线词典
- python联合vrep_vrep-python 控制方法
- 中兴新支点操作系统_中兴新支点系统预装测试
- 友善之臂mini2440使用日志1
- fiddler mac教程_Mac os 安装fiddler
- Unity 编辑器下运行没有声音
- 重启服务器上的MYSQL
- 多目标应用:多目标蜣螂优化算法求解多旅行商问题(Multiple Traveling Salesman Problem, MTSP)
- java110 RedPacket 红包系统安装与使用
- PDF文件转DWG文件用CAD转换器可以操作吗?
- 【转载】CRC32校验算法C语言版(查表法)