thanos监控多个kubernetes集群

一、准备
1、k8s 1.23
2、helm 3.8
3、minio最新版本（请自行安装，本人使用docker部暑单节点）
4、kube-prometheus-stack 版本为：35.0.0 （helm安装）
5、kube-thanos版本为：10.3.6 （helm安装）
6、准备两套k8s, 分别使用 *.lady.cn（监控） 和 *.kids.cn（被监控）

二、目标
lady.cn 部暑以下组件

grafana
prometheus
alertmanager
query-frontend
query #查询（通过sidecar、storegateway、storegateway-kids）
compactor #去重
storegateway #为query提供查询objstore
sidecar #在kube-prometheus-stack安装时已安装, 用于数据上传和query查询
ruler # 告警
storegateway-kids #被监控集群的objstore(需要yaml手动部暑 )

kids.cn部暑经以下组件

grafana #可不安装
alertmanager #可不安装
prometheus
query-frontend #可不安装
query #查询本地 sidecar、storegateway，
compactor #去重
storegateway
sidecar

三、 minio 已在独立服务器部暑minio,作为S3对象存储

172.16.0.39:9000  admin /  Thanos@654321

四、部暑kube-prometheus-stack(分别在两个集群中部暑)

#添加 kubernetes-dashboard helm chart
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts# 更新下仓库
helm repo update #指定变量
pro=kube-prometheus-stack
chart_version=35.0.0mkdir -p /data/$pro
cd /data/$pro#下载charts
helm pull prometheus-community/$pro --version=$chart_version#提取values.yaml文件
tar zxvf $pro-$chart_version.tgz --strip-components 1 $pro/values.yaml cat > /data/$pro/start.sh << EOF
helm upgrade --create-namespace --wait --install $pro $pro-$chart_version.tgz \
-f values.yaml \
-n monitoring
EOF

修改配置values.yaml

kubeTargetVersionOverride: "1.23.4"   #指定k8s版本
---
alertmanager:
#  config:
#    route:
#      receiver: 'ding2wechat'
#      routes:
#      - match:
#          alertname: Watchdog
#        receiver: 'ding2wechat'
#    receivers:
#    - name: 'ding2wechat'
#      webhook_configs:
#      - url: 'http://dingtalk-webhook:8080/dingtalk/ding2wechat/send'ingress:enabled: truehosts:- alertmanager.lady.cn        #注意修改
---
grafana:ingress:enabled: truehosts:- grafana.lady.cn             #注意修改additionalDataSources:- name: Prometheustype: prometheusurl: http://thanos-query-frontend:9090/        #与query-frontend集成access: proxyisDefault: true
---
prometheus:thanosService:enabled: truethanosServiceExternal:enabled: true                       #设为开启type: NodePort                      #注意修改，有loadbance时，改为LoadBalancerextraSecret:                                    #配置thanos的bucket-config, 里面包括了objstor(minio)的配置name: bucket-configdata:objstore.yml: |type: S3config:bucket: "lady-bucket"                      #minio的桶名，注意修改endpoint: "172.16.0.39:9000"               #minio的地址access_key: "Thanos"                       #minio的帐号secret_key: "Thanos@654321"                #minio的密码insecure: true                             #不验证tls证书ingress:enabled: truehosts:- prometheus.lady.cn                           #注意修改prometheusSpec:disableCompaction: true                          #kube-prometheus-stack 启用thanos-sidecarexternalLabels: cluster: lady.cn                               # 添加 cluster 标签区分集群secrets:- etcd-client-cert                               #添加etcd的证书，（etcd不在集群内）thanos:objectStorageConfig:                           #thanos使用上边的secret来配置thanos-sidecarname: bucket-configkey: objstore.yml
---
kubeControllerManager:endpoints:- 192.168.11.100      #注意修改service:port: 10257     #此处端口一定要配置
---
kubeScheduler:endpoints:- 192.168.11.100      #注意修改service:port: 10259     #此处端口一定要配置
---
kubeEtcd:endpoints:- 192.168.11.100      #注意修改
---
kubeProxy:endpoints:- 192.168.11.100       #注意修改

持久化 ---- grafana、prometheus、alertmanager（实验环境可不设置，生产环境需要配置持久化）

#alertmanagerstorage:volumeClaimTemplate:spec:accessModes: ["ReadWriteOnce"]resources:requests:storage: 20Gi
#prometheusstorageSpec:volumeClaimTemplate:spec:accessModes: ["ReadWriteOnce"]resources:requests:storage: 50Gi

启动

bash /data/kube-prometheus-stack/start.sh

本图是thanos-sidecar上传数据到minio的结果

五、kube-thanos安装
1、下载charts

#添加 kubernetes-dashboard helm chart
helm repo add bitnami https://charts.bitnami.com/bitnami# 更新下仓库
helm repo update #指定变量
pro=thanos
chart_version=10.3.6mkdir -p /data/$pro
cd /data/$pro#下载charts
helm pull bitnami/$pro --version=$chart_version#提取values.yaml文件
tar zxvf $pro-$chart_version.tgz --strip-components 1 $pro/values.yaml cat > /data/$pro/start.sh << EOFhelm upgrade --wait --create-namespace --install $pro $pro-$chart_version.tgz \
-f values.yaml \
-n monitoring
EOF

2、配置values.yaml

#此处对应kube-prometheus-stack的values.yaml配置中的prometheus.extraSecret.name
existingObjstoreSecret: "bucket-config"
query:replicaLabel: [lady_replica]                             #去重标记，注意修改dnsDiscovery:sidecarsService: "kube-prometheus-stack-thanos-discovery"  #kube-prometheus-stack的thanos-servicenamesidecarsNamespace: "monitoring"                            #kube-prometheus-stack部暑空间   ingress:enabled: trueingress:enabled: truehostname: thanos.lady.cn    #注意修改
queryFrontend:                 #提供给grafana查询使用，看下图enabled: trueextraFlags:- --query-frontend.compress-responses            #压缩http请求- --query-range.split-interval=12h               # 将请求按照时间间隔分隔- --query-range.max-retries-per-request=5        - --query-frontend.log-queries-longer-than=10s    # 打印查询时间大于指定值的查询时间。- --labels.split-interval=12h                     # 将请求按照时间间隔分隔- --labels.max-retries-per-request=5- --query-range.align-range-with-step       # 使其开始和结束与步长保持一致，以获得更好的缓存能力。- --query-range.max-query-length=0        # 限制查询的时间范围，设置为0禁用，1h只能查询1小时范围数据- --query-range.response-cache-max-freshness=1m   # 范围查询请求的最近允许的可缓存结果，为了防止最近的缓存结果不断变化- |---query-range.response-cache-config="config":max_size: "200MB"max_size_items: 0validity: 0stype: IN-MEMORY- |---labels.response-cache-config="config":max_size: "200MB"max_size_items: 0validity: 0stype: IN-MEMORYingress:enabled: truehostname: thanos-frontend.lady.cn
compactor:enabled: truepersistence:enabled: true             #生产环境设为true,持久化
storegateway:enabled: true persistence:enabled: true             #生产环境设为true,持久化
ruler:enabled: truereplicaLabel: lady_replica              #去重标记，注意修改alertmanagers:- kube-prometheus-stack-alertmanager:9093       #kube-prometheus-stack的servicename地址existingConfigmap: "prometheus-kube-prometheus-stack-prometheus-rulefiles-0"   #kube-prometheus-stack的ruler规则配置persistence:enabled: true             #生产环境设为true,持久化ingress:enabled: truehostname: thanos-ruler.lady.cn     #注意修改

注: 需要修改一下charts的原码

tar zxvf thanos-10.3.6.tgz
vi thanos/templates/ruler/statefulset.yaml --rule-file=/conf/rules/*.yml   改为  --rule-file=/conf/rules/*.yamlhelm package thanos      #重新打包chart.

bash /data/thanos/start.sh

3、query图，包含了sidecar、store、rule

grafana配置新的数据源为 http://thanos-query-frontend:9090/

在lady中集群中增加thanos-storegateway-kids 和thanos-query-kids来收集kids集群的数据

cat > /data/thanos/query-kids.yaml << 'EOF'
---
apiVersion: v1
kind: Endpoints
metadata:name: thanos-query-kidsnamespace: monitoring
subsets:
- addresses:- ip: 192.168.11.101     #注意修改，这里指向kids.cn的集群ports:- name: grpcport: 30901protocol: TCP- name: httpport: 30902protocol: TCP
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/instance: thanos-query-kidsname: thanos-query-kidsnamespace: monitoring
spec:ports:- name: grpcport: 30901protocol: TCPtargetPort: grpc- name: httpport: 30902protocol: TCPtargetPort: httptype: ClusterIP
EOFkubectl apply -f /data/thanos/query-kids.yaml

cat > /data/thanos/storegateway-kids.yaml << 'EOF'
apiVersion: v1
kind: Secret
metadata:labels:app: kube-prometheus-stack-prometheusapp.kubernetes.io/component: prometheusapp.kubernetes.io/instance: kube-prometheus-stackapp.kubernetes.io/part-of: kube-prometheus-stackname: bucket-config-kidsnamespace: monitoring
data:objstore.yml: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogImtpZHMtYnVja2V0IiAgICAgICAgICAgICAgICAgICAgICAjbWluaW/nmoTmobblkI0KICBlbmRwb2ludDogIjE3Mi4xNi4wLjM5OjkwMDAiICAgICAgICAgICAgICAgI21pbmlv55qE5Zyw5Z2ACiAgYWNjZXNzX2tleTogIlRoYW5vcyIgICAgICAgICAgICAgICAgICAgICAgICNtaW5pb+eahOW4kOWPtwogIHNlY3JldF9rZXk6ICJUaGFub3NANjU0MzIxIiAgICAgICAgICAgICAgICAjbWluaW/nmoTlr4bnoIEKICBpbnNlY3VyZTogdHJ1ZSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgI+S4jemqjOivgXRsc+ivgeS5pgo=
type: Opaque
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosname: thanos-storegateway-kidsnamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosserviceName: thanos-storegateway-headlesstemplate:metadata:labels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosspec:affinity:podAntiAffinity:preferredDuringSchedulingIgnoredDuringExecution:- podAffinityTerm:labelSelector:matchLabels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosnamespaces:- monitoringtopologyKey: kubernetes.io/hostnameweight: 1automountServiceAccountToken: truecontainers:- args:- store- --log.level=info- --log.format=logfmt- --grpc-address=0.0.0.0:10901- --http-address=0.0.0.0:10902- --data-dir=/data- --objstore.config-file=/conf/objstore.ymlimage: docker.io/bitnami/thanos:0.25.2-scratch-r5imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 6httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 30periodSeconds: 10successThreshold: 1timeoutSeconds: 30name: storegatewayports:- containerPort: 10902name: httpprotocol: TCP- containerPort: 10901name: grpcprotocol: TCPreadinessProbe:failureThreshold: 6httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 30periodSeconds: 10successThreshold: 1timeoutSeconds: 30securityContext:allowPrivilegeEscalation: falsereadOnlyRootFilesystem: truerunAsNonRoot: truerunAsUser: 1001terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilevolumeMounts:- mountPath: /confname: objstore-config- mountPath: /dataname: datadnsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext:fsGroup: 1001serviceAccount: thanos-storegatewayserviceAccountName: thanos-storegatewayterminationGracePeriodSeconds: 30volumes:- name: objstore-configsecret:defaultMode: 420secretName: bucket-config-kids- emptyDir: {}name: dataupdateStrategy:type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosname: thanos-storegateway-kidsnamespace: monitoring
spec:internalTrafficPolicy: ClusteripFamilies:- IPv4ipFamilyPolicy: SingleStackports:- name: httpport: 9090protocol: TCPtargetPort: http- name: grpcport: 10901protocol: TCPtargetPort: grpcselector:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanossessionAffinity: Nonetype: ClusterIP
EOFkubectl apply -f /data/thanos/storegateway-kids.yaml

修改lady集群中的thanos-query

kubectl edit -n monitoring deployments.apps thanos-query- --store=dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.cluster.local- --store=dnssrv+_grpc._tcp.thanos-storegateway.monitoring.svc.cluster.local                                  - --store=dnssrv+_grpc._tcp.thanos-ruler.monitoring.svc.cluster.local- --store=dnssrv+_grpc._tcp.thanos-query-kids.monitoring.svc.cluster.local                 #增加此项，指向kids.cn- --store=dnssrv+_grpc._tcp.thanos-storegateway-kids.monitoring.svc.cluster.local          #增加此项，指向kids.cn

验证
lady集群

kubectl get pod -n monitoring
NAME                                                        READY   STATUS    RESTARTS      AGE
alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   2 (44h ago)   2d1h
kube-prometheus-stack-grafana-799446c5b9-8h2kh              3/3     Running   3 (44h ago)   2d1h
kube-prometheus-stack-kube-state-metrics-6c5d86887c-hr7l7   1/1     Running   1 (44h ago)   2d1h
kube-prometheus-stack-operator-5bbb5f4f64-dk5dr             1/1     Running   1 (44h ago)   2d1h
kube-prometheus-stack-prometheus-node-exporter-r6pcz        1/1     Running   1 (44h ago)   2d1h
prometheus-kube-prometheus-stack-prometheus-0               3/3     Running   3 (44h ago)   2d1h
thanos-compactor-66ccd948d-g72zt                            1/1     Running   2 (44h ago)   2d
thanos-query-5df6c68bc5-vptrq                               1/1     Running   0             53m
thanos-query-frontend-59df69d5c-gndz4                       1/1     Running   1 (44h ago)   2d
thanos-ruler-0                                              1/1     Running   1 (44h ago)   2d
thanos-storegateway-0                                       1/1     Running   2 (44h ago)   2d
thanos-storegateway-kids-0                                  1/1     Running   0             155m

kids集群

kubectl get pod -n monitoring
NAME                                                        READY   STATUS    RESTARTS   AGE
alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          44h
kube-prometheus-stack-grafana-799446c5b9-fdgng              3/3     Running   0          44h
kube-prometheus-stack-kube-state-metrics-6c5d86887c-m7tw5   1/1     Running   0          44h
kube-prometheus-stack-operator-5bbb5f4f64-rxxn6             1/1     Running   0          44h
kube-prometheus-stack-prometheus-node-exporter-fqtjl        1/1     Running   0          44h
prometheus-kube-prometheus-stack-prometheus-0               3/3     Running   0          44h
thanos-compactor-66ccd948d-7tfzd                            1/1     Running   0          43h
thanos-query-f6ffddfb4-8qhdj                                1/1     Running   0          23h
thanos-query-frontend-59df69d5c-pwbxs                       1/1     Running   0          43h
thanos-storegateway-0                                       1/1     Running   0          43h

thanos-query-frontend配置https://blog.csdn.net/qq_34556414/article/details/124997111

如何使用 Thanos 实现 Prometheus 多集群监控 https://blog.csdn.net/xxxxaayy/article/details/104989792

thanos监控多个kubernetes集群相关推荐

使用FIT2CLOUD在青云QingCloud快速部署和管理Kubernetes集群
一.Kubernetes概述 Kubernetes是Google一直在推进的容器调度和管理系统,是Google内部使用的容器管理系统Borg的开源版本.它可以实现对Docker容器的部署,配置,伸缩和 ...
查看grafana版本_使用 Prometheus 与 Grafana 为 Kubernetes 集群建立监控与警报机制
作者 | Gregoire DAYET 策划 | 田晓旭 IT 团队已经明确意识到对基础设施进行监控的必要性.目前市面上存在着大量适用于传统基础设施且历史悠久的解决方案:Nagios.Zabbix 等 ...
Prometheus-使用Prometheus监控Kubernetes集群
Prometheus是一个集数据收集存储.数据查询和数据图表显示于一身的开源监控组件.本文主要讲解如何搭建Prometheus,并使用它监控Kubernetes集群. 准备工作 Kubernete ...
使用Prometheus监控kubernetes集群
一键安装(网络可访问quay.io): kubectl apply --filename https://raw.githubusercontent.com/giantswarm/kubernetes ...
kubernetes集群搭建Zabbix监控平台
kubernetes集群搭建Zabbix监控平台一.zabbix介绍 1.zabbix简介 2.zabbix特点 3.zabbix的主要功能 4.zabbix架构图二.检查本地k8s环境 1.检查 ...
巧用 Prometheus 监控 Kubernetes 集群所有组件的证书
KubeSphere 虽然提供了运维友好的向导式操作界面,简化了 Kubernetes 的运维操作,但它还是建立在底层 Kubernetes 之上的,Kubernetes 默认的证书有效期都是一年,即 ...
三种监控 Kubernetes 集群证书过期方案
公众号关注「奇妙的 Linux 世界」设为「星标」,每天带你玩转 Linux ! 前言 Kubernetes 中大量用到了证书, 比如 ca证书.以及 kubelet.apiserver.prox ...
Kubernetes 集群和应用监控方案的设计与实践
Kubernetes 监控当你的应用部署到 Kubenetes 后,你很难看到容器内部发生了什么,一旦容器死掉,里面的数据可能就永远无法恢复,甚至无法查看日志以定位问题所在,何况一个应用可能存在很多 ...
如何专业化监控一个Kubernetes集群？
简介:本文会介绍 Kubernetes 可观测性系统的构建,以及基于阿里云云产品实现 Kubernetes 可观测系统构建的最佳实践. 作者:佳旭阿里云容器服务技术专家引言 Kubernetes ...

thanos监控多个kubernetes集群

thanos监控多个kubernetes集群相关推荐

最新文章

热门文章