一、准备
1、k8s 1.23
2、helm 3.8
3、minio最新版本 (请自行安装,本人使用docker部暑单节点)
4、kube-prometheus-stack 版本为:35.0.0 (helm安装)
5、kube-thanos版本为:10.3.6 (helm安装)
6、准备两套k8s, 分别使用 *.lady.cn(监控)*.kids.cn(被监控)

二、目标
lady.cn 部暑以下组件

  • grafana
  • prometheus
  • alertmanager
  • query-frontend
  • query #查询 (通过sidecar、storegateway、storegateway-kids)
  • compactor #去重
  • storegateway #为query提供查询objstore
  • sidecar #在kube-prometheus-stack安装时已安装, 用于数据上传和query查询
  • ruler # 告警
  • storegateway-kids #被监控集群的objstore(需要yaml手动部暑 )

kids.cn部暑经以下组件

  • grafana #可不安装
  • alertmanager #可不安装
  • prometheus
  • query-frontend #可不安装
  • query #查询本地 sidecar、storegateway,
  • compactor #去重
  • storegateway
  • sidecar

三、 minio 已在独立服务器部暑minio,作为S3对象存储

172.16.0.39:9000  admin /  Thanos@654321

四、部暑kube-prometheus-stack(分别在两个集群中部暑)

#添加 kubernetes-dashboard helm chart
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts# 更新下仓库
helm repo update #指定变量
pro=kube-prometheus-stack
chart_version=35.0.0mkdir -p /data/$pro
cd /data/$pro#下载charts
helm pull prometheus-community/$pro --version=$chart_version#提取values.yaml文件
tar zxvf $pro-$chart_version.tgz --strip-components 1 $pro/values.yaml cat > /data/$pro/start.sh << EOF
helm upgrade --create-namespace --wait --install $pro $pro-$chart_version.tgz \
-f values.yaml \
-n monitoring
EOF
  • 修改配置values.yaml
kubeTargetVersionOverride: "1.23.4"   #指定k8s版本
---
alertmanager:
#  config:
#    route:
#      receiver: 'ding2wechat'
#      routes:
#      - match:
#          alertname: Watchdog
#        receiver: 'ding2wechat'
#    receivers:
#    - name: 'ding2wechat'
#      webhook_configs:
#      - url: 'http://dingtalk-webhook:8080/dingtalk/ding2wechat/send'ingress:enabled: truehosts:- alertmanager.lady.cn        #注意修改
---
grafana:ingress:enabled: truehosts:- grafana.lady.cn             #注意修改additionalDataSources:- name: Prometheustype: prometheusurl: http://thanos-query-frontend:9090/        #与query-frontend集成access: proxyisDefault: true
---
prometheus:thanosService:enabled: truethanosServiceExternal:enabled: true                       #设为开启type: NodePort                      #注意修改,有loadbance时,改为LoadBalancerextraSecret:                                    #配置thanos的bucket-config, 里面包括了objstor(minio)的配置name: bucket-configdata:objstore.yml: |type: S3config:bucket: "lady-bucket"                      #minio的桶名,注意修改endpoint: "172.16.0.39:9000"               #minio的地址access_key: "Thanos"                       #minio的帐号secret_key: "Thanos@654321"                #minio的密码insecure: true                             #不验证tls证书ingress:enabled: truehosts:- prometheus.lady.cn                           #注意修改prometheusSpec:disableCompaction: true                          #kube-prometheus-stack 启用thanos-sidecarexternalLabels: cluster: lady.cn                               # 添加 cluster 标签区分集群secrets:- etcd-client-cert                               #添加etcd的证书,(etcd不在集群内)thanos:objectStorageConfig:                           #thanos使用上边的secret来配置thanos-sidecarname: bucket-configkey: objstore.yml
---
kubeControllerManager:endpoints:- 192.168.11.100      #注意修改service:port: 10257     #此处端口一定要配置
---
kubeScheduler:endpoints:- 192.168.11.100      #注意修改service:port: 10259     #此处端口一定要配置
---
kubeEtcd:endpoints:- 192.168.11.100      #注意修改
---
kubeProxy:endpoints:- 192.168.11.100       #注意修改
  • 持久化 ---- grafana、prometheus、alertmanager(实验环境可不设置,生产环境需要配置持久化)
#alertmanagerstorage:volumeClaimTemplate:spec:accessModes: ["ReadWriteOnce"]resources:requests:storage: 20Gi
#prometheusstorageSpec:volumeClaimTemplate:spec:accessModes: ["ReadWriteOnce"]resources:requests:storage: 50Gi

启动

bash /data/kube-prometheus-stack/start.sh

本图是thanos-sidecar上传数据到minio的结果

五、kube-thanos安装
1、下载charts

#添加 kubernetes-dashboard helm chart
helm repo add bitnami https://charts.bitnami.com/bitnami# 更新下仓库
helm repo update #指定变量
pro=thanos
chart_version=10.3.6mkdir -p /data/$pro
cd /data/$pro#下载charts
helm pull bitnami/$pro --version=$chart_version#提取values.yaml文件
tar zxvf $pro-$chart_version.tgz --strip-components 1 $pro/values.yaml cat > /data/$pro/start.sh << EOFhelm upgrade --wait --create-namespace --install $pro $pro-$chart_version.tgz \
-f values.yaml \
-n monitoring
EOF

2、 配置values.yaml

#此处对应kube-prometheus-stack的values.yaml配置中的prometheus.extraSecret.name
existingObjstoreSecret: "bucket-config"
query:replicaLabel: [lady_replica]                             #去重标记,注意修改dnsDiscovery:sidecarsService: "kube-prometheus-stack-thanos-discovery"  #kube-prometheus-stack的thanos-servicenamesidecarsNamespace: "monitoring"                            #kube-prometheus-stack部暑空间   ingress:enabled: trueingress:enabled: truehostname: thanos.lady.cn    #注意修改
queryFrontend:                 #提供给grafana查询使用,看下图enabled: trueextraFlags:- --query-frontend.compress-responses            #压缩http请求- --query-range.split-interval=12h               # 将请求按照时间间隔分隔- --query-range.max-retries-per-request=5        - --query-frontend.log-queries-longer-than=10s    # 打印查询时间大于指定值的查询时间。- --labels.split-interval=12h                     # 将请求按照时间间隔分隔- --labels.max-retries-per-request=5- --query-range.align-range-with-step       # 使其开始和结束与步长保持一致,以获得更好的缓存能力。- --query-range.max-query-length=0        # 限制查询的时间范围,设置为0禁用,1h只能查询1小时范围数据- --query-range.response-cache-max-freshness=1m   # 范围查询请求的最近允许的可缓存结果,为了防止最近的缓存结果不断变化- |---query-range.response-cache-config="config":max_size: "200MB"max_size_items: 0validity: 0stype: IN-MEMORY- |---labels.response-cache-config="config":max_size: "200MB"max_size_items: 0validity: 0stype: IN-MEMORYingress:enabled: truehostname: thanos-frontend.lady.cn
compactor:enabled: truepersistence:enabled: true             #生产环境设为true,持久化
storegateway:enabled: true persistence:enabled: true             #生产环境设为true,持久化
ruler:enabled: truereplicaLabel: lady_replica              #去重标记,注意修改alertmanagers:- kube-prometheus-stack-alertmanager:9093       #kube-prometheus-stack的servicename地址existingConfigmap: "prometheus-kube-prometheus-stack-prometheus-rulefiles-0"   #kube-prometheus-stack的ruler规则配置persistence:enabled: true             #生产环境设为true,持久化ingress:enabled: truehostname: thanos-ruler.lady.cn     #注意修改

注: 需要修改一下charts的原码

tar zxvf thanos-10.3.6.tgz
vi thanos/templates/ruler/statefulset.yaml --rule-file=/conf/rules/*.yml   改为  --rule-file=/conf/rules/*.yamlhelm package thanos      #重新打包chart.
bash /data/thanos/start.sh

3、query图,包含了sidecar、store、rule

  • grafana配置新的数据源为 http://thanos-query-frontend:9090/

在lady中集群中增加thanos-storegateway-kids 和thanos-query-kids来收集kids集群的数据

cat > /data/thanos/query-kids.yaml << 'EOF'
---
apiVersion: v1
kind: Endpoints
metadata:name: thanos-query-kidsnamespace: monitoring
subsets:
- addresses:- ip: 192.168.11.101     #注意修改,这里指向kids.cn的集群ports:- name: grpcport: 30901protocol: TCP- name: httpport: 30902protocol: TCP
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/instance: thanos-query-kidsname: thanos-query-kidsnamespace: monitoring
spec:ports:- name: grpcport: 30901protocol: TCPtargetPort: grpc- name: httpport: 30902protocol: TCPtargetPort: httptype: ClusterIP
EOFkubectl apply -f /data/thanos/query-kids.yaml
cat > /data/thanos/storegateway-kids.yaml << 'EOF'
apiVersion: v1
kind: Secret
metadata:labels:app: kube-prometheus-stack-prometheusapp.kubernetes.io/component: prometheusapp.kubernetes.io/instance: kube-prometheus-stackapp.kubernetes.io/part-of: kube-prometheus-stackname: bucket-config-kidsnamespace: monitoring
data:objstore.yml: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogImtpZHMtYnVja2V0IiAgICAgICAgICAgICAgICAgICAgICAjbWluaW/nmoTmobblkI0KICBlbmRwb2ludDogIjE3Mi4xNi4wLjM5OjkwMDAiICAgICAgICAgICAgICAgI21pbmlv55qE5Zyw5Z2ACiAgYWNjZXNzX2tleTogIlRoYW5vcyIgICAgICAgICAgICAgICAgICAgICAgICNtaW5pb+eahOW4kOWPtwogIHNlY3JldF9rZXk6ICJUaGFub3NANjU0MzIxIiAgICAgICAgICAgICAgICAjbWluaW/nmoTlr4bnoIEKICBpbnNlY3VyZTogdHJ1ZSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgI+S4jemqjOivgXRsc+ivgeS5pgo=
type: Opaque
---
apiVersion: apps/v1
kind: StatefulSet
metadata:labels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosname: thanos-storegateway-kidsnamespace: monitoring
spec:replicas: 1selector:matchLabels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosserviceName: thanos-storegateway-headlesstemplate:metadata:labels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosspec:affinity:podAntiAffinity:preferredDuringSchedulingIgnoredDuringExecution:- podAffinityTerm:labelSelector:matchLabels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosnamespaces:- monitoringtopologyKey: kubernetes.io/hostnameweight: 1automountServiceAccountToken: truecontainers:- args:- store- --log.level=info- --log.format=logfmt- --grpc-address=0.0.0.0:10901- --http-address=0.0.0.0:10902- --data-dir=/data- --objstore.config-file=/conf/objstore.ymlimage: docker.io/bitnami/thanos:0.25.2-scratch-r5imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 6httpGet:path: /-/healthyport: httpscheme: HTTPinitialDelaySeconds: 30periodSeconds: 10successThreshold: 1timeoutSeconds: 30name: storegatewayports:- containerPort: 10902name: httpprotocol: TCP- containerPort: 10901name: grpcprotocol: TCPreadinessProbe:failureThreshold: 6httpGet:path: /-/readyport: httpscheme: HTTPinitialDelaySeconds: 30periodSeconds: 10successThreshold: 1timeoutSeconds: 30securityContext:allowPrivilegeEscalation: falsereadOnlyRootFilesystem: truerunAsNonRoot: truerunAsUser: 1001terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilevolumeMounts:- mountPath: /confname: objstore-config- mountPath: /dataname: datadnsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext:fsGroup: 1001serviceAccount: thanos-storegatewayserviceAccountName: thanos-storegatewayterminationGracePeriodSeconds: 30volumes:- name: objstore-configsecret:defaultMode: 420secretName: bucket-config-kids- emptyDir: {}name: dataupdateStrategy:type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:labels:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanosname: thanos-storegateway-kidsnamespace: monitoring
spec:internalTrafficPolicy: ClusteripFamilies:- IPv4ipFamilyPolicy: SingleStackports:- name: httpport: 9090protocol: TCPtargetPort: http- name: grpcport: 10901protocol: TCPtargetPort: grpcselector:app.kubernetes.io/component: storegateway-kidsapp.kubernetes.io/instance: thanosapp.kubernetes.io/name: thanossessionAffinity: Nonetype: ClusterIP
EOFkubectl apply -f /data/thanos/storegateway-kids.yaml

修改lady集群中的thanos-query

kubectl edit -n monitoring deployments.apps thanos-query- --store=dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.cluster.local- --store=dnssrv+_grpc._tcp.thanos-storegateway.monitoring.svc.cluster.local                                  - --store=dnssrv+_grpc._tcp.thanos-ruler.monitoring.svc.cluster.local- --store=dnssrv+_grpc._tcp.thanos-query-kids.monitoring.svc.cluster.local                 #增加此项,指向kids.cn- --store=dnssrv+_grpc._tcp.thanos-storegateway-kids.monitoring.svc.cluster.local          #增加此项,指向kids.cn

验证
lady集群

kubectl get pod -n monitoring
NAME                                                        READY   STATUS    RESTARTS      AGE
alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   2 (44h ago)   2d1h
kube-prometheus-stack-grafana-799446c5b9-8h2kh              3/3     Running   3 (44h ago)   2d1h
kube-prometheus-stack-kube-state-metrics-6c5d86887c-hr7l7   1/1     Running   1 (44h ago)   2d1h
kube-prometheus-stack-operator-5bbb5f4f64-dk5dr             1/1     Running   1 (44h ago)   2d1h
kube-prometheus-stack-prometheus-node-exporter-r6pcz        1/1     Running   1 (44h ago)   2d1h
prometheus-kube-prometheus-stack-prometheus-0               3/3     Running   3 (44h ago)   2d1h
thanos-compactor-66ccd948d-g72zt                            1/1     Running   2 (44h ago)   2d
thanos-query-5df6c68bc5-vptrq                               1/1     Running   0             53m
thanos-query-frontend-59df69d5c-gndz4                       1/1     Running   1 (44h ago)   2d
thanos-ruler-0                                              1/1     Running   1 (44h ago)   2d
thanos-storegateway-0                                       1/1     Running   2 (44h ago)   2d
thanos-storegateway-kids-0                                  1/1     Running   0             155m

kids集群

kubectl get pod -n monitoring
NAME                                                        READY   STATUS    RESTARTS   AGE
alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          44h
kube-prometheus-stack-grafana-799446c5b9-fdgng              3/3     Running   0          44h
kube-prometheus-stack-kube-state-metrics-6c5d86887c-m7tw5   1/1     Running   0          44h
kube-prometheus-stack-operator-5bbb5f4f64-rxxn6             1/1     Running   0          44h
kube-prometheus-stack-prometheus-node-exporter-fqtjl        1/1     Running   0          44h
prometheus-kube-prometheus-stack-prometheus-0               3/3     Running   0          44h
thanos-compactor-66ccd948d-7tfzd                            1/1     Running   0          43h
thanos-query-f6ffddfb4-8qhdj                                1/1     Running   0          23h
thanos-query-frontend-59df69d5c-pwbxs                       1/1     Running   0          43h
thanos-storegateway-0                                       1/1     Running   0          43h

thanos-query-frontend配置https://blog.csdn.net/qq_34556414/article/details/124997111

如何使用 Thanos 实现 Prometheus 多集群监控 https://blog.csdn.net/xxxxaayy/article/details/104989792

thanos监控多个kubernetes集群相关推荐

  1. 使用FIT2CLOUD在青云QingCloud快速部署和管理Kubernetes集群

    一.Kubernetes概述 Kubernetes是Google一直在推进的容器调度和管理系统,是Google内部使用的容器管理系统Borg的开源版本.它可以实现对Docker容器的部署,配置,伸缩和 ...

  2. 查看grafana版本_使用 Prometheus 与 Grafana 为 Kubernetes 集群建立监控与警报机制

    作者 | Gregoire DAYET 策划 | 田晓旭 IT 团队已经明确意识到对基础设施进行监控的必要性.目前市面上存在着大量适用于传统基础设施且历史悠久的解决方案:Nagios.Zabbix 等 ...

  3. Prometheus-使用Prometheus监控Kubernetes集群

      Prometheus是一个集数据收集存储.数据查询和数据图表显示于一身的开源监控组件.本文主要讲解如何搭建Prometheus,并使用它监控Kubernetes集群. 准备工作 Kubernete ...

  4. 使用Prometheus监控kubernetes集群

    一键安装(网络可访问quay.io): kubectl apply --filename https://raw.githubusercontent.com/giantswarm/kubernetes ...

  5. kubernetes集群搭建Zabbix监控平台

    kubernetes集群搭建Zabbix监控平台 一.zabbix介绍 1.zabbix简介 2.zabbix特点 3.zabbix的主要功能 4.zabbix架构图 二.检查本地k8s环境 1.检查 ...

  6. 巧用 Prometheus 监控 Kubernetes 集群所有组件的证书

    KubeSphere 虽然提供了运维友好的向导式操作界面,简化了 Kubernetes 的运维操作,但它还是建立在底层 Kubernetes 之上的,Kubernetes 默认的证书有效期都是一年,即 ...

  7. 三种监控 Kubernetes 集群证书过期方案

    公众号关注 「奇妙的 Linux 世界」 设为「星标」,每天带你玩转 Linux ! 前言 Kubernetes 中大量用到了证书, 比如 ca证书.以及 kubelet.apiserver.prox ...

  8. Kubernetes 集群和应用监控方案的设计与实践

    Kubernetes 监控 当你的应用部署到 Kubenetes 后,你很难看到容器内部发生了什么,一旦容器死掉,里面的数据可能就永远无法恢复,甚至无法查看日志以定位问题所在,何况一个应用可能存在很多 ...

  9. 如何专业化监控一个Kubernetes集群?

    简介:本文会介绍 Kubernetes 可观测性系统的构建,以及基于阿里云云产品实现 Kubernetes 可观测系统构建的最佳实践. 作者:佳旭 阿里云容器服务技术专家 引言 Kubernetes ...

最新文章

  1. 线程工具类(根据电脑逻辑处理器个数控制同时运行的线程个数)
  2. 电子商务系统的设计与实现(十二):技术选型
  3. 大厂抢夺冬奥会“第二赛场”
  4. 太太丘舍去_过中不至,太丘舍去,去后乃至的意思
  5. THUSCH 2017 大魔法师(矩阵乘法+线段树)
  6. 保存多序列tiff文件_解码TIFF文件
  7. 借条已经收回,他以没收据为由让我继续还款怎样办?
  8. 国家自然科学基金申请书写作攻略
  9. matlab qpsk代码 博客,完整版QPSK调制原理及matlab程序实现
  10. 有道词典pc离线包打包下载_【超福利】安卓手机上最好用的离线词典
  11. python联合vrep_vrep-python 控制方法
  12. 中兴新支点操作系统_中兴新支点系统预装测试
  13. 友善之臂mini2440使用日志1
  14. fiddler mac教程_Mac os 安装fiddler
  15. Unity 编辑器下运行没有声音
  16. 重启服务器上的MYSQL
  17. 多目标应用:多目标蜣螂优化算法求解多旅行商问题(Multiple Traveling Salesman Problem, MTSP)
  18. java110 RedPacket 红包系统安装与使用
  19. PDF文件转DWG文件用CAD转换器可以操作吗?
  20. 【转载】CRC32校验算法C语言版(查表法)

热门文章

  1. 西门子PLC各个通信协议解析,分析
  2. Opencv 霍夫变换 霍夫圆检测
  3. 技术交流群加入方式开放
  4. 第一讲-tensorflow搭建完整的神经网络步骤(附完整代码)
  5. matlab deconv出现无穷大,deconv(matlab的deconv函数)
  6. Himall商城信任登录用户信息
  7. uni-app微信小程序开通流量主图解
  8. WebShell攻击相关概念
  9. 200字学会辗转相除法原理详解
  10. 在Vue中使用HappyPack