worker节点calico无法启动定位分析
问题现象
worker节点部署的calico-node 无法拉起,反复启动,日志信息如下
kubectl logs -f calico-node-hv4sf -nkube-system
2020-12-02 13:20:13.067 [INFO][8] startup.go 259: Early log level set to info
2020-12-02 13:20:13.067 [INFO][8] startup.go 275: Using NODENAME environment for node name
2020-12-02 13:20:13.067 [INFO][8] startup.go 287: Determined node name: xxx-work-1
2020-12-02 13:20:13.068 [INFO][8] k8s.go 228: Using Calico IPAM
2020-12-02 13:20:13.069 [INFO][8] startup.go 319: Checking datastore connection
2020-12-02 13:20:16.075 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: no route to host
2020-12-02 13:20:19.081 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: no route to host
2020-12-02 13:20:23.087 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: no route to host
2020-12-02 13:20:27.095 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: no route to host
问题定位
master集群双网络 172.31.0.0/16 10.0.0.0/24 worker单网络 172.31.0.0/16
私网 | 管理网 | |
---|---|---|
master-1 | 172.31.0.26 | 10.0.0.77 |
master-2 | 172.31.0.26 | 10.0.0.128 |
master-3 | 172.31.0.26 | 10.0.0.154 |
worker-1 | 172.31.0.23 | - |
worker-2 | 172.31.0.8 | - |
从worker节点 curl 10.96.0.1不通
apiserver svc 信息
kubectl get svc -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 29h <none>
endpoint 走的是10.0.0.0/xx 网络,因为k8s api-server 默认绑的是有默认网关的网卡
集群信息
kubectl cluster-info
Kubernetes master is running at https://k8s-cluster-ins-0029-master-vip.service.consul:6443
KubeDNS is running at https://k8s-cluster-ins-0029-master-vip.service.consul:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxyTo further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
kube-apiserver svc 信息
# kubectl get svc kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 25h
# kubectl get ep kubernetes
NAME ENDPOINTS AGE
kubernetes 10.0.0.133:6443,10.0.0.32:6443,10.0.0.50:6443 25h
查看worker信息
# iptables -t nat -nL
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
cali-PREROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCALChain INPUT (policy ACCEPT)
target prot opt source destination Chain OUTPUT (policy ACCEPT)
target prot opt source destination
cali-OUTPUT all -- 0.0.0.0/0 0.0.0.0/0 /* cali:tVnHkvAo15HuiPy0 */
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCALChain POSTROUTING (policy ACCEPT)
target prot opt source destination
cali-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* cali:O3lYWMrLQYEMJtB5 */
KUBE-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */
MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 Chain DOCKER (2 references)
target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0 Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination Chain KUBE-MARK-DROP (0 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK or 0x8000Chain KUBE-MARK-MASQ (15 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK or 0x4000Chain KUBE-NODEPORTS (1 references)
target prot opt source destination Chain KUBE-POSTROUTING (1 references)
target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0 mark match ! 0x4000/0x4000
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK xor 0x4000
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */Chain KUBE-PROXY-CANARY (0 references)
target prot opt source destination Chain KUBE-SEP-6AVEXVWMTAUJHVS6 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.131 0.0.0.0/0
DNAT udp -- 0.0.0.0/0 0.0.0.0/0 udp to:10.244.149.131:53Chain KUBE-SEP-AJAH3OWF36MHDVF7 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.131 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.149.131:9153Chain KUBE-SEP-F4NUFHPP6MV3U2FB (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.130 0.0.0.0/0
DNAT udp -- 0.0.0.0/0 0.0.0.0/0 udp to:10.244.149.130:53Chain KUBE-SEP-ILEHVTEL5AKI6EAE (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.130 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.149.130:9153Chain KUBE-SEP-N4P2JU5RW7IWUD2Z (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.229.131 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.229.131:3443Chain KUBE-SEP-ORF6FH7KUHVWJER7 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.130 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.149.130:53Chain KUBE-SEP-UMPZ2SD2APVNR4IN (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.149.131 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.244.149.131:53Chain KUBE-SEP-6JBY7EOKHF37VPAE (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.50 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.50:6443Chain KUBE-SEP-VW6RD437TCEB4BL4 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.133 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.133:6443Chain KUBE-SEP-ZCIWMPUBNREXOPRW (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.32 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.32:6443Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-MARK-MASQ udp -- !10.244.0.0/16 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
KUBE-SVC-TCOU7JCQXEZGVUNU udp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
KUBE-SVC-JD5MR3NA4I4DYORP tcp -- 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.107.150.63 /* orch-operator-system/orch-operator-webhook-service: cluster IP */ tcp dpt:3443
KUBE-SVC-6HOYT5WSPFV75AOP tcp -- 0.0.0.0/0 10.107.150.63 /* orch-operator-system/orch-operator-webhook-service: cluster IP */ tcp dpt:3443
KUBE-MARK-MASQ tcp -- !10.244.0.0/16 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- 0.0.0.0/0 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-NODEPORTS all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCALChain KUBE-SVC-6HOYT5WSPFV75AOP (1 references)
target prot opt source destination
KUBE-SEP-N4P2JU5RW7IWUD2Z all -- 0.0.0.0/0 0.0.0.0/0 Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
target prot opt source destination
KUBE-SEP-ORF6FH7KUHVWJER7 all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.50000000000
KUBE-SEP-UMPZ2SD2APVNR4IN all -- 0.0.0.0/0 0.0.0.0/0 Chain KUBE-SVC-JD5MR3NA4I4DYORP (1 references)
target prot opt source destination
KUBE-SEP-ILEHVTEL5AKI6EAE all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.50000000000
KUBE-SEP-AJAH3OWF36MHDVF7 all -- 0.0.0.0/0 0.0.0.0/0 Chain KUBE-SVC-NPX46M4PTMTKRN6Y (1 references)
target prot opt source destination
KUBE-SEP-VW6RD437TCEB4BL4 all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.33333333349
KUBE-SEP-ZCIWMPUBNREXOPRW all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.50000000000
KUBE-SEP-6JBY7EOKHF37VPAE all -- 0.0.0.0/0 0.0.0.0/0 Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references)
target prot opt source destination
KUBE-SEP-F4NUFHPP6MV3U2FB all -- 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.50000000000
KUBE-SEP-6AVEXVWMTAUJHVS6 all -- 0.0.0.0/0 0.0.0.0/0 Chain cali-OUTPUT (1 references)
target prot opt source destination
cali-fip-dnat all -- 0.0.0.0/0 0.0.0.0/0 /* cali:GBTAv2p5CwevEyJm */Chain cali-POSTROUTING (1 references)
target prot opt source destination
cali-fip-snat all -- 0.0.0.0/0 0.0.0.0/0 /* cali:Z-c7XtVd2Bq7s_hA */
cali-nat-outgoing all -- 0.0.0.0/0 0.0.0.0/0 /* cali:nYKhEzDlr11Jccal */
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* cali:SXWvdsbh4Mw7wOln */ ADDRTYPE match src-type !LOCAL limit-out ADDRTYPE match src-type LOCALChain cali-PREROUTING (1 references)
target prot opt source destination
cali-fip-dnat all -- 0.0.0.0/0 0.0.0.0/0 /* cali:r6XmIziWUJsdOK6Z */Chain cali-fip-dnat (2 references)
target prot opt source destination Chain cali-fip-snat (1 references)
target prot opt source destination Chain cali-nat-outgoing (1 references)
target prot opt source destination
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* cali:flqWnvo8yq4ULQLa */ match-set cali40masq-ipam-pools src ! match-set cali40all-ipam-pools dst
经过分析 10.96.0.1 会被转发到 tcp to:10.0.0.50:6443 tcp to:10.0.0.133:6443 tcp to:10.0.0.32:6443
而 worker网段为 172.31.0.0/xx ,因此10.96.0.1无法访问 通过kubectl get ep kubernetes也验证了apiserver服务是转发到了到了10.0.0.0/24 网段的pod上
Chain KUBE-SEP-6JBY7EOKHF37VPAE (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.50 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.50:6443Chain KUBE-SEP-VW6RD437TCEB4BL4 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.133 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.133:6443Chain KUBE-SEP-ZCIWMPUBNREXOPRW (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.0.0.32 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.0.0.32:6443
解决方案
k8s创建时指定网卡
master1节点配置
cat kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: v1.17.14
imageRepository: harbor.xxx.com/library/k8s.gcr.io
apiServer:timeoutForControlPlane: 4m0scertSANs:- k8s-cluster-ins-0029-master-1.service.consul- k8s-cluster-ins-0029-master-2.service.consul- k8s-cluster-ins-0029-master-3.service.consulcontrolPlaneEndpoint: k8s-cluster-ins-0029-master-vip.service.consul:6443 #此处为slb vip 端口
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
networking:dnsDomain: cluster.localserviceSubnet: 10.96.0.0/12podSubnet: 10.244.0.0/16 #podIP的范围
dns:type: CoreDNS
etcd:local:dataDir: /data/etcd---apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:advertiseAddress: 172.31.0.26 #设置master要绑定的IP
master 2,3 节点添加命令
join_cmd=`kubeadm token create --print-join-command
$join_cmd --control-plane --apiserver-advertise-address=${主机绑定的ip}
kubeadm token create --print-join-commandkubeadm join k8s-cluster-ins-0029-master-vip.service.consul:6443 --token g9stix.vvinbvdt83ndeyoc --discovery-token-ca-cert-hash sha256:e966b388406a6a04b78c04d1d2a62b4a6a50799c37c708e5fadf6fabb7481231
参考
生成参考配置
kubeadm config print init-defaults
kubeadm config print init-defaults apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:- system:bootstrappers:kubeadm:default-node-tokentoken: abcdef.0123456789abcdefttl: 24h0m0susages:- signing- authentication
kind: InitConfiguration
localAPIEndpoint:advertiseAddress: 1.2.3.4bindPort: 6443
nodeRegistration:criSocket: /var/run/dockershim.sockname: k8s-cluster-ins-0029-master-1taints:- effect: NoSchedulekey: node-role.kubernetes.io/master
---
apiServer:timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:type: CoreDNS
etcd:local:dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.17.0
networking:dnsDomain: cluster.localserviceSubnet: 10.96.0.0/12
scheduler: {}
获取集群kubeadm实际配置
kubectl get cm kubeadm-config -n kube-system -oyaml
apiVersion: v1
data:ClusterConfiguration: |apiServer:certSANs:- k8s-cluster-ins-0029-master-1.service.consul- k8s-cluster-ins-0029-master-2.service.consul- k8s-cluster-ins-0029-master-3.service.consulextraArgs:advertise-address: 172.31.0.26authorization-mode: Node,RBACtimeoutForControlPlane: 4m0sapiVersion: kubeadm.k8s.io/v1beta2certificatesDir: /etc/kubernetes/pkiclusterName: kubernetescontrolPlaneEndpoint: k8s-cluster-ins-0029-master-vip.service.consul:6443controllerManager: {}dns:type: CoreDNSetcd:local:dataDir: /data/etcdimageRepository: harbor.xxx.com/library/k8s.gcr.iokind: ClusterConfigurationkubernetesVersion: v1.17.14networking:dnsDomain: cluster.localpodSubnet: 10.244.0.0/16serviceSubnet: 10.96.0.0/12scheduler: {}ClusterStatus: |apiEndpoints:k8s-cluster-ins-0029-master-1:advertiseAddress: 10.0.0.77bindPort: 6443k8s-cluster-ins-0029-master-2:advertiseAddress: 10.0.0.128bindPort: 6443k8s-cluster-ins-0029-master-3:advertiseAddress: 10.0.0.154bindPort: 6443apiVersion: kubeadm.k8s.io/v1beta2kind: ClusterStatus
kind: ConfigMap
metadata:creationTimestamp: "2020-12-03T09:29:09Z"name: kubeadm-confignamespace: kube-systemresourceVersion: "665"selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-configuid: 78961de1-682a-49d0-8c8f-dc6d4e47ca04
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/
https://github.com/kubernetes/kubernetes/issues/33618
https://github.com/kubernetes/kubeadm/blob/master/docs/design/design_v1.9.md#optional-self-hosting
https://idig8.com/2019/08/08/zoujink8skubeadmdajian-kubernetes1-15-1jiqunhuanjing14/
https://feisky.gitbooks.io/kubernetes/content/troubleshooting/network.html
https://github.com/kubernetes/kubernetes/issues/33618
https://github.com/projectcalico/calico/issues/3092
https://github.com/projectcalico/calico/issues/2720
worker节点calico无法启动定位分析相关推荐
- Spark配置启动脚本分析
2019独角兽企业重金招聘Python工程师标准>>> 今天想停止spark集群,发现执行stop-all.sh的时候spark的相关进程都无法停止.提示: no org.apach ...
- javacore分析工具_线上死锁定位分析
" 记录一次线上死锁的定位分析." 昨晚睡觉前提了点代码到 jfoa(https://github.com/JavaFamilyClub/jfoa) 怎么也没想到导致 ...
- Bootm启动流程分析
Bootm启动流程分析 如何引导内核 uboot启动命令 内核镜像介绍 内核启动前提 Bootm命令详解 Bootm命令格式 do_bootm do_bootm_subcommand images全局 ...
- RISC-V Linux 启动流程分析
" Author: 通天塔 985400330@qq.com Date: 2022/05/15 Revisor: lzufalcon falcon@tinylab.org Proje ...
- centos7.6使用kubeadm安装kubernetes的master worker节点笔记及遇到的坑
个人博客原文地址:http://www.lampnick.com/php/760 本文目标 安装docker及设置docker代理 安装kubeadm 使用kubeadm初始化k8s Master节点 ...
- 软件测试的问题定位分析思路
定位分析思路 软件开发流程 一,前言 避免被开发忽悠,节省与开发扯皮 测试人反馈一个bug之后,开发(前端/后端)的回应? 能不能复现?有没有验证?再测一遍?脏数据?刷新一下?浏览器的问题?清一下缓存 ...
- 解析并符号 读取dll_Spring IOC容器之XmlBeanFactory启动流程分析和源码解析
一. 前言 Spring容器主要分为两类BeanFactory和ApplicationContext,后者是基于前者的功能扩展,也就是一个基础容器和一个高级容器的区别.本篇就以BeanFactory基 ...
- 购物中心定位分析、调整方案及租金建议
商业调整从来都是一个不变的命题,对购物中心而言,调整也是保持购物中心最佳经营业绩和持续竞争优势的重要措施. 尽管购物中心调整的终极目的是租金收益的提升,但商业品质的提升也是资产增值的重要体现:而且只有 ...
- Kubelet源码分析(一):启动流程分析
源码版本 kubernetes version: v1.3.0 简介 在Kubernetes急群众,在每个Node节点上都会启动一个kubelet服务进程.该进程用于处理Master节点下发到本节点的 ...
最新文章
- deeplearning模型库
- linux内核中的数据结构
- iOS GCD, 同步,异步,串行队列,并行队列,dispatch_group
- visual studio 解决方案项目结构部署和配置
- FPGA跨时钟域异步时钟设计的几种同步策略
- 雷林鹏分享:jQuery EasyUI 数据网格 - 自定义排序
- springboot集成log4j
- css怎么去掉字体样式,css怎么去掉字体粗体样式
- vue项目本地服务器调用豆瓣接口,vue调用豆瓣API加载图片403问题
- 二级C语言考前学习资料(机试)及C语言程序二十四种大题题型
- 直流电机正反转驱动电路板
- excel冻结窗口_怎么设置excel2007冻结窗口
- 空洞卷积的使用增大感受野
- python 多线程 代理 爬取 豆果美食app
- mysql数据倾斜_Hive SQL 数据倾斜总结
- 2018大数据就业前景怎么样
- 【万字长文】——作者底层逻辑辨析【自组织场景宣言】,拉开未来序幕!
- 入门UI设计要学习什么内容?
- jpa查询表的部分字段
- php伪协议实现命令执行,任意文件读取