Error: “MountVolume.SetUp failed for volume pvc 故障处理
文章目录
- 故障描述
- 排查思路
- 1.尝试重启Pod
- 2.查看pod events事件
- 3.查看kubelet日志
- 4.检查pvc与pv资源对象
- 5.检查磁盘挂载
- 解决方案
故障描述
内部环境收到Pod异常告警
[Alerting] Pod 状态告警
集群中存在 Pod 处于异常状态超过 1 分钟1. ti-inf/etcd-1 (Pending): 1.000
详请链接, http://xx.xx.xx.xx/grafana/d/default/alert-dashboard?tab=alert&viewPanel=19&orgId=1
查看k8s集群中异常Pod,发现为数据组件pod
排查思路
1.尝试重启Pod
~]# kubectl delete pod etcd-1 -nti-inf
发现还是处于异常状态。
2.查看pod events事件
~]# kubectl describe pod redis-server-2 -nti-inf
Events:Type Reason Age From Message---- ------ ---- ---- -------Normal Scheduled 28m volcano Successfully assigned ti-inf/redis-server-2 to x.x.x.xWarning FailedMount 3m17s (x3599 over 28m) kubelet MountVolume.SetUp failed for volume "pvc-9d1c0e76-6d56-439d-8070-741d8846d569" : rpc error: code = Internal desc = stat /csi-data-dir/ti-database/pv: input/output error
从events事件中可以看到,kubelet程序在MountVolume这一步骤Failed,暴露出来的信息为“pvc input/output error”
3.查看kubelet日志
[root@VM-2-29-centos prometheus-db]# grep -i error /var/log/messages| tail -n 5
Jun 28 20:14:13 VM-2-29-centos kubelet: E0628 20:14:13.819828 793997 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-668750fa-cc0a-4105-96f3-7fa184db4ada podName: nodeName:}" failed. No retries permitted until 2022-06-28 20:14:14.319804053 +0800 CST m=+11760883.388055363 (durationBeforeRetry 500ms). Error: "MountVolume.SetUp failed for volume \"pvc-668750fa-cc0a-4105-96f3-7fa184db4ada\" (UniqueName: \"kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-668750fa-cc0a-4105-96f3-7fa184db4ada\") pod \"etcd-1\" (UID: \"1c99773c-3845-4141-ac30-1c3d26f1f30a\") : rpc error: code = Internal desc = stat /csi-data-dir/ti-database/pv: input/output error"
Jun 28 20:14:13 VM-2-29-centos kubelet: E0628 20:14:13.901519 793997 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-668750fa-cc0a-4105-96f3-7fa184db4ada podName:4c5d9bdf-498a-4456-9c6c-e6f7b456e693 nodeName:}" failed. No retries permitted until 2022-06-28 20:14:14.401482582 +0800 CST m=+11760883.469733942 (durationBeforeRetry 500ms). Error: "UnmountVolume.TearDown failed for volume \"data\" (UniqueName: \"kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-668750fa-cc0a-4105-96f3-7fa184db4ada\") pod \"4c5d9bdf-498a-4456-9c6c-e6f7b456e693\" (UID: \"4c5d9bdf-498a-4456-9c6c-e6f7b456e693\") : kubernetes.io/csi: mounter.TearDownAt failed: rpc error: code = Internal desc = stat /var/lib/kubelet/pods/4c5d9bdf-498a-4456-9c6c-e6f7b456e693/volumes/kubernetes.io~csi/pvc-668750fa-cc0a-4105-96f3-7fa184db4ada/mount: input/output error"
Jun 28 20:14:14 VM-2-29-centos kubelet: E0628 20:14:14.018249 793997 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-9d1c0e76-6d56-439d-8070-741d8846d569 podName: nodeName:}" failed. No retries permitted until 2022-06-28 20:14:14.518217097 +0800 CST m=+11760883.586468437 (durationBeforeRetry 500ms). Error: "MountVolume.SetUp failed for volume \"pvc-9d1c0e76-6d56-439d-8070-741d8846d569\" (UniqueName: \"kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-9d1c0e76-6d56-439d-8070-741d8846d569\") pod \"redis-server-2\" (UID: \"5550e257-2245-4401-bd9a-cf275ff94675\") : rpc error: code = Internal desc = stat /csi-data-dir/ti-database/pv: input/output error"
Jun 28 20:14:14 VM-2-29-centos kubelet: E0628 20:14:14.102735 793997 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-9d1c0e76-6d56-439d-8070-741d8846d569 podName:daea4ba4-b97c-46c6-866b-aa7cc29af0a8 nodeName:}" failed. No retries permitted until 2022-06-28 20:14:14.602692068 +0800 CST m=+11760883.670943428 (durationBeforeRetry 500ms). Error: "UnmountVolume.TearDown failed for volume \"data\" (UniqueName: \"kubernetes.io/csi/loopdevice.csi.infra.ti.io^pvc-9d1c0e76-6d56-439d-8070-741d8846d569\") pod \"daea4ba4-b97c-46c6-866b-aa7cc29af0a8\" (UID: \"daea4ba4-b97c-46c6-866b-aa7cc29af0a8\") : kubernetes.io/csi: mounter.TearDownAt failed: rpc error: code = Internal desc = stat /var/lib/kubelet/pods/daea4ba4-b97c-46c6-866b-aa7cc29af0a8/volumes/kubernetes.io~csi/pvc-9d1c0e76-6d56-439d-8070-741d8846d569/mount: input/output error"经过日志分析可以看到是磁盘出现了部分阻塞,出现以上大量报错信息。
4.检查pvc与pv资源对象
[root@VM-2-29-centos ~]# kubectl get pvc -nti-inf |grep redis
data-redis-server-0 Bound pvc-59fde781-e03e-4b26-b07c-7de93f608395 10Gi RWO csi-localpv-tidb 136d
data-redis-server-1 Bound pvc-6bf28ec2-40e1-4b52-8d54-b4ab0aa9f67a 10Gi RWO csi-localpv-tidb 136d
data-redis-server-2 Bound pvc-9d1c0e76-6d56-439d-8070-741d8846d569 10Gi RWO csi-localpv-tidb 136d
[root@VM-2-29-centos ~]#
[root@VM-2-29-centos ~]# kubectl get pv |grep redis
pvc-59fde781-e03e-4b26-b07c-7de93f608395 10Gi RWO Delete Bound ti-inf/data-redis-server-0 csi-localpv-tidb 136d
pvc-6bf28ec2-40e1-4b52-8d54-b4ab0aa9f67a 10Gi RWO Delete Bound ti-inf/data-redis-server-1 csi-localpv-tidb 136d
pvc-9d1c0e76-6d56-439d-8070-741d8846d569 10Gi RWO Delete Bound ti-inf/data-redis-server-2 csi-localpv-tidb 136dpvc与pv资源均正常。
5.检查磁盘挂载
dmesg(display message) [or display driver],即看内核相关信息
[二 6月 28 20:22:47 2022] buffer_io_error: 6 callbacks suppressed
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971392, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971393, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971394, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971395, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971396, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971397, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971398, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop0, logical block 20971399, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop4, logical block 20971392, async page read
[二 6月 28 20:22:47 2022] Buffer I/O error on dev loop4, logical block 20971393, async page read
因pvc对应磁盘为/dev/vdc,而且系统做了lvm逻辑卷,显然是逻辑卷故障了
通过系统终端查询此目录,已经无法正常访问
~]# ls /data/ti-database
ls: 无法访问/data/ti-database: 输入/输出错误说明:缓冲区 I/O 错误,逻辑块20971393 异步页面读取失败
解决方案
因平台数据组件(etcd/redis/es)均为3个副本,可容忍单点故障,并且此逻辑卷在起初规划设计时只给数据组件使用,所以对其他服务没有影响,只需要重新制作lvm逻辑卷即可。
详细操作流程:
1、mysql/etcd/es 数据备份
2、卸载逻辑卷挂载
3、使用lvremove删除逻辑卷LV
4、使用vgremove删除卷组VG
5、使用pvremove删除物理卷设备
在上述操作执行完毕之后,再执行 lvdisplay、vgdisplay、pvdisplay 命令来查看 LVM 的信息时就不会再看到信息了
6、删除此节点pv与pvc
7、重新制作lvm逻辑卷并进行挂载
8、创建pv、pvc资源对象,与Pod进行关联绑定
9、验证Pod状态
10、检查redis与etcd组件集群健康状态,及数据一致性校验
参考资料:
https://github.com/longhorn/longhorn/issues/1210
https://developer.aliyun.com/article/521158
Error: “MountVolume.SetUp failed for volume pvc 故障处理相关推荐
- pod一直处于ContainerCreating,查看报错信息为挂载错误MountVolume.SetUp failed for volume
背景,在搭建redis集群时,使用的是nfs挂载卷,中途我好像把挂载盘的文件移走了,当我再次启动pod时就出现挂载错误. [root@master redis-cluster-sts]# kubect ...
- 解决argo workflow报错:MountVolume.SetUp failed for volume “docker-sock“ : hostPath type check failed
提交workflow时报错: MountVolume.SetUp failed for volume "docker-sock" : hostPath type check fai ...
- MountVolume.MountDevice failed for volume “pvc“ ...问题解决
一.问题描述 Warning FailedMount 44s (x2 over 108s) kubelet MountVolume.MountDevice failed for volume &quo ...
- MountVolume.NewMounter initialization failed for volume “pvc-61dedc85-ea5a-4ac7-aaf3-e072e2e46e18“
报错 本地测试环境k8s重启后,stateful set报错了 # 报错信息 MountVolume.NewMounter initialization failed for volume " ...
- repo sync error.GitError: manifests rev-list : fatal: revision walk setup failed
更新代码是repo sync 出错:error.GitError: manifests rev-list ('^HEAD', u'a78728c68089372c3ce03a76f10143d7a5d ...
- pip install nmslib 失败 (error: command ‘x86_64-linux-gnu-gcc‘ failed with exit status 1)
1. 问题现象 使用 pip 安装 nmslib 命令时出现如下错误: sudo pip install nmslib ....ERROR: Complete output from command ...
- python mysql gcc_MySQL-python “error: command 'gcc' failed with exit status 1”错误
安装MySQL-python-1.2.3c1出现"error: command 'gcc' failed with exit status 1"错误 具体报错信息如下: _mysq ...
- 安装MySQL-python报错 error: command 'gcc' failed with exit status 1解决方法
错误如: _mysql.c:2331: error: '_mysql_ConnectionObject' has no member named 'open' _mysql.c:2338: error ...
- pycuda installation error: command 'gcc' failed with exit status 1
原文:python采坑之路 Setup script exited with error: command 'gcc' failed with exit status 1 伴随出现"cuda ...
最新文章
- Flutter开发之iOS真机调试(六)
- Android 面试题(转)
- 【Netty】Netty 核心组件 ( ServerBootstrap | Bootstrap )
- python中for循环语句格式_Python基础-10循环语句
- OpenCV通过填充修复损坏的图像的实例(附完整代码)
- ubuntu MySQL安装指南
- 亚马逊招聘,无人超市研发部门
- 马云:格局不够大,人生成就再高也有限!
- windows10系统右键新建菜单的自定义
- Confluence 6 示例 - https://confluence.atlassian.com/
- 老牌语言依然强势,GO、Kotlin 等新语言为何不能破局?
- Integrated Security = True和Integrated Security = SSPI有什么区别?
- 【服务器托管单线、双线以及多线如何区别】
- ASP.NET DATETIME
- 第一序列任小粟的能力_《第一序列》陈无敌刚烈正义,自封大圣,可任小粟做不得慈悲唐僧...
- github支持php_github怎么使用
- 计算机网络管理员绩效考核,网络工程师专业考核方案
- 深入理解git内部原理
- 俄罗斯最大银行宣布加入区块链联盟…
- Linux统计文件行数的几种方法