
如题,起因是在阿里云GPU服务器上,使用原先正常运行的镜像生成了容器,但容器的显卡驱动出问题了,使用nvidia-smi命令会报错 NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver.

尝试使用官网.run文件重新安装显卡驱动会报错ERROR: Unable to load the kernel module ‘nvidia.ko’. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.


放弃了原先的镜像,新建了空的容器,但是空的容器也会报NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver的错,并且空的容器居然也装不上显卡驱动,遂怀疑是容器本身的问题。





