

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node AllReduceGrads/NcclAllReduce (defined at /home/zw/anaconda3/envs/tf_models/lib/python3.7/site-packages/tensorpack/graph_builder/utils.py:160) with these attrs: [reduction="sum", shared_name="c0", T=DT_FLOAT, num_devices=2]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:device='GPU'[[AllReduceGrads/NcclAllReduce]]Errors may have originated from an input operation.
Input Source operations connected to node AllReduceGrads/NcclAllReduce:tower0/gradients/AddN_373 (defined at /home/zw/anaconda3/envs/tf_models/lib/python3.7/site-packages/tensorpack/train/tower.py:276)
terminate called without an active exception
terminate called recursively
terminate called recursively
*** Received signal 6 ***
Aborted (core dumped)



  • 系统:Ubuntu16.04
  • cuda版本:10.1
  • cudnn版本:8.0.2
  • tensorflow-gpu:1.14.0



2020-08-14 13:58:07.324004: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2020-08-14 13:58:07.324109: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2020-08-14 13:58:07.324205: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2020-08-14 13:58:07.324311: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2020-08-14 13:58:07.324415: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2020-08-14 13:58:07.324508: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2020-08-14 13:58:07.324599: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2020-08-14 13:58:07.324614: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-08-14 13:58:07.324666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:




