W0415 16:40:54.818552 17125 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7
OSError: (External) Cudnn error, CUDNN_STATUS_INTERNAL_ERROR (at /paddle/paddle/fluid/platform/dev
W0418 11:28:11.296656 1805 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.2, Runtime API Version: 10.2
W0418 11:28:11.302722 1805 device_context.cc:372] device: 0, cuDNN Version: 8.1.
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
/tmp/ipykernel_1805/2380467330.py in <module>
7 audio_len = paddle.to_tensor(audio_len)
8 audio_feature = paddle.to_tensor(audio_feature_i, dtype='float32')
----> 9 audio_feature = paddle.unsqueeze(audio_feature, axis=0)
10 # print(f"shape: {audio_feature.shape}")
11
/opt/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/tensor/manipulation.py in unsqueeze(x, axis, name)
769 """
770
--> 771 return layers.unsqueeze(x, axis, name)
772
773
/opt/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/nn.py in unsqueeze(input, axes, name)
6317 for item in axes
6318 ]
-> 6319 out, _ = core.ops.unsqueeze2(input, 'axes', axes)
6320 return out
6321
OSError: (External) Cudnn error, CUDNN_STATUS_INTERNAL_ERROR (at /paddle/paddle/fluid/platform/device_context.h:255)
[operator < unsqueeze2 > error]
解决问题:
原因是第一块显卡有别人的程序在跑,但是为什么非要是一张空卡,才能成功?因为run_check里需要跑一个简单的网络来验证安装的正确性,单卡验证默认是在第0号卡上执行,至少需要预留一些显存来验证网络的执行情况。
参考
Python 3.7.5
paddlepaddle-gpu 2.0.0
cuda 11.2
cudnn 8.1.1
torch 1.10.2
要求:
CUDA 工具包11.2配合cuDNN v8.1.1(如需多卡支持,需配合NCCL2.7及更高)
查看cuda版本
nvcc --version
版本为11.2
查看cudnn版本
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
使用pytorch查看cuda和cudnn版本
import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())
pytorch的版本存在不一致的问题,后续再跟进