判断CUDA和CUDNN是否安装成功并正常使用

最新推荐文章于 2024-12-31 10:56:58 发布

DvLee1024

最新推荐文章于 2024-12-31 10:56:58 发布

阅读量3.4w

点赞数 4

分类专栏： AI 文章标签： cuda cudnn

点击访问个人博客，微信dvlee1024

本文链接：https://blog.csdn.net/killfunst/article/details/109344966

版权

AI 专栏收录该内容

13 篇文章

订阅专栏

👉 原文地址 : https://blog.hidavid.cn/cuda-cudnn-install-successful/

最近又捡起YOLOv3来练练手，检测医学B超图像。

重新搭建环境，由于网速时快时慢，搭建起来相当痛苦，最终还是搭建完成了。

下面分享下如何判断CUDA是否正常使用。

正文

一、判断安装情况

首先是判断cuda是否安装成功。
一般安装路径为/usr/local/cuda
使用nvcc -v命令可以输出cuda版本

然后是判断cudnn，这个库安装很简单，只需把cudnn的include和lib64里面的文件拷到cuda相应目录即可，所以判断是否安装的方式是，到cuda的include和lib64，用ls | grep cudnn 命令查看是否有cudnn相关的文件。

二、判断是否正常使用CUDA

判断方式很多，我以使用tensorflow为例。

在启动tensorflow的时候，会有下面的log，这能看出来cuda和cudnn的库都顺利加载进来了。

2020-10-28 13:23:32.729663: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-10-28 13:23:32.732189: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-10-28 13:23:32.734612: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-10-28 13:23:32.735064: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-10-28 13:23:32.737413: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-10-28 13:23:32.739131: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-10-28 13:23:32.744086: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-10-28 13:23:32.754026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0, 1
2020-10-28 13:23:32.754078: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-10-28 13:23:32.759785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-28 13:23:32.759804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1
2020-10-28 13:23:32.759842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y
2020-10-28 13:23:32.759868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N
2020-10-28 13:23:32.767512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30265 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8b:00.0, compute capability: 7.0)
2020-10-28 13:23:32.770834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30265 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8d:00.0, compute capability: 7.0)
2020-10-28 13:24:08.051764: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7

在训练过程，也可以通过观察CPU和GPU的使用情况来判断。

比如输入top指令，可以试试查看cpu和mem的使用情况，可看出cpu使用率还挺高的，由于多核的原因，使用率超过100%了。我在没使用GPU训练的时候，没记错的话cpu占用率接近800%。

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  5374 root      20   0 65.426g 6.152g 1.499g S 226.6  1.2  27:42.84 python

然后是nvidia-smi命令，查看gpu使用情况，下表可看出gpu显存的使用率为76%，那就表示GPU正被使用了。当GPU显存使用率接近100%，tensorflow就会蹦了，此时一般要降低batchsize来处理。

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:8B:00.0 Off |                    0 |
| N/A   58C    P0   160W / 300W |  31361MiB / 32480MiB |     76%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:8D:00.0 Off |                    0 |
| N/A   45C    P0    59W / 300W |    625MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|