@【备忘】Paddle安装后无法找到libcudart.so文件
问题
- 环境
使用Anaconda进行虚拟环境划分,新建一个虚拟环境进行Paddle的调试,虚拟环境下的安装包如下:
# packages in environment at /mnt/dm0/anaconda3/envs/paddle_py310_cu117:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
astor 0.8.1 pyh9f0ad1d_0 conda-forge
brotlipy 0.7.0 py310h5764c6d_1005 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2022.12.7 ha878542_0 conda-forge
certifi 2022.12.7 pyhd8ed1ab_0 conda-forge
cffi 1.15.1 py310h255011f_3 conda-forge
charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge
cryptography 39.0.0 py310h65dfdc0_0 conda-forge
cudatoolkit 11.7.0 hd8887f6_11 conda-forge
cudnn 8.4.1.50 hed8a83a_0 conda-forge
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
freetype 2.12.1 hca18f0e_1 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
lcms2 2.15 haa2dc70_1 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libblas 3.9.0 16_linux64_openblas conda-forge
libcblas 3.9.0 16_linux64_openblas conda-forge
libdeflate 1.17 h0b41bf4_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.2.0 h65d4601_19 conda-forge
libgfortran-ng 12.2.0 h69a702a_19 conda-forge
libgfortran5 12.2.0 h337968e_19 conda-forge
libgomp 12.2.0 h65d4601_19 conda-forge
libjpeg-turbo 2.1.5.1 h0b41bf4_0 conda-forge
liblapack 3.9.0 16_linux64_openblas conda-forge
libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge
libpng 1.6.39 h753d276_0 conda-forge
libprotobuf 3.20.0 h6239696_0 conda-forge
libstdcxx-ng 12.2.0 h46fd767_19 conda-forge
libtiff 4.5.0 hddfeb54_5 conda-forge
libuuid 1.41.5 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libwebp-base 1.3.0 h0b41bf4_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libzlib 1.2.13 h166bdaf_4 conda-forge
ncurses 6.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy 1.24.2 py310h8deb116_0 conda-forge
openjpeg 2.5.0 hfec8fc6_2 conda-forge
openssl 1.1.1t h0b41bf4_0 conda-forge
opt_einsum 3.3.0 pyhd8ed1ab_1 conda-forge
paddlepaddle-gpu 2.4.2.post117 pypi_0 pypi
pillow 9.4.0 py310h065c6d2_2 conda-forge
pip 23.0.1 pyhd8ed1ab_0 conda-forge
protobuf 3.20.0 py310hd8f1fbe_5 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pyopenssl 23.0.0 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.10.10 h7a1cb2a_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python_abi 3.10 2_cp310 conda-forge
readline 8.2 h8228510_1 conda-forge
requests 2.28.2 pyhd8ed1ab_0 conda-forge
setuptools 67.6.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sqlite 3.41.1 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tk 8.6.12 h27826a3_0 conda-forge
tzdata 2022g h191b570_0 conda-forge
urllib3 1.26.15 pyhd8ed1ab_0 conda-forge
wheel 0.40.0 pyhd8ed1ab_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xz 5.2.10 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zlib 1.2.13 h166bdaf_4 conda-forge
zstd 1.5.2 h3eb15da_6 conda-forge
- 问题
使用PaddlePaddle官网的指令进行安装
conda install paddlepaddle-gpu==2.4.2 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
按照官网要求验证是否安装成功,有warning,提示某些包即将不被支持,但不影响使用,没问题,继续。
(paddle_py310_cu117) Super-Server:~$ python
Python 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
/mnt/dm0/anaconda3/envs/paddle_py310_cu117/lib/python3.10/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)
/mnt/dm0/anaconda3/envs/paddle_py310_cu117/lib/python3.10/site-packages/pkg_resources/__init__.py:2870: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
declare_namespace(pkg)
验证paddle是否有问题时,报错。
>>> paddle.utils.run_check()
## 错误纪录没有留下来,但清晰记得是找不到libcudart.so文件
事实上,安装cudatoolkit和cudnn的时候,这些库文件都已经装好了。
(paddle_py310_cu117) ubuntu-Super-Server:/mnt/dm0/anaconda3/envs/paddle_py310_cu117$ find ./ -name "libcudart.so"
./lib/libcudart.so
## 文件存在,绝对路径是:/mnt/dm0/anaconda3/envs/paddle_py310_cu117/lib/libcudart.so
解决
- 常规解决
对于找不到库文件的常规做法其实也很简单,就是在LD_LIBRARY_PATH将路径添加进去即可。
但由于机器上装了多个CUDA版本,我仅希望在特定的Conda虚拟环境下添加特定的LD_LIBRARY_PATH路径,以实现虚拟环境下不同CUDA版本的使用,不会互相干扰, 所以需要想点办法。如果只有一个CUDA版本,可以直接修改用户目录下的.bashrc,如下:
#第一步编辑你的~/.bashrc
sudo nano ~/.bashrc
#第二步添加环境变量
export LD_LIBRARY_PATH=<Your-Lib-Path>:$LD_LIBRARY_PATH
#第三步使能新的.bashrc
source ~/.bashrc
下面内容可以跳过了。
区分虚拟环境加载不同的库
每个conda环境在激活的时候,其实都可以执行一些自定义启动脚本,只不过我们需要把这些脚本放在一些特定的目录中,而这些特定的目录可能并不会随着虚拟环境的创建而自动生成,需要我们手工添加。
- 在虚拟环境下添加特定的目录
cd <Your-Conda-env-path> # 在我这个例子里就是/mnt/dm0/anaconda3/envs/paddle_py310_cu117/
mkdir -p etc/conda/activate.d # 即在你的某个conda虚拟环境目录下创建三层子目录,分别为etc、conda和activate.d
- 写一个自定义脚本
#以下操作假设你仍然在<Your-Conda-env-path>目录上,
#第一步,建一个启动脚本,名字无所谓
nano ./etc/conda/activate.d/<your-sh-name>.sh # 建一个你希望的脚本名称
#第二步,添加环境变量的常规操作
export LD_LIBRARY_PATH=<Your-Lib-Path>:$LD_LIBRARY_PATH
# <Your-Lib-Path>看你的cudatoolkit和cudnn安装在哪个目录,因为我是在虚拟环境下安装的,所以都会纳入虚拟环境下统一管理。
# 因此对于本例来说:<Your-Lib-Path> = <Your-Conda-env-path>/lib
- 给脚本赋予执行权限
#第三步,给脚本赋执行权限
sudo chmod +x ./etc/conda/activate.d/<your-sh-name>.sh
#第四步,重新激活你的环境并验证
conda deactivate
conda activate <your-conda-env-name>
echo $LD_LIBRARY_PATH #查看是否已经添加了你的环境变量
第四步在重新激活该虚拟环境就可以看到LD_LIBRARY_PATH被自动加到环境变量中了。
验证
# 进入Python(输入python)
>>> import paddle
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
W0323 11:40:15.062598 167212 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.7
W0323 11:40:15.069036 167212 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.