llama.cpp无法使用gpu的问题

米有哥

于 2025-05-10 11:36:34 发布

阅读量629

点赞数 10

CC 4.0 BY-SA版权

文章标签： llama

本文链接：https://blog.csdn.net/MichaelPengCN/article/details/147849888

使用cuda编译llama.cpp后，仍然无法使用gpu。

./llama-server -m ../../../../../model/hf_models/qwen/qwen3-4b-q8_0.gguf -ngl 40

报错如下

ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
warning: no usable GPU found, --gpu-layers option will be ignored
warning: one possible reason is that llama.cpp was compiled without GPU support
warning: consult docs/build.md for compilation instructions

使用nvidia-smi

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.144

重启即可解决问题

./llama-server -m ../../../../../model/hf_models/qwen/qwen3-4b-q8_0.gguf -ngl 40
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1660 Ti, compute capability 7.5, VMM: yes
...

load_tensors: offloading 36 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 37/37 layers to GPU
load_tensors: CUDA0 model buffer size = 4076.43 MiB
load_tensors: CPU_Mapped model buffer size = 394.12 MiB