Name and Version
ggml_cuda_init: found 3 CUDA devices (Total VRAM: 72354 MiB):
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes, VRAM: 24114 MiB
Device 1: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes, VRAM: 24114 MiB
Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB
load_backend: loaded CUDA backend from /app/libggml-cuda.so
load_backend: loaded CPU backend from /app/libggml-cpu-zen4.so
version: 8768 (1e9d771)
built with GNU 14.2.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
CPU: AMD Ryzen 9 9950X3D 16-Core Processor
GPU0: 3090ti
GPU1: 3090ti
GPU2: 3090
Models
Qwen3-Coder-Next-UD-Q6_K_XL.gguf
https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF
Problem description & steps to reproduce
During random use by hermes-agent 0.8.0 the model will crash out. Happens on any version of llama.cpp I've tried. Running under docker 29.3.1, CUDA 13.1 and NVidia docker container toolkit.
First Bad Commit
N/A
Relevant log output
Logs
[34893] /app/ggml/src/ggml-cuda/ggml-cuda.cu:97: CUDA error
[34893] ggml_cuda_compute_forward: MUL_MAT_ID failed
[34893] CUDA error: invalid argument
[34893] current device: 0, in function ggml_cuda_compute_forward at /app/ggml/src/ggml-cuda/ggml-cuda.cu:2884
[34893] err
[34893] libggml-base.so.0(+0x19c36)[0x7e5ab8cb3c36]
[34893] libggml-base.so.0(ggml_print_backtrace+0x21a)[0x7e5ab8cb409a]
[34893] libggml-base.so.0(ggml_abort+0x15b)[0x7e5ab8cb427b]
[34893] /app/libggml-cuda.so(_Z15ggml_cuda_errorPKcS0_S0_iS0_+0xb5)[0x7e5ab639a175]
[34893] /app/libggml-cuda.so(+0x1e4e88)[0x7e5ab63abe88]
[34893] /app/libggml-cuda.so(+0x1e79d1)[0x7e5ab63ae9d1]
[34893] /app/libggml-cuda.so(+0x1ea112)[0x7e5ab63b1112]
[34893] libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x82f)[0x7e5ab8cd1caf]
[34893] libllama.so.0(_ZN13llama_context13graph_computeEP11ggml_cgraphb+0xa1)[0x7e5ab8e295b1]
[34893] libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0x112)[0x7e5ab8e2bf82]
[34893] libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0x36f)[0x7e5ab8e31edf]
[34893] libllama.so.0(llama_decode+0xf)[0x7e5ab8e33aef]
[34893] /app/llama-server(+0x182627)[0x63acdfe07627]
[34893] /app/llama-server(+0x20f456)[0x63acdfe94456]
[34893] /app/llama-server(+0xd3b43)[0x63acdfd58b43]
[34893] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7e5ab871d1ca]
[34893] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7e5ab871d28b]
[34893] /app/llama-server(+0xdb835)[0x63acdfd60835]
Name and Version
ggml_cuda_init: found 3 CUDA devices (Total VRAM: 72354 MiB):
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes, VRAM: 24114 MiB
Device 1: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes, VRAM: 24114 MiB
Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB
load_backend: loaded CUDA backend from /app/libggml-cuda.so
load_backend: loaded CPU backend from /app/libggml-cpu-zen4.so
version: 8768 (1e9d771)
built with GNU 14.2.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
CPU: AMD Ryzen 9 9950X3D 16-Core Processor
GPU0: 3090ti
GPU1: 3090ti
GPU2: 3090
Models
Qwen3-Coder-Next-UD-Q6_K_XL.gguf
https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF
Problem description & steps to reproduce
During random use by hermes-agent 0.8.0 the model will crash out. Happens on any version of llama.cpp I've tried. Running under docker 29.3.1, CUDA 13.1 and NVidia docker container toolkit.
First Bad Commit
N/A
Relevant log output
Logs