Skip to content

Eval bug: Error while using Qwen3-Code-Next for agent work #21832

@ceinstaller

Description

@ceinstaller

Name and Version

ggml_cuda_init: found 3 CUDA devices (Total VRAM: 72354 MiB):
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes, VRAM: 24114 MiB
Device 1: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes, VRAM: 24114 MiB
Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24126 MiB
load_backend: loaded CUDA backend from /app/libggml-cuda.so
load_backend: loaded CPU backend from /app/libggml-cpu-zen4.so
version: 8768 (1e9d771)
built with GNU 14.2.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

CPU: AMD Ryzen 9 9950X3D 16-Core Processor
GPU0: 3090ti
GPU1: 3090ti
GPU2: 3090

Models

Qwen3-Coder-Next-UD-Q6_K_XL.gguf

https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF

Problem description & steps to reproduce

During random use by hermes-agent 0.8.0 the model will crash out. Happens on any version of llama.cpp I've tried. Running under docker 29.3.1, CUDA 13.1 and NVidia docker container toolkit.

First Bad Commit

N/A

Relevant log output

Logs
[34893] /app/ggml/src/ggml-cuda/ggml-cuda.cu:97: CUDA error

[34893] ggml_cuda_compute_forward: MUL_MAT_ID failed

[34893] CUDA error: invalid argument

[34893]   current device: 0, in function ggml_cuda_compute_forward at /app/ggml/src/ggml-cuda/ggml-cuda.cu:2884

[34893]   err

[34893] libggml-base.so.0(+0x19c36)[0x7e5ab8cb3c36]

[34893] libggml-base.so.0(ggml_print_backtrace+0x21a)[0x7e5ab8cb409a]

[34893] libggml-base.so.0(ggml_abort+0x15b)[0x7e5ab8cb427b]

[34893] /app/libggml-cuda.so(_Z15ggml_cuda_errorPKcS0_S0_iS0_+0xb5)[0x7e5ab639a175]

[34893] /app/libggml-cuda.so(+0x1e4e88)[0x7e5ab63abe88]

[34893] /app/libggml-cuda.so(+0x1e79d1)[0x7e5ab63ae9d1]

[34893] /app/libggml-cuda.so(+0x1ea112)[0x7e5ab63b1112]

[34893] libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x82f)[0x7e5ab8cd1caf]

[34893] libllama.so.0(_ZN13llama_context13graph_computeEP11ggml_cgraphb+0xa1)[0x7e5ab8e295b1]

[34893] libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0x112)[0x7e5ab8e2bf82]

[34893] libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0x36f)[0x7e5ab8e31edf]

[34893] libllama.so.0(llama_decode+0xf)[0x7e5ab8e33aef]

[34893] /app/llama-server(+0x182627)[0x63acdfe07627]

[34893] /app/llama-server(+0x20f456)[0x63acdfe94456]

[34893] /app/llama-server(+0xd3b43)[0x63acdfd58b43]

[34893] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7e5ab871d1ca]

[34893] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7e5ab871d28b]

[34893] /app/llama-server(+0xdb835)[0x63acdfd60835]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions