Eval bug: commit d6f303004 cause AMD IOMMU I/O page‑faults and Nvidia Xid 31 fault

### Name and Version

llama-server
version: 8792 (f4b5bf2f3)
built with GNU 14.2.0 for Linux x86_64

NVIDIA-SMI version: 595.58.03
NVML version: 595.58
DRIVER version: 595.58.03
CUDA Version: 13.2


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

NVIDIA RTX PRO 6000 Blackwell Workstation Edition
AMD EPYC 7543P 32-Core Processor



### Models

Qwen3.5-397B-A17B-UD-Q5_K_XL (https://huggingface.co/unsloth/Qwen3.5-397B-A17B-GGUF)
Gemma-4-31B-it-BF16 (https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF)
MiniMax-M2.7-Q8_0 (https://huggingface.co/bartowski/MiniMaxAI_MiniMax-M2.7-GGUF)


### Problem description & steps to reproduce

llama-server crashes or hangs on some new models.

Linux kernel logs AMD IOMMU I/O page‑faults and then a GPU MMU fault (Xid 31)

### Git bisect

Between commit 85d482e6b6706648070f620797e54f1a6a0ff3d8 (good) and
commit f4b5bf2f329a1c5aa9af5380344f81fc3e1e24df (bad)

> d6f3030047f85a98b009189e76f441fe818ea44d is the first bad commit

At each git bisect step llama-cpp was re-built with:
```
git clean -xdfq
cmake -B build -DGGML_ZENDNN=ON -DGGML_CUDA=ON
cmake --build build --config Release -j 32
```

Linux kernel (dmesg) logs are the same (see below) for all these models, while llama-server behaves different depending on the model:

### Qwen3.5-397B

Crashes on first request.

### Gemma-4-31B 

Crashes on server start before first request.

### MiniMax-M2.7

llama-server to hang on the first request, without any logged errors. dmesg shows the same IO_PAGE_FAULT however.

### First Bad Commit

> d6f3030047f85a98b009189e76f441fe818ea44d is the first bad commit

### Relevant log output

### Qwen3.5-397B

`llama-server --host 192.168.88.249 --port 14128 --threads 8 --threads-http 2 --ctx-size 131072 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.00 --batch-size 2048 --ubatch-size 2048 -fa off --no-mmap --model /srv/d/.cache/hf/models--unsloth--Qwen3.5-397B-A17B-GGUF/snapshots/da33c16fa4440f831149fcf53b98a22bc07785e5/UD-Q5_K_XL/Qwen3.5-397B-A17B-UD-Q5_K_XL-00001-of-00008.gguf`

> 
> (...)
> 
> srv  update_slots: all slots are idle
> srv  params_from_: Chat format: peg-native
> slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
> srv  get_availabl: updating prompt cache
> srv          load:  - looking for better prompt, base f_keep = -1.000, sim = 0.000
> srv        update:  - cache state: 0 prompts, 0.000 MiB (limits: 8192.000 MiB, 131072 tokens, 8589934592 est)
> srv  get_availabl: prompt cache update took 0.01 ms
> slot launch_slot_: id  3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist 
> slot launch_slot_: id  3 | task 0 | processing task, is_child = 0
> slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 1524
> slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
> slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 1520, batch.n_tokens = 1520, progress = 0.997375
> /srv/d/p/llm/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:97: CUDA error
> CUDA error: an illegal memory access was encountered
>   current device: 0, in function ggml_backend_cuda_synchronize at /srv/d/p/llm/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:3005
>   cudaStreamSynchronize(cuda_ctx->stream())
> /srv/d/p/llm/llama.cpp/build/bin/libggml-base.so.0(+0x172a5) [0x7fe132c8b2a5]
> /srv/d/p/llm/llama.cpp/build/bin/libggml-base.so.0(ggml_print_backtrace+0x1df) [0x7fe132c8b66f]
> /srv/d/p/llm/llama.cpp/build/bin/libggml-base.so.0(ggml_abort+0x11e) [0x7fe132c8b7fe]
> /srv/d/p/llm/llama.cpp/build/bin/libggml-cuda.so.0(+0x1b34f3) [0x7fe12f5b34f3]
> /srv/d/p/llm/llama.cpp/build/bin/libggml-cuda.so.0(+0x1b46d0) [0x7fe12f5b46d0]
> /srv/d/p/llm/llama.cpp/build/bin/libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x815) [0x7fe132ca7bb5]
> /srv/d/p/llm/llama.cpp/build/bin/libllama.so.0(_ZN13llama_context13graph_computeEP11ggml_cgraphb+0xa1) [0x7fe1322afad1]
> /srv/d/p/llm/llama.cpp/build/bin/libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0xe4) [0x7fe1322b2084]
> /srv/d/p/llm/llama.cpp/build/bin/libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0x34f) [0x7fe1322b7c8f]
> /srv/d/p/llm/llama.cpp/build/bin/libllama.so.0(llama_decode+0xb) [0x7fe1322b96bb]
> llama-server(+0x160af3) [0x558c4c542af3]
> llama-server(+0x1e5a2f) [0x558c4c5c7a2f]
> llama-server(+0xba4aa) [0x558c4c49c4aa]
> /lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7fe131c35ca8]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7fe131c35d65]
> llama-server(+0xc14f1) [0x558c4c4a34f1]
> 
> LIBXSMM_VERSION: feature_print_bw-1.17-3780 (25693892)
> LIBXSMM_TARGET: hsw [AMD EPYC 7543P 32-Core Processor]
> Registry and code: 13 MB
> Command: llama-server --host 192.168.88.249 --port 14128 --threads 8 --threads-http 2 --ctx-size 131072 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.00 --batch-size 2048 --ubatch-size 2048 -fa off --no-mmap --model /srv/d/.cache/hf/models--unsloth--Qwen3.5-397B-A17B-GGUF/snapshots/da33c16fa4440f831149fcf53b98a22bc07785e5/UD-Q5_K_XL/Qwen3.5-397B-A17B-UD-Q5_K_XL-00001-of-00008.gguf
> Uptime: 288.167928 s
> Aborted
> 

### Gemma-4-31B 

`llama-server -v --device CUDA0 --host 192.168.88.249 --port 14128 --threads 8 --threads-http 2 --ctx-size 131072 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.00 --batch-size 2048 --ubatch-size 2048 -fa off --no-mmap --model /srv/d/.cache/llama.cpp/models--bartowski--google_gemma-4-31B-it-GGUF/snapshots/a83d9d3efca681556dbd277fc50fde765041a3b0/google_gemma-4-31B-it-bf16/google_gemma-4-31B-it-bf16-00001-of-00002.gguf `

> double free or corruption (!prev)
> double free or corruption (!prev)
> LIBXSMM_VERSION: feature_print_bw-1.17-3780 (25693892)double free or corruption (!prev)
> Aborted

### dmesg

> [ 7460.483909] amd_iommu_report_page_fault: 6 callbacks suppressed
> [ 7460.483919] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xd0826240 flags=0x0020]
> [ 7460.485503] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0314000 flags=0x0020]
> [ 7460.486330] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0310000 flags=0x0020]
> [ 7460.487121] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0314c00 flags=0x0020]
> [ 7460.487883] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0311000 flags=0x0020]
> [ 7460.488624] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0315000 flags=0x0020]
> [ 7460.489330] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0316000 flags=0x0020]
> [ 7460.490004] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0312000 flags=0x0020]
> [ 7460.490676] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0315e00 flags=0x0020]
> [ 7460.491352] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0312100 flags=0x0020]
> [ 7474.898124] amd_iommu_report_page_fault: 77 callbacks suppressed
> [ 7474.898133] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0314000 flags=0x0020]
> [ 7474.899573] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0xd0826240 flags=0x0020]
> [ 7474.900302] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0310000 flags=0x0020]
> [ 7474.901005] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0311100 flags=0x0020]
> [ 7474.901705] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0315100 flags=0x0020]
> [ 7474.902399] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0314400 flags=0x0020]
> [ 7474.903082] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0310200 flags=0x0020]
> [ 7474.903873] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0311c00 flags=0x0020]
> [ 7474.904546] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0315000 flags=0x0020]
> [ 7474.905221] nvidia 0000:81:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001c address=0x2bee0316100 flags=0x0020]
> [ 7474.908632] AMD-Vi: IOMMU Event log restarting
> [ 7475.278724] amd_iommu_report_page_fault: 2564 callbacks suppressed
> [ 7475.278730] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xf4826240 flags=0x0020]
> [ 7475.279811] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0x7fbe0014000 flags=0x0020]
> [ 7475.280652] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0x7fbe0010000 flags=0x0020]
> [ 7475.281418] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0x7fbe0015000 flags=0x0020]
> [ 7475.282116] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0x7fbe0011100 flags=0x0020]
> [ 7475.282814] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0x7fbe0010d00 flags=0x0020]
> [ 7475.283502] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0x7fbe0016100 flags=0x0020]
> [ 7475.284175] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0x7fbe0015600 flags=0x0020]
> [ 7475.284841] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0x7fbe0012100 flags=0x0020]
> [ 7475.285503] nvidia 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0x7fbe0011600 flags=0x0020]
> [ 7475.287354] AMD-Vi: IOMMU Event log restarting
> [ 7475.292157] AMD-Vi: IOMMU Event log restarting
> [ 7475.293352] AMD-Vi: IOMMU Event log restarting
> [ 7475.299240] AMD-Vi: IOMMU Event log restarting
> [ 7475.305167] AMD-Vi: IOMMU Event log restarting
> [ 7475.353664] NVRM: Xid (PCI:0000:01:00): 31, pid=86045, name=llama-server, channel 0x00000003, intr 00000000. MMU Fault: ENGINE GRAPHICS GPC11 GPCCLIENT_T1_11 faulted @ 0x5fd0_24000000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
> 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: commit d6f303004 cause AMD IOMMU I/O page‑faults and Nvidia Xid 31 fault #21908

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Git bisect

Qwen3.5-397B

Gemma-4-31B

MiniMax-M2.7

First Bad Commit

Relevant log output

Qwen3.5-397B

Gemma-4-31B

dmesg

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: commit d6f303004 cause AMD IOMMU I/O page‑faults and Nvidia Xid 31 fault #21908

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Git bisect

Qwen3.5-397B

Gemma-4-31B

MiniMax-M2.7

First Bad Commit

Relevant log output

Qwen3.5-397B

Gemma-4-31B

dmesg

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions