Name and Version
Last working version b8541 (ded446b)
Operating systems
Linux
GGML backends
CUDA
Hardware
2 x L40s, also tested on 2x A6000 ADA
Models
unsloth--Llama-4-Scout-17B-16E-Instruct-GGUF Q4_K_M
Problem description & steps to reproduce
Llama 4 vision broken for any image larger than 336px.
# model: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF (Q4_K_M + mmproj-BF16)
# works - image under 336px
llama-mtmd-cli \
-m Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00001-of-00002.gguf \
--mmproj mmproj-BF16.gguf \
-c 32768 \
--image cat-200px.jpg \
-p "What animal is this?" -n 16
# fails - image over 336px
llama-mtmd-cli \
-m Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00001-of-00002.gguf \
--mmproj mmproj-BF16.gguf \
-c 32768 \
--image cat-1200px.jpg \
-p "What animal is this?" -n 16
First Bad Commit
b8542 (a73bbd5)
Relevant log output
Logs
warmup: *****************************************************************
init_vision: llama 4 vision is known to have degraded quality:
https://github.com/ggml-org/llama.cpp/pull/13282
main: loading model: Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00001-of-00002.gguf
WARN: This is an experimental CLI for testing multimodal capability.
For normal use cases, please use the standard llama-cli
encoding image slice...
failed to encode image slice
failed to eval chunk 1
Unable to eval prompt
Name and Version
Last working version b8541 (ded446b)
Operating systems
Linux
GGML backends
CUDA
Hardware
2 x L40s, also tested on 2x A6000 ADA
Models
unsloth--Llama-4-Scout-17B-16E-Instruct-GGUF Q4_K_M
Problem description & steps to reproduce
Llama 4 vision broken for any image larger than 336px.
First Bad Commit
b8542 (a73bbd5)
Relevant log output
Logs