Skip to content

Eval bug: Llama 4 Scout vision broken for image > 336px #21871

@delta9000

Description

@delta9000

Name and Version

Last working version b8541 (ded446b)

Operating systems

Linux

GGML backends

CUDA

Hardware

2 x L40s, also tested on 2x A6000 ADA

Models

unsloth--Llama-4-Scout-17B-16E-Instruct-GGUF Q4_K_M

Problem description & steps to reproduce

Llama 4 vision broken for any image larger than 336px.

  # model: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF (Q4_K_M + mmproj-BF16)                                                                                                                      
                                                                                                                                                                                                   
# works - image under 336px                                                                                                                                                    
llama-mtmd-cli \                                                                                                                                                                                 
  -m Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00001-of-00002.gguf \                                                                                                                                 
  --mmproj mmproj-BF16.gguf \                                                                                                                                                                    
  -c 32768 \                                                                                                                                                                                     
  --image cat-200px.jpg \                                                                                                                                                                        
  -p "What animal is this?" -n 16                                                                                                                                                                
                                                                                                                                                                                                 
# fails - image over 336px                                                                                                                                            
llama-mtmd-cli \                                                                                                                                                                                 
  -m Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00001-of-00002.gguf \                                                                                                                                 
  --mmproj mmproj-BF16.gguf \                                                                                                                                                                    
  -c 32768 \                                                                    
  --image cat-1200px.jpg \                                                                                                                                                                       
  -p "What animal is this?" -n 16   

First Bad Commit

b8542 (a73bbd5)

Relevant log output

Logs
warmup: *****************************************************************                                                                                                                        
init_vision: llama 4 vision is known to have degraded quality:                                                                                                                                   
    https://github.com/ggml-org/llama.cpp/pull/13282                                                                                                                                             
main: loading model: Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00001-of-00002.gguf                                                                                                                   
WARN: This is an experimental CLI for testing multimodal capability.                                                                                                                             
      For normal use cases, please use the standard llama-cli                                                                                                                                    
encoding image slice...                                                                                                                                                                          
failed to encode image slice                                                                                                                                                                     
failed to eval chunk 1                                                                                                                                                                           
Unable to eval prompt 

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions