Tags · ggml-org/llama.cpp

b5200

llama-bench : Add `--override-tensors` arg (#12922)

* Add --override-tensors option to llama-bench

* Correct llama-bench --override-tensors to --override-tensor

* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.

* Make new llama-bench util functions static to fix Ubuntu CI

* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)

Apr 27, 2025
c0a97b7
zip
tar.gz
Downloads

b5199

llama-chat : fix wrong template in GLM4-0414 (#13140)

* fix wrong template in GLM4-0414

* fix spaces

* no bos token since it is already in the template

* moved the chatgml4 check to higher priority

* restored template for old GLM models

* moved the GLM4 template check in the correct place with correct check

Apr 27, 2025
ced44be
zip
tar.gz
Downloads

b5198

musa: fix build warning (#13129)

Signed-off-by: Xiaodong Ye <[email protected]>

Apr 27, 2025
e291450
zip
tar.gz
Downloads

b5197

Fixes Qwen2.5VL segfault during inference with #12402 as has_qwen2vl_…

…merger migration was incomplete (#13133)

Apr 27, 2025
59e991c
zip
tar.gz
Downloads

b5196

clip : Add Qwen2.5VL support (#12402)

* implment vision model architecture, gguf convertor

* handle window attention inputs

* add debug utils

* fix few incorrect tensor memory layout

* move position id remap out of ggml to avoid int32 cuda operations

* cleaning up

* ignore transformers Qwen2_5_xxx type check

* remove not so often use `qwen2vl-cli` debug functions

* remove commented-out code blocks

* fix attn weight scaling after rebase

* add `PROJECTOR_TYPE_QWEN2_5_VL`

* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`

* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`

* remove `attn_window_size` from gguf

* fix model conversion

* clean up

* fix merging problem

* add test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Apr 27, 2025
ca2bb89
zip
tar.gz
Downloads

b5195

common : add common_remote_get_content (#13123)

* common : add common_remote_get_content

* support max size and timeout

* add tests

Apr 26, 2025
2d451c8
zip
tar.gz
Downloads

b5194

clip : improve projector naming (#13118)

* clip : improve projector naming

* no more kv has_llava_projector

* rm unused kv

* rm more unused

Apr 26, 2025
4753791
zip
tar.gz
Downloads

b5193

ggml: move fp16/bf16 conversion optimizations to CPU backend + export…

… conversion APIs (#13107)

* ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion

* move fp converter to ggml-cpu

* Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32

Apr 26, 2025
77d5e9a
zip
tar.gz
Downloads

b5192

grammar : handle maxItems == 0 in JSON schema (#13117)

Co-authored-by: Richard Lyons <[email protected]>

Apr 26, 2025
d5fe4e8
zip
tar.gz
Downloads

b5191

llama : fix K-shift with quantized K and BLAS backend (#13113)

Apr 25, 2025
295354e
zip
tar.gz
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b5200

b5199

b5198

b5197

b5196

b5195

b5194

b5193

b5192

b5191

Tags: ggml-org/llama.cpp