Skip to content

Tags: ggml-org/llama.cpp

Tags

b5200

Toggle b5200's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama-bench : Add `--override-tensors` arg (#12922)

* Add --override-tensors option to llama-bench

* Correct llama-bench --override-tensors to --override-tensor

* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.

* Make new llama-bench util functions static to fix Ubuntu CI

* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)

b5199

Toggle b5199's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama-chat : fix wrong template in GLM4-0414 (#13140)

* fix wrong template in GLM4-0414

* fix spaces

* no bos token since it is already in the template

* moved the chatgml4 check to higher priority

* restored template for old GLM models

* moved the GLM4 template check in the correct place with correct check

b5198

Toggle b5198's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
musa: fix build warning (#13129)

Signed-off-by: Xiaodong Ye <[email protected]>

b5197

Toggle b5197's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fixes Qwen2.5VL segfault during inference with #12402 as has_qwen2vl_…

…merger migration was incomplete (#13133)

b5196

Toggle b5196's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
clip : Add Qwen2.5VL support (#12402)

* implment vision model architecture, gguf convertor

* handle window attention inputs

* add debug utils

* fix few incorrect tensor memory layout

* move position id remap out of ggml to avoid int32 cuda operations

* cleaning up

* ignore transformers Qwen2_5_xxx type check

* remove not so often use `qwen2vl-cli` debug functions

* remove commented-out code blocks

* fix attn weight scaling after rebase

* add `PROJECTOR_TYPE_QWEN2_5_VL`

* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`

* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`

* remove `attn_window_size` from gguf

* fix model conversion

* clean up

* fix merging problem

* add test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b5195

Toggle b5195's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
common : add common_remote_get_content (#13123)

* common : add common_remote_get_content

* support max size and timeout

* add tests

b5194

Toggle b5194's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
clip : improve projector naming (#13118)

* clip : improve projector naming

* no more kv has_llava_projector

* rm unused kv

* rm more unused

b5193

Toggle b5193's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml: move fp16/bf16 conversion optimizations to CPU backend + export…

… conversion APIs (#13107)

* ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion

* move fp converter to ggml-cpu

* Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32

b5192

Toggle b5192's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
grammar : handle maxItems == 0 in JSON schema (#13117)

Co-authored-by: Richard Lyons <[email protected]>

b5191

Toggle b5191's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama : fix K-shift with quantized K and BLAS backend (#13113)