Skip to content

Misc. bug: nix flake builds omit CPU optimization flags, resulting in slower inference #21899

@jhandwe

Description

@jhandwe

Name and Version

version: 8790 (be76dd0)

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server, llama-bench

Command line

llama-bench --model $MODEL -p 0 -n 128 -ngl 0

Problem description & steps to reproduce

The nix flake builds a binary without certain CPU optimizations, resulting in significantly lower performance. With my benchmark I went from roughly 6tps to 1.6tps for a 13B model fully on the CPU.

To reproduce:

  • Clone the repository
  • Build with nix build . (a CPU only build)
  • Test inference speed with ./result/bin/llama-bench --model $MODEL -p 0 -n 128 -ngl 0

To demonstrate the potential speed up:

  • Enter the nix development shell with nix develop .
  • Query the nix build flags and add CPU optimization:
export FLAGS=$(echo $(nix log . | grep "cmake flags" | cut -c 14-) \
-DGGML_SSE42=ON \
-DGGML_AVX=ON \
-DGGML_AVX2=ON \
-DGGML_FMA=ON \
-DGGML_F16C=ON \
-DGGML_BMI2=ON)
  • Build with mkdir build; cmake -S . -B build $FLAGS; cmake --build build --config Release -- -j 16 (number of threads CPU dependent)
  • Rerun test with LD_LIBRARY_PATH="$PWD/build/bin" ./build/bin/llama-bench --model $MODEL -p 0 -n 128 -ngl 0 and observe 3-4x speedup.

I tested this on a system with a Ryzen 5800x. I got these specific extra compile flags by comparing the CMakeCache.txt files of a manual build following the build page (which is fast) with the file generated by the build procedure within the nix develop shell.

I don't think simply adding these cmake flags to the flake build in .devops/nix/package.nix is a good way to fix this. As I understand it this could cause the build or binary to fail for other CPUs, but I don't have enough experience with multi platform development for a good evaluation. At the same time the current state means significantly lower performace for really no good reason on modern systems, which users don't even realize unless they switch between the nix flake build and a normal build as I did. Perhaps CPU optimizations could be added as optional versions of the existing packages, with the agnostic version remaining the default to ensure the normal build always works.

Disclaimer: I used generative AI to help narrow down the specific issue and find the relevant difference in my builds. This report was written and tested by me.

First Bad Commit

I am reasonably certain that the change in #11317 is responsible for CPU optimizations being disabled, but did not test this further.

Relevant log output

Benchmark with CPU optimization
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 13B IQ4_NL - 4.5 bpw     |   6.60 GiB |    12.25 B | BLAS       |       8 |           tg128 |          5.98 ± 0.01 |
Benchmark without CPU optimization
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 13B IQ4_NL - 4.5 bpw     |   6.60 GiB |    12.25 B | BLAS       |       8 |           tg128 |          1.60 ± 0.00 |

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions