Misc. bug: nix flake builds omit CPU optimization flags, resulting in slower inference

### Name and Version

version: 8790 (be76dd0bb)

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server, llama-bench

### Command line

```shell
llama-bench --model $MODEL -p 0 -n 128 -ngl 0
```

### Problem description & steps to reproduce

The nix flake builds a binary without certain CPU optimizations, resulting in significantly lower performance. With my benchmark I went from roughly 6tps to 1.6tps for a 13B model fully on the CPU.

To reproduce:

* Clone the repository
* Build with `nix build .` (a CPU only build)
* Test inference speed with `./result/bin/llama-bench  --model $MODEL -p 0 -n 128 -ngl 0`

To demonstrate the potential speed up: 

* Enter the nix development shell with `nix develop .`
* Query the nix build flags and add CPU optimization:

```
export FLAGS=$(echo $(nix log . | grep "cmake flags" | cut -c 14-) \
-DGGML_SSE42=ON \
-DGGML_AVX=ON \
-DGGML_AVX2=ON \
-DGGML_FMA=ON \
-DGGML_F16C=ON \
-DGGML_BMI2=ON)
```

* Build with `mkdir build; cmake -S . -B build $FLAGS; cmake --build build --config Release -- -j 16` (number of threads CPU dependent)
* Rerun test with `LD_LIBRARY_PATH="$PWD/build/bin" ./build/bin/llama-bench  --model $MODEL -p 0 -n 128 -ngl 0` and observe 3-4x speedup.

I tested this on a system with a Ryzen 5800x. I got these specific extra compile flags by comparing the CMakeCache.txt files of a manual build following the build page (which is fast) with the file generated by the build procedure within the nix develop shell.

I don't think simply adding these cmake flags to the flake build in `.devops/nix/package.nix` is a good way to fix this. As I understand it this could cause the build or binary to fail for other CPUs, but I don't have enough experience with multi platform development for a good evaluation. At the same time the current state means significantly lower performace for really no good reason on modern systems, which users don't even realize unless they switch between the nix flake build and a normal build as I did. Perhaps CPU optimizations could be added as optional versions of the existing packages, with the agnostic version remaining the default to ensure the normal build always works.

Disclaimer: I used generative AI to help narrow down the specific issue and find the relevant difference in my builds. This report was written and tested by me.

### First Bad Commit

I am reasonably certain that the change in #11317 is responsible for CPU optimizations being disabled, but did not test this further.

### Relevant log output

<details>
<summary>Benchmark with CPU optimization</summary>


```console
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 13B IQ4_NL - 4.5 bpw     |   6.60 GiB |    12.25 B | BLAS       |       8 |           tg128 |          5.98 ± 0.01 |
```
</details>

<details>
<summary>Benchmark without CPU optimization</summary>


```console
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 13B IQ4_NL - 4.5 bpw     |   6.60 GiB |    12.25 B | BLAS       |       8 |           tg128 |          1.60 ± 0.00 |
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: nix flake builds omit CPU optimization flags, resulting in slower inference #21899

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: nix flake builds omit CPU optimization flags, resulting in slower inference #21899

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions