Description
I was recently looking for ways to demonstrate some of the functionality of the llama.cpp examples and some of the commands can become very cumbersome. For example, here is what I use for the llama.vim FIM server:
llama-server \
-m ./models/qwen2.5-7b-coder/ggml-model-q8_0.gguf \
--log-file ./service-vim.log \
--host 0.0.0.0 --port 8012 \
--ctx-size 0 \
--cache-reuse 256 \
-ub 1024 -b 1024 -ngl 99 -fa -dt 0.1
It would be much cleaner if I could just run, for example:
llama-server --cfg-fim-7b
Or if I could turn this embedding server command into something simpler:
# llama-server \
# --hf-repo ggml-org/bert-base-uncased \
# --hf-file bert-base-uncased-Q8_0.gguf \
# --port 8033 -c 512 --embeddings --pooling mean
llama-server --cfg-embd-bert --port 8033
Implementation
There is already an initial example of how we can create such configuration presets:
llama-tts --tts-oute-default -p "This is a TTS preset"
# equivalent to
#
# llama-tts \
# --hf-repo OuteAI/OuteTTS-0.2-500M-GGUF \
# --hf-file OuteTTS-0.2-500M-Q8_0.gguf \
# --hf-repo-v ggml-org/WavTokenizer \
# --hf-file-v WavTokenizer-Large-75-F16.gguf -p "This is a TTS preset"
Details
https://github.com/ggerganov/llama.cpp/blob/5cd85b5e008de2ec398d6596e240187d627561e3/common/arg.cpp#L2208-L2220
This preset configures the model urls so that they would be automatically downloaded from HF when the example runs and thus simplifies the command significantly. It can additionally set various default values, such as context size, batch size, pooling type, etc.
Goal
The goal of this issue is to create such presets for various common tasks:
The list of configuration presets would require curation and proper documentation.
I think this is a great task for new contributors to help and to get involved in the project.
Description
I was recently looking for ways to demonstrate some of the functionality of the
llama.cppexamples and some of the commands can become very cumbersome. For example, here is what I use for thellama.vimFIM server:llama-server \ -m ./models/qwen2.5-7b-coder/ggml-model-q8_0.gguf \ --log-file ./service-vim.log \ --host 0.0.0.0 --port 8012 \ --ctx-size 0 \ --cache-reuse 256 \ -ub 1024 -b 1024 -ngl 99 -fa -dt 0.1It would be much cleaner if I could just run, for example:
Or if I could turn this embedding server command into something simpler:
Implementation
There is already an initial example of how we can create such configuration presets:
Details
https://github.com/ggerganov/llama.cpp/blob/5cd85b5e008de2ec398d6596e240187d627561e3/common/arg.cpp#L2208-L2220
This preset configures the model urls so that they would be automatically downloaded from HF when the example runs and thus simplifies the command significantly. It can additionally set various default values, such as context size, batch size, pooling type, etc.
Goal
The goal of this issue is to create such presets for various common tasks:
llama.vimThe list of configuration presets would require curation and proper documentation.
I think this is a great task for new contributors to help and to get involved in the project.