Add oneliner for batch quantization #17

jooray · 2023-03-11T17:45:06Z

No description provided.

prusnak · 2023-03-11T18:05:01Z

README.md

+which can be scripted like this if you are lazy (for 65B model):
+
+```bash
+for i in models/65B/ggml-model-f16.bin*;do quantized=`echo "$i" | sed -e 's/f16/q4_0/'`; ./quantize "$i" "$quantized" 2 ;done


Sed is not necessary, bash, zsh and other modern shells can perform pattern replacement of a variable:

Suggested change

for i in models/65B/ggml-model-f16.bin*;do quantized=`echo "$i" | sed -e 's/f16/q4_0/'`; ./quantize "$i" "$quantized" 2 ;done

for i in models/65B/ggml-model-f16.bin* ; do ./quantize "$i" "${i/f16/q4_0}" 2 ;done

This will generate 'models/65B/ggml-model-q4_0/.bin.2' such paths and will fail with errors, the right command (in bash) should be for i in models/65B/ggml-model-f16.bin* ; do ./quantize "$i" "${i/f16/q4_0}" 2 ;done

@Player-205 right, updated the suggestion above, thanks

ggerganov · 2023-03-12T20:17:57Z

Lets put this in a quantize.sh script that accepts argument like 7B, 13B, etc. and update instructions to just run the script:

source quantize.sh 7B

Should be much easier to follow

leszekhanusz · 2023-03-12T22:29:56Z

Note that if the disk space is limited, it is still useful to quantize each file separately so that we could delete each intermediate file in between.
In my case I added a rm command because I did not have enough disk space otherwise:

for i in models/65B/ggml-model-f16.bin* ; do ./quantize "$i" "${i/f16/q4_0}" 2 ; rm "$i"; done

ggerganov · 2023-03-12T22:34:34Z

Good point, should have a second parameter for "keep f16" which is on by default

prusnak · 2023-03-13T12:50:43Z

Superseded by #92

improve docs and example

Add oneliner for batch quantization

faad7f1

prusnak requested changes Mar 11, 2023

View reviewed changes

prusnak mentioned this pull request Mar 13, 2023

Add quantize script for batch quantization #92

Merged

ggerganov closed this Mar 13, 2023

SlyEcho pushed a commit to SlyEcho/llama.cpp that referenced this pull request Jun 11, 2023

Merge pull request ggml-org#17 from SlyEcho/server_refactor

310bf61

improve docs and example

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add oneliner for batch quantization #17

Add oneliner for batch quantization #17

jooray commented Mar 11, 2023

prusnak Mar 11, 2023 •

edited

Loading

s-and-witch Mar 12, 2023 •

edited

Loading

prusnak Mar 12, 2023 •

edited

Loading

ggerganov commented Mar 12, 2023

leszekhanusz commented Mar 12, 2023

ggerganov commented Mar 12, 2023

prusnak commented Mar 13, 2023

	for i in models/65B/ggml-model-f16.bin*;do quantized=`echo "$i" \| sed -e 's/f16/q4_0/'`; ./quantize "$i" "$quantized" 2 ;done
	for i in models/65B/ggml-model-f16.bin* ; do ./quantize "$i" "${i/f16/q4_0}" 2 ;done

Add oneliner for batch quantization #17

Add oneliner for batch quantization #17

Conversation

jooray commented Mar 11, 2023

prusnak Mar 11, 2023 • edited Loading

Choose a reason for hiding this comment

s-and-witch Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

prusnak Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

ggerganov commented Mar 12, 2023

leszekhanusz commented Mar 12, 2023

ggerganov commented Mar 12, 2023

prusnak commented Mar 13, 2023

prusnak Mar 11, 2023 •

edited

Loading

s-and-witch Mar 12, 2023 •

edited

Loading

prusnak Mar 12, 2023 •

edited

Loading