65B quantized for CPU #251

satvikpendem · 2023-03-18T02:47:36Z

Is there any way to run the 65B model on the CPU quantized for 4 bit? I saw that it's about 40 gigs for RAM usage when quantized.

How much RAM is required to quantize the 65B model? I'm not sure I have enough RAM to quantize myself, anyone have the model files for the quantized output for the 65B model for CPU? I've only found the quantized GPU files so far.

jarcen · 2023-03-18T03:42:02Z

Big models are split into chunks of 12 to 15 GB. Both pth->ggml converter and quantizer work with chunks so that at most one is loaded in memory. At each stage outputs are also chunks. If you can convert 13B model then I see nothing stopping you from doing same with 65B.

G2G2G2G · 2023-03-18T05:57:01Z

Yea it works the same as every other model and the same steps are in the readme...?

satvikpendem · 2023-03-18T05:59:53Z

Alright just tried it, it takes about 40 GB RAM to run. Thanks!

satvikpendem closed this as completed Mar 18, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

65B quantized for CPU #251

65B quantized for CPU #251

satvikpendem commented Mar 18, 2023 •

edited

Loading

jarcen commented Mar 18, 2023

G2G2G2G commented Mar 18, 2023

satvikpendem commented Mar 18, 2023

65B quantized for CPU #251

65B quantized for CPU #251

Comments

satvikpendem commented Mar 18, 2023 • edited Loading

jarcen commented Mar 18, 2023

G2G2G2G commented Mar 18, 2023

satvikpendem commented Mar 18, 2023

satvikpendem commented Mar 18, 2023 •

edited

Loading