You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there any way to run the 65B model on the CPU quantized for 4 bit? I saw that it's about 40 gigs for RAM usage when quantized.
How much RAM is required to quantize the 65B model? I'm not sure I have enough RAM to quantize myself, anyone have the model files for the quantized output for the 65B model for CPU? I've only found the quantized GPU files so far.
The text was updated successfully, but these errors were encountered:
Big models are split into chunks of 12 to 15 GB. Both pth->ggml converter and quantizer work with chunks so that at most one is loaded in memory. At each stage outputs are also chunks. If you can convert 13B model then I see nothing stopping you from doing same with 65B.
Is there any way to run the 65B model on the CPU quantized for 4 bit? I saw that it's about 40 gigs for RAM usage when quantized.
How much RAM is required to quantize the 65B model? I'm not sure I have enough RAM to quantize myself, anyone have the model files for the quantized output for the 65B model for CPU? I've only found the quantized GPU files so far.
The text was updated successfully, but these errors were encountered: