Skip to content

65B quantized for CPU #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
satvikpendem opened this issue Mar 18, 2023 · 3 comments
Closed

65B quantized for CPU #251

satvikpendem opened this issue Mar 18, 2023 · 3 comments

Comments

@satvikpendem
Copy link

satvikpendem commented Mar 18, 2023

Is there any way to run the 65B model on the CPU quantized for 4 bit? I saw that it's about 40 gigs for RAM usage when quantized.

How much RAM is required to quantize the 65B model? I'm not sure I have enough RAM to quantize myself, anyone have the model files for the quantized output for the 65B model for CPU? I've only found the quantized GPU files so far.

@jarcen
Copy link

jarcen commented Mar 18, 2023

Big models are split into chunks of 12 to 15 GB. Both pth->ggml converter and quantizer work with chunks so that at most one is loaded in memory. At each stage outputs are also chunks. If you can convert 13B model then I see nothing stopping you from doing same with 65B.

@G2G2G2G
Copy link

G2G2G2G commented Mar 18, 2023

Yea it works the same as every other model and the same steps are in the readme...?

@satvikpendem
Copy link
Author

Alright just tried it, it takes about 40 GB RAM to run. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants