-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning #403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ther consumption/training/finetuning
Wow, Fantastic. Thank you for this contribution. |
I converted a 30b 4bit ggml model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main back to pytorch (hf), but the resulting file was 65gb instead of about 20gb Is it possible for 4bit ggml model to be directly converted to 4bit pytorch model? Im attempting to quantize the 65gb model back to 4bit, but im concerned that quantizing it a second time will further degrade it |
I don't think pytorch or HF have the ability to run using 4bit (float). So this file is mainly for you to get float16 weights back so that you can use with other pytorch libraries or training/finetuning using pytorch lightning or HF's transformers |
ah, your right there are 4bit models (https://huggingface.co/decapoda-research/llama-13b-hf-int4) in hf format i think, and you can run them using the textgen-webui (https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model) but it does require additional setup to work. i assumed that the conversion from 4bit ggml to 4bit hf (GPTQ) format would be similar/straight forward, but i dont know much about this and could easily be wrong |
Well, I suppose they quantize the weights to 4bit then save it as 4bit, which you can do easily with a bit of modification on my code. However, at inference, they need "special support code" to dequantize back to 16 or 32 bits. Basically, it saves a bit of space but it won't save memory. (And I could be wrong on this) |
the .pth files == ggml-f16 which both contain the full information. so you can convert pth <-> ggml-f16 , they contain the same information. |
@anzz1 Thank you for your comment. However, what if you want to study the effect of finetuning on quantized models? Or simply want to look at the distribution of weights of a particular layer before/after quantization? I agree that in a point of view of a normal user, it's not useful, but for researchers or people who wants to understand the effect of different quantization methods, I believe this can be very helpful. |
Sure thing , I definitely agree. The comment was just for information. And the ability to losslessly go between f32/f16 <-> pytorch bin is definitely good idea so don't have to store both. These models do take quite a space when you start to collect more of them 😄 |
@anzz1 @ggerganov Any idea how I can get this PR reviewed/accepted? I am willing to put in more work to make it run correctly and smoothly. |
@ductai199x |
@ggerganov any reason why this was removed from main? |
I think it's because some time ago there were lots and lots of breaking changes to the implementation that the old code couldn't keep up. Maybe once everything stablize a bit we should add this capability back? |
I see. This feature is extremely useful nowadays but right now I don't have
enough room for contributing with a fix.
El mié., 23 ago. 2023 14:37, Tai Duc Nguyen ***@***.***>
escribió:
… @ggerganov <https://github.com/ggerganov> any reason why this was removed
from main?
I think it's because some time ago there were lots and lots of breaking
changes to the implementation that the old code couldn't keep up. Maybe
once everything stablize a bit we should add this capability back?
—
Reply to this email directly, view it on GitHub
<#403 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIF5HJZG6EV2JA7CHHEKITXWY5UBANCNFSM6AAAAAAWEEV574>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
Here's a PR to convert a model written in a GGML format back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning. Mentioned in issue #359
Also included the ability to use HF's transformers to load the torch model and open up a chat (
-c
). The model's generation will be stopped on a newline character so beware of what you are asking 😄 .