Skip to content

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning #403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 28, 2023

Conversation

ductai199x
Copy link
Contributor

@ductai199x ductai199x commented Mar 22, 2023

Here's a PR to convert a model written in a GGML format back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning. Mentioned in issue #359

Also included the ability to use HF's transformers to load the torch model and open up a chat (-c). The model's generation will be stopped on a newline character so beware of what you are asking 😄 .

@PriNova
Copy link

PriNova commented Mar 22, 2023

Wow, Fantastic. Thank you for this contribution.

@gjmulder gjmulder added the enhancement New feature or request label Mar 22, 2023
@onusai
Copy link

onusai commented Mar 23, 2023

I converted a 30b 4bit ggml model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main back to pytorch (hf), but the resulting file was 65gb instead of about 20gb

Is it possible for 4bit ggml model to be directly converted to 4bit pytorch model? Im attempting to quantize the 65gb model back to 4bit, but im concerned that quantizing it a second time will further degrade it

@ductai199x
Copy link
Contributor Author

I converted a 30b 4bit ggml model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main back to pytorch (hf), but the resulting file was 65gb instead of about 20gb

Is it possible for 4bit ggml model to be directly converted to 4bit pytorch model? Im attempting to quantize the 65gb model back to 4bit, but im concerned that quantizing it a second time will further degrade it

I don't think pytorch or HF have the ability to run using 4bit (float). So this file is mainly for you to get float16 weights back so that you can use with other pytorch libraries or training/finetuning using pytorch lightning or HF's transformers

@onusai
Copy link

onusai commented Mar 23, 2023

I don't think pytorch or HF have the ability to run using 4bit (float).

ah, your right

there are 4bit models (https://huggingface.co/decapoda-research/llama-13b-hf-int4) in hf format i think, and you can run them using the textgen-webui (https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model) but it does require additional setup to work. i assumed that the conversion from 4bit ggml to 4bit hf (GPTQ) format would be similar/straight forward, but i dont know much about this and could easily be wrong

@ductai199x
Copy link
Contributor Author

Well, I suppose they quantize the weights to 4bit then save it as 4bit, which you can do easily with a bit of modification on my code. However, at inference, they need "special support code" to dequantize back to 16 or 32 bits. Basically, it saves a bit of space but it won't save memory. (And I could be wrong on this)

@anzz1
Copy link
Contributor

anzz1 commented Mar 26, 2023

the .pth files == ggml-f16 which both contain the full information.
if you have a quantized .pt or ggml-q4_0 / q4_1 , the full information is already lost so you can't transform it back to the unquantized version. i mean you can, but its like saving a full size BMP from a compressed JPG file and makes no sense.

so you can convert pth <-> ggml-f16 , they contain the same information.
same with 4bit quantized .pt <-> ggml_q4 , they are in essence the same thing except the quantizing algorithm used can make a difference ofc
and you can quantize pth -> pt or ggml-f16 -> ggml_q4
but you cannot "un-quantize" , so going from pt -> pth or ggml-q4 -> ggml-f16 is not something you should do even if you technically could

@ductai199x
Copy link
Contributor Author

@anzz1 Thank you for your comment. However, what if you want to study the effect of finetuning on quantized models? Or simply want to look at the distribution of weights of a particular layer before/after quantization? I agree that in a point of view of a normal user, it's not useful, but for researchers or people who wants to understand the effect of different quantization methods, I believe this can be very helpful.

@anzz1
Copy link
Contributor

anzz1 commented Mar 27, 2023

@anzz1 Thank you for your comment. However, what if you want to study the effect of finetuning on quantized models? Or simply want to look at the distribution of weights of a particular layer before/after quantization? I agree that in a point of view of a normal user, it's not useful, but for researchers or people who wants to understand the effect of different quantization methods, I believe this can be very helpful.

Sure thing , I definitely agree. The comment was just for information. And the ability to losslessly go between f32/f16 <-> pytorch bin is definitely good idea so don't have to store both. These models do take quite a space when you start to collect more of them 😄

@ductai199x
Copy link
Contributor Author

@anzz1 @ggerganov Any idea how I can get this PR reviewed/accepted? I am willing to put in more work to make it run correctly and smoothly.

@ggerganov
Copy link
Member

@ductai199x
I usually look at all PRs, but sometimes it can take a while.
I'll merge this and also invited you as a collaborator

@ggerganov ggerganov merged commit d0330fd into ggml-org:master Mar 28, 2023
@SlyEcho SlyEcho mentioned this pull request Apr 21, 2023
@webpolis
Copy link

@ggerganov any reason why this was removed from main?

@ductai199x
Copy link
Contributor Author

@ggerganov any reason why this was removed from main?

I think it's because some time ago there were lots and lots of breaking changes to the implementation that the old code couldn't keep up. Maybe once everything stablize a bit we should add this capability back?

@webpolis
Copy link

webpolis commented Aug 23, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants