Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning #403

ductai199x · 2023-03-22T17:40:07Z

Here's a PR to convert a model written in a GGML format back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning. Mentioned in issue #359

Also included the ability to use HF's transformers to load the torch model and open up a chat (-c). The model's generation will be stopped on a newline character so beware of what you are asking 😄 .

…ther consumption/training/finetuning

PriNova · 2023-03-22T17:41:58Z

Wow, Fantastic. Thank you for this contribution.

onusai · 2023-03-23T01:37:55Z

I converted a 30b 4bit ggml model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main back to pytorch (hf), but the resulting file was 65gb instead of about 20gb

Is it possible for 4bit ggml model to be directly converted to 4bit pytorch model? Im attempting to quantize the 65gb model back to 4bit, but im concerned that quantizing it a second time will further degrade it

ductai199x · 2023-03-23T02:00:44Z

I converted a 30b 4bit ggml model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main back to pytorch (hf), but the resulting file was 65gb instead of about 20gb

Is it possible for 4bit ggml model to be directly converted to 4bit pytorch model? Im attempting to quantize the 65gb model back to 4bit, but im concerned that quantizing it a second time will further degrade it

I don't think pytorch or HF have the ability to run using 4bit (float). So this file is mainly for you to get float16 weights back so that you can use with other pytorch libraries or training/finetuning using pytorch lightning or HF's transformers

onusai · 2023-03-23T02:22:55Z

I don't think pytorch or HF have the ability to run using 4bit (float).

ah, your right

there are 4bit models (https://huggingface.co/decapoda-research/llama-13b-hf-int4) in hf format i think, and you can run them using the textgen-webui (https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model) but it does require additional setup to work. i assumed that the conversion from 4bit ggml to 4bit hf (GPTQ) format would be similar/straight forward, but i dont know much about this and could easily be wrong

ductai199x · 2023-03-23T02:46:31Z

Well, I suppose they quantize the weights to 4bit then save it as 4bit, which you can do easily with a bit of modification on my code. However, at inference, they need "special support code" to dequantize back to 16 or 32 bits. Basically, it saves a bit of space but it won't save memory. (And I could be wrong on this)

anzz1 · 2023-03-26T23:41:32Z

the .pth files == ggml-f16 which both contain the full information.
if you have a quantized .pt or ggml-q4_0 / q4_1 , the full information is already lost so you can't transform it back to the unquantized version. i mean you can, but its like saving a full size BMP from a compressed JPG file and makes no sense.

so you can convert pth <-> ggml-f16 , they contain the same information.
same with 4bit quantized .pt <-> ggml_q4 , they are in essence the same thing except the quantizing algorithm used can make a difference ofc
and you can quantize pth -> pt or ggml-f16 -> ggml_q4
but you cannot "un-quantize" , so going from pt -> pth or ggml-q4 -> ggml-f16 is not something you should do even if you technically could

ductai199x · 2023-03-26T23:50:28Z

@anzz1 Thank you for your comment. However, what if you want to study the effect of finetuning on quantized models? Or simply want to look at the distribution of weights of a particular layer before/after quantization? I agree that in a point of view of a normal user, it's not useful, but for researchers or people who wants to understand the effect of different quantization methods, I believe this can be very helpful.

anzz1 · 2023-03-27T19:31:27Z

@anzz1 Thank you for your comment. However, what if you want to study the effect of finetuning on quantized models? Or simply want to look at the distribution of weights of a particular layer before/after quantization? I agree that in a point of view of a normal user, it's not useful, but for researchers or people who wants to understand the effect of different quantization methods, I believe this can be very helpful.

Sure thing , I definitely agree. The comment was just for information. And the ability to losslessly go between f32/f16 <-> pytorch bin is definitely good idea so don't have to store both. These models do take quite a space when you start to collect more of them 😄

ductai199x · 2023-03-27T20:08:12Z

@anzz1 @ggerganov Any idea how I can get this PR reviewed/accepted? I am willing to put in more work to make it run correctly and smoothly.

ggerganov · 2023-03-28T17:50:51Z

@ductai199x
I usually look at all PRs, but sometimes it can take a while.
I'll merge this and also invited you as a collaborator

webpolis · 2023-08-23T17:26:29Z

@ggerganov any reason why this was removed from main?

ductai199x · 2023-08-23T17:36:52Z

@ggerganov any reason why this was removed from main?

I think it's because some time ago there were lots and lots of breaking changes to the implementation that the old code couldn't keep up. Maybe once everything stablize a bit we should add this capability back?

webpolis · 2023-08-23T18:07:15Z

I see. This feature is extremely useful nowadays but right now I don't have enough room for contributing with a fix. El mié., 23 ago. 2023 14:37, Tai Duc Nguyen ***@***.***> escribió:

…

@ggerganov <https://github.com/ggerganov> any reason why this was removed from main? I think it's because some time ago there were lots and lots of breaking changes to the implementation that the old code couldn't keep up. Maybe once everything stablize a bit we should add this capability back? — Reply to this email directly, view it on GitHub <#403 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIF5HJZG6EV2JA7CHHEKITXWY5UBANCNFSM6AAAAAAWEEV574> . You are receiving this because you commented.Message ID: <ggerganov/llama. ***@***.***>

add capabiliy to convert from ggml back to torch or hf format for fur…

84ba1fd

…ther consumption/training/finetuning

ductai199x mentioned this pull request Mar 22, 2023

Converting GGML Q4_0 back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning #359

Closed

gjmulder added the enhancement New feature or request label Mar 22, 2023

ggerganov approved these changes Mar 28, 2023

View reviewed changes

ggerganov merged commit d0330fd into ggml-org:master Mar 28, 2023

SlyEcho mentioned this pull request Apr 21, 2023

ROCm Port #1087

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning #403

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning #403

ductai199x commented Mar 22, 2023 •

edited

Loading

PriNova commented Mar 22, 2023 •

edited

Loading

onusai commented Mar 23, 2023 •

edited

Loading

ductai199x commented Mar 23, 2023

onusai commented Mar 23, 2023 •

edited

Loading

ductai199x commented Mar 23, 2023

anzz1 commented Mar 26, 2023

ductai199x commented Mar 26, 2023

anzz1 commented Mar 27, 2023

ductai199x commented Mar 27, 2023

ggerganov commented Mar 28, 2023

webpolis commented Aug 23, 2023

ductai199x commented Aug 23, 2023

webpolis commented Aug 23, 2023 via email

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning #403

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning #403

Conversation

ductai199x commented Mar 22, 2023 • edited Loading

PriNova commented Mar 22, 2023 • edited Loading

onusai commented Mar 23, 2023 • edited Loading

ductai199x commented Mar 23, 2023

onusai commented Mar 23, 2023 • edited Loading

ductai199x commented Mar 23, 2023

anzz1 commented Mar 26, 2023

ductai199x commented Mar 26, 2023

anzz1 commented Mar 27, 2023

ductai199x commented Mar 27, 2023

ggerganov commented Mar 28, 2023

webpolis commented Aug 23, 2023

ductai199x commented Aug 23, 2023

webpolis commented Aug 23, 2023 via email

ductai199x commented Mar 22, 2023 •

edited

Loading

PriNova commented Mar 22, 2023 •

edited

Loading

onusai commented Mar 23, 2023 •

edited

Loading

onusai commented Mar 23, 2023 •

edited

Loading