Skip to content

Added new memory efficient conversion script for hf to ggml format, tested on bloomz 176B + Added token conversion script to convert from tokenizer.json format to tokenizer.model #867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

akumaburn
Copy link

Converts tokenizer.json to tokenizer.model format, tested with bigscience model (eg https://huggingface.co/bigscience/bloomz), usage like:

python3 tokenconvert.py ./ad033898-d849-41a1-9ecd-ad24e016bc4f/bloomz

…o tokenizer.model format, tested with bigscience models
@akumaburn
Copy link
Author

akumaburn commented Apr 10, 2023

I've added a helper script as well in 74b92ff

This script is more memory efficient than the existing convert-hf-to-ggml.py ; I was able to use it to convert bloomz-176b to float16 ggml format with under 64GB of ram utilization.

@akumaburn akumaburn changed the title Added token conversion script to convert from tokenizer.json format to tokenizer.model Added new memory efficient conversion script for hf to ggml format, tested on bloomz 176B + Added token conversion script to convert from tokenizer.json format to tokenizer.model Apr 10, 2023
@bil-ash
Copy link

bil-ash commented Apr 11, 2023

What is the runtime memory usage of the converted bloomz-176b model?

@akumaburn
Copy link
Author

What is the runtime memory usage of the converted bloomz-176b model?

I believe its around 340GB of ram, you'd need to run it with bloomz.cpp (this particular fork: NouamaneTazi/bloomz.cpp#21) which doesn't have mmap at the moment.

I only have around ~96GB of ram, so I've not been able to run the model yet, I've been working on a quantizer as well to quantize it to int4 / q4_0 which is going well, but I suspect it may still not be enough.

@TheBloke
Copy link
Contributor

TheBloke commented Apr 12, 2023

I've added a helper script as well in 74b92ff

This script is more memory efficient than the existing convert-hf-to-ggml.py ; I was able to use it to convert bloomz-176b to float16 ggml format with under 64GB of ram utilization.

Are you aware that @comex has already written a new conversion script for converting HF to GGML? It has been approved for merge but hasn't been merged yet. It can be seen here: #545

So you might want to compare your new convert script to that, rather than the original script provided in llama.cpp currently.

Converts tokenizer.json to tokenizer.model format, tested with bigscience model (eg https://huggingface.co/bigscience/bloomz), usage like:

Thanks very much for the tokenizer.json conversion script! I recently hoped to convert my GPTQ 4-bit version of GeorgiaTechResearch/Galpaca 30B (an OPT model) to GGML. My model repo is: huggingface/galpaca-30B-GPTQ-4bit-128g I couldn't use comex's convert.py due to lack of tokenizer.model.

I tried your script and it seemed to work to produce a tokenizer.model:

tomj@Eddie ~/src $ python tokenconvert.py huggingface/galpaca-30B-GPTQ-4bit-128g
/Users/tomj/src/tokenconvert.py:38: DeprecationWarning: Deprecated in 0.9.0: BPE.__init__ will not create from files anymore, try `BPE.from_file` instead
  tokenizer = Tokenizer(models.BPE(vocab_file.name, merges_file.name))
Saving.. tokenizer.model to huggingface/galpaca-30B-GPTQ-4bit-128g/tokenizer.model
Saved tokenizer.model to huggingface/galpaca-30B-GPTQ-4bit-128g/tokenizer.model

I have no idea if it's even possible to try and convert an OPT model to GGML, but I thought I'd give it a try anyway!

Unfortunately I still can't convert the model. comex's convert.py fails on the new tokenizer.model file:

tomj@Eddie ~/src $ python ./convert.py huggingface/galpaca-30B-GPTQ-4bit-128g/galpaca-30B-4bit-128g.no-act-order.pt --outfile huggingface/galpaca-30B-GPTQ-4bit-128g/galpaca-30B-4bit-128g.GGML.bin
Loading model file huggingface/galpaca-30B-GPTQ-4bit-128g/galpaca-30B-4bit-128g.no-act-order.pt
Loading vocab file huggingface/galpaca-30B-GPTQ-4bit-128g/tokenizer.model
Traceback (most recent call last):
  File "/Users/tomj/src/./convert.py", line 1053, in <module>
    main()
  File "/Users/tomj/src/./convert.py", line 1042, in main
    vocab = load_vocab(vocab_dir)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tomj/src/./convert.py", line 990, in load_vocab
    return SentencePieceVocab(path, added_tokens_path if added_tokens_path.exists() else None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tomj/src/./convert.py", line 125, in __init__
    self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentencepiece/__init__.py", line 447, in Init
    self.Load(model_file=model_file, model_proto=model_proto)
  File "/usr/local/lib/python3.11/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)

I tried your conversion script as well, but I can't get it working on any model.

Trying it with locally downloaded HF model:

tomj@Eddie ~/src $ ll huggingface/koala-7B-HF
total 26323264
drwxr-xr-x  13 tomj  staff   416B  7 Apr 13:27 .
drwxr-xr-x  14 tomj  staff   448B 12 Apr 09:52 ..
drwxr-xr-x  13 tomj  staff   416B  7 Apr 16:07 .git
-rw-r--r--   1 tomj  staff   1.4K  7 Apr 13:20 .gitattributes
-rw-r--r--   1 tomj  staff   2.1K  7 Apr 13:20 README.md
-rw-r--r--   1 tomj  staff   507B  7 Apr 13:20 config.json
-rw-r--r--   1 tomj  staff   137B  7 Apr 13:20 generation_config.json
-rw-r--r--   1 tomj  staff   9.3G  7 Apr 13:27 pytorch_model-00001-of-00002.bin
-rw-r--r--   1 tomj  staff   3.3G  7 Apr 13:24 pytorch_model-00002-of-00002.bin
-rw-r--r--   1 tomj  staff    26K  7 Apr 13:20 pytorch_model.bin.index.json
-rw-r--r--   1 tomj  staff     2B  7 Apr 13:20 special_tokens_map.json
-rw-r--r--   1 tomj  staff   488K  7 Apr 13:21 tokenizer.model
-rw-r--r--   1 tomj  staff   141B  7 Apr 13:20 tokenizer_config.json

tomj@Eddie ~/src $ ~/anaconda3/envs/torch21/bin/python ./convert-hf-to-ggml-v2.py huggingface/koala-7B-HF ./koala-7B-test
Loading model:  huggingface/koala-7B-HF
Traceback (most recent call last):
  File "/Users/tomj/src/./convert-hf-to-ggml-v2.py", line 61, in <module>
    fout.write(struct.pack("i", hparams["n_head"]))
KeyError: 'n_head'

tomj@Eddie ~/src $ ~/anaconda3/envs/torch21/bin/python -V
Python 3.10.10

I get the same error if I try it with a remote model:

tomj@Eddie ~/src $ ~/anaconda3/envs/torch21/bin/python  convert-hf-to-ggml-v2.py --debug "sallywww/Llama-7B" ./koala-13B-test
Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 264/264 [00:00<00:00, 48.2kB/s]
Downloading tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 1.75MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 3.00/3.00 [00:00<00:00, 2.70kB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 578/578 [00:00<00:00, 478kB/s]
Loading model:  sallywww/Llama-7B
Traceback (most recent call last):
  File "/Users/tomj/src/convert-hf-to-ggml-v2.py", line 61, in <module>
    fout.write(struct.pack("i", hparams["n_head"]))
KeyError: 'n_head'

@akumaburn
Copy link
Author

akumaburn commented Apr 13, 2023

@TheBloke Thanks for linking me to @comex's script ; my script uses the HuggingFace library to handle the tokenization : ( https://github.com/huggingface/tokenizers ) , and assumes Byte-Pair Encoding by default- see: https://huggingface.co/docs/transformers/tokenizer_summary

Comex's script seems to be using SentencePiece (https://arxiv.org/pdf/1808.06226.pdf) which is a different tokenizer.

I had created it with the hope of using bloom models (which currently llama.cpp doesn't support) see: #452 and it works for that purpose.

In reality the conversion script would probably have to support all common tokenizers in order to work for each

@akumaburn
Copy link
Author

akumaburn commented Apr 13, 2023

I just updated the token conversion script to add support for "SentencePiece" and "WordPiece" tokenizers, Usage has been updated to:

python3 tokenconvert.py TokenizerType InDIR [OutDir] where TokenizerType can be one of ["BPE","WordPiece","SentencePiece"]

eg:

python3 tokenconvert.py BPE ./ad033898-d849-41a1-9ecd-ad24e016bc4f/bloomz

@TheBloke I'm not sure if this will help with your quest of converting your OPT model into a GGML model, but I thought I'd tag you anyways.

EDIT: Actually, looks like SentencePiece specifically isn't supported by HuggingFace's library, I'm taking a look to see...

@aidaho
Copy link

aidaho commented Apr 18, 2023

I've tried this on Bloomz mt0-xl:

(v:llama.cpp) aidaho@lin:~/bin/llama.cpp$ python3 tokenconvert.py BPE /media/aidaho/blue/llm-files/mt0-xl/
Traceback (most recent call last):
  File "/home/aidaho/bin/llama.cpp/tokenconvert.py", line 87, in <module>
    tokenizer = load_tokenizer_from_json(input_json_path, special_tokens_map_path, tokenizer_config_path, tokenizer_type)
  File "/home/aidaho/bin/llama.cpp/tokenconvert.py", line 34, in load_tokenizer_from_json
    merges = model_data["merges"]
KeyError: 'merges'
Code: 1

Am I doing anything wrong?

@akumaburn
Copy link
Author

@aidaho It looks like that model is using T5Tokenizer(https://huggingface.co/bigscience/mt0-xl/blob/main/tokenizer_config.json) which is not supported by this script, which only supports BPE and WordPiece at the moment..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants