Add support for running bloom models #452

bil-ash · 2023-03-24T02:26:13Z

Bloom models have a more permissive license than llama models and are also multilingual in nature. While there is a project based on llama.cpp which can perform inference of bloom models, development seems to be slow and might even stagnate after a few days. So I am requesting to add support for running bloom models using llama.cpp(most probably with a command-line switch)

v3ss0n · 2023-03-24T05:32:11Z

may be move to discussion?

reedxiao · 2023-04-04T12:12:25Z

Agreed, and Bloom has much better support and performance for other languages, while llama is mostly English focused.

acul3 · 2023-04-09T07:40:51Z

+1 to this

i'm already fine tuning bloom for instruction task and the result quite good

bloomz.cpp seem doesnt have capabilities to do inference for instruction task (define prompt)

akumaburn · 2023-04-09T23:13:22Z

FYI, I've created a conversion script that successfully turns bloom's tokenizer.json files into tokenizer.model files here: #867

I'm also working on a chunked conversion script pytorch to ggml for bloomz-176b, so I think its a good idea to add bloom support also. It should allow very large models to be converted with much less memory.

akumaburn · 2023-04-10T21:28:08Z

FYI, I've created a conversion script that successfully turns bloom's tokenizer.json files into tokenizer.model files here: #867

I'm also working on a chunked conversion script pytorch to ggml for bloomz-176b, so I think its a good idea to add bloom support also. It should allow very large models to be converted with much less memory.

Chunked conversion script is here: 74b92ff

This loads the model layer by layer instead of all at once and does the conversion layer by layer as well. It has significantly lower memory requirements.

See also:
#867

cc @reedxiao @rabidcopy

phamkhactu · 2023-08-09T08:46:52Z

Hi @akumaburn, @bil-ash

I've followed your tutorial for convert bloom to ggml, but if I want to use bloom with llama cpp.
I have tried on quantizer to q4

llama_model_quantize: failed to quantize: unexpectedly reached end of file
main: failed to quantize model from /7b/ggml-model--f16.bin

I think it comes from other format architecture.
How can I do for using bloom in llama?
Thank you

…-gpu-step-4 Update macOS Metal GPU step 4

github-actions · 2024-04-10T01:07:59Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gjmulder added enhancement New feature or request model Model specific labels Mar 26, 2023

akumaburn mentioned this issue Apr 13, 2023

Added new memory efficient conversion script for hf to ggml format, tested on bloomz 176B + Added token conversion script to convert from tokenizer.json format to tokenizer.model #867

Closed

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Merge pull request ggml-org#452 from audreyfeldroy/update-macos-metal…

7952ca5

…-gpu-step-4 Update macOS Metal GPU step 4

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 10, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for running bloom models #452

Add support for running bloom models #452

bil-ash commented Mar 24, 2023

v3ss0n commented Mar 24, 2023

reedxiao commented Apr 4, 2023

acul3 commented Apr 9, 2023

akumaburn commented Apr 9, 2023 •

edited

Loading

akumaburn commented Apr 10, 2023

phamkhactu commented Aug 9, 2023 •

edited

Loading

github-actions bot commented Apr 10, 2024

Add support for running bloom models #452

Add support for running bloom models #452

Comments

bil-ash commented Mar 24, 2023

v3ss0n commented Mar 24, 2023

reedxiao commented Apr 4, 2023

acul3 commented Apr 9, 2023

akumaburn commented Apr 9, 2023 • edited Loading

akumaburn commented Apr 10, 2023

phamkhactu commented Aug 9, 2023 • edited Loading

github-actions bot commented Apr 10, 2024

akumaburn commented Apr 9, 2023 •

edited

Loading

phamkhactu commented Aug 9, 2023 •

edited

Loading