Skip to content

Add support for running bloom models #452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bil-ash opened this issue Mar 24, 2023 · 7 comments
Closed

Add support for running bloom models #452

bil-ash opened this issue Mar 24, 2023 · 7 comments
Labels
enhancement New feature or request model Model specific stale

Comments

@bil-ash
Copy link

bil-ash commented Mar 24, 2023

Bloom models have a more permissive license than llama models and are also multilingual in nature. While there is a project based on llama.cpp which can perform inference of bloom models, development seems to be slow and might even stagnate after a few days. So I am requesting to add support for running bloom models using llama.cpp(most probably with a command-line switch)

@v3ss0n
Copy link

v3ss0n commented Mar 24, 2023

may be move to discussion?

@gjmulder gjmulder added enhancement New feature or request model Model specific labels Mar 26, 2023
@reedxiao
Copy link

reedxiao commented Apr 4, 2023

Agreed, and Bloom has much better support and performance for other languages, while llama is mostly English focused.

@acul3
Copy link

acul3 commented Apr 9, 2023

+1 to this

i'm already fine tuning bloom for instruction task and the result quite good

bloomz.cpp seem doesnt have capabilities to do inference for instruction task (define prompt)

@akumaburn
Copy link

akumaburn commented Apr 9, 2023

FYI, I've created a conversion script that successfully turns bloom's tokenizer.json files into tokenizer.model files here: #867

I'm also working on a chunked conversion script pytorch to ggml for bloomz-176b, so I think its a good idea to add bloom support also. It should allow very large models to be converted with much less memory.

@akumaburn
Copy link

FYI, I've created a conversion script that successfully turns bloom's tokenizer.json files into tokenizer.model files here: #867

I'm also working on a chunked conversion script pytorch to ggml for bloomz-176b, so I think its a good idea to add bloom support also. It should allow very large models to be converted with much less memory.

Chunked conversion script is here: 74b92ff

This loads the model layer by layer instead of all at once and does the conversion layer by layer as well. It has significantly lower memory requirements.

See also:
#867

cc @reedxiao @rabidcopy

@phamkhactu
Copy link

phamkhactu commented Aug 9, 2023

Hi @akumaburn, @bil-ash

I've followed your tutorial for convert bloom to ggml, but if I want to use bloom with llama cpp.
I have tried on quantizer to q4

llama_model_quantize: failed to quantize: unexpectedly reached end of file
main: failed to quantize model from /7b/ggml-model--f16.bin

I think it comes from other format architecture.
How can I do for using bloom in llama?
Thank you

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023
…-gpu-step-4

Update macOS Metal GPU step 4
@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model Model specific stale
Projects
None yet
Development

No branches or pull requests

7 participants