Skip to content

[fixed]The last code build with memory fix running result is not good in my pc. #462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
FNsi opened this issue Mar 24, 2023 · 10 comments
Closed
Labels
bug Something isn't working performance Speed related topics

Comments

@FNsi
Copy link
Contributor

FNsi commented Mar 24, 2023

Be obviously slower with Q_1 30b model. And the memory usage become garbage...
(Linux 5.19 x64 Ubuntu base)

@FNsi
Copy link
Contributor Author

FNsi commented Mar 24, 2023

Seems the flag n makes different.
n=100000 no long work🤔.
But After change n to 4096 also not working...

@FNsi
Copy link
Contributor Author

FNsi commented Mar 24, 2023

Seems the flag n makes different.
n=100000 no long work🤔.
After change n to 4096 also not working.

Rollback to my back up...

@Green-Sky
Copy link
Collaborator

some of the last commits changed/touched how memory is handled.
also there is -c you can set up to 2048.

@x02Sylvie
Copy link

Kinda similar case here, although im unsure which specific commit caused performance loss

After swapping out old exe for new one, I went from 207 ms per token to 269 on 13b alpaca
I suspect impact might be more noticable on 30b and 65b models in this case

@FNsi
Copy link
Contributor Author

FNsi commented Mar 24, 2023

some of the last commits changed/touched how memory is handled.

also there is -c you can set up to 2048.

Always. -c works fine even with 5000+ in my pc😅 so I guess there be somewhere else problems.

@FNsi
Copy link
Contributor Author

FNsi commented Mar 24, 2023

Kinda similar case here, although im unsure which specific commit caused performance loss

After swapping out old exe for new one, I went from 207 ms per token to 269 on 13b alpaca

I suspect impact might be more noticable on 30b and 65b models in this case

Yes. In 30b I was enable to chat with more than half an hour, now with less than 20 tokens the ggml show memory not enough.

@gjmulder gjmulder added bug Something isn't working performance Speed related topics labels Mar 24, 2023
@Green-Sky
Copy link
Collaborator

@FNsi please try again with latest master.

@FNsi
Copy link
Contributor Author

FNsi commented Mar 25, 2023

@FNsi please try again with latest master.

Ban the Blas make it works, still have little performance loss anyway.

a guess from me, is it because the blas try to make 4 bit prompt back to 16 bit in run time...? 😅😂

@ggerganov
Copy link
Member

ggerganov commented Mar 25, 2023

It's not the 4-bits - it does not work with F16 either.
I am almost sure that this F16 BLAS call is somehow wrong (as well as the rest of them):

https://github.com/ggerganov/llama.cpp/blob/8520fc310eab87f2c4612f2a00d4adbd44a20d0d/ggml.c#L6244-L6250

Which is super strange since this has been used in whisper.cpp forever and it seems to work..

@FNsi
Copy link
Contributor Author

FNsi commented Mar 25, 2023

It's not the 4-bits - it does not work with F16 either.

I am almost sure that this F16 BLAS call is somehow wrong:

https://github.com/ggerganov/llama.cpp/blob/8520fc310eab87f2c4612f2a00d4adbd44a20d0d/ggml.c#L6244-L6250

Which is super strange since this has been used in whisper.cpp forever and it seems to work..

So, it seems mean that, a big performance improvement is coming as if it's been figured out.

@FNsi FNsi changed the title The last code build with memory fix running result is not good in my pc. [fixed]The last code build with memory fix running result is not good in my pc. Mar 26, 2023
@FNsi FNsi closed this as completed Mar 27, 2023
AAbushady pushed a commit to AAbushady/llama.cpp that referenced this issue Jan 27, 2024
Popen() needs to be used with 'with' or have .wait() called or be
destroyed, otherwise there is a zombie child that sticks around until
the object is GC'd.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance Speed related topics
Projects
None yet
Development

No branches or pull requests

5 participants