-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[fixed]The last code build with memory fix running result is not good in my pc. #462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Seems the flag n makes different. |
Rollback to my back up... |
some of the last commits changed/touched how memory is handled. |
Kinda similar case here, although im unsure which specific commit caused performance loss After swapping out old exe for new one, I went from 207 ms per token to 269 on 13b alpaca |
Always. -c works fine even with 5000+ in my pc😅 so I guess there be somewhere else problems. |
Yes. In 30b I was enable to chat with more than half an hour, now with less than 20 tokens the ggml show memory not enough. |
@FNsi please try again with latest master. |
Ban the Blas make it works, still have little performance loss anyway. a guess from me, is it because the blas try to make 4 bit prompt back to 16 bit in run time...? 😅😂 |
It's not the 4-bits - it does not work with F16 either. Which is super strange since this has been used in |
So, it seems mean that, a big performance improvement is coming as if it's been figured out. |
Popen() needs to be used with 'with' or have .wait() called or be destroyed, otherwise there is a zombie child that sticks around until the object is GC'd.
Be obviously slower with Q_1 30b model. And the memory usage become garbage...
(Linux 5.19 x64 Ubuntu base)
The text was updated successfully, but these errors were encountered: