[fixed]The last code build with memory fix running result is not good in my pc. #462

FNsi · 2023-03-24T14:22:06Z

Be obviously slower with Q_1 30b model. And the memory usage become garbage...
(Linux 5.19 x64 Ubuntu base)

FNsi · 2023-03-24T14:28:19Z

Seems the flag n makes different.
n=100000 no long work🤔.
But After change n to 4096 also not working...

FNsi · 2023-03-24T14:30:19Z

Seems the flag n makes different.
n=100000 no long work🤔.
After change n to 4096 also not working.

Rollback to my back up...

Green-Sky · 2023-03-24T14:35:27Z

some of the last commits changed/touched how memory is handled.
also there is -c you can set up to 2048.

x02Sylvie · 2023-03-24T14:38:25Z

Kinda similar case here, although im unsure which specific commit caused performance loss

After swapping out old exe for new one, I went from 207 ms per token to 269 on 13b alpaca
I suspect impact might be more noticable on 30b and 65b models in this case

FNsi · 2023-03-24T14:42:46Z

some of the last commits changed/touched how memory is handled.

also there is -c you can set up to 2048.

Always. -c works fine even with 5000+ in my pc😅 so I guess there be somewhere else problems.

FNsi · 2023-03-24T14:45:24Z

Kinda similar case here, although im unsure which specific commit caused performance loss

After swapping out old exe for new one, I went from 207 ms per token to 269 on 13b alpaca

I suspect impact might be more noticable on 30b and 65b models in this case

Yes. In 30b I was enable to chat with more than half an hour, now with less than 20 tokens the ggml show memory not enough.

Green-Sky · 2023-03-24T21:41:48Z

@FNsi please try again with latest master.

FNsi · 2023-03-25T03:19:19Z

@FNsi please try again with latest master.

Ban the Blas make it works, still have little performance loss anyway.

a guess from me, is it because the blas try to make 4 bit prompt back to 16 bit in run time...? 😅😂

ggerganov · 2023-03-25T04:36:45Z

It's not the 4-bits - it does not work with F16 either.
I am almost sure that this F16 BLAS call is somehow wrong (as well as the rest of them):

https://github.com/ggerganov/llama.cpp/blob/8520fc310eab87f2c4612f2a00d4adbd44a20d0d/ggml.c#L6244-L6250

Which is super strange since this has been used in whisper.cpp forever and it seems to work..

FNsi · 2023-03-25T04:43:40Z

It's not the 4-bits - it does not work with F16 either.

I am almost sure that this F16 BLAS call is somehow wrong:

https://github.com/ggerganov/llama.cpp/blob/8520fc310eab87f2c4612f2a00d4adbd44a20d0d/ggml.c#L6244-L6250

Which is super strange since this has been used in whisper.cpp forever and it seems to work..

So, it seems mean that, a big performance improvement is coming as if it's been figured out.

Popen() needs to be used with 'with' or have .wait() called or be destroyed, otherwise there is a zombie child that sticks around until the object is GC'd.

gjmulder added bug Something isn't working performance Speed related topics labels Mar 24, 2023

FNsi changed the title ~~The last code build with memory fix running result is not good in my pc.~~ [fixed]The last code build with memory fix running result is not good in my pc. Mar 26, 2023

FNsi closed this as completed Mar 27, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fixed]The last code build with memory fix running result is not good in my pc. #462

[fixed]The last code build with memory fix running result is not good in my pc. #462

FNsi commented Mar 24, 2023 •

edited

Loading

FNsi commented Mar 24, 2023 •

edited

Loading

FNsi commented Mar 24, 2023

Green-Sky commented Mar 24, 2023

x02Sylvie commented Mar 24, 2023

FNsi commented Mar 24, 2023

FNsi commented Mar 24, 2023

Green-Sky commented Mar 24, 2023

FNsi commented Mar 25, 2023

ggerganov commented Mar 25, 2023 •

edited

Loading

FNsi commented Mar 25, 2023

[fixed]The last code build with memory fix running result is not good in my pc. #462

[fixed]The last code build with memory fix running result is not good in my pc. #462

Comments

FNsi commented Mar 24, 2023 • edited Loading

FNsi commented Mar 24, 2023 • edited Loading

FNsi commented Mar 24, 2023

Green-Sky commented Mar 24, 2023

x02Sylvie commented Mar 24, 2023

FNsi commented Mar 24, 2023

FNsi commented Mar 24, 2023

Green-Sky commented Mar 24, 2023

FNsi commented Mar 25, 2023

ggerganov commented Mar 25, 2023 • edited Loading

FNsi commented Mar 25, 2023

FNsi commented Mar 24, 2023 •

edited

Loading

FNsi commented Mar 24, 2023 •

edited

Loading

ggerganov commented Mar 25, 2023 •

edited

Loading