Help me understand the memory usage situation when using GPU #2118

JianbangZ · 2023-07-05T21:11:55Z

So I built with cuBLAS, quantize my 7B model to q4_0, offload all my 7B model layers to GPU with ./main, and I realize even though compute is happening in GPU and about 4GB VRAM is taken, the CPU memory never gets a chance to be released. So there is also about 4GB CPU memory in use.
Is this the right behavior? are the weights directlly offloaded to GPU, or loaded to CPU RAM first and then copied to VRAM? but Then why CPU memory is not released or not immediately?

I also tried the server/chat.sh program built with cuBLAS, and I see once server is uprunning, after a short moment CPU memory is released.
Help me understand please

JianbangZ · 2023-08-17T11:24:51Z

@JohannesGaessler Mind providing some insights?

JohannesGaessler · 2023-08-17T11:32:48Z

How should I know? Which OS, git revision, and CLI arguments are you using, and what method are you even using to determine whether or not the memory has been released?

github-actions · 2024-04-09T01:08:24Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

JianbangZ mentioned this issue Jul 5, 2023

VRAM optimization + matrix multiplication discussion #1935

Closed

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help me understand the memory usage situation when using GPU #2118

Help me understand the memory usage situation when using GPU #2118

JianbangZ commented Jul 5, 2023

JianbangZ commented Aug 17, 2023

JohannesGaessler commented Aug 17, 2023

github-actions bot commented Apr 9, 2024

Help me understand the memory usage situation when using GPU #2118

Help me understand the memory usage situation when using GPU #2118

Comments

JianbangZ commented Jul 5, 2023

JianbangZ commented Aug 17, 2023

JohannesGaessler commented Aug 17, 2023

github-actions bot commented Apr 9, 2024