-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
I recently discovered this due to the changelog of llama.cpp, and I love it – minimal space requirements and pretty direct access to llama.cpp. I'm a fan! One thing I would absolutely love is if there was the ability to automatically unload the model on idle. I would imagine the following:
- A new setting allowing users to define an auto-unload threshold (maybe 5 minutes or so) after which the model gets unloaded again (essentially same function as clicking the model in the dropdown menu)
- In a perfect world, maybe even a listener in the web UI that, when the user sends a new message, to reload the model again.
The first one would be pretty nifty, because that frees up memory even if you forget to unload the model. The second feature is probably a bit more difficult (since it will require stopping the llama-server, if I'm not mistaken?), and not as mission-critical, as re-starting the server with a simple click is relatively straight forward. It's really mostly about the unloading of the model to free up memory.
Thanks for consideration!
Metadata
Metadata
Assignees
Labels
No labels