Auto-unload models on idle

I recently discovered this due to the changelog of llama.cpp, and I love it – minimal space requirements and pretty direct access to llama.cpp. I'm a fan! One thing I would absolutely love is if there was the ability to automatically unload the model on idle. I would imagine the following:

* A new setting allowing users to define an auto-unload threshold (maybe 5 minutes or so) after which the model gets unloaded again (essentially same function as clicking the model in the dropdown menu)
* In a perfect world, maybe even a listener in the web UI that, when the user sends a new message, to reload the model again.

The first one would be pretty nifty, because that frees up memory even if you forget to unload the model. The second feature is probably a bit more difficult (since it will require stopping the llama-server, if I'm not mistaken?), and not as mission-critical, as re-starting the server with a simple click is relatively straight forward. It's really mostly about the unloading of the model to free up memory.

Thanks for consideration!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-unload models on idle #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Auto-unload models on idle #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions