Skip to content

Add support for request caching #3637

@ekzhu

Description

@ekzhu

What feature would you like to be added?

Implement a caching mechanism for LLM API calls to reduce unnecessary API calls, similar to that of in 0.2.

When enabled, this feature should allow us to retrieve cached responses for identical LLM requests instead of making new API calls. Ideal to include a configuration flag to enable/disable caching as well as for managing the cache.

We don't need to follow the same API as in the 0.2 version. We can have the cache managed by the model client instead.

Why is this needed?

Save cost on identical inference requests.

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions