Proxy
LLM caches
Reduce latency and save LLM costs by caching LLM prompts and responses.
Why Caches?
You may find caches useful when you want to:
- Reduce latency: Serve stored responses instantly, eliminating the need for repeated API calls.
- Save costs: Minimize expenses by reusing cached responses instead of making redundant requests.
- Improve performance: Deliver consistently high-quality outputs by serving pre-vetted, cached responses.
How to use Caches?
Turn on caches by setting cache_enabled
to true
. We currently will cache the whole conversation, including the system message, user message and the response.
See the example below, we will cache the user message “Hi, how are you?” and its response.
Cashes parameters
cache_enabled
boolean
Enable or disable caches.
cache_ttl
number
This parameter specifies the time-to-live (TTL) for the cache in seconds.
It’s optional and the default value is 30 days now.
cache_options
boolean
This parameter specifies the cache options. Currently we support cache_by_customer
option, you can set it to true
or false
. If cache_by_customer
is set to true
, the cache will be stored by the customer identifier.
It’s an optional parameter