This is a beta feature. Please do let us know if you encounter any issues. We’ll continuously improve it.

Why Caches?

You may find caches useful when you want to:

  • Reduce latency: Serve stored responses instantly, eliminating the need for repeated API calls.
  • Save costs: Minimize expenses by reusing cached responses instead of making redundant requests.
  • Improve performance: Deliver consistently high-quality outputs by serving pre-vetted, cached responses.

How to use Caches?

Turn on caches by setting cache_enabled to true. We currently will cache the whole conversation, including the system message, user message and the response.

See the example below, we will cache the user message “Hi, how are you?” and its response.

    "model": "gpt-3.5-turbo",
    "messages": [
            "role": "system",
            "content": "Hello, how can I help you today?" // message to be cached, its response will be cached as well
            "role": "user",
            "content": "Hi, how are you?" // message to be cached, its response will be cached as well
    "cache_enabled": true, // enable cache
    "cache_ttl":600, // cache for 10 minutes, optional
    "cache_options": { // optional
        "cache_by_customer": true // or false
    "customer_params": {
        "customer_identifier": "customer_123",
        "name": "Hendrix Liu", //optional
        "email": "" //optional

Cashes parameters


Enable or disable caches.

    "cache_enabled": true

This parameter specifies the time-to-live (TTL) for the cache in seconds.

It’s optional and the default value is 30 days now.
    "cache_ttl": 3600 // in seconds

This parameter specifies the cache options. Currently we support cache_by_customer option, you can set it to true or false. If cache_by_customer is set to true, the cache will be stored by the customer identifier.

It’s an optional parameter
    "cache_options": { // optional
        "cache_by_customer": true // or false