Caches

What is caches?

Caches are storage systems that save and reuse exact LLM requests. You can enable caches to reduce LLM costs and improve response times.

Why Caches?

You may find caches useful when you want to:

Reduce latency: Serve stored responses instantly, eliminating the need for repeated API calls.
Save costs: Minimize expenses by reusing cached responses instead of making redundant requests.

How to use Caches?

Turn on caches by setting cache_enabled to true. We currently will cache the whole conversation, including the system message, user message and the response. See the example below, we will cache the user message “Hi, how are you?” and its response.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.keywordsai.co/api/",
    api_key="YOUR_KEYWORDSAI_API_KEY",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Tell me a long story"}
    ],
    extra_body={
        "cache_enabled": True,
        "cache_ttl": 600,
        "cache_options": {
            "cache_by_customer": True
        }
    }
)

Caches parameters

cache_enabled

boolean

Enable or disable caches.

{
    "cache_enabled": true
}

cache_ttl

number

This parameter specifies the time-to-live (TTL) for the cache in seconds.

It’s optional and the default value is 30 days now.

{
    "cache_ttl": 3600 // in seconds
}

cache_options

boolean

This parameter specifies the cache options. Currently we support cache_by_customer option, you can set it to true or false. If cache_by_customer is set to true, the cache will be stored by the customer identifier.

It’s an optional parameter

{
    "cache_options": { // optional
        "cache_by_customer": true, // or false
        "omit_log": true // or false
    }
}

How to view caches

You can view the caches on the Logs page. The model tag will be keywordsai/cache. You can also filter the logs by the Cache hit field.

Omit logs when cache hit

You can omit the logs when cache hit by setting the omit_logs parameter to true or go to Caches in Settings. So this won’t generate a new LLM log when the cache is hit.

LLM caching vs Prompt caching

Regular caching stores complete prompt responses. When the same prompt is received, the system returns the stored response, reducing costs but limiting response variety. Prompt caching stores the model’s intermediate computation state. This allows the model to generate diverse responses while still saving computational costs, as it doesn’t need to reprocess the entire prompt from scratch. View the Prompt caching section for more information.

Get started

Products

Admin

Resources

What is caches?

Why Caches?

How to use Caches?

Caches parameters

How to view caches

Omit logs when cache hit

LLM caching vs Prompt caching

Get started

Products

Admin

Resources

​What is caches?

​Why Caches?

​How to use Caches?

​Caches parameters

​How to view caches

​Omit logs when cache hit

​LLM caching vs Prompt caching

What is caches?

Why Caches?

How to use Caches?

Caches parameters

How to view caches

Omit logs when cache hit

LLM caching vs Prompt caching