Prompt caching

You can only enable prompt caching if you are using LLM proxy for Anthropic models.

How to use prompt caching

You can either use the prompt_caching feature through the LLM proxy or log those LLM requests that are cached, which will give you a better observability.

Anthropic Python SDK
Anthropic TypeScript SDK
Proxy API
Logging API

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.keywordsai.co/api/anthropic/",
    api_key="Your_Keywords_AI_API_Key",
)

message = client.messages.create(
    model="claude-3-opus-20240229",
    system=[
      {
        "type": "text",
        "text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.\n",
      },
      {
        "type": "text",
        "text": "<the entire contents of 'Pride and Prejudice'>",
        "cache_control": {"type": "ephemeral"}
      }
    ],
    messages=[{"role": "user", "content": "Analyze the major themes in 'Pride and Prejudice'."}]
)

print(message.content)

How does prompt caching work?

All information is from Anthropic’s documentation. When you send a request with prompt caching enabled:

The system checks if a prompt prefix, up to a specified cache breakpoint, is already cached from a recent query.
If found, it uses the cached version, reducing processing time and costs.
Otherwise, it processes the full prompt and caches the prefix once the response begins.

This is especially useful for:

Prompts with many examples
Large amounts of context or background information
Repetitive tasks with consistent instructions
Long multi-turn conversations

The cache has a 5-minute lifetime, refreshed each time the cached content is used.

Prompt caching pricing for Anthropic models

Model	Base Input Tokens	Cache Writes	Cache Hits	Output Tokens
Claude 3.5 Sonnet	$3 / MTok	$3.75 / MTok	$0.30 / MTok	$15 / MTok
Claude 3.5 Haiku	$1 / MTok	$1.25 / MTok	$0.10 / MTok	$5 / MTok
Claude 3 Haiku	$0.25 / MTok	$0.30 / MTok	$0.03 / MTok	$1.25 / MTok
Claude 3 Opus	$15 / MTok	$18.75 / MTok	$1.50 / MTok	$75 / MTok

Note:

Cache write tokens are 25% more expensive than base input tokens
Cache read tokens are 90% cheaper than base input tokens
Regular input and output tokens are priced at standard rates

Supported models

Prompt caching is currently supported on:

Claude 3.5 Sonnet
Claude 3.5 Haiku
Claude 3 Haiku
Claude 3 Opus

Cache Limitations

The minimum cacheable prompt length is:

1024 tokens for Claude 3.5 Sonnet and Claude 3 Opus
2048 tokens for Claude 3.5 Haiku and Claude 3 Haiku

Shorter prompts cannot be cached, even if marked with cache_control. Any requests to cache fewer than this number of tokens will be processed without caching.

Get started

Features

Admin

Security

Resources

Help & Community

Prompt caching

How to use prompt caching

How does prompt caching work?

Prompt caching pricing for Anthropic models

Supported models

Cache Limitations

Get started

Features

Admin

Security

Resources

Help & Community

​How to use prompt caching

​How does prompt caching work?

​Prompt caching pricing for Anthropic models

​Supported models

​Cache Limitations

How to use prompt caching

How does prompt caching work?

Prompt caching pricing for Anthropic models

Supported models

Cache Limitations