What is evaluations?

Why use evaluations?

Quickstart

A guide to benchmark LLM performance with evals.

Overview

Authorization

Docs

Keywords AI is a full-stack LLM engineering platform for developers and PMs.

What is Keywords AI?

A full-stack LLM engineering platform for developers and PMs.

Log your LLM requests and responses asynchronously

AI observability

A guide to manage and test prompts as a team.

Test and compare prompts for different models in the LLM playground

Prompt Playground

Prompt playground

Keywords AI LLM proxy supports you call 250+ LLMs using the same input/output format.

Call 200+ LLMs with a single OpenAI compatible format

Supported models

This is the guide for how to add your own credentials in Keywords AI

Provider API keys

Increase your LLM rate limits with our load balancing feature.

Load balancing

When an LLM call fails, our system detects the error and retries the request to prevent failovers.

Retries

This is the guide for how to set up fallbacks in Keywords AI

Fallback models

Reduce latency and save LLM costs by caching LLM prompts and responses.

Caches

Prompt caching is a feature that allows you to cache the results of a prompt so that it can be reused later.

Prompt caching

This is the guide for how to use function calling in Keywords AI

Function calling

Learn how to upload image variables through Keywords AI gateway.

Image upload

Create custom model names and call them with the AI gateway

Custom model

Learn how to use PDF files with Keywords AI.

PDF support

API keys management

Learn how to create a new team in Keywords AI.

Create a new team

Discover the importance of LLM monitoring in AI. Learn how it ensures accuracy, performance, and reliability while reducing costs. Keywords AI offers top-tier LLM monitoring tools.

What is LLM monitoring?

Retry failed request due to busy upstream provider

Automatic retries

How streaming works

Logging API

LLM Proxy

Batch logging

Retrieve Log list

Get log

Update logs

Thread list

Create prompt

Retrieve prompts

Update prompt

Delete prompt

Create version

Retrieve versions

Patch version

Delete version

Text to speech

Speech to text

Embeddings

User creation

User list

User detail

User update

Model list

Create API key

List API keys

Retrieve API key

Update API key

Delete API key

Use your own API keys through Keywords AI

Anthropic SDK

LangChain SDK

Mastra with Keywords AI

Mastra

A guide to integrating Mem0 with Keywords AI.

Mem0

LlamaIndex

BAML is a domain-specific language to write and test LLM functions. Use BAML with Keywords AI to get complete LLM observability

BAML

OpenAI

Use your own Anthropic credits through Keywords AI

Anthropic

Use your own Azure OpenAI credits through Keywords AI

Azure OpenAI

Use your own Vertex AI credits through Keywords AI

Google Vertex AI

Use your own Gemini credits through Keywords AI

Gemini

Use your own AWS credits through Keywords AI

AWS Bedrock

Nebius AI

Novita AI

Use your own Groq credits through Keywords AI

Groq

A guide to call and log AssemblyAI speech-to-text model with Keywords AI gateway.

AssemblyAI

Use your own Fireworks credits through Keywords AI

Fireworks

Use your own Together AI credits through Keywords AI

Together AI

Use your own Perplexity AI credits through Keywords AI

Perplexity AI

Use your own OpenRouter credits through Keywords AI

OpenRouter

Linkup is an AI dev tool that helps you connect your code with your LLM. It’s a platform that allows you to connect your code with your LLM and get observability.

Linkup

PostHog is an open-source product analytics platform, you can monitor your product traffic and user sessions with PostHog. Now you can connect PostHog with Keywords AI to monitor your LLM performance and get observability.

PostHog

Don't switch between tools to build, monitor, and test your AI products.

Build and Scale AI Products with Keywords AI: Platform Overview

How to structure LLM output with Anthropic models

Filters in Jinja templates

Set up production monitoring

LLM proxy

Prototype with an AI gateway

Managing prompts is a crucial part of building AI applications. Migrate your prompts to Keywords AI to manage them more effectively.

Migrate your prompts

Via Logging API

Via LLM proxy

You can use Keywords AI Traces to trace your LLM requests and responses.

LLM tracing

Via Tracing

Learn how to customize your metrics dashboard in Keywords AI for complete LLM application observability.

Customize Dashboard

Learn how to filter metrics on your dashboard in Keywords AI for complete LLM application observability.

Filter metrics

Monitor complex LLM workflows including chains, agents with tools, and advanced prompts in your AI products using Traces.

You can pass Keywords AI params to your traces to get more insights into your LLM workflows.

Pass Keywords AI params

Group traces across different sessions by a trace group id.

Group traces

Custom properties allows you to add any additional information to your logs, which can help you tag and filter logs.

Send custom properties

Learn how to use our Thread feature to organize chat logs into cohesive conversation threads.

Group logs in a session

Export your LLM logs to CSV or JSON for external analysis.

Export logs

Log URL

You can send your prompt variables to Keywords AI to log them in your logs. So you can see the prompt template and variables separately

Log prompt variables

Keywords AI allows you to track user data and monitor the user's data in the platform.

Enable user analytics

Set budget for users

Subscribe to system status and get notified when an LLM outage is detected.

Subscribe to Alerts & Warnings

Disable logging

Receive notifications when certain events occur.

Webhooks

Iterating and versioning prompts as a team.

A guide on how to create and versiona prompt in Keywords AI.

Create a prompt

Deploy a prompt

Monitor a prompt in production and get detailed metrics.

Monitor a prompt

Improve a prompt

Improve prompts with the model playground

Share a prompt

Integrate 200+ LLMs

Add own credentials

Add fallback models

Enable load balancing

Cache LLM responses

Enable prompt caching

Configure retries

How to create your own golden datasets from logs.

Create a dataset from logs

Easily manage and organize test cases. Import a CSV file and edit it like a Google Sheet.

Import dynamic test cases

A spreadsheet-like interface built for multi-config prompt testing

Compare & debug prompts

Learn how to use OpenAI SDK with Keywords AI LLM proxy.

OpenAI SDK

Vercel AI SDK

Mem0 provides a memory layer for LLM applications, enabling them to remember and learn from user interactions over time.

Get started

AI observability

Prompt engineering

Evaluations

AI gateway

Organization management

Resources

Overview

What is evaluations?

Why use evaluations?

Quickstart

Create an LLM-as-judge evaluator

Create a human evaluator

Run experiments with testsets

Get started

AI observability

Prompt engineering

Evaluations

AI gateway

Organization management

Resources

​What is evaluations?

​Why use evaluations?

​Quickstart

Create an LLM-as-judge evaluator

Create a human evaluator

Run experiments with testsets

What is evaluations?

Why use evaluations?

Quickstart