Docs home page
Search...
⌘K
Get started
Overview
Quickstart
Products
Dashboard
Logs
Traces
Prompt management
Evaluations
Quickstart
Experiments
LLM evals
Human evals
Gateway
Users
Notifications
Admin
Keywords API keys
LLM provider keys
Team management
Resources
What is LLM monitoring?
Automatic retries
How streaming works
Relari eval
Discord
platform
Docs home page
Search...
⌘K
Discord
platform
platform
Search...
Navigation
Evaluations
Quickstart
Documentation
Integrations
API reference
Cookbooks
Changelog ↗
Documentation
Integrations
API reference
Cookbooks
Changelog ↗
Evaluations
Quickstart
Copy page
A guide to benchmark LLM performance with evals.
What is evaluations?
Evaluations help you assess the performance of your prompts. You can create custom evals to measure different dimensions of output quality.
Why use evaluations?
To measure the quality of a prompt or a model.
To find the best prompt or model for a specific task.
To optimize your prompts and models.
Quickstart
LLM-as-judge evaluator
Create an LLM-as-judge evaluator to measure the quality of your prompts.
Create a human evaluator
Create a human evaluator to measure the quality of your prompts.
Run experiments with testsets
Run experiments with testsets to measure the performance of your prompts.
Was this page helpful?
Yes
No
Suggest edits
Raise issue
Previous
Overview
A guide to manage test cases for running experiments and run prompt experiments against testsets.
Next
On this page
What is evaluations?
Why use evaluations?
Quickstart
Assistant
Responses are generated using AI and may contain mistakes.