Run your first evaluation - Docs

What is evaluations?

Evaluations help you assess the performance of your prompts. You can create custom evals to measure different dimensions of output quality.

Why use evaluations?

To measure the quality of a prompt or a model.
To find the best prompt or model for a specific task.
To optimize your prompts and models.

Quickstart

LLM-as-judge evaluator

Create an LLM-as-judge evaluator to measure the quality of your prompts.

Create a human evaluator

Create a human evaluator to measure the quality of your prompts.

Run experiments with testsets

Run experiments with testsets to measure the performance of your prompts.

ScoresLearn how to manage evaluation scores for your logs

⌘I

What is evaluations?
Why use evaluations?
Quickstart