Set up an LLM evaluator
In this guide, we will show you how to run an LLM evaluator in the UI.
LLM evaluators allows you to evaluate your prompts with the help of LLM. You can evaluate your prompts based on various metrics and see the results in Logs.
Prerequisites
You have already created prompts in the platform. Learn how to create a prompt here.
Steps
Create a new evaluator
You can set up an evaluator in Evaluators. Click the + New evaluator button, and select LLM.
Configure an LLM evaluator
Here’s a sample evaluator configuration. We’ll describe each section in detail below.
You need to define a Slug
for each evaluator. This slug will be used to apply the evaluator in your LLM calls, and will be used to identify the evaluator in the Logs.
The Slug
is a unique identifier for the evaluator. We suggest you don’t change it once you have created the evaluator.
Then, you need to choose a model for the evaluator. The evaluator will use this model to evaluate the LLM outputs. Currently, we only support gpt-4o
and gpt-4o-mini
from OpenAI and Azure OpenAI.
After that, you need to write a description for the evaluator. This description is for the LLM to understand the task and the expected output. There are 3 variables that you can use in the description:
{{llm_output}}
: The output text from the LLM.{{ideal_output}}
: The ideal output text from the LLM. This is optional, you can add it if you want to give the LLM a reference output.
In the last, you need to define the Scoring rubric
for the evaluator. This is for the LLM to understand the scoring criteria.
Passing score is the minimum score that the LLM output needs to achieve to be considered as a passing response.
You’re good to go now! Click on the Save button to create the evaluator. Let’s move on to the next step to see how to run the evaluator in the UI.
Was this page helpful?