LLM evaluators allows you to evaluate your prompts with the help of LLM. You can evaluate your prompts based on various metrics and see the results in Logs.

This is a beta feature. Please do let us know if you encounter any issues. We’ll continuously improve it.

Prerequisites

You have already created prompts in the platform. Learn how to create a prompt here.

Steps

1

Create a new evaluator

You can set up an evaluator in Evaluators. Click the + New evaluator button, and select LLM.

2

Configure an LLM evaluator

Here’s a sample evaluator configuration. We’ll describe each section in detail below.

You need to define a Slug for each evaluator. This slug will be used to apply the evaluator in your LLM calls, and will be used to identify the evaluator in the Logs.

The Slug is a unique identifier for the evaluator. We suggest you don’t change it once you have created the evaluator.

Then, you need to choose a model for the evaluator. The evaluator will use this model to evaluate the LLM outputs. Currently, we only support gpt-4o and gpt-4o-mini from OpenAI and Azure OpenAI.

After that, you need to write a description for the evaluator. This description is for the LLM to understand the task and the expected output. There are 3 variables that you can use in the description:

  • {{llm_output}}: The output text from the LLM.
  • {{ideal_output}}: The ideal output text from the LLM. This is optional, you can add it if you want to give the LLM a reference output.

In the last, you need to define the Scoring rubric for the evaluator. This is for the LLM to understand the scoring criteria.

Passing score is the minimum score that the LLM output needs to achieve to be considered as a passing response.

You’re good to go now! Click on the Save button to create the evaluator. Let’s move on to the next step to see how to run the evaluator in the UI.