Set up LLM-as-a-judge
Evaluate your prompts with the help of LLM
LLM as a judge is a feature that allows you to evaluate your prompts with the help of LLM. You can evaluate your prompts based on various metrics and see the results on the Logs page.
Create an evaluator
To evaluate your prompts, you should first create a new evaluator on the Evaluators page.
After you click on the Create Evaluator
button, you will see the following page. You will then need to define the evaluator slug
for applying the evaluator in your LLM calls.
Configure the evaluator
Configure the evaluator with the prompt and the customer you want to evaluate. You can also test the evaluator with a sample prompt。
Pass required params in code
After you have configured the evaluator, you should pass the required parameters in the code to evaluate the prompt.
For example, you are asking a question about the capital of France, and you have retrieved some context about France and Paris.
You have defined an evaluator with the slug evaluator-slug1
, and the required fields in the evaluator are retrieved_contexts
, expected_response
, input_text
(default), and output_text
(default).
By default, input_text will be extracted from messages array and output_text will be extracted from the LLM’s response, you DON’T need to specify them in the extra_params.
Example code:
See the result
The evaluator will automatically run on the LLM call and you will be able to see the results in the side panel of the corresponding log.