LLM-as-a-judge
Evaluate your prompts with the help of LLM
LLM as a judge is a feature that allows you to evaluate your prompts with the help of LLM. You can evaluate your prompts based on various metrics and see the results on the Logs page.
Create an evaluator
To evaluate your prompts, you should first create a new evaluator on the Evaluators page.
Choose a metric
Choose a metric to evaluate your prompts. We integrted Evaluation framework from Relari(Comming soon) and Ragas.
Configure the evaluator
Configure the evaluator with the prompt and the customer you want to evaluate. You can also test the evaluator with a sample prompt.
Pass required params in code
After you have configured the evaluator, you should pass the required parameters in the code to evaluate the prompt.
Example code:
Explanation of extra_params
You must pass the extra_params
based on the metric you choose. For example, if you choose the Ragas Answer Relevancy
metric, you can find the required parameters areQuestion
, Answer
, and Contexts
.
Question
and Answer
fields are automatically extracted from the conversation log if required by the evaluator, so you don’t need to be explicitly provided in extra_params. See the result
After you have passed the required parameters in the code, you can see the evaluation results on the Logs page.