LLM-based Answer Relevance outputs a score between 0.0 - 1.0 assessing the consistency of the generated answer based on the reference ground truth answers.

Scoring rubric in LLM prompt:

  • 0.0 means that the answer is completely irrelevant to the question.
  • 0.5 means that the answer is partially relevant to the question or it only partially answers the question.
  • 1.0 means that the answer is relevant to the question and completely answers the question.

Settings and parameters

  1. Go to Keywords AI (on top of the left nav bar) > Evaluation > Text generation > Answer Relevance
  1. Click on the Flesch–Kincaid card to create the setting:
  • Click the enable switch to turn on the evaluation
  • Pick which LLM model you want to use for the evaluation
  1. Make an API call, and the evaluation will be run based on the Ramdom sampling setting.

  2. Check results in the requests log