A spreadsheet-style editor for running prompts and models across multiple test cases. Import testsets to easily test, evaluate, and optimize your LLM outputs.

Prerequisites

  • You have already created at least one prompt in Keywords AI. Learn how to create a prompt here.
  • You have added variables to your prompt. Learn how to add variables to your prompt here.

Setup

1. Create a prompt with variables

Use {{variable}} to define variables in your prompt. Configure your prompts in the side panel. Commit the prompt version.

2. Create a testset

You have 2 ways to create a testset:

3. Create an experiment

Create an experiment by clicking the + NEW experiment button in the Experiments page and select the prompt and the versions you want to test.
Then you can add testcases for your experiment. You can either add testcases manually or import a testset from Testsets.

4. Compare the results

After the experiment is finished, you can compare the results by scrolling down to the bottom of the page. You can easily find which prompt version has the best performance.

5. Use LLM to evaluate the results

You can also use LLM to evaluate the results. Check out LLM-as-judge to learn how to create LLM evaluators. After you have created your LLM evaluators, you can run them to evaluate the results.