Skip to main content

What is Experiments?

Experiments lets you run repeatable evaluations over a dataset and inspect results as traces/logs (outputs, evaluator scores, costs, latency). The main decision is which workflow type you want:
  • Prompt workflow: render a saved prompt template with dataset variables, then run LLM calls automatically
  • Completion workflow: run direct LLM completions on dataset messages automatically (no prompt templates)
  • Custom workflow: you fetch inputs, run your own code/model, and submit outputs back for evaluator execution
This page is a tutorial. For full endpoint specs, jump to the linked API reference pages inside each step.

Resources

Steps to use

1

Step 1: Prepare a dataset with variables

Dataset entries must include the variables your prompt template expects (example: name, issue, order_id).Reference: Datasets: create / add logs
Your dataset log input should be an object with keys that match your prompt version variables schema:
{
  "input": {
    "name": "John Doe",
    "issue": "Damaged product",
    "order_id": "ORD-12345"
  }
}
2

Step 2: Create a prompt + deploy a version

Create a prompt template, create a version (with variables), and deploy it so it can be used in experiments.Reference: Prompts API
Prompt workflows need a readonly/deployed version so the experiment run is reproducible. A common pattern is:
  • Create version 1
  • Create version 2 (locks version 1 as readonly)
  • Deploy version 1
3

Step 3: Create the experiment (type = prompt)

Create an experiment that references your dataset_id and prompt_id.Reference: Create experiment
{
  "type": "prompt",
  "config": {
    "prompt_id": "YOUR_PROMPT_ID"
  }
}
import requests

API_KEY = "YOUR_API_KEY"
url = "https://api.keywordsai.co/api/v2/experiments/"

payload = {
  "name": "Prompt Workflow Experiment",
  "dataset_id": "YOUR_DATASET_ID",
  "workflows": [{"type": "prompt", "config": {"prompt_id": "YOUR_PROMPT_ID"}}],
  "evaluator_slugs": ["your_evaluator_slug"]
}

res = requests.post(url, headers={"Authorization": f"Bearer {API_KEY}"}, json=payload)
res.raise_for_status()
print(res.json())
4

Step 4: List results and inspect the span tree

After the job runs, list logs for the experiment and fetch details for full span trees (workflow execution + evaluator spans).Reference: List experiment logs
  • A root workflow span for the experiment trace
  • Child spans for prompt loading/rendering and the LLM call
  • Evaluator spans named like evaluator.{slug} with score outputs

Troubleshooting

  • The experiment may still be processing in the background (wait 5–10 seconds and retry)
  • Your dataset might be empty
  • Confirm your evaluator slug exists and is accessible to your org
  • Evaluators can run asynchronously—poll the log detail endpoint after submission
Use the “get log” endpoint (detail) to retrieve the full span tree and untruncated fields.