Skip to main content

What is a dataset?

A dataset is a curated collection of logs (inputs/outputs + metadata) that you can evaluate, annotate, and use to power Experiments. If you want the raw endpoint specs, see the Datasets API reference. This page is the workflow.

When to use “datasets via API”

  • Automated evaluation pipelines: programmatically create datasets per release / per prompt version
  • Curated test cases: write your own input/output JSON and store it as dataset logs
  • Sampling production logs: build datasets from existing request logs with filters + sampling

Resources

Steps to use

Prerequisites

  • API key: Authorization: Bearer YOUR_API_KEY
  • Base URL: https://api.keywordsai.co
If you’re starting from scratch, the easiest path is: create an empty datasetPOST dataset logs (unified format).
1

Step 1: Create an empty dataset

Create an empty dataset so you can add logs manually.Reference: Create dataset
import requests

API_KEY = "YOUR_API_KEY"

url = "https://api.keywordsai.co/api/datasets/"
payload = {
  "name": "Demo Dataset (via API)",
  "description": "Created from docs tutorial",
  "is_empty": True
}

res = requests.post(url, headers={"Authorization": f"Bearer {API_KEY}"}, json=payload)
res.raise_for_status()
dataset = res.json()
print("dataset_id:", dataset["id"])
You’ll use the returned id as dataset_id in the next steps.
2

Step 2: Add a dataset log (your own input/output JSON)

Dataset logs store end-to-end workflow I/O. Both input and output can be any JSON.Reference: Create dataset log
import requests

API_KEY = "YOUR_API_KEY"
dataset_id = "YOUR_DATASET_ID"

url = f"https://api.keywordsai.co/api/datasets/{dataset_id}/logs/"
payload = {
  "input": {
    "question": "What is 2+2?",
    "context": {"source": "docs_tutorial"}
  },
  "output": {
    "answer": "4",
    "explanation": "2 + 2 = 4."
  },
  "metadata": {
    "custom_identifier": "dataset-tutorial-log",
    "model": "gpt-4o-mini"
  },
  "metrics": {"cost": 0.0, "latency": 0.0}
}

res = requests.post(url, headers={"Authorization": f"Bearer {API_KEY}"}, json=payload)
res.raise_for_status()
print(res.json())
3

Step 3: List dataset logs to verify

Fetch logs inside the dataset (paginated).Reference: List logs (GET)
import requests

API_KEY = "YOUR_API_KEY"
dataset_id = "YOUR_DATASET_ID"

url = f"https://api.keywordsai.co/api/datasets/{dataset_id}/logs/list/?page=1&page_size=10"
res = requests.get(url, headers={"Authorization": f"Bearer {API_KEY}"})
res.raise_for_status()
print(res.json())
4

Step 4: Run an eval on the dataset

Run one or more evaluators over all logs in the dataset.Reference: Run eval on dataset
import requests

API_KEY = "YOUR_API_KEY"
dataset_id = "YOUR_DATASET_ID"

url = f"https://api.keywordsai.co/api/datasets/{dataset_id}/eval-reports/create"
payload = {"evaluator_slugs": ["char_count_eval"]}

res = requests.post(url, headers={"Authorization": f"Bearer {API_KEY}"}, json=payload)
res.raise_for_status()
print(res.json())
5

Step 5: List eval runs for the dataset

Use this to check the run status and see report IDs.Reference: List eval runs
import requests

API_KEY = "YOUR_API_KEY"
dataset_id = "YOUR_DATASET_ID"

url = f"https://api.keywordsai.co/api/datasets/{dataset_id}/eval-reports/list/"
res = requests.get(url, headers={"Authorization": f"Bearer {API_KEY}"})
res.raise_for_status()
print(res.json())

Maintenance & cleanup (optional)

Update dataset metadata

Use PATCH to rename or update the description: Reference: Update dataset (PATCH)
import requests

API_KEY = "YOUR_API_KEY"
dataset_id = "YOUR_DATASET_ID"

url = f"https://api.keywordsai.co/api/datasets/{dataset_id}/"
payload = {"name": "Renamed dataset", "description": "Updated via API"}

res = requests.patch(url, headers={"Authorization": f"Bearer {API_KEY}"}, json=payload)
res.raise_for_status()
print(res.json())

Remove logs from a dataset (by filter or delete all)

Reference: Delete logs (filters / delete-all)
import requests

API_KEY = "YOUR_API_KEY"
dataset_id = "YOUR_DATASET_ID"

url = f"https://api.keywordsai.co/api/datasets/{dataset_id}/logs/delete/"
payload = {"filters": {"metadata.custom_identifier": "dataset-tutorial-log"}}

res = requests.delete(url, headers={"Authorization": f"Bearer {API_KEY}"}, json=payload)
res.raise_for_status()
print(res.json())

Delete the dataset

Reference: Delete dataset
import requests

API_KEY = "YOUR_API_KEY"
dataset_id = "YOUR_DATASET_ID"

url = f"https://api.keywordsai.co/api/datasets/{dataset_id}/"
res = requests.delete(url, headers={"Authorization": f"Bearer {API_KEY}"})
print(res.status_code)  # typically 204