Run Evaluator

POST /api/evaluators/{evaluator_id}/run/ Executes an evaluator against provided input/output data for testing purposes. This endpoint allows you to test your evaluator configuration before using it in production.

Authentication

Requires API key authentication. Include your API key in the request headers:
Authorization: Api-Key YOUR_API_KEY

Path Parameters

ParameterTypeDescription
evaluator_idstringThe unique ID of the evaluator to run

Request Body

FieldTypeRequiredDescription
llm_inputstringYesThe input that was provided to the LLM
llm_outputstringYesThe output generated by the LLM
extra_paramsobjectNoAdditional parameters for evaluation context

Examples

Test LLM Evaluator

import requests

evaluator_id = "0f4325f9-55ef-4c20-8abe-376694419947"
url = f"https://api.keywordsai.co/api/evaluators/{evaluator_id}/run/"
headers = {
    "Authorization": "Api-Key YOUR_API_KEY",
    "Content-Type": "application/json"
}

data = {
    "llm_input": "What is the capital of France?",
    "llm_output": "The capital of France is Paris. Paris is located in the north-central part of France and is the country's largest city and political center.",
    "extra_params": {
        "context": "Geography question about European capitals",
        "user_id": "user123"
    }
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Test Code Evaluator

data = {
    "llm_input": "Write a short summary of machine learning.",
    "llm_output": "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed.",
    "extra_params": {
        "topic": "AI/ML",
        "expected_length": "short"
    }
}

response = requests.post(url, headers=headers, json=data)

Test Human Evaluator (Simulation)

# For human evaluators, this endpoint simulates the evaluation process
data = {
    "llm_input": "Explain quantum computing in simple terms.",
    "llm_output": "Quantum computing uses quantum mechanics principles to process information in ways that classical computers cannot, potentially solving certain problems much faster.",
    "extra_params": {
        "complexity_level": "beginner",
        "target_audience": "general public"
    }
}

response = requests.post(url, headers=headers, json=data)

Response

Status: 200 OK

LLM Evaluator Response

{
  "id": "eval-result-abc123",
  "score": 4.5,
  "evaluation_result": "Excellent response with accurate and comprehensive information. The answer correctly identifies Paris as the capital and provides relevant additional context about its location and significance.",
  "evaluator_id": "0f4325f9-55ef-4c20-8abe-376694419947",
  "created_at": "2025-09-11T09:45:00.000000Z",
  "execution_time_ms": 1250,
  "tokens_used": {
    "input_tokens": 45,
    "output_tokens": 28
  }
}

Code Evaluator Response

{
  "id": "eval-result-def456",
  "score": 3,
  "evaluation_result": "Response length is appropriate for the request",
  "evaluator_id": "code-eval-456",
  "created_at": "2025-09-11T09:45:00.000000Z",
  "execution_time_ms": 45
}

Human Evaluator Response (Simulation)

{
  "id": "eval-result-ghi789",
  "score": null,
  "evaluation_result": "This evaluator requires human input. In production, a human evaluator would review this content and provide a score.",
  "evaluator_id": "human-eval-789",
  "created_at": "2025-09-11T09:45:00.000000Z",
  "execution_time_ms": 10,
  "requires_human_input": true
}

Response Fields

FieldTypeDescription
idstringUnique identifier for this evaluation result
scorenumber|nullThe evaluation score (null for human evaluators awaiting input)
evaluation_resultstringDetailed evaluation feedback or reasoning
evaluator_idstringID of the evaluator that was run
created_atstringISO timestamp of when the evaluation was performed
execution_time_msnumberTime taken to execute the evaluation in milliseconds
tokens_usedobjectToken usage for LLM evaluators (input/output tokens)
requires_human_inputbooleanWhether this evaluation requires human input (human evaluators only)

Use Cases

Testing Evaluator Configuration

# Test different inputs to validate evaluator behavior
test_cases = [
    {
        "llm_input": "What is 2+2?",
        "llm_output": "2+2 equals 4."
    },
    {
        "llm_input": "What is 2+2?",
        "llm_output": "I don't know."
    },
    {
        "llm_input": "What is 2+2?",
        "llm_output": "The answer to 2+2 is 4. This is a basic arithmetic operation where we add two numbers together."
    }
]

for i, test_case in enumerate(test_cases):
    response = requests.post(url, headers=headers, json=test_case)
    result = response.json()
    print(f"Test {i+1}: Score = {result['score']}, Feedback = {result['evaluation_result']}")

Debugging Evaluator Issues

# Test edge cases to identify potential issues
edge_cases = [
    {
        "llm_input": "",  # Empty input
        "llm_output": "I need more information to help you."
    },
    {
        "llm_input": "Very long question..." * 100,  # Very long input
        "llm_output": "Short answer."
    }
]

Error Responses

400 Bad Request

{
  "detail": "llm_input and llm_output are required fields"
}

401 Unauthorized

{
  "detail": "Your API key is invalid or expired, please check your API key at https://platform.keywordsai.co/platform/api/api-keys"
}

404 Not Found

{
  "detail": "Not found."
}

500 Internal Server Error

{
  "detail": "Evaluator execution failed: [specific error message]"
}