Create Evaluator

POST /api/evaluators/ Creates a new evaluator for your organization. Evaluators can be of three types: LLM-based, human-based, or code-based.

Authentication

Requires API key authentication. Include your API key in the request headers:
Authorization: Api-Key YOUR_API_KEY

Request Body

Required Fields

FieldTypeDescription
namestringDisplay name for the evaluator
evaluator_slugstringUnique identifier (lowercase, underscores allowed)
typestringEvaluator type: "llm", "human", or "code"
score_value_typestringScore type: "numerical", "boolean", "categorical", or "comment"
descriptionstringDescription of what the evaluator does

Optional Fields

FieldTypeDescription
eval_classstringPre-built template class (optional)
configurationsobjectType-specific configuration settings
categorical_choicesarrayChoices for categorical evaluators
custom_required_fieldsarrayAdditional required fields
tagsarrayTags for organization

Configuration by Type

LLM Evaluators (type: "llm")

{
  "configurations": {
    "evaluator_definition": "Evaluation prompt with {{llm_input}} and {{llm_output}} variables",
    "scoring_rubric": "Description of scoring criteria",
    "llm_engine": "gpt-4o-mini",
    "model_options": {
      "temperature": 0.1,
      "max_tokens": 200
    },
    "min_score": 1.0,
    "max_score": 5.0,
    "passing_score": 3.0
  }
}

Code Evaluators (type: "code")

{
  "configurations": {
    "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n    # Your evaluation logic\n    return score"
  }
}

Human Evaluators (type: "human")

For categorical human evaluators:
{
  "categorical_choices": [
    { "name": "Excellent", "value": 5 },
    { "name": "Good", "value": 4 },
    { "name": "Average", "value": 3 },
    { "name": "Poor", "value": 2 },
    { "name": "Very Poor", "value": 1 }
  ]
}

Examples

LLM Evaluator

import requests

url = "https://api.keywordsai.co/api/evaluators/"
headers = {
    "Authorization": "Api-Key YOUR_API_KEY",
    "Content-Type": "application/json"
}

data = {
    "name": "Response Quality Evaluator",
    "evaluator_slug": "response_quality_v1",
    "type": "llm",
    "score_value_type": "numerical",
    "description": "Evaluates response quality on a 1-5 scale",
    "configurations": {
        "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
        "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
        "llm_engine": "gpt-4o-mini",
        "min_score": 1.0,
        "max_score": 5.0,
        "passing_score": 3.0
    }
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Human Categorical Evaluator

data = {
    "name": "Content Quality Assessment",
    "evaluator_slug": "content_quality_categorical",
    "type": "human",
    "score_value_type": "categorical",
    "description": "Human assessment of content quality",
    "categorical_choices": [
        { "name": "Excellent", "value": 5 },
        { "name": "Good", "value": 4 },
        { "name": "Average", "value": 3 },
        { "name": "Poor", "value": 2 },
        { "name": "Very Poor", "value": 1 }
    ]
}

Code Evaluator

data = {
    "name": "Custom Length Check",
    "evaluator_slug": "custom_length_check",
    "type": "code",
    "score_value_type": "numerical",
    "description": "Evaluates response length appropriateness",
    "configurations": {
        "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n    length = len(llm_output.split())\n    if length < 10:\n        return 1\n    elif length < 50:\n        return 3\n    else:\n        return 5"
    }
}

Response

Status: 201 Created
{
  "id": "0f4325f9-55ef-4c20-8abe-376694419947",
  "name": "Response Quality Evaluator",
  "evaluator_slug": "response_quality_v1",
  "type": "llm",
  "score_value_type": "numerical",
  "eval_class": "",
  "description": "Evaluates response quality on a 1-5 scale",
  "configurations": {
    "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
    "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
    "llm_engine": "gpt-4o-mini",
    "model_options": {
      "temperature": 0.1,
      "max_tokens": 200
    },
    "min_score": 1.0,
    "max_score": 5.0,
    "passing_score": 3.0
  },
  "created_by": {
    "first_name": "Keywords AI",
    "last_name": "Team",
    "email": "admin@keywordsai.co"
  },
  "updated_by": {
    "first_name": "Keywords AI",
    "last_name": "Team",
    "email": "admin@keywordsai.co"
  },
  "created_at": "2025-09-11T09:43:55.858321Z",
  "updated_at": "2025-09-11T09:43:55.858331Z",
  "custom_required_fields": [],
  "categorical_choices": null,
  "starred": false,
  "tags": []
}

Error Responses

400 Bad Request

{
  "configurations": [
    "Configuration validation failed: 1 validation error for KeywordsAICustomLLMEvaluatorType\nscoring_rubric\n  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]"
  ]
}

401 Unauthorized

{
  "detail": "Your API key is invalid or expired, please check your API key at https://platform.keywordsai.co/platform/api/api-keys"
}