POST
/
api
/
logs
/
{log_id}
/
scores
/
{
  "evaluator_slug": "my_custom_evaluator",
  "numerical_value": 4.5,
  "string_value": "Good response quality",
  "boolean_value": true
}
{
  "id": "eval_result_unique_id",
  "created_at": "2024-01-15T10:30:00Z",
  "type": "llm",
  "environment": "test",
  "numerical_value": 4.5,
  "string_value": "Good response quality",
  "boolean_value": true,
  "is_passed": false,
  "cost": 0.0,
  "evaluator_id": null,
  "evaluator_slug": "my_custom_evaluator",
  "log_id": null,
  "dataset_id": null
}
The Log Scores Management API allows you to create, retrieve, update, and delete evaluation scores for specific logs. This ensures that only one score exists per evaluator per log, providing clean score management for your logged requests.

Overview

The scoring system ensures uniqueness by enforcing that each evaluator (identified by either evaluator_id or evaluator_slug) can only have one score per log. This prevents duplicate scores and maintains data integrity. Note: The response format follows the standard evaluation results schema, providing comprehensive metadata including type, environment, is_passed, and other evaluation context fields.

Key Concepts

Log ID

The unique identifier of the request log you want to score

Score ID

The unique identifier of a specific score (returned as id in responses)

Evaluator ID

The UUID of an evaluator created in Keywords AI platform (managed evaluators)

Evaluator Slug

A custom string identifier that you define for your own custom evaluators

Authentication

All endpoints require authentication via either:
  • JWT token: Authorization: Bearer <token>
  • API key: Authorization: Bearer <key>

Endpoints

1. Create a Score

Creates a new evaluation score for a specific log.

Request Body

You must provide either evaluator_id (for evaluators created in Keywords AI platform) or evaluator_slug (a custom string for your own evaluators), plus at least one score value.
evaluator_id
string
UUID of evaluator created in Keywords AI. Either this or evaluator_slug must be provided.
evaluator_slug
string
Custom string identifier for your evaluator. Either this or evaluator_id must be provided.
numerical_value
number
Optional numerical score value.
string_value
string
Optional string score value.
boolean_value
boolean
Optional boolean score value.
{
  "evaluator_slug": "my_custom_evaluator",
  "numerical_value": 4.5,
  "string_value": "Good response quality",
  "boolean_value": true
}
{
  "id": "eval_result_unique_id",
  "created_at": "2024-01-15T10:30:00Z",
  "type": "llm",
  "environment": "test",
  "numerical_value": 4.5,
  "string_value": "Good response quality",
  "boolean_value": true,
  "is_passed": false,
  "cost": 0.0,
  "evaluator_id": null,
  "evaluator_slug": "my_custom_evaluator",
  "log_id": null,
  "dataset_id": null
}

2. List Scores for a Log

Retrieves all scores associated with a specific log.
{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": "eval_result_unique_id_1",
      "created_at": "2024-01-15T10:30:00Z",
      "type": "llm",
      "environment": "test",
      "numerical_value": 4.5,
      "string_value": "Good quality",
      "boolean_value": true,
      "is_passed": false,
      "cost": 0.0,
      "evaluator_id": null,
      "evaluator_slug": "quality_evaluator",
      "log_id": "log_unique_id",
      "dataset_id": null
    },
    {
      "id": "eval_result_unique_id_2",
      "created_at": "2024-01-15T10:32:00Z",
      "type": "llm",
      "environment": "test",
      "numerical_value": 3.8,
      "string_value": "Mostly relevant",
      "boolean_value": true,
      "is_passed": false,
      "cost": 0.0,
      "evaluator_id": null,
      "evaluator_slug": "relevance_evaluator",
      "log_id": "log_unique_id",
      "dataset_id": null
    }
  ]
}

3. Retrieve a Specific Score

Gets detailed information about a specific score.
{
  "id": "eval_result_unique_id",
  "created_at": "2024-01-15T10:30:00Z",
  "type": "llm",
  "environment": "test",
  "numerical_value": 4.5,
  "string_value": "Good quality",
  "boolean_value": true,
  "is_passed": false,
  "cost": 0.0,
  "evaluator_id": null,
  "evaluator_slug": "quality_evaluator",
  "log_id": "log_unique_id",
  "dataset_id": null
}

4. Update a Score

Updates an existing score. You can update any combination of score values.
numerical_value
number
Optional updated numerical score value.
string_value
string
Optional updated string score value.
boolean_value
boolean
Optional updated boolean score value.
{
  "numerical_value": 4.8,
  "string_value": "Excellent quality"
}
{
  "id": "eval_result_unique_id",
  "created_at": "2024-01-15T10:30:00Z",
  "type": "llm",
  "environment": "test",
  "numerical_value": 4.8,
  "string_value": "Excellent quality",
  "boolean_value": true,
  "is_passed": false,
  "cost": 0.0,
  "evaluator_id": null,
  "evaluator_slug": "quality_evaluator",
  "log_id": "log_unique_id",
  "dataset_id": null
}

5. Delete a Score

Removes a score from a log.
// No response body is returned for successful deletions.

Usage Examples

Log Enrichment

When you retrieve log details, scores are automatically included in the response under a scores field:
{
  "id": "my-log-id",
  "model": "gpt-4",
  "prompt_tokens": 150,
  "completion_tokens": 75,
  "scores": {
    "response_quality": {
      "evaluator_slug": "response_quality",
      "numerical_value": 4.2,
      "string_value": "Good response",
      "boolean_value": true,
      "is_passed": false,
      "cost": 0.0
    },
    "relevance_check": {
      "evaluator_slug": "relevance_check",
      "numerical_value": 3.8,
      "string_value": "Mostly relevant",
      "boolean_value": true,
      "is_passed": false,
      "cost": 0.0
    }
  }
}

Best Practices

Use descriptive evaluator slugs

Make them meaningful and consistent across your application

Handle uniqueness errors

Always check for 409 Conflict responses when creating scores

Validate score types

Ensure you’re using the appropriate score type (numerical, string, boolean) for your use case

Batch operations carefully

Since these are manual operations (max ~50 calls/sec), avoid overwhelming the API

Store score IDs

Keep track of score IDs if you need to update or delete them later

Error Handling

The API returns standard HTTP status codes:
Status CodeDescription
200 OKSuccessful GET/PATCH requests
201 CreatedSuccessful POST requests
204 No ContentSuccessful DELETE requests
400 Bad RequestInvalid request data
401 UnauthorizedInvalid or missing authentication
404 Not FoundLog or score not found
409 ConflictScore already exists for this evaluator in this log
Always check the response status code and handle errors appropriately in your application.