Log Scores Management

The Log Scores Management API allows you to create, retrieve, update, and delete evaluation scores for specific logs. This ensures that only one score exists per evaluator per log, providing clean score management for your logged requests.

Overview

The scoring system ensures uniqueness by enforcing that each evaluator (identified by either evaluator_id or evaluator_slug) can only have one score per log. This prevents duplicate scores and maintains data integrity. Note: The response format follows the standard evaluation results schema, providing comprehensive metadata including type, environment, is_passed, and other evaluation context fields.

Key Concepts

Log ID

The unique identifier of the request log you want to score

Score ID

The unique identifier of a specific score (returned as id in responses)

Evaluator ID

The UUID of an evaluator created in Keywords AI platform (managed evaluators)

Evaluator Slug

A custom string identifier that you define for your own custom evaluators

Authentication

All endpoints require authentication via either:

JWT token: Authorization: Bearer <token>
API key: Authorization: Bearer <key>

Endpoints

1. Create a Score

Creates a new evaluation score for a specific log.

Request Body

You must provide either evaluator_id (for evaluators created in Keywords AI platform) or evaluator_slug (a custom string for your own evaluators), plus at least one score value.

evaluator_id

string

UUID of evaluator created in Keywords AI. Either this or evaluator_slug must be provided.

evaluator_slug

string

Custom string identifier for your evaluator. Either this or evaluator_id must be provided.

numerical_value

number

Optional numerical score value.

string_value

string

Optional string score value.

boolean_value

boolean

Optional boolean score value.

{
  "evaluator_slug": "my_custom_evaluator",
  "numerical_value": 4.5,
  "string_value": "Good response quality",
  "boolean_value": true
}

{
  "id": "eval_result_unique_id",
  "created_at": "2024-01-15T10:30:00Z",
  "type": "llm",
  "environment": "test",
  "numerical_value": 4.5,
  "string_value": "Good response quality",
  "boolean_value": true,
  "is_passed": false,
  "cost": 0.0,
  "evaluator_id": null,
  "evaluator_slug": "my_custom_evaluator",
  "log_id": null,
  "dataset_id": null
}

2. List Scores for a Log

Retrieves all scores associated with a specific log.

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": "eval_result_unique_id_1",
      "created_at": "2024-01-15T10:30:00Z",
      "type": "llm",
      "environment": "test",
      "numerical_value": 4.5,
      "string_value": "Good quality",
      "boolean_value": true,
      "is_passed": false,
      "cost": 0.0,
      "evaluator_id": null,
      "evaluator_slug": "quality_evaluator",
      "log_id": "log_unique_id",
      "dataset_id": null
    },
    {
      "id": "eval_result_unique_id_2",
      "created_at": "2024-01-15T10:32:00Z",
      "type": "llm",
      "environment": "test",
      "numerical_value": 3.8,
      "string_value": "Mostly relevant",
      "boolean_value": true,
      "is_passed": false,
      "cost": 0.0,
      "evaluator_id": null,
      "evaluator_slug": "relevance_evaluator",
      "log_id": "log_unique_id",
      "dataset_id": null
    }
  ]
}

3. Retrieve a Specific Score

Gets detailed information about a specific score.

{
  "id": "eval_result_unique_id",
  "created_at": "2024-01-15T10:30:00Z",
  "type": "llm",
  "environment": "test",
  "numerical_value": 4.5,
  "string_value": "Good quality",
  "boolean_value": true,
  "is_passed": false,
  "cost": 0.0,
  "evaluator_id": null,
  "evaluator_slug": "quality_evaluator",
  "log_id": "log_unique_id",
  "dataset_id": null
}

4. Update a Score

Updates an existing score. You can update any combination of score values.

numerical_value

number

Optional updated numerical score value.

string_value

string

Optional updated string score value.

boolean_value

boolean

Optional updated boolean score value.

{
  "numerical_value": 4.8,
  "string_value": "Excellent quality"
}

{
  "id": "eval_result_unique_id",
  "created_at": "2024-01-15T10:30:00Z",
  "type": "llm",
  "environment": "test",
  "numerical_value": 4.8,
  "string_value": "Excellent quality",
  "boolean_value": true,
  "is_passed": false,
  "cost": 0.0,
  "evaluator_id": null,
  "evaluator_slug": "quality_evaluator",
  "log_id": "log_unique_id",
  "dataset_id": null
}

5. Delete a Score

Removes a score from a log.

// No response body is returned for successful deletions.

Usage Examples

Creating a Score with Custom Evaluator

curl -X POST "https://api.keywordsai.co/api/logs/my-log-id/scores/" \
  -H "Authorization: Bearer <your-jwt-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "evaluator_slug": "response_quality",
    "numerical_value": 4.2,
    "string_value": "Good response with minor issues",
    "boolean_value": true
  }'

Note: The evaluator_slug can be any custom string you choose to identify your evaluator (e.g., “quality_check”, “relevance_score”, “custom_eval_v1”).

Creating a Score with Keywords AI Evaluator

curl -X POST "https://api.keywordsai.co/api/logs/my-log-id/scores/" \
  -H "Authorization: Bearer <your-jwt-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "evaluator_id": "550e8400-e29b-41d4-a716-446655440000",
    "numerical_value": 3.8
  }'

Note: The evaluator_id must be a evaluator ID copied from the Keywords AI platform.

Listing All Scores for a Log

curl -X GET "https://api.keywordsai.co/api/logs/my-log-id/scores/" \
  -H "Authorization: Bearer <your-jwt-token>"

Updating a Score

curl -X PATCH "https://api.keywordsai.co/api/logs/my-log-id/scores/score-unique-id/" \
  -H "Authorization: Bearer <your-jwt-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "numerical_value": 4.5,
    "string_value": "Updated assessment"
  }'

Deleting a Score

curl -X DELETE "https://api.keywordsai.co/api/logs/my-log-id/scores/score-unique-id/" \
  -H "Authorization: Bearer <your-jwt-token>"

Log Enrichment

When you retrieve log details, scores are automatically included in the response under a scores field:

{
  "id": "my-log-id",
  "model": "gpt-4",
  "prompt_tokens": 150,
  "completion_tokens": 75,
  "scores": {
    "response_quality": {
      "evaluator_slug": "response_quality",
      "numerical_value": 4.2,
      "string_value": "Good response",
      "boolean_value": true,
      "is_passed": false,
      "cost": 0.0
    },
    "relevance_check": {
      "evaluator_slug": "relevance_check",
      "numerical_value": 3.8,
      "string_value": "Mostly relevant",
      "boolean_value": true,
      "is_passed": false,
      "cost": 0.0
    }
  }
}

Best Practices

Use descriptive evaluator slugs

Make them meaningful and consistent across your application

Handle uniqueness errors

Always check for 409 Conflict responses when creating scores

Validate score types

Ensure you’re using the appropriate score type (numerical, string, boolean) for your use case

Batch operations carefully

Since these are manual operations (max ~50 calls/sec), avoid overwhelming the API

Store score IDs

Keep track of score IDs if you need to update or delete them later

Error Handling

The API returns standard HTTP status codes:

Status Code	Description
`200 OK`	Successful GET/PATCH requests
`201 Created`	Successful POST requests
`204 No Content`	Successful DELETE requests
`400 Bad Request`	Invalid request data
`401 Unauthorized`	Invalid or missing authentication
`404 Not Found`	Log or score not found
`409 Conflict`	Score already exists for this evaluator in this log

Always check the response status code and handle errors appropriately in your application.

Observability

Develop

Evaluation

Manage

Log Scores Management

Overview

Key Concepts

Log ID

Score ID

Evaluator ID

Evaluator Slug

Authentication

Endpoints

1. Create a Score

Request Body

2. List Scores for a Log

3. Retrieve a Specific Score

4. Update a Score

5. Delete a Score

Usage Examples

Log Enrichment

Best Practices

Use descriptive evaluator slugs

Handle uniqueness errors

Validate score types

Batch operations carefully

Store score IDs

Error Handling

Observability

Develop

Evaluation

Manage

​Overview

​Key Concepts

Log ID

Score ID

Evaluator ID

Evaluator Slug

​Authentication

​Endpoints

​1. Create a Score

​Request Body

​2. List Scores for a Log

​3. Retrieve a Specific Score

​4. Update a Score

​5. Delete a Score

​Usage Examples

​Log Enrichment

​Best Practices

Use descriptive evaluator slugs

Handle uniqueness errors

Validate score types

Batch operations carefully

Store score IDs

​Error Handling

Overview

Key Concepts

Authentication

Endpoints

1. Create a Score

Request Body

2. List Scores for a Log

3. Retrieve a Specific Score

4. Update a Score

5. Delete a Score

Usage Examples

Log Enrichment

Best Practices

Error Handling