Score types

Score Value Types

Type Mapping

Critical: Use the correct value field based on evaluator’s score_value_type

Evaluator’s `score_value_type`	Use This Field	Data Type	Example
`numerical`	`numerical_value`	number	`4.5`
`boolean`	`boolean_value`	boolean	`true`
`categorical`	`categorical_value`	array of strings	`["excellent", "coherent"]`
`comment`	`string_value`	string	`"Good response quality"`

Detailed Descriptions

Numerical Scores

Use case: Ratings, confidence scores, quality metrics
Range: Defined by evaluator’s min_score and max_score
Example: Rating response quality from 1-5

Boolean Scores

Use case: Pass/fail evaluations, binary classifications
Values: true or false
Example: Content safety check

Categorical Scores

Use case: Multi-choice classifications
Values: Array of predefined choices from evaluator’s categorical_choices
Example: ["relevant", "accurate", "helpful"]

Comment Scores

Use case: Qualitative feedback, explanations
Values: Free-form text
Example: Detailed evaluation reasoning

Evaluator Types

LLM Evaluators (`type: "llm"`)

AI-powered evaluation using language models
Requires evaluator_definition prompt
Supports all score value types

Human Evaluators (`type: "human"`)

Manual evaluation by human reviewers
Often used with categorical or comment scores
Requires predefined choices for categorical

Code Evaluators (`type: "code"`)

Programmatic evaluation using custom code
Requires eval_code_snippet
Most flexible for complex logic

Legacy fields (llm_input, llm_output) are normalized to input/output when reading the inputs field.

Metrics and Metadata Fields

When present, inputs.metrics and inputs.metadata include the following:

Metrics Fields

start_time: Request start time (RFC3339)
timestamp: Span end time (RFC3339)
prompt_tokens: Tokens in the prompt/input
completion_tokens: Tokens in the model output
prompt_cache_hit_tokens: Tokens served from cache
prompt_cache_creation_tokens: Tokens added to cache
total_request_tokens: Sum of prompt and completion tokens
latency: Total request latency in seconds
time_to_first_token: Time from request start to first output token
tokens_per_second: Output token throughput (TPS)
routing_time: Deprecated; time spent deciding the model/route
cost: Total request cost (USD)

Metadata Fields

unique_id: Request unique identifier
unique_organization_id: Organization unique identifier
organization_key_id: API key identifier
environment: Runtime environment (e.g., test, prod)
customer_identifier: User/customer-level identifier
evaluation_identifier: Evaluator run identifier
prompt_id: Prompt identifier
prompt_version_number: Prompt version
custom_identifier: Custom identifier provided by client
thread_identifier: Logical thread id
thread_unique_id: Unique id of thread
span_unique_id: Span id
span_name: Span name
span_parent_id: Parent span id
span_workflow_name: Workflow name
trace_group_identifier: Trace group id
deployment_name: Deployment name
provider_id: Provider identifier
model: Model name
status_code: HTTP-like status code
status: Status string
tool_calls: Tool calls recorded
LLM configuration fields: stream, stream_options, temperature, max_tokens, logit_bias, logprobs, top_logprobs, frequency_penalty, presence_penalty, stop, n, response_format, verbosity, tools

Some fields may be omitted depending on provider/model and request path
routing_time is deprecated and retained for historical compatibility

Integration methods

Logs

Threads

Prompts

Prompt versions

Multimodal integrations

User

Model

API keys management

Evaluation

Score types

Score Value Types

Type Mapping

Detailed Descriptions

Numerical Scores

Boolean Scores

Categorical Scores

Comment Scores

Evaluator Types

LLM Evaluators (`type: "llm"`)

Human Evaluators (`type: "human"`)

Code Evaluators (`type: "code"`)

Metrics and Metadata Fields

Metrics Fields

Metadata Fields

Integration methods

Logs

Threads

Prompts

Prompt versions

Multimodal integrations

User

Model

API keys management

Evaluation

​Score Value Types

​Type Mapping

​Detailed Descriptions

​Numerical Scores

​Boolean Scores

​Categorical Scores

​Comment Scores

​Evaluator Types

​LLM Evaluators (type: "llm")

​Human Evaluators (type: "human")

​Code Evaluators (type: "code")

​Metrics and Metadata Fields

​Metrics Fields

​Metadata Fields

Score Value Types

Type Mapping

Detailed Descriptions

Numerical Scores

Boolean Scores

Categorical Scores

Comment Scores

Evaluator Types

LLM Evaluators (`type: "llm"`)

Human Evaluators (`type: "human"`)

Code Evaluators (`type: "code"`)

Metrics and Metadata Fields

Metrics Fields

Metadata Fields