type and score_value_type. The eval_class field is optional and only used for pre-built templates.
llm: LLM-based evaluators that use another LLM to evaluate responseshuman: Human annotation-based evaluators for manual scoringcode: Python code-based evaluators that run custom evaluation logicnumerical: Numeric scores (e.g., 1-5, 0.0-1.0)boolean: True/false or pass/fail evaluationscategorical: Multiple choice selections with predefined optionscomment: Text-based feedback and commentseval_class:
keywordsai_custom_llm: LLM-based evaluator with standard configurationcustom_code: Code-based evaluator templateinputs object. This applies to all evaluator types (llm, human, code).
Structure:
input (any JSON): The request/input to be evaluated.output (any JSON): The response/output being evaluated.metrics (object, optional): System-captured metrics (e.g., tokens, latency, cost).metadata (object, optional): Context and custom properties you pass; also logged.llm_input and llm_output (string, optional): Legacy convenience aliases.name (string): Display name for the evaluatortype (string): Evaluator type - "llm", "human", or "code"score_value_type (string): Score format - "numerical", "boolean", "categorical", or "comment"evaluator_slug (string): Unique identifier (auto-generated if not provided)description (string): Description of the evaluatoreval_class (string): Pre-built template to use (optional)configurations (object): Custom configuration based on evaluator typecategorical_choices (array): Required when score_value_type is "categorical"type: "llm" evaluators:
evaluator_definition (string): The evaluation prompt/instruction. Must include {{input}} and {{output}} template variables. Legacy {{llm_input}} and {{llm_output}} are also supported for backward compatibility.scoring_rubric (string): Description of the scoring criteriallm_engine (string): LLM model to use (e.g., “gpt-4o-mini”, “gpt-4o”)model_options (object, optional): LLM parameters like temperature, max_tokensmin_score (number, optional): Minimum possible scoremax_score (number, optional): Maximum possible scorepassing_score (number, optional): Score threshold for passingtype: "code" evaluators:
eval_code_snippet (string): Python code with evaluate() function that returns the scoretype: "human" evaluators:
categorical_choices field when score_value_type is "categorical"score_value_type: "categorical":
categorical_choices (array): List of choice objects with name and value properties