Overview
Execute evaluation metrics on experiment results to assess model performance.
Method Signature
# Synchronous
client.experiments.run_experiment_evals(
experiment_id: str,
evaluator_ids: List[str],
run_id: Optional[str] = None
) -> Dict[str, Any]
# Asynchronous
await client.experiments.run_experiment_evals(
experiment_id: str,
evaluator_ids: List[str],
run_id: Optional[str] = None
) -> Dict[str, Any]
Parameters
The unique identifier of the experiment
List of evaluator IDs to run on the experiment results
Specific run ID to evaluate. If not provided, evaluates the latest run
Returns
Returns a dictionary containing the evaluation results and metrics.
Example
from keywordsai import KeywordsAI
client = KeywordsAI(api_key="your-api-key")
# Run evaluations on latest experiment run
evaluator_ids = ["eval_accuracy", "eval_relevance"]
result = client.experiments.run_experiment_evals(
experiment_id="exp_123",
evaluator_ids=evaluator_ids
)
print(f"Evaluation status: {result['status']}")
print(f"Metrics: {result['metrics']}")
# Run evaluations on specific run
result = client.experiments.run_experiment_evals(
experiment_id="exp_123",
evaluator_ids=evaluator_ids,
run_id="run_456"
)
Error Handling
try:
result = client.experiments.run_experiment_evals(
experiment_id="exp_123",
evaluator_ids=["eval_accuracy"]
)
except Exception as e:
print(f"Error running evaluations: {e}")