- Manually score individual log responses
- Integrate custom evaluators with your workflow
- Track quality metrics over time
- Compare performance across different models or prompts
Understanding Scores
Scores represent evaluation results for specific logs. Each score is associated with:- A specific log (identified by
log_id
) - An evaluator (either a Keywords AI evaluator or your custom evaluator)
- One or more value types (numerical, string, boolean)
Each log can have multiple scores, but only one score per evaluator. This ensures clean, consistent evaluation data.
Score Types
You can use three types of values for your scores:Numerical
A floating-point value (e.g., 4.5 out of 5.0)
String
A text description or category (e.g., “Good quality”)
Boolean
A pass/fail indicator (true/false)
Using the Scores API
The Scores API allows you to programmatically manage scores. See the API Reference for detailed documentation. Here’s a quick example of creating a score:Viewing Scores in the Dashboard
Scores are visible in the log details view in the Keywords AI dashboard. You can:- Navigate to the Logs section
- Click on a specific log to view its details
- See all scores associated with that log in the Scores tab
Best Practices
- Use consistent evaluator slugs across your application
- Define clear scoring criteria for each evaluator
- Combine with automated evaluations for comprehensive quality assessment
- Track scores over time to measure improvements
Next Steps
- Learn about Experiments for systematic evaluation
- Explore LLM Evaluators for automated scoring
- Set up Human Evaluations for subjective assessment