Docs home page
Search...
⌘K
Get started
Overview
Quickstart
Features
Dashboard
Logs
Traces
Prompt management
Evaluations
Gateway
Users
Notifications
Admin
Keywords API keys
LLM provider keys
Team management
Security
Trust center
SOC II
HIPAA Compliance
GDPR
Architecture review
Security FAQ
Resources
Data model
What is LLM monitoring?
Automatic retries
How streaming works
Help & Community
Support
Feedback
Status
Discord
platform
Docs home page
Search...
⌘K
Discord
platform
platform
Search...
Navigation
Resources
Data model
Documentation
Integrations
API reference
Python SDK
Changelog ↗
Documentation
Integrations
API reference
Python SDK
Changelog ↗
Resources
Data model
Copy page
Understanding the core concepts behind Keywords AI’s observability features
Copy page
Keywords AI provides two main types of observability:
Traces
for agent workflows and
Logs
for LLM calls, plus a comprehensive
Evaluation
framework.
Traces vs Logs
Logs
LLM call logging
- Individual requests and responses to language models
Traces
Agent tracking
- Complete workflows with multiple steps
Logs
Logs record individual LLM API calls and responses
Single request/response pairs
Token usage and costs
Model performance metrics
Error tracking
Response latency
Traces
Traces capture the full execution flow of agent workflows
Multi-step agent processes
Tool calls and function executions
Decision-making steps
Hierarchical workflow visualization
Agent reasoning and planning
Evaluation Framework
The evaluation system helps you assess and improve LLM performance through systematic testing.
Test Sets
Curated collections of examples for evaluation
Input/output pairs
Expected responses
Evaluation criteria
Test case metadata
Evaluators
Tools that assess LLM output quality
Types of evaluators:
LLM Evaluators
: AI-powered assessment
Human Evaluators
: Manual review
Rule-based
: Automated validation
Custom Metrics
: Domain-specific scoring
Experiments
Comparative testing of different configurations
A/B testing of prompts
Model comparisons
Performance benchmarking
Cost analysis
Scores
Quantitative and qualitative assessment results
Numeric ratings (1-5, 1-10)
Boolean pass/fail
Categorical classifications
Comments and feedback
Data Flow
Was this page helpful?
Yes
No
Suggest edits
Raise issue
Previous
What is LLM monitoring?
Discover the importance of LLM monitoring in AI. Learn how it ensures accuracy, performance, and reliability while reducing costs. Keywords AI offers top-tier LLM monitoring tools.
Next
On this page
Traces vs Logs
Logs
Traces
Evaluation Framework
Test Sets
Evaluators
Experiments
Scores
Data Flow
Assistant
Responses are generated using AI and may contain mistakes.