Skip to main content

Overview

Completion workflows execute direct LLM completions with custom parameters on your dataset. The system automatically:
  • Processes each dataset entry through your configured model
  • Tracks costs, tokens, and latency
  • Runs evaluators on outputs
  • Creates detailed trace hierarchies

How It Works

Dataset Entry → Completion Workflow → LLM Call → Output → Evaluators → Scores
                 (your config)        (auto)     (auto)    (auto)       (auto)
Execution Flow:
  1. System reads dataset entries (input messages)
  2. Calls LLM with your configuration
  3. Stores output and metrics
  4. Runs configured evaluators
  5. Creates trace with full span hierarchy

Key Benefits

  • Zero Code: No custom processing needed
  • Automatic Execution: Runs in the background when you create the experiment
  • Cost Tracking: Real-time cost calculation
  • Chaining Support: Can chain multiple completion workflows
  • Full Observability: Detailed execution traces

Configuration

Workflow Config

Type: "completion" Config Fields:
FieldTypeRequiredDescriptionDefault
modelstringYesModel identifier (e.g., “gpt-4o-mini”)-
temperaturenumberNoSampling temperature (0-2)1.0
max_tokensintegerNoMaximum completion tokens150
top_pnumberNoNucleus sampling (0-1)1.0
frequency_penaltynumberNoFrequency penalty (-2 to 2)0
presence_penaltynumberNoPresence penalty (-2 to 2)0
stopstring or arrayNoStop sequencesnull
response_formatobjectNoResponse format (e.g., {"type": "json_object"})null
toolsarrayNoFunction calling toolsnull
tool_choicestring or objectNoTool choice strategynull
reasoning_effortstringNoReasoning effort for o1 modelsnull
Example:
{
  "type": "completion",
  "config": {
    "model": "gpt-4o-mini",
    "temperature": 0.7,
    "max_tokens": 100
  }
}

API Endpoints

1. Create Completion Experiment

POST /api/v2/experiments/ Creates experiment and starts execution in the background. Check status in the platform UI. Request:
{
  "name": "Completion Workflow Test",
  "description": "Testing GPT-4o-mini completions",
  "dataset_id": "172fc5e8-bac4-43d4-a066-f2f7a167c148",
  "workflows": [
    {
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "max_tokens": 100
      }
    }
  ],
  "evaluator_slugs": ["test_llm_quality_v1"]
}
Response (201):
{
  "id": "187f57ea7a8141a19b535108c5b92432",
  "name": "Completion Workflow Test",
  "description": "Testing GPT-4o-mini completions",
  "status": "pending",
  "workflows": [
    {
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "max_tokens": 100,
        "top_p": null,
        "frequency_penalty": null,
        "presence_penalty": null
      }
    }
  ],
  "evaluator_slugs": ["test_llm_quality_v1"],
  "workflow_count": 1,
  "progress": 0.0,
  "created_at": "2025-12-03T01:25:00Z"
}
Notes:
  • Execution starts immediately in the background
  • status: "pending" means queued for processing
  • Null config fields are optional (not sent to LLM)

2. List Execution Results

GET /api/v2/experiments/{experiment_id}/logs/list/ Retrieves completed traces with outputs. Query Parameters:
  • page: Page number (default: 1)
  • page_size: Results per page (default: 100)
Response (200):
{
  "results": [
    {
      "id": "trace-123",
      "name": "experiment_trace",
      "input": "[{\"role\": \"system\", \"content\": \"You are helpful\"}, {\"role\": \"user\", \"content\": \"What is AI?\"}]",
      "output": "{\"role\": \"assistant\", \"content\": \"AI stands for Artificial Intelligence...\"}",
      "status": "success",
      "span_count": 4,
      "duration": 2.5,
      "total_cost": 0.000123,
      "total_tokens": 85,
      "total_prompt_tokens": 35,
      "total_completion_tokens": 50,
      "start_time": "2025-12-03T01:25:05Z",
      "end_time": "2025-12-03T01:25:07.5Z"
    }
  ],
  "count": 3
}
Status Values:
  • pending: Queued for execution
  • running: Currently executing
  • success: Completed successfully
  • failed: Workflow execution failed
  • error: Unexpected error

3. Get Detailed Span Tree

GET /api/v2/experiments/{experiment_id}/logs/{trace_id}/?detail=1 Get full trace with span hierarchy. Response (200):
{
  "id": "trace-123",
  "status": "success",
  "span_tree": [
    {
      "id": "root-span-id",
      "span_name": "experiment_trace",
      "span_type": "ROOT",
      "log_type": "workflow",
      "status": "success",
      "input": "[{\"role\": \"user\", \"content\": \"...\"}]",
      "output": "{\"role\": \"assistant\", \"content\": \"...\"}",
      "start_time": "2025-12-03T01:25:05Z",
      "end_time": "2025-12-03T01:25:07.5Z",
      "latency": 2.5,
      "children": [
        {
          "span_name": "workflow_execution",
          "log_type": "workflow",
          "children": [
            {
              "span_name": "Experiment Workflow.completion",
              "log_type": "chat",
              "status": "success",
              "model": "gpt-4o-mini",
              "prompt_tokens": 35,
              "completion_tokens": 50,
              "total_tokens": 85,
              "cost": 0.000123,
              "latency": 2.1
            }
          ]
        },
        {
          "span_name": "evaluator.test_llm_quality_v1",
          "log_type": "score",
          "status": "completed",
          "output": {
            "primary_score": 4.5,
            "string_value": "High quality response"
          }
        }
      ]
    }
  ]
}
Span Hierarchy:
experiment_trace (ROOT)
├── workflow_execution
│   └── Experiment Workflow.completion (LLM call)
└── evaluator.{slug} (score)

Chaining Workflows

Multiple completion workflows execute in sequence (head-to-tail):

Example: Summarize → Expand

Request:
{
  "workflows": [
    {
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.3,
        "max_tokens": 100
      }
    },
    {
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "max_tokens": 200
      }
    }
  ]
}
Execution:
Dataset Input

Workflow 1: Brief summary (low temp, 100 tokens)
    ↓ (output becomes input)
Workflow 2: Detailed expansion (high temp, 200 tokens)

Final Output → Evaluators
Use Cases:
  • Focused → Creative: Low temp → High temp
  • Summarize → Expand: Short → Detailed
  • Filter → Process: Data cleaning pipelines
  • Draft → Refine: Multiple revision passes
See Complete Chaining Example.

Input Format

From Dataset

Dataset entries must contain messages in OpenAI format: Dataset Entry:
{
  "input": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is AI?"}
  ],
  "output": {"role": "assistant", "content": "..."} 
}
Note: The input field must be a messages array for completion workflows.

From Previous Workflow

When chaining, output of Workflow N becomes input of Workflow N+1:
// Workflow 1 output
{
  "role": "assistant",
  "content": "Brief: AI is artificial intelligence"
}

// Becomes Workflow 2 input (appended to messages)
[
  {"role": "system", "content": "..."},
  {"role": "user", "content": "..."},
  {"role": "assistant", "content": "Brief: AI is artificial intelligence"}
]

Cost Tracking

Automatic Calculation

Costs are calculated from the pricing database based on:
  • Model name
  • Prompt tokens
  • Completion tokens
  • Your organization’s pricing tier
Span Fields:
{
  "model": "gpt-4o-mini",
  "prompt_tokens": 35,
  "completion_tokens": 50,
  "total_tokens": 85,
  "cost": 0.000123,
  "latency": 2.1
}

Aggregation

Trace-level costs aggregate all workflow steps:
{
  "total_cost": 0.000246,
  "total_tokens": 170,
  "total_prompt_tokens": 70,
  "total_completion_tokens": 100
}
Formula:
total_cost = sum(workflow_costs) + sum(evaluator_costs)

Error Handling

Workflow Execution Errors

Failed Workflow:
{
  "span_name": "Experiment Workflow.completion",
  "status": "failed",
  "output": {
    "error": "argument of type 'NoneType' is not iterable",
    "error_type": "workflow_execution_error",
    "workflow_type": "completion",
    "success": false
  }
}
Common Errors:
  • Invalid model name
  • Missing required config (e.g., model)
  • Invalid messages format
  • Rate limiting
  • API key issues

Validation Errors

Missing Model:
{
  "detail": "Validation error",
  "errors": [
    {
      "type": "missing",
      "loc": ["workflows", 0, "config", "model"],
      "msg": "Field required"
    }
  ]
}

Complete Example

1. Create Experiment

curl -X POST "https://api.keywordsai.co/api/v2/experiments/" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "GPT-4o-mini Test",
    "dataset_id": "dataset-123",
    "workflows": [{
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "max_tokens": 150
      }
    }],
    "evaluator_slugs": ["quality_v1"]
  }'

2. Wait for Execution

import time

experiment_id = "exp-123"

# Wait 30-60 seconds for processing
for i in range(15):
    time.sleep(2)
    response = list_traces(experiment_id)
    if len(response['results']) > 0:
        print(f"Found {len(response['results'])} traces")
        break

3. View Results

curl "https://api.keywordsai.co/api/v2/experiments/exp-123/logs/list/" \
  -H "Authorization: Bearer YOUR_API_KEY"

4. Analyze Trace

curl "https://api.keywordsai.co/api/v2/experiments/exp-123/logs/trace-456/?detail=1" \
  -H "Authorization: Bearer YOUR_API_KEY"

Best Practices

1. Choose Appropriate Parameters

For consistent outputs:
{
  "temperature": 0.1,
  "max_tokens": 100
}
For creative outputs:
{
  "temperature": 0.9,
  "max_tokens": 500
}

2. Monitor Costs

# Get cost summary
curl "https://api.keywordsai.co/api/v2/experiments/exp-123/logs/summary/" \
  -H "Authorization: Bearer YOUR_API_KEY"
Response:
{
  "total_count": 100,
  "total_cost": 0.123,
  "total_tokens": 8500,
  "avg_latency": 2.3
}

3. Handle Rate Limits

Use smaller batch sizes and concurrency:
{
  "batch_size": 50,
  "concurrency": 10
}

4. Chain Strategically

Good:
  • Draft → Refine
  • Summarize → Expand
  • Classify → Process
Avoid:
  • Too many steps (3+ workflows)
  • Identical configurations
  • Redundant processing

Troubleshooting

No Logs Created

Symptom: Empty results after waiting Causes:
  1. Experiment still processing in background
  2. Dataset is empty
  3. Invalid model name
  4. Wait longer
Solution:
  • Check experiment status in the platform UI
  • Verify dataset has entries
  • Wait longer (60+ seconds for large datasets)
  • Verify model name is valid

Cost Shows Zero

Symptom: total_cost: 0.0 in traces Cause: Bug was fixed in Phase 15 (Dec 1, 2025) Solution: Upgrade to latest version

Spans Missing Children

Symptom: Flat span tree (no nested spans) Cause: Span hierarchy bug (fixed) Solution: Check with latest version

See Also

For more information, visit Keywords AI Platform.