Completion Workflow

Overview

Completion workflows execute direct LLM completions with custom parameters on your dataset. The system automatically:

Processes each dataset entry through your configured model
Tracks costs, tokens, and latency
Runs evaluators on outputs
Creates detailed trace hierarchies

How It Works

Dataset Entry → Completion Workflow → LLM Call → Output → Evaluators → Scores
                 (your config)        (auto)     (auto)    (auto)       (auto)

Execution Flow:

System reads dataset entries (input messages)
Calls LLM with your configuration
Stores output and metrics
Runs configured evaluators
Creates trace with full span hierarchy

Key Benefits

Zero Code: No custom processing needed
Automatic Execution: Runs in the background when you create the experiment
Cost Tracking: Real-time cost calculation
Chaining Support: Can chain multiple completion workflows
Full Observability: Detailed execution traces

Configuration

Workflow Config

Type: "completion" Config Fields:

Field	Type	Required	Description	Default
`model`	string	Yes	Model identifier (e.g., “gpt-4o-mini”)	-
`temperature`	number	No	Sampling temperature (0-2)	1.0
`max_tokens`	integer	No	Maximum completion tokens	150
`top_p`	number	No	Nucleus sampling (0-1)	1.0
`frequency_penalty`	number	No	Frequency penalty (-2 to 2)	0
`presence_penalty`	number	No	Presence penalty (-2 to 2)	0
`stop`	string or array	No	Stop sequences	null
`response_format`	object	No	Response format (e.g., `{"type": "json_object"}`)	null
`tools`	array	No	Function calling tools	null
`tool_choice`	string or object	No	Tool choice strategy	null
`reasoning_effort`	string	No	Reasoning effort for o1 models	null

Example:

{
  "type": "completion",
  "config": {
    "model": "gpt-4o-mini",
    "temperature": 0.7,
    "max_tokens": 100
  }
}

API Endpoints

1. Create Completion Experiment

POST /api/v2/experiments/ Creates experiment and starts execution in the background. Check status in the platform UI. Request:

{
  "name": "Completion Workflow Test",
  "description": "Testing GPT-4o-mini completions",
  "dataset_id": "172fc5e8-bac4-43d4-a066-f2f7a167c148",
  "workflows": [
    {
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "max_tokens": 100
      }
    }
  ],
  "evaluator_slugs": ["test_llm_quality_v1"]
}

Response (201):

{
  "id": "187f57ea7a8141a19b535108c5b92432",
  "name": "Completion Workflow Test",
  "description": "Testing GPT-4o-mini completions",
  "status": "pending",
  "workflows": [
    {
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "max_tokens": 100,
        "top_p": null,
        "frequency_penalty": null,
        "presence_penalty": null
      }
    }
  ],
  "evaluator_slugs": ["test_llm_quality_v1"],
  "workflow_count": 1,
  "progress": 0.0,
  "created_at": "2025-12-03T01:25:00Z"
}

Notes:

Execution starts immediately in the background
status: "pending" means queued for processing
Null config fields are optional (not sent to LLM)

2. List Execution Results

GET /api/v2/experiments/{experiment_id}/logs/list/ Retrieves completed traces with outputs. Query Parameters:

page: Page number (default: 1)
page_size: Results per page (default: 100)

Response (200):

{
  "results": [
    {
      "id": "trace-123",
      "name": "experiment_trace",
      "input": "[{\"role\": \"system\", \"content\": \"You are helpful\"}, {\"role\": \"user\", \"content\": \"What is AI?\"}]",
      "output": "{\"role\": \"assistant\", \"content\": \"AI stands for Artificial Intelligence...\"}",
      "status": "success",
      "span_count": 4,
      "duration": 2.5,
      "total_cost": 0.000123,
      "total_tokens": 85,
      "total_prompt_tokens": 35,
      "total_completion_tokens": 50,
      "start_time": "2025-12-03T01:25:05Z",
      "end_time": "2025-12-03T01:25:07.5Z"
    }
  ],
  "count": 3
}

Status Values:

pending: Queued for execution
running: Currently executing
success: Completed successfully
failed: Workflow execution failed
error: Unexpected error

3. Get Detailed Span Tree

GET /api/v2/experiments/{experiment_id}/logs/{trace_id}/?detail=1 Get full trace with span hierarchy. Response (200):

{
  "id": "trace-123",
  "status": "success",
  "span_tree": [
    {
      "id": "root-span-id",
      "span_name": "experiment_trace",
      "span_type": "ROOT",
      "log_type": "workflow",
      "status": "success",
      "input": "[{\"role\": \"user\", \"content\": \"...\"}]",
      "output": "{\"role\": \"assistant\", \"content\": \"...\"}",
      "start_time": "2025-12-03T01:25:05Z",
      "end_time": "2025-12-03T01:25:07.5Z",
      "latency": 2.5,
      "children": [
        {
          "span_name": "workflow_execution",
          "log_type": "workflow",
          "children": [
            {
              "span_name": "Experiment Workflow.completion",
              "log_type": "chat",
              "status": "success",
              "model": "gpt-4o-mini",
              "prompt_tokens": 35,
              "completion_tokens": 50,
              "total_tokens": 85,
              "cost": 0.000123,
              "latency": 2.1
            }
          ]
        },
        {
          "span_name": "evaluator.test_llm_quality_v1",
          "log_type": "score",
          "status": "completed",
          "output": {
            "primary_score": 4.5,
            "string_value": "High quality response"
          }
        }
      ]
    }
  ]
}

Span Hierarchy:

experiment_trace (ROOT)
├── workflow_execution
│   └── Experiment Workflow.completion (LLM call)
└── evaluator.{slug} (score)

Chaining Workflows

Multiple completion workflows execute in sequence (head-to-tail):

Example: Summarize → Expand

Request:

{
  "workflows": [
    {
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.3,
        "max_tokens": 100
      }
    },
    {
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "max_tokens": 200
      }
    }
  ]
}

Execution:

Dataset Input
    ↓
Workflow 1: Brief summary (low temp, 100 tokens)
    ↓ (output becomes input)
Workflow 2: Detailed expansion (high temp, 200 tokens)
    ↓
Final Output → Evaluators

Use Cases:

Focused → Creative: Low temp → High temp
Summarize → Expand: Short → Detailed
Filter → Process: Data cleaning pipelines
Draft → Refine: Multiple revision passes

See Complete Chaining Example.

Input Format

From Dataset

Dataset entries must contain messages in OpenAI format: Dataset Entry:

{
  "input": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is AI?"}
  ],
  "output": {"role": "assistant", "content": "..."} 
}

Note: The input field must be a messages array for completion workflows.

From Previous Workflow

When chaining, output of Workflow N becomes input of Workflow N+1:

// Workflow 1 output
{
  "role": "assistant",
  "content": "Brief: AI is artificial intelligence"
}

// Becomes Workflow 2 input (appended to messages)
[
  {"role": "system", "content": "..."},
  {"role": "user", "content": "..."},
  {"role": "assistant", "content": "Brief: AI is artificial intelligence"}
]

Cost Tracking

Automatic Calculation

Costs are calculated from the pricing database based on:

Model name
Prompt tokens
Completion tokens
Your organization’s pricing tier

Span Fields:

{
  "model": "gpt-4o-mini",
  "prompt_tokens": 35,
  "completion_tokens": 50,
  "total_tokens": 85,
  "cost": 0.000123,
  "latency": 2.1
}

Aggregation

Trace-level costs aggregate all workflow steps:

{
  "total_cost": 0.000246,
  "total_tokens": 170,
  "total_prompt_tokens": 70,
  "total_completion_tokens": 100
}

Formula:

total_cost = sum(workflow_costs) + sum(evaluator_costs)

Error Handling

Workflow Execution Errors

Failed Workflow:

{
  "span_name": "Experiment Workflow.completion",
  "status": "failed",
  "output": {
    "error": "argument of type 'NoneType' is not iterable",
    "error_type": "workflow_execution_error",
    "workflow_type": "completion",
    "success": false
  }
}

Common Errors:

Invalid model name
Missing required config (e.g., model)
Invalid messages format
Rate limiting
API key issues

Validation Errors

Missing Model:

{
  "detail": "Validation error",
  "errors": [
    {
      "type": "missing",
      "loc": ["workflows", 0, "config", "model"],
      "msg": "Field required"
    }
  ]
}

Complete Example

1. Create Experiment

curl -X POST "https://api.keywordsai.co/api/v2/experiments/" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "GPT-4o-mini Test",
    "dataset_id": "dataset-123",
    "workflows": [{
      "type": "completion",
      "config": {
        "model": "gpt-4o-mini",
        "temperature": 0.7,
        "max_tokens": 150
      }
    }],
    "evaluator_slugs": ["quality_v1"]
  }'

2. Wait for Execution

import time

experiment_id = "exp-123"

# Wait 30-60 seconds for processing
for i in range(15):
    time.sleep(2)
    response = list_traces(experiment_id)
    if len(response['results']) > 0:
        print(f"Found {len(response['results'])} traces")
        break

3. View Results

curl "https://api.keywordsai.co/api/v2/experiments/exp-123/logs/list/" \
  -H "Authorization: Bearer YOUR_API_KEY"

4. Analyze Trace

curl "https://api.keywordsai.co/api/v2/experiments/exp-123/logs/trace-456/?detail=1" \
  -H "Authorization: Bearer YOUR_API_KEY"

Best Practices

1. Choose Appropriate Parameters

For consistent outputs:

{
  "temperature": 0.1,
  "max_tokens": 100
}

For creative outputs:

{
  "temperature": 0.9,
  "max_tokens": 500
}

2. Monitor Costs

# Get cost summary
curl "https://api.keywordsai.co/api/v2/experiments/exp-123/logs/summary/" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response:

{
  "total_count": 100,
  "total_cost": 0.123,
  "total_tokens": 8500,
  "avg_latency": 2.3
}

3. Handle Rate Limits

Use smaller batch sizes and concurrency:

{
  "batch_size": 50,
  "concurrency": 10
}

4. Chain Strategically

Good:

Draft → Refine
Summarize → Expand
Classify → Process

Avoid:

Too many steps (3+ workflows)
Identical configurations
Redundant processing

Troubleshooting

No Logs Created

Symptom: Empty results after waiting Causes:

Experiment still processing in background
Dataset is empty
Invalid model name
Wait longer

Solution:

Check experiment status in the platform UI
Verify dataset has entries
Wait longer (60+ seconds for large datasets)
Verify model name is valid

Cost Shows Zero

Symptom: total_cost: 0.0 in traces Cause: Bug was fixed in Phase 15 (Dec 1, 2025) Solution: Upgrade to latest version

Spans Missing Children

Symptom: Flat span tree (no nested spans) Cause: Span hierarchy bug (fixed) Solution: Check with latest version

Get started

Features

Admin

Security

Resources

Help & Community

​Overview

​How It Works

​Key Benefits

​Configuration

​Workflow Config

​API Endpoints

​1. Create Completion Experiment

​2. List Execution Results

​3. Get Detailed Span Tree

​Chaining Workflows

​Example: Summarize → Expand

​Input Format

​From Dataset

​From Previous Workflow

​Cost Tracking

​Automatic Calculation

​Aggregation

​Error Handling

​Workflow Execution Errors

​Validation Errors

​Complete Example

​1. Create Experiment

​2. Wait for Execution

​3. View Results

​4. Analyze Trace

​Best Practices

​1. Choose Appropriate Parameters

​2. Monitor Costs

​3. Handle Rate Limits

​4. Chain Strategically

​Troubleshooting

​No Logs Created

​Cost Shows Zero

​Spans Missing Children

​See Also