Skip to main content
POST
/
api
/
chat
/
completions
import requests
def demo_call(input, 
              model="gpt-4o-mini",
              token="YOUR_KEYWORDS_AI_API_KEY"
              ):
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {token}',
    }

    data = {
        'model': model,
        'messages': [{'role': 'user', 'content': input}],
    }

    response = requests.post('https://api.keywordsai.co/api/chat/completions', headers=headers, json=data)
    return response

messages = "Say 'Hello World'"
print(demo_call(messages).json())

OpenAI compatible parameters

To use Keywords AI parameters, you can pass them in the extra_body parameter if you’re using the OpenAI SDK.
Environment Switching: Keywords AI doesn’t support an env parameter in API calls. To switch between environments (test/production), use different API keys - one for your test environment and another for production. You can manage these keys in your API Keys settings.
import requests
def demo_call(input, 
              model="gpt-4o-mini",
              token="YOUR_KEYWORDS_AI_API_KEY"
              ):
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {token}',
    }

    data = {
        'model': model,
        'messages': [{'role': 'user', 'content': input}],
    }

    response = requests.post('https://api.keywordsai.co/api/chat/completions', headers=headers, json=data)
    return response

messages = "Say 'Hello World'"
print(demo_call(messages).json())
messages
array
required
List of messages to send to the endpoint in the OpenAI style, each of them following this format:
messages=[
  {"role": "system", // Available choices are user, system or assistant
   "content": "You are a helpful assistant."
  },
  {"role": "user", "content": "Hello!"}
]
type
string
The type of response format. Options: json_object, json_schema or text
Image processing: If you want to use the image processing feature, you need to use the following format to upload the image.
{
  "role": "user",
  "content": [
    {
        "type": "text",
        "text": "What's in this image?"
    },
    {
        "type": "image_url",
        "image_url": {
        "url": "https://as1.ftcdn.net/v2/jpg/01/34/53/74/1000_F_134537443_VendrqyXIWyHrZgxdIsfyKUost734JDP.jpg"
        }
    }
  ]
}
model
string
required
Specify which model to use. See the list of model here
This parameter will be overridden by the loadbalance_models parameter.
stream
boolean
default:false
Whether to stream back partial progress token by token
tools
array[dict]
A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide an array of functions the model may generate JSON inputs for.
{
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  }
tool_choice
dict
Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.none is the default when no tools are present. auto is the default if tools are present.Specifying a particular tool via the code below forces the model to call that tool.
{
  "type": "function",
  "function": {"name": "name_of_the_function"},
}
frequency_penalty
number
Specify how much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood of repeating the same line verbatim
max_tokens
number
Maximum number of tokens to generate in the response
temperature
number
default:1
Controls randomness in the output in the range of 0-2, higher temperature will a more random response.
n
number
default:1
How many chat completion choices are generated for each input message.Caveat! While this can help improve generation quality by picking the optimal choice, this could also lead to more token usage.
logprobs
boolean
default:false
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.
echo
boolean
Echo back the prompt in addition to the completion
stop
array[string]
Stop sequence
presence_penalty
number
Specify how much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood of talking about new topics
logit_bias
dict
Used to modify the probability of tokens appearing in the response
response_format
object
An object specifying the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106.Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON.If you want to specify your own output structure, use { "type": "json_schema", "json_schema": {...your shema}}. For more reference, please check OpenAI’s guide on structured outputYou must have a “json” as a keyword in the prompt to use this feature.
type
string
required
The type of response format. options: json_object, json_schema and text
If you are using Vertex AI and want to use JSON mode, you should specify a response_schema in the response_format parameter. Check the details of response schema here.
Example.py
response_schema = {
    "type": "array", # or "string", "number", "object", "boolean"....
    "items": { # items only for array type
        "type": "object", 
        "properties": {  # properties only for object type
          "number": { "type": "number" },
          "street_name": { "type": "string" },
          "street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
        } 
    },
}

response_format={
        "type": "json_object",
        "response_schema": response_schema,
    },
parallel_tool_calls
boolean
Whether to enable parallel function calling during tool use.

Keywords AI parameters

See how to make a standard Keywords AI API call in the Quick Start guide.

Generation parameters

load_balance_group
object
Balance the load of your requests between different models. See the details of load balancing here.
The proxy will pick one model from the group and override the model parameter
{
// you don't need to specify the model parameter, otherwise, the model parameter will overwrite the load balance group
    "messages": [
        {
            "role": "user",
            "content": "Hi, how are you?"
        }
    ],
    "load_balance_group": {
        "group_id":"THE_GROUP_ID" // from Load balancing page
    }
}
The models field will overwrite the load_balance_group you specified in the UI.
{
  "load_balance_group": {
      "group_id":"THE_GROUP_ID", // from Load balancing page
      "models": [
        {
          "model": "azure/gpt-35-turbo",
          "weight": 1
        },
        {
          "model": "azure/gpt-4",
          "credentials": { // add your own credentials if you want to use your own Azure credentials or custom model name
              "api_base": "Your own Azure api_base",
              "api_version": "Your own Azure api_version",
              "api_key": "Your own Azure api_key"
          },
          "weight": 1
        } 
      ]
  }
}
fallback_models
array
Specify the list of backup models (ranked by priority) to respond in case of a failure in the primary model. See the details of fallback models here.
{
// ...other parameters...
  "fallback_models": [
    "gemini/gemini-pro",
    "mistral/mistral-small",
    "gpt-4o"
  ]
}
customer_credentials
object
You can pass in your customer’s credentials for supported providers and use their credits when our proxy is calling models from those providers.
See details here
"customer_credentials": {

  "openai": {
    "api_key": "YOUR_OPENAI_API_KEY",
  }
}
credential_override
object
One-off credential overrides. Instead of using what is uploaded for each provider, this targets credentials for individual models.Go to provider page to see how to add your own credentials and override them for a specific model.
"credential_override": {
    "azure/gpt-4o":{ // override for a specific model.
      "api_key": "your-api-key",
      "api_base": "your-api-base-url",
      "api_version": "your-api-version",
    }
  }
cache_enabled
boolean
Enable or disable caches. Check the details of caches here.
{
    "cache_enabled": true
}
cache_ttl
number
This parameter specifies the time-to-live (TTL) for the cache in seconds.
It’s optional and the default value is 30 days now.
{
    "cache_ttl": 3600 // in seconds
}
cache_options
boolean
This parameter specifies the cache options. Currently we support cache_by_customer option, you can set it to true or false. If cache_by_customer is set to true, the cache will be stored by the customer identifier.
It’s an optional parameter
{
    "cache_options": { // optional
        "cache_by_customer": true // or false
    }
}
prompt
object
The prompt template to use for the completion. You can build and deploy prompts in the Prompt.
prompt_id
string
required
The ID of the prompt to use. You can find this on the Prompts page.
variables
object
The variables to replace in the prompt template.
echo
boolean
default:true
With echo on, the response body will have an extra field. This is an optional parameter.
  "prompt_message" [an array of messages]
override
boolean
default:true
Turn on override to use params in override_params instead of the params in the prompt.
{
  "override": true,
}

override_params
boolean
default:true
You can put any OpenAI chat/completions parameters here to override the prompt’s parameters. This will only work if override is set to true.
{
  "override_params": {
    "temperature": 0.5,
    "max_tokens": 100
  }
}
override_config
object
This parameter allows you to control how you can override the parameters in the prompt.
messages_override_mode
string
append: append the new messages to the existing messagesoverride: override the existing messages
   request_body = {
       "prompt": {
           "prompt_id": "xxxxxx",
           "override_config": {"messages_override_mode": "append"}, # append or override
           "override_params": {"messages": [{"role": "user", "content": "5"}]},
       }
   }
{
"prompt": {
      "prompt_id": "prompt_id", //paste this from the prompt management page
      "variables": {
        "variable_name": "variable_value"
      },
      // "echo": true //optional parameter
    }
}
retry_params
object
Enable or disable retries and set the number of retries and the time to wait before retrying. Check the details of retries here.
retry_enabled
boolean
required
Enable or disable retries.
num_retries
number
The number of retries to attempt.
retry_after
number
The time to wait before retrying in seconds.
disable_log
boolean
When set to true, only the request and performance metrics will be recorded, input and output messages will be omitted from the log.
model_name_map
object
This parameter is for Azure deployment only!!
We understand that you may have a custom name for your Azure deployment. Keywords AI is using the model’s origin name which may not be able to match your deployment. You can use this parameter to map the default name to your custom name.
{
    "model": "azure/gpt-4o",
    "model_name_map": {
      "original_model_name": "azure/your_custom_model_name"
      // e.g, "azure/gpt-4o": "azure/{your gpt-4o's deployment name}"
    }
}
models
array
Specify the list of models for the Keywords AI LLM router to choose between. If not specified, all models will be used. See the list of models hereIf only one model is specified, it will be treated as if the model parameter is used and the router will not trigger.When the model parameter is used, the router will not trigger, and this parameter behaves as fallback_models.
exclude_providers
array
default:[]
The list of providers to exclude from the LLM router’s selection. All models under the provider will be excluded. See the list of providers hereThis only excludes providers in the LLM router, if model parameter takes precedence over this parameter, andfallback_models and safety net will still use the excluded models to catch failures.
exclude_models
array
default:[]
The list of models to exclude from the LLM router’s selection. See the list of models hereThis only excludes models in the LLM router, if model parameter takes precedence over this parameter, andfallback_models and safety net will still use the excluded models to catch failures.

Observability parameters

metadata
dict
You can add any key-value pair to this metadata field for your reference. Check the details of metadata here.Contact team@keywordsai.co if you need extra parameter support for your use case.
{
  "metadata": {
    "my_key": "my_value"
    // Add any key-value pair here
  }
}
custom_identifier
string
You can use this parameter to send an extra custom tag with your request. This will help you to identify LLM logs faster than metadata parameter, because it’s indexed. You can see it in Logs with name Custom ID field.
{
  "custom_identifier": "my_value"
}
customer_identifier
string
Use this as a tag to identify the user associated with the API call. See the details of customer identifier here.
{
    //...other_params,
    "customer_identifier": "user_123"
}
customer_params
object
Pass the customer’s parameters in the API call to monitor the user’s data in the Keywords AI platform. See how to get insights into your users’ data here
customer_identifier
string
required
The unique identifier for the customer. It can be any string.
group_identifier
string
Group identifier. Use group identifier to group logs together.
name
string
The name of the customer. It can be any string.
email
string
The email of the customer. It shoud be a valid email.
period_start
string
The start date of the period. It should be in the format YYYY-MM-DD.
period_end
string
The start date of the period. It should be in the format YYYY-MM-DD.
budget_duration
string
Choices are yearly, monthly, weekly, and daily
period_budget
float
The budget for the period. It should be a float.
markup_percentage
float
The markup percentage for the period. Usage report of your customers through this key will be increased by this percentge.
total_budget
float
The total budget for a user.
request_breakdown
boolean
default:false
Adding this returns the summarization of the response in the response body. If streaming is on, the metrics will be streamed as the last chunk.
{
"id": "chatcmpl-7476cf3f-fcc9-4902-a548-a12489856d8a",
//... main part of the response body ...
"request_breakdown": {
"prompt_tokens": 6,
"completion_tokens": 9,
"cost": 4.8e-5,
"prompt_messages": [
  {
    "role": "user",
    "content": "How are you doing today?"
  }
],
"completion_message": {
  "content": " I'm doing well, thanks for asking!",
  "role": "assistant"
},
"model": "claude-2",
"cached": false,
"timestamp": "2024-02-20T01:23:39.329729Z",
"status_code": 200,
"stream": false,
"latency": 1.8415491580963135,
"scores": {},
"category": "Questions",
"metadata": {},
"routing_time": 0.18612787732854486,
"full_request": {
  "messages": [
    {
      "role": "user",
      "content": "How are you doing today?"
    }
  ],
  "model": "claude-2",
  "logprobs": true
},
"sentiment_score": 0
}
}

Evals parameters

positive_feedback
boolean
Whether the user liked the output. True means the user liked the output.

Deprecated parameters

customer_api_keys
object
You can pass in a dictionary of your customer’s API keys for specific models. If the router selects a model that is in the dictionary, it will attempt to use the customer’s API key for calling the model before using your integration API key or Keywords AI’s default API key.
{
  "gpt-3.5-turbo": "your_customer_api_key",
  "gpt-4": "your_customer_api_key"
}
loadbalance_models
array
Balance the load of your requests between different models. See the details of load balancing here.
This parameter will override the model parameter.
{
  // ...other parameters...
  "loadbalance_models": [
      {
          "model": "claude-3-5-sonnet-20240620",
          "weight": 34,
          "credentials": { // Your own Anthropic API key, optional for team plan and above
              "api_key": "Your own Anthropic API key"
          }
      },
      {
          "model": "azure/gpt-35-turbo",
          "weight": 34,
          "credentials": { // Your own Azure credentials, optional for team plan and above
              "api_base": "Your own Azure api_base",
              "api_version": "Your own Azure api_version",
              "api_key": "Your own Azure api_key"
          }
      }
  ]
}
I