POST
/
api
/
generate
Make sure you add Bearer prefix to your token

You can paste the command below into your terminal to run your first API request. Make sure to replace YOUR_KEYWORDSAI_API_KEY with your actual Keywords AI API key.

Example Call

OpenAI Parameters

model
string

Specify which model to use. See the list of model here

messages
array
required

List of messages to send to the endpoint in the OpenAI style, each of them following this format:

{
  "role": "user", // Available choices are user, system or assistant
  "content": "Hello?"
}

Image Processing If you want to use the image processing feature, you need to use the following format to upload the image

{
"role": "user",
"content": [
  {
      "type": "text",
      "text": "What’s in this image?"
  },
  {
      "type": "image_url",
      "image_url": {
      "url": "https://as1.ftcdn.net/v2/jpg/01/34/53/74/1000_F_134537443_VendrqyXIWyHrZgxdIsfyKUost734JDP.jpg"
      }
  }
]
}
max_tokens
number

Maximum number of tokens to generate in the response

temperature
number
default:
1

Controls randomness in the output in the range of 0-2, higher temperature will a more random response.

n
number
default:
1

Specify many completion choices to generate for each prompt.

Caveat! While this can help improve generation quality by picking the optimal choice, this could also lead to more token usage.

stream
boolean
default:
false

Whether to stream back partial progress token by token

logprobs
boolean
default:
false

Include the log probabilities on each token being selected.

echo
boolean

Echo back the prompt in addition to the completion

stop
array[string]

Stop sequence

presence_penalty
number

Specify how much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood of talking about new topics

frequency_penalty
number

Specify how much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood of repeating the same line verbatim

logit_bias
dict

Used to modify the probability of tokens appearing in the response

tools
array[dict]

A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide an array of functions the model may generate JSON inputs for.

{
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  }
tool_choice
dict

Manually picking the choice of tool for the model to use. This will force the model to make a function call every time a function is passed in.

{
  "type": "function",
  "function": {"name": "name_of_the_function"},
}

Keywords AI Parameters

request_breakdown
boolean
default:
false

Adding this returns the summarization of the response in the response body. If streaming is on, the metrics will be streamed as the last chunk

{
  "id": "chatcmpl-7476cf3f-fcc9-4902-a548-a12489856d8a",
  //... main part of the response body ...
  "request_breakdown": {
    "prompt_tokens": 6,
    "completion_tokens": 9,
    "cost": 4.8e-5,
    "prompt_messages": [
      {
        "role": "user",
        "content": "How are you doing today?"
      }
    ],
    "completion_message": {
      "content": " I'm doing well, thanks for asking!",
      "role": "assistant"
    },
    "model": "claude-2",
    "cached": false,
    "timestamp": "2024-02-20T01:23:39.329729Z",
    "status_code": 200,
    "stream": false,
    "latency": 1.8415491580963135,
    "scores": {},
    "category": "Questions",
    "metadata": {},
    "routing_time": 0.18612787732854486,
    "full_request": {
      "messages": [
        {
          "role": "user",
          "content": "How are you doing today?"
        }
      ],
      "model": "claude-2",
      "logprobs": true
    },
    "sentiment_score": 0
  }
}
//... other chunks ...
// The following is the last chunk
{
  "id": "request_breakdown",
  "choices": [
    {
      "delta": { "content": null, "role": "assistant" },
      "finish_reason": "stop",
      "request_breakdown": {
        "prompt_tokens": 6,
        "completion_tokens": 9,
        "cost": 4.8e-5, //  In usd
        "prompt_messages": [
          {
            "role": "user",
            "content": "How are you doing today?"
          }
        ],
        "completion_message": {
          "content": " I'm doing well, thanks for asking!",
          "role": "assistant"
        },
        "model": "claude-2",
        "cached": false,
        "timestamp": "2024-02-20T01:23:39.329729Z",
        "status_code": 200,
        "stream": false,
        "latency": 1.8415491580963135, // in seconds
        "scores": {},
        "category": "Questions",
        "metadata": {},
        "routing_time": 0.18612787732854486, // in seconds
        "full_request": {
          "messages": [
            {
              "role": "user",
              "content": "How are you doing today?"
            }
          ],
          "model": "claude-2",
          "logprobs": true
        },
        "sentiment_score": 0
      },
      "index": 0,
      "message": { "content": null, "role": "assistant" }
    }
  ],
  "created": 1706100589,
  "model": "extra_parameter",
  "object": "chat.completion.chunk",
  "system_fingerprint": null,
  "usage": {}
}
models
array

Specify the list of models for the router to choose between. If not specified, all models will be used. See the list of models here

If only one model is specified, it will be treated as if model parameter is used and the router will not trigger.

fallback_models
array

Specify the list of backup models (ranked by priority) to respond in case of a failure in the primary model. See the list of models here

customer_credentials
object

You can pass in a dictionary of your customer’s credentials and deployment variables for supported providers and use their credits when the router is calling models from those providers.

{
  "openai": {
    "api_key": "sk-DEacrWTDndFYhdcLYKF6T3BlbkdfghuyTj4sYL2v1EDhg3iz5",
    "some_vars": "some_values"
  },
  "provider_id": {
    "some_provider_var_names": "some_provider_var_values"
  }
}
customer_identifier
string

Use this as a tag to identify the user associated with the API call.

metadata
dict

You can add any key-value pair to this metadata field for your reference, contact team@keywordsai.co if you need extra parameter support for your use case.

Example:

{
  "my_value_key": "my_value"
}
disable_log
boolean

When set to true, only the request and the performance metrics will be recorded, input and output messages will be omitted from the log.

exclude_models
array
default:
[]

The list of models to exclude from the router’s selection. See the list of models here

This only excludes models in the router, if model parameter will take precedence over this parameter, andfallback_models and safety net will still use the excluded models to catch failures.

exclude_providers
array
default:
[]

The list of providers to exclude from the router’s selection. All models under the provider will be excluded. See the list of providers here

This only excludes models in the router, if model parameter will take precedence over this parameter, andfallback_models and safety net will still use the excluded models to catch failures.

Deprecated Parameters

customer_api_keys
object

You can pass in a dictionary of your customer’s API keys for specific models. If the router selects a model that is in the dictionary, it will attempt to use the customer’s API key for calling the model before using your integration API key or Keywords AI’s default API key.

{
  "gpt-3.5-turbo": "your_customer_api_key",
  "gpt-4": "your_customer_api_key",
}