LLM Proxy
OpenAI compatible parameters
To use Keywords AI parameters, you can pass them in the extra_body
parameter if you’re using the OpenAI SDK.
List of messages to send to the endpoint in the OpenAI style, each of them following this format:
Properties
Properties
The type of response format. Options: json_object
or text
Image processing: If you want to use the image processing feature, you need to use the following format to upload the image.
Example
Example
Specify which model to use. See the list of model here
loadbalance_models
parameter.Whether to stream back partial progress token by token
A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide an array of functions the model may generate JSON inputs for.
Example
Example
Controls which (if any) tool is called by the model. none
means the model will not call any tool and instead generates a message. auto
means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
none
is the default when no tools are present. auto
is the default if tools are present.
Specifying a particular tool via the code below forces the model to call that tool.
Specify how much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood of repeating the same line verbatim
Maximum number of tokens to generate in the response
Controls randomness in the output in the range of 0-2, higher temperature will a more random response.
How many chat completion choices are generated for each input message.
Caveat! While this can help improve generation quality by picking the optimal choice, this could also lead to more token usage.
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content
of message
.
Echo back the prompt in addition to the completion
Stop sequence
Specify how much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood of talking about new topics
Used to modify the probability of tokens appearing in the response
An object specifying the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106.
Setting to { "type": "json_object" }
enables JSON mode, which guarantees the message the model generates is valid JSON.
You must have a “json” as a keyword in the prompt to use this feature.
Properties
Properties
The type of response format. options: json_object
, text
Vertex AI example
Vertex AI example
If you are using Vertex AI and want to use JSON mode, you should specify a response_schema
in the response_format
parameter. Check the details of response schema here.
Whether to enable parallel function calling during tool use.
Keywords AI parameters
See how to make a standard Keywords AI API call in the Quick Start guide.
Generation parameters
Balance the load of your requests between different models. See the details of load balancing here.
model
parameter Example
Example
Example code with adding credentials
Example code with adding credentials
The models
field will overwrite the load_balance_group
you specified in the UI.
Specify the list of backup models (ranked by priority) to respond in case of a failure in the primary model. See the details of fallback models here.
Example
Example
You can pass in your customer’s credentials for supported providers and use their credits when our proxy is calling models from those providers.
See details here
Example
Example
One-off credential overrides. Instead of using what is uploaded for each provider, this targets credentials for individual models.
Go to provider page to see how to add your own credentials and override them for a specific model.
Example
Example
Enable or disable caches. Check the details of caches here.
Example
Example
This parameter specifies the time-to-live (TTL) for the cache in seconds.
Example
Example
This parameter specifies the cache options. Currently we support cache_by_customer
option, you can set it to true
or false
. If cache_by_customer
is set to true
, the cache will be stored by the customer identifier.
The prompt template to use for the completion. You can build and deploy prompts in the Prompt.
Properties
Properties
The ID of the prompt to use. You can find this on the Prompts page.
The variables to replace in the prompt template.
With echo on, the response body will have an extra field. This is an optional parameter.
Turn on override to use params in override_params
instead of the params in the prompt.
You can put any OpenAI chat/completions parameters here to override the prompt’s parameters. This will only work if override
is set to true
.
Example
Example
Enable or disable retries and set the number of retries and the time to wait before retrying. Check the details of retries here.
When set to true, only the request and performance metrics will be recorded, input and output messages will be omitted from the log.
Example
Example
Specify the list of models for the Keywords AI LLM router to choose between. If not specified, all models will be used. See the list of models here
If only one model is specified, it will be treated as if the model
parameter is used and the router will not trigger.
When the model
parameter is used, the router will not trigger, and this parameter behaves as fallback_models
.
The list of providers to exclude from the LLM router’s selection. All models under the provider will be excluded. See the list of providers here
This only excludes providers in the LLM router, if model
parameter takes precedence over this parameter, andfallback_models
and safety net will still use the excluded models to catch failures.
The list of models to exclude from the LLM router’s selection. See the list of models here
This only excludes models in the LLM router, if model
parameter takes precedence over this parameter, andfallback_models
and safety net will still use the excluded models to catch failures.
Observability parameters
You can add any key-value pair to this metadata field for your reference. Check the details of metadata here.
Contact team@keywordsai.co if you need extra parameter support for your use case.
Example
Example
You can use this parameter to send an extra custom tag with your request. This will help you to identify LLM logs faster than metadata
parameter, because it’s indexed. You can see it in Logs with name Custom ID
field.
Example
Example
Use this as a tag to identify the user associated with the API call. See the details of customer identifier here.
Example
Example
Pass the customer’s parameters in the API call to monitor the user’s data in the Keywords AI platform. See how to get insights into your users’ data here
Properties
Properties
The unique identifier for the customer. It can be any string.
Group identifier. Use group identifier to group logs together.
The name of the customer. It can be any string.
The email of the customer. It shoud be a valid email.
The start date of the period. It should be in the format YYYY-MM-DD
.
The start date of the period. It should be in the format YYYY-MM-DD
.
Choices are yearly
, monthly
, weekly
, and daily
The budget for the period. It should be a float.
The markup percentage for the period. Usage report of your customers through this key will be increased by this percentge.
The total budget for a user.
Adding this returns the summarization of the response in the response body. If streaming is on, the metrics will be streamed as the last chunk.
Properties
Properties
Evals parameters
Whether the user liked the output. True
means the user liked the output.
Deprecated parameters
You can pass in a dictionary of your customer’s API keys for specific models. If the router selects a model that is in the dictionary, it will attempt to use the customer’s API key for calling the model before using your integration API key or Keywords AI’s default API key.
Balance the load of your requests between different models. See the details of load balancing here.
model
parameter. Example
Example