Async logging (recommended)
The Async logging endpoint allows you to directly log an LLM inference to Keywords AI, instead of using Keywords AI as a proxy with the chat completion endpoint.
Model used for the LLM inference. Default is an empty string. See the list of model here
An Array of prompt messages. Default is an empty list.
Completion message in JSON format. Default is an empty dictionary.
Cost of the inference in US dollars.
Number of tokens in the completion.
Pass this parameter in if you want to log your self-host / fine-tuned model.
Parameters related to the customer. Default is an empty dictionary.
Error message if the LLM inference failed. Default is an empty string.
The full request object. Default is an empty dictionary. This is optional and it is helpful for logging configurations such as temperature
, precence_penalty
etc.
completion_messages
, tool_calls
will be automatically extracted from full_requestSpecify how much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood of repeating the same line verbatim
Total generation time. Generation time = TTFT (Time To First Token) + TPOT (Time Per Output Token) * #tokens. Do not confuse this with ttft
.
Use this parameter to control the behavior of the Keywords AI API. Default is an empty dictionary.
You can add any key-value pair to this metadata field for your reference.
Specify how much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood of talking about new topics
Number of tokens in the prompt.
Pass this parameter in if you want to log your self-host / fine-tuned model.
The format of the response.
Whether the LLM inference was streamed. Default is false.
The status code of the LLM inference. Default is 200 (ok). See supported status codes here.
Stop sequence
Controls randomness in the output in the range of 0-2, higher temperature will a more random response.
A list of tools the model may call. Currently, only functions are supported as a tool.
Controls which (if any) tool is called by the model. none
means the model will not call any tool and instead generates a message.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
Time to first token. The time it takes for the model to generate the first token after receiving a request.
Any warnings that occurred during the LLM inference. You could pass a warning message here. Default is an empty string.