Async logging (recommended)
The Async logging endpoint allows you to directly log an LLM inference to Keywords AI, instead of using Keywords AI as a proxy with the chat completion endpoint.
Model used for the LLM inference. Default is an empty string. See the list of model here
An Array of prompt messages. Default is an empty list.
"prompt_messages": [
{
"role": "user",
"content": "Hi"
},
# optional function call
{
"role": "tool",
"tool_call_id": "your tool call id",
"content": "...." # tool call content
}
],
Completion message in JSON format. Default is an empty dictionary.
"completion_message": {
"role": "assistant",
"content": "Hi, how can I assist you today?"
},
Cost of the inference in US dollars.
Number of tokens in the completion.
Pass this parameter in if you want to log your self-host / fine-tuned model.
An identifier for the customer that invoked this LLM inference, helps with visualizing user activities. Default is an empty string. See the details of customer identifier here.
Error message if the LLM inference failed. Default is an empty string.
The full request object. Default is an empty dictionary. This is optional and it is helpful for logging configurations such as temperature
, precence_penalty
etc.
completion_messages
, tool_calls
will be automatically extracted from full_request"full_request": {
"temperature": 0.5,
"top_p": 0.5,
#... other parameters
},
Total generation time. Generation time = TTFT (Time To First Token) + TPOT (Time Per Output Token) * #tokens. Do not confuse this with ttft
.
You can add any key-value pair to this metadata field for your reference.
Number of tokens in the prompt.
Pass this parameter in if you want to log your self-host / fine-tuned model.
The format of the response.
Whether the LLM inference was streamed. Default is false.
The status code of the LLM inference. Default is 200 (ok). See supported status codes here.
A list of tools the model may call. Currently, only functions are supported as a tool.
Controls which (if any) tool is called by the model. none
means the model will not call any tool and instead generates a message.
Time to first token. The time it takes for the model to generate the first token after receiving a request.
Any warnings that occurred during the LLM inference. You could pass a warning message here. Default is an empty string.