POST
/
api
/
request-logs
/
create

The Async logging endpoint allows you to directly log an LLM inference to Keywords AI, instead of using Keywords AI as a proxy with the chat completion endpoint.

model
string
required

Model used for the LLM inference. Default is an empty string. See the list of model here

prompt_messages
array
required

An Array of prompt messages. Default is an empty list.

"prompt_messages": [
  {
    "role": "user",
    "content": "Hi"
  },
  # optional function call
  {
    "role": "tool",
    "tool_call_id": "your tool call id",
    "content": "...." # tool call content
  }
],
completion_message
dict
required

Completion message in JSON format. Default is an empty dictionary.

"completion_message": {
    "role": "assistant",
    "content": "Hi, how can I assist you today?"
},
cost
float
default:
0

Cost of the inference in US dollars.

completion_tokens
integer

Number of tokens in the completion.

completion_unit_price
number

Pass this parameter in if you want to log your self-host / fine-tuned model.

customer_params
string

Parameters related to the customer. Default is an empty dictionary.

error_message
text

Error message if the LLM inference failed. Default is an empty string.

full_request
object

The full request object. Default is an empty dictionary. This is optional and it is helpful for logging configurations such as temperature, precence_penalty etc.

completion_messages, tool_calls will be automatically extracted from full_request

{
"full_request": {
"temperature": 0.5,
"top_p": 0.5,
//... other parameters
},
}

frequency_penalty
number

Specify how much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood of repeating the same line verbatim

generation_time
float
default:
0

Total generation time. Generation time = TTFT (Time To First Token) + TPOT (Time Per Output Token) * #tokens. Do not confuse this with ttft.

keywordsai_api_controls
object

Use this parameter to control the behavior of the Keywords AI API. Default is an empty dictionary.

metadata
dict

You can add any key-value pair to this metadata field for your reference.

presence_penalty
number

Specify how much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood of talking about new topics

prompt_tokens
integer

Number of tokens in the prompt.

prompt_unit_price
number

Pass this parameter in if you want to log your self-host / fine-tuned model.

response_format
object

The format of the response.

stream
boolean

Whether the LLM inference was streamed. Default is false.

status_code
integer
default:
200

The status code of the LLM inference. Default is 200 (ok). See supported status codes here.

stop
array[string]

Stop sequence

temperature
number
default:
1

Controls randomness in the output in the range of 0-2, higher temperature will a more random response.

tools
array

A list of tools the model may call. Currently, only functions are supported as a tool.

tool_choice
object

Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message.

top_p
number
default:
1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

ttft
float
default:
0

Time to first token. The time it takes for the model to generate the first token after receiving a request.

usage
object

Usage details for the LLM inference. Currently, only support Prompt Caching.

warnings
string

Any warnings that occurred during the LLM inference. You could pass a warning message here. Default is an empty string.