POST
/
api
/
request-logs
/
create
/

The Async logging endpoint allows you to directly log an LLM inference to Keywords AI, instead of using Keywords AI as a proxy with the chat completion endpoint.

model
string
required

Model used for the LLM inference. Default is an empty string. See the list of model here

prompt_messages
array
required

An Array of prompt messages. Default is an empty list.

"prompt_messages": [
  {
    "role": "user",
    "content": "Hi"
  },
  # optional function call
  {
    "role": "tool",
    "tool_call_id": "your tool call id",
    "content": "...." # tool call content
  }
],
completion_message
dict
required

Completion message in JSON format. Default is an empty dictionary.

"completion_message": {
    "role": "assistant",
    "content": "Hi, how can I assist you today?"
},
cost
float
default: 0

Cost of the inference in US dollars.

completion_tokens
integer

Number of tokens in the completion.

completion_unit_price
number

Pass this parameter in if you want to log your self-host / fine-tuned model.

customer_identifier
string

An identifier for the customer that invoked this LLM inference, helps with visualizing user activities. Default is an empty string. See the details of customer identifier here.

error_message
text

Error message if the LLM inference failed. Default is an empty string.

full_request
object

The full request object. Default is an empty dictionary. This is optional and it is helpful for logging configurations such as temperature, precence_penalty etc.

completion_messages, tool_calls will be automatically extracted from full_request

"full_request": {
"temperature": 0.5,
"top_p": 0.5,
#... other parameters
},

generation_time
float
default: 0

Total generation time. Generation time = TTFT (Time To First Token) + TPOT (Time Per Output Token) * #tokens. Do not confuse this with ttft.

metadata
dict

You can add any key-value pair to this metadata field for your reference.

prompt_tokens
integer

Number of tokens in the prompt.

prompt_unit_price
number

Pass this parameter in if you want to log your self-host / fine-tuned model.

response_format
object

The format of the response.

stream
boolean

Whether the LLM inference was streamed. Default is false.

status_code
integer
default: 200

The status code of the LLM inference. Default is 200 (ok). See supported status codes here.

tools
array

A list of tools the model may call. Currently, only functions are supported as a tool.

tool_choice
object

Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message.

ttft
float
default: 0

Time to first token. The time it takes for the model to generate the first token after receiving a request.

warnings
string

Any warnings that occurred during the LLM inference. You could pass a warning message here. Default is an empty string.