POST
/
api
/
request-logs
/
create
/

The Async logging endpoint allows you to directly log an LLM inference to Keywords AI, instead of using Keywords AI as a proxy with the chat completion endpoint.

model
string
required

Model used for the LLM inference. Default is an empty string. See the list of model here

prompt_messages
array
required

An Array of prompt messages. Default is an empty list.

"prompt_messages": [
  {
    "role": "user",
    "content": "Hi"
  },
  # optional function call
  {
    "role": "tool",
    "tool_call_id": "your tool call id",
    "content": "...." # tool call content
  }
],
completion_message
dict
required

Completion message in JSON format. Default is an empty dictionary.

"completion_message": {
    "role": "assistant",
    "content": "Hi, how can I assist you today?"
},
tool_choice
object

Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message.

tools
array

A list of tools the model may call. Currently, only functions are supported as a tool.

stream
boolean

Whether the LLM inference was streamed. Default is false.

customer_identifier
string

An identifier for the customer that invoked this LLM inference, helps with visualizing user activities. Default is an empty string. See the details of customer identifier here.

metadata
dict

You can add any key-value pair to this metadata field for your reference.

prompt_tokens
integer

Number of tokens in the prompt.

completion_tokens
integer

Number of tokens in the completion.

prompt_unit_price
number

Pass this parameter in if you want to log your self-host / fine-tuned model.

completion_unit_price
number

Pass this parameter in if you want to log your self-host / fine-tuned model.

cost
float
default: 0

Cost of the inference in US dollars.

generation_time
float
default: 0

Total generation time. Generation time = TTFT (Time To First Token) + TPOT (Time Per Output Token) * #tokens. Do not confuse this with ttft.

ttft
float
default: 0

Time to first token. The time it takes for the model to generate the first token after receiving a request.

tokens_per_second
float
default: 0

The number of tokens generated per second.

error_message
text

Error message if the LLM inference failed. Default is an empty string.

status_code
integer
default: 200

The status code of the LLM inference. Default is 200 (ok). See supported status codes here.

has_warnings
boolean

Whether the LLM inference has any warnings. Default is false.

warnings
string

Any warnings that occurred during the LLM inference. You could pass a warning message here. Default is an empty string.

is_test
boolean

Whether the LLM inference is a test call. If set to true, you will only be able to see this log in Test Mode in the platform. Default is false.