Logging API
The Async logging endpoint allows you to directly log an LLM inference to Keywords AI, instead of using Keywords AI as a proxy with the chat completion endpoint.
Body Parameters
Model used for the LLM inference. Default is an empty string. See the list of model here
An Array of prompt messages. Default is an empty list.
Completion message in JSON format. Default is an empty dictionary.
Cost of the inference in US dollars.
Number of tokens in the completion.
Pass this parameter in if you want to log your self-host / fine-tuned model.
Example
Example
Parameters related to the customer. Default is an empty dictionary.
Properties
Properties
An identifier for the customer that invoked this LLM inference, helps with visualizing user activities. Default is an empty string. See the details of customer identifier here.
Name of the customer. Default is an empty string.
Email of the customer. Default is an empty string.
Example
Example
Same functionality as metadata
, but it’s faster to query since it’s indexed.
Example
Example
Error message if the LLM inference failed. Default is an empty string.
The full request object. Default is an empty dictionary. This is optional and it is helpful for logging configurations such as temperature
, precence_penalty
etc.
completion_messages
, tool_calls
will be automatically extracted from full_requestSpecify how much to penalize new tokens based on their existing frequency in the text so far. Decreases the model’s likelihood of repeating the same line verbatim
Total generation time. Generation time = TTFT (Time To First Token) + TPOT (Time Per Output Token) * #tokens. Do not confuse this with ttft
.
The unit of generation time is seconds.
Group identifier. Use group identifier to group logs together.
Whether the prompt is a custom prompt. Default is False
.
Use this parameter to control the behavior of the Keywords AI API. Default is an empty dictionary.
Properties
Properties
If false, the server will immediately return a status of whether the logging task is initialized successfully with no log data.
Example
Example
You can add any key-value pair to this metadata field for your reference.
Example
Example
Whether the user liked the output. True
means the user liked the output.
Specify how much to penalize new tokens based on whether they appear in the text so far. Increases the model’s likelihood of talking about new topics
ID of the prompt. If you want to log a custom prompt ID, you need to pass is_custom_prompt
as True
. Otherwise, use the Prompt ID in Prompts.
Name of the prompt.
Number of tokens in the prompt.
Pass this parameter in if you want to log your self-host / fine-tuned model.
Example
Example
Setting to { "type": "json_schema", "json_schema": {...} }
enables Structured Outputs which ensures the model will match your supplied JSON schema.
Setting to { "type": "json_object" }
enables the older JSON mode, which ensures the message the model generates is valid JSON. Using json_schema is preferred for models that support it.
Possible types
Possible types
Default response format. Used to generate text responses.
Properties
Properties
The type of response format being defined. Always text
.
JSON Schema response format. Used to generate structured JSON responses.
Properties
Properties
The type of response format being defined. Always json_schema
.
Structured Outputs configuration options, including a JSON Schema.
Properties
Properties
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
A description of what the response format is for, used by the model to determine how to respond in the format.
Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema
field. Only a subset of JSON Schema is supported when strict
is true
.
The schema for the response format, described as a JSON Schema object.
JSON object response format. An older method of generating JSON responses. Using json_schema is recommended for models that support it. Note that the model will not generate JSON without a system or user message instructing it to do so.
Properties
Properties
The type of response format being defined. Always json_object
.
Whether the LLM inference was streamed. Default is false.
The status code of the LLM inference. Default is 200 (ok). See supported status codes here.
Supported status codes
Supported status codes
We support all status codes that is a valid HTTP status.
200,201,204,301,304,400, 401,402,403,404,405,415,422,429,500,502,503,504
etc.
Stop sequence
Controls randomness in the output in the range of 0-2, higher temperature will a more random response.
A unique identifier for the thread.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
Time to first token. The time it takes for the model to generate the first token after receiving a request.
The unit of ttft is seconds.
Usage details for the LLM inference. Currently, only support Prompt Caching.
Any warnings that occurred during the LLM inference. You could pass a warning message here. Default is an empty string.