Parameters
These are the essential parameters needed for basic LLM request logging.Core required fields
The model’s response message in JSON format.
Example
Example
Telemetry
Performance metrics and cost tracking for monitoring LLM efficiency.Number of tokens in the prompt.
Number of tokens in the completion.
Cost of the inference in US dollars.
Total generation time. Generation time = TTFT (Time To First Token) + TPOT (Time Per Output Token) * #tokens. Do not confuse this with
ttft
.The unit of generation time is seconds.
Time to first token. The time it takes for the model to generate the first token after receiving a request.
The unit of ttft is seconds.
Metadata
Custom tracking and identification parameters for advanced analytics and filtering.You can add any key-value pair to this metadata field for your reference.
Example
Example
Parameters related to the customer.
Properties
Properties
An identifier for the customer that invoked this LLM inference, helps with visualizing user activities. See the details of customer identifier here.
Name of the customer.
Email of the customer.
Example
Example
Group identifier. Use group identifier to group logs together.
A unique identifier for the thread.
Same functionality as
metadata
, but it’s faster to query since it’s indexed.Example
Example
Advanced Parameters
Tool Calls and Function Calling
A list of tools the model may call. Currently, only functions are supported as a tool.
Controls which (if any) tool is called by the model.
none
means the model will not call any tool and instead generates a message.Response Configuration
Setting to
{ "type": "json_schema", "json_schema": {...} }
enables Structured Outputs which ensures the model will match your supplied JSON schema.Possible types
Possible types
Default response format. Used to generate text responses.
Properties
Properties
The type of response format being defined. Always
text
.Model Configuration
Controls randomness in the output in the range of 0-2, higher temperature will result in more random responses.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.We generally recommend altering this or temperature but not both.
Penalizes new tokens based on their frequency in the text so far. Decreases the model’s likelihood of repeating the same line verbatim.
Penalizes new tokens based on whether they appear in the text so far. Increases the model’s likelihood of talking about new topics.
Stop sequences where the API will stop generating further tokens.
Error Handling and Status
The status code of the LLM inference. Default is 200 (ok). See supported status codes here.
Supported status codes
Supported status codes
We support all status codes that are valid HTTP status codes.
200,201,204,301,304,400, 401,402,403,404,405,415,422,429,500,502,503,504
etc.Error message if the LLM inference failed. Default is an empty string.
Any warnings that occurred during the LLM inference. You could pass a warning message here. Default is an empty string.
Additional Configuration
Whether the LLM inference was streamed. Default is false.
Whether the prompt is a custom prompt. Default is
False
.ID of the prompt. If you want to log a custom prompt ID, you need to pass
is_custom_prompt
as True
. Otherwise, use the Prompt ID in Prompts.Name of the prompt.
The full request object. Default is an empty dictionary. This is optional and it is helpful for logging configurations such as
temperature
, presence_penalty
etc.completion_messages
, tool_calls
will be automatically extracted from full_requestUse this parameter to control the behavior of the Keywords AI API. Default is an empty dictionary.
Properties
Properties
If false, the server will immediately return a status of whether the logging task is initialized successfully with no log data.
Example
Example
Pricing Configuration
Pass this parameter in if you want to log your self-host / fine-tuned model.
Example
Example
Pass this parameter in if you want to log your self-host / fine-tuned model.
Example
Example
Usage Details
Usage details for the LLM inference. Currently, only support Prompt Caching.
Whether the user liked the output.
True
means the user liked the output.