What is Google Gen AI SDK?
Keywords AI is compatible with the official Google Gen AI SDK, enabling you to use Google’s Gemini models through our gateway with full observability, monitoring, and advanced features.
This integration is for the Keywords AI gateway.
Resources
Steps to use
Step 1: Install the SDK
Install the official Google Gen AI SDK for Python. Step 2: Initialize the client
Initialize the client with your Keywords AI API key and set the base URL to Keywords AI’s endpoint.from google import GenAI
import os
client = GenAI.Client(
api_key=os.environ.get("KEYWORDSAI_API_KEY"),
http_options={
"base_url": "https://api.keywordsai.co/api/google/gemini",
}
)
The base_url can be either https://api.keywordsai.co/api/google/gemini or https://endpoint.keywordsai.co/api/google/gemini.
Step 3: Make your first request
Now you can use the client to make requests to Google’s models.response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Hello, world!",
)
print(response.text)
Step 4: Switch models
To switch between different Google models, simply change the model parameter.response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents="Tell me a joke.",
)
Step 5: Configure parameters
Use GenerateContentConfig to control model behavior with various parameters.from google.GenAI import types
config = types.GenerateContentConfig(
temperature=0.9,
top_k=1,
top_p=1,
max_output_tokens=2048,
)
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="What is the capital of France?",
config=config,
)
Step 6: Advanced configuration
Here’s a comprehensive example showcasing various parameters, including system instructions, safety settings, and tools.from google import GenAI
from google.GenAI import types
import os
client = GenAI.Client(
api_key=os.environ.get("KEYWORDSAI_API_KEY"),
http_options={
"base_url": "https://api.keywordsai.co/api/google/gemini",
}
)
# Example: Configure tools for grounding
grounding_tool = types.Tool(
google_search=types.GoogleSearch()
)
# Example: Comprehensive GenerateContentConfig showcasing various parameters
config = types.GenerateContentConfig(
# System instruction to guide the model's behavior
system_instruction="You are a helpful assistant that provides accurate, concise information about sports events.",
# Sampling parameters
temperature=0.7, # Controls randomness (0.0-1.0). Lower = more focused, Higher = more creative
top_p=0.95, # Nucleus sampling. Tokens with cumulative probability up to this value are considered
top_k=40, # Top-k sampling. Considers this many top tokens at each step
# Output controls
max_output_tokens=1024, # Maximum number of tokens in the response
stop_sequences=["\n\n\n"], # Sequences that will stop generation
# Tools and function calling
tools=[grounding_tool], # Enable Google Search grounding
# Thinking configuration (for models that support it)
thinking_config=types.ThinkingConfig(thinking_budget=0), # Disables thinking mode
# Response format options
# response_mime_type="application/json", # Uncomment for JSON output
# response_schema=types.Schema( # Uncomment to enforce structured output
# type=types.Type.OBJECT,
# properties={
# "winner": types.Schema(type=types.Type.STRING),
# "year": types.Schema(type=types.Type.INTEGER)
# }
# ),
# Safety settings
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
),
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
)
],
# Diversity controls
presence_penalty=0.0, # Penalize tokens based on presence in text (-2.0 to 2.0)
frequency_penalty=0.0, # Penalize tokens based on frequency in text (-2.0 to 2.0)
# Reproducibility
# seed=42, # Uncomment to make responses more deterministic
# Logprobs (for token analysis)
# response_logprobs=True, # Uncomment to get log probabilities
# logprobs=5, # Number of top candidate tokens to return logprobs for
)
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Who won the euro 2024?",
config=config,
)
print(response.text)
Step 1: Install the SDK
Install the official Google Gen AI SDK for TypeScript.npm install @google/GenAI
Step 2: Initialize the client
Initialize the client with your Keywords AI API key and set the base URL to Keywords AI’s endpoint.import { GoogleGenAI } from "@google/GenAI";
const GenAI = new GoogleGenAI({
apiKey: process.env.KEYWORDSAI_API_KEY,
httpOptions: {
baseUrl: "https://api.keywordsai.co/api/google/gemini",
},
});
The baseUrl can be either https://api.keywordsai.co/api/google/gemini or https://endpoint.keywordsai.co/api/google/gemini.
Step 3: Make your first request
Now you can use the client to make requests to Google’s models.async function run() {
const result = await GenAI.models.generateContent({
model: "gemini-2.5-flash",
contents: [{ role: "user", parts: [{ text: "Hello, world!" }] }]
});
console.log(result.text);
}
run();
Step 4: Switch models
To switch between different Google models, simply change the model parameter.const result = await GenAI.models.generateContent({
model: "gemini-2.0-flash-exp",
contents: [{ role: "user", parts: [{ text: "Tell me a joke." }] }]
});
Step 5: Configure parameters
Use the config parameter to control model behavior with various parameters.const result = await GenAI.models.generateContent({
model: "gemini-2.5-flash",
contents: [{ role: "user", parts: [{ text: "What is the capital of France?" }] }],
config: {
temperature: 0.9,
topK: 1,
topP: 1,
maxOutputTokens: 2048,
},
});
Step 6: Advanced configuration
Here’s a comprehensive example showcasing various parameters, including system instructions, safety settings, and tools.import { GoogleGenAI } from "@google/GenAI";
const GenAI = new GoogleGenAI({
apiKey: process.env.KEYWORDSAI_API_KEY,
httpOptions: {
baseUrl: "https://api.keywordsai.co/api/google/gemini",
},
});
// Example: Comprehensive GenerateContentConfig showcasing various parameters
async function run() {
const response = await GenAI.models.generateContent({
model: "gemini-2.5-flash",
contents: [
{
role: "user",
parts: [{ text: "Who won the euro 2024?" }]
}
],
config: {
// System instruction to guide the model's behavior
systemInstruction: "You are a helpful assistant that provides accurate, concise information about sports events.",
// Sampling parameters
temperature: 0.7, // Controls randomness (0.0-1.0). Lower = more focused, Higher = more creative
topP: 0.95, // Nucleus sampling. Tokens with cumulative probability up to this value are considered
topK: 40, // Top-k sampling. Considers this many top tokens at each step
// Output controls
maxOutputTokens: 1024, // Maximum number of tokens in the response
stopSequences: ["\n\n\n"], // Sequences that will stop generation
// Tools and function calling
tools: [
{
googleSearch: {} // Enable Google Search grounding
}
],
// Thinking configuration (for models that support it)
thinkingConfig: {
thinkingBudget: 0 // Disables thinking mode
},
// Response format options
// responseMimeType: "application/json", // Uncomment for JSON output
// responseSchema: { // Uncomment to enforce structured output
// type: "OBJECT",
// properties: {
// winner: { type: "STRING" },
// year: { type: "INTEGER" }
// }
// },
// Safety settings
safetySettings: [
{
category: "HARM_CATEGORY_HATE_SPEECH",
threshold: "BLOCK_MEDIUM_AND_ABOVE"
},
{
category: "HARM_CATEGORY_DANGEROUS_CONTENT",
threshold: "BLOCK_MEDIUM_AND_ABOVE"
}
],
// Diversity controls
presencePenalty: 0.0, // Penalize tokens based on presence in text (-2.0 to 2.0)
frequencyPenalty: 0.0, // Penalize tokens based on frequency in text (-2.0 to 2.0)
// Reproducibility
// seed: 42, // Uncomment to make responses more deterministic
// Logprobs (for token analysis)
// responseLogprobs: true, // Uncomment to get log probabilities
// logprobs: 5, // Number of top candidate tokens to return logprobs for
},
});
console.log(response.text);
}
run();
Configuration Parameters
The GenerateContentConfig supports a wide range of parameters to control model behavior:
System Instructions
system_instruction: Sets the role and behavior guidelines for the model. This helps maintain consistent personality and response style throughout the conversation.
Sampling Parameters
temperature (0.0-1.0): Controls randomness in responses. Lower values (0.0-0.3) make output more focused and deterministic, while higher values (0.7-1.0) increase creativity and variation.
top_p (0.0-1.0): Nucleus sampling parameter. The model considers tokens with cumulative probability up to this value. Lower values make responses more focused.
top_k: Limits the number of highest probability tokens considered at each step. Helps balance between creativity and coherence.
Output Controls
max_output_tokens: Maximum number of tokens in the generated response. Helps control response length and costs.
stop_sequences: Array of strings that will stop generation when encountered. Useful for controlling output format.
tools: Array of tools the model can use, such as Google Search for grounding responses in real-time information.
google_search: Enables the model to search the web for up-to-date information before generating responses.
Thinking Configuration
thinking_config: Controls the model’s internal reasoning process for models that support thinking mode.
thinking_budget: Amount of tokens allocated for internal reasoning. Set to 0 to disable thinking mode.
Structured Output
response_mime_type: Specify the output format (e.g., “application/json” for JSON responses).
response_schema: Define the exact structure of JSON output using a schema. Ensures responses follow a specific format.
Safety Settings
safety_settings: Array of safety configurations to filter harmful content across different categories:
HARM_CATEGORY_HATE_SPEECH: Hate speech and discriminatory content
HARM_CATEGORY_DANGEROUS_CONTENT: Dangerous or harmful instructions
HARM_CATEGORY_HARASSMENT: Harassment and bullying
HARM_CATEGORY_SEXUALLY_EXPLICIT: Sexually explicit content
Threshold options:
BLOCK_NONE: Don’t block any content
BLOCK_ONLY_HIGH: Block only high-severity content
BLOCK_MEDIUM_AND_ABOVE: Block medium and high-severity content
BLOCK_LOW_AND_ABOVE: Block low, medium, and high-severity content
Diversity Controls
presence_penalty (-2.0 to 2.0): Penalizes tokens based on whether they appear in the text. Positive values encourage the model to talk about new topics.
frequency_penalty (-2.0 to 2.0): Penalizes tokens based on their frequency in the text. Positive values reduce repetition.
Reproducibility
seed: Integer value for deterministic output. Using the same seed with identical inputs will produce similar outputs (not guaranteed to be exactly identical due to model updates).
Token Analysis
response_logprobs: When enabled, returns log probabilities for generated tokens. Useful for analyzing model confidence.
logprobs: Number of top candidate tokens to return log probabilities for at each position.