Skip to main content

What is Google Gen AI SDK?

Keywords AI is compatible with the official Google Gen AI SDK, enabling you to use Google’s Gemini models through our gateway with full observability, monitoring, and advanced features.
This integration is for the Keywords AI gateway.

Resources

Steps to use

1

Step 1: Install the SDK

Install the official Google Gen AI SDK for Python.
pip install google-GenAI
2

Step 2: Initialize the client

Initialize the client with your Keywords AI API key and set the base URL to Keywords AI’s endpoint.
from google import GenAI
import os

client = GenAI.Client(
    api_key=os.environ.get("KEYWORDSAI_API_KEY"),
    http_options={
        "base_url": "https://api.keywordsai.co/api/google/gemini",
    }
)
The base_url can be either https://api.keywordsai.co/api/google/gemini or https://endpoint.keywordsai.co/api/google/gemini.
3

Step 3: Make your first request

Now you can use the client to make requests to Google’s models.
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Hello, world!",
)

print(response.text)
4

Step 4: Switch models

To switch between different Google models, simply change the model parameter.
response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents="Tell me a joke.",
)
5

Step 5: Configure parameters

Use GenerateContentConfig to control model behavior with various parameters.
from google.GenAI import types

config = types.GenerateContentConfig(
    temperature=0.9,
    top_k=1,
    top_p=1,
    max_output_tokens=2048,
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is the capital of France?",
    config=config,
)
6

Step 6: Advanced configuration

Here’s a comprehensive example showcasing various parameters, including system instructions, safety settings, and tools.
from google import GenAI
from google.GenAI import types
import os

client = GenAI.Client(
    api_key=os.environ.get("KEYWORDSAI_API_KEY"),
    http_options={
        "base_url": "https://api.keywordsai.co/api/google/gemini",
    }
)

# Example: Configure tools for grounding
grounding_tool = types.Tool(
    google_search=types.GoogleSearch()
)

# Example: Comprehensive GenerateContentConfig showcasing various parameters
config = types.GenerateContentConfig(
    # System instruction to guide the model's behavior
    system_instruction="You are a helpful assistant that provides accurate, concise information about sports events.",
    
    # Sampling parameters
    temperature=0.7,  # Controls randomness (0.0-1.0). Lower = more focused, Higher = more creative
    top_p=0.95,  # Nucleus sampling. Tokens with cumulative probability up to this value are considered
    top_k=40,  # Top-k sampling. Considers this many top tokens at each step
    
    # Output controls
    max_output_tokens=1024,  # Maximum number of tokens in the response
    stop_sequences=["\n\n\n"],  # Sequences that will stop generation
    
    # Tools and function calling
    tools=[grounding_tool],  # Enable Google Search grounding
    
    # Thinking configuration (for models that support it)
    thinking_config=types.ThinkingConfig(thinking_budget=0),  # Disables thinking mode
    
    # Response format options
    # response_mime_type="application/json",  # Uncomment for JSON output
    # response_schema=types.Schema(  # Uncomment to enforce structured output
    #     type=types.Type.OBJECT,
    #     properties={
    #         "winner": types.Schema(type=types.Type.STRING),
    #         "year": types.Schema(type=types.Type.INTEGER)
    #     }
    # ),
    
    # Safety settings
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        ),
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ],
    
    # Diversity controls
    presence_penalty=0.0,  # Penalize tokens based on presence in text (-2.0 to 2.0)
    frequency_penalty=0.0,  # Penalize tokens based on frequency in text (-2.0 to 2.0)
    
    # Reproducibility
    # seed=42,  # Uncomment to make responses more deterministic
    
    # Logprobs (for token analysis)
    # response_logprobs=True,  # Uncomment to get log probabilities
    # logprobs=5,  # Number of top candidate tokens to return logprobs for
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Who won the euro 2024?",
    config=config,
)

print(response.text)

Configuration Parameters

The GenerateContentConfig supports a wide range of parameters to control model behavior:

System Instructions

  • system_instruction: Sets the role and behavior guidelines for the model. This helps maintain consistent personality and response style throughout the conversation.

Sampling Parameters

  • temperature (0.0-1.0): Controls randomness in responses. Lower values (0.0-0.3) make output more focused and deterministic, while higher values (0.7-1.0) increase creativity and variation.
  • top_p (0.0-1.0): Nucleus sampling parameter. The model considers tokens with cumulative probability up to this value. Lower values make responses more focused.
  • top_k: Limits the number of highest probability tokens considered at each step. Helps balance between creativity and coherence.

Output Controls

  • max_output_tokens: Maximum number of tokens in the generated response. Helps control response length and costs.
  • stop_sequences: Array of strings that will stop generation when encountered. Useful for controlling output format.

Tools and Grounding

  • tools: Array of tools the model can use, such as Google Search for grounding responses in real-time information.
  • google_search: Enables the model to search the web for up-to-date information before generating responses.

Thinking Configuration

  • thinking_config: Controls the model’s internal reasoning process for models that support thinking mode.
  • thinking_budget: Amount of tokens allocated for internal reasoning. Set to 0 to disable thinking mode.

Structured Output

  • response_mime_type: Specify the output format (e.g., “application/json” for JSON responses).
  • response_schema: Define the exact structure of JSON output using a schema. Ensures responses follow a specific format.

Safety Settings

  • safety_settings: Array of safety configurations to filter harmful content across different categories:
    • HARM_CATEGORY_HATE_SPEECH: Hate speech and discriminatory content
    • HARM_CATEGORY_DANGEROUS_CONTENT: Dangerous or harmful instructions
    • HARM_CATEGORY_HARASSMENT: Harassment and bullying
    • HARM_CATEGORY_SEXUALLY_EXPLICIT: Sexually explicit content
Threshold options:
  • BLOCK_NONE: Don’t block any content
  • BLOCK_ONLY_HIGH: Block only high-severity content
  • BLOCK_MEDIUM_AND_ABOVE: Block medium and high-severity content
  • BLOCK_LOW_AND_ABOVE: Block low, medium, and high-severity content

Diversity Controls

  • presence_penalty (-2.0 to 2.0): Penalizes tokens based on whether they appear in the text. Positive values encourage the model to talk about new topics.
  • frequency_penalty (-2.0 to 2.0): Penalizes tokens based on their frequency in the text. Positive values reduce repetition.

Reproducibility

  • seed: Integer value for deterministic output. Using the same seed with identical inputs will produce similar outputs (not guaranteed to be exactly identical due to model updates).

Token Analysis

  • response_logprobs: When enabled, returns log probabilities for generated tokens. Useful for analyzing model confidence.
  • logprobs: Number of top candidate tokens to return log probabilities for at each position.