The Keywords AI API will automatically balance the request load across the same model provided by different providers. This is achieved by hitting different provider endpoints based on their rate limit and their average latency across the day.

Keywords AI will ping the providers every hour to measure the average latency across the day. And update the latency data for the providers in the backend for this load-balancing calculation.

Under the hood, here is the pseudo-code of how the load balancing works:

# Received model in the request
model_name = "requested_model_from_request"
# Load the model's param and its respective providers
model: dict = keywords_ai_model_data[model_name] # A slice from our data

fallback_models = model["load_balancing_fallbacks"]

balancing_between = []
for fallback in fallback_models:
    fallback_param = pack_fallback_list_object(fallback)
    """
    A fallback_param looks something like this:
    {
        "model_name": "model_that_needs_a_fallback",
        "fallback_name": "model_to_fallback_to",
        "rate_limit": 1000,
        "latency": This is live data from the provider based on daily ping
    }
    """
    balancing_between.append(fallback_param)

# ...Omit code until generation...
# Randomly pick a provider based on the latency and rate_limit distribution
model = pick_model_to_use(balancing_between)
"""Under the hood
distribution = [rate_limit_1, rate_limit_2, rate_limit_3] / sum_of_rate_limits
weights = add_latency_weight_to_distribution(distribution, [latency_1, latency_2, latency_3])
model = random.choices(balancing_between, weights=weights)
"""
keywords_generation(module, **other_kwargs)