Load balancing is a feature that allows you to balance the request load across different models or deployments. You could specify weights for each model/deployment based on their rate limit and your preference.

See all supported params here.

Load balancing between models

You could specify the load balancing weights for different models. This is useful when you want to balance the load between different models from different providers.

1

Go to the Load balancing page

Go to the Load balancing page and click on Create load balancing group

2

Add models

Click Add model to add models and specify the weight for each model and add your own credentials.

3

Copy group ID to your codebase

After you have added the models, copy the group ID (the blue text) to your codebase and use it in your requests.

The model parameter will overwrite the load_balance_group!
{
// you don't need to specify the model parameter, otherwise, the model parameter will overwrite the load balance group
    "messages": [
        {
            "role": "user",
            "content": "Hi, how are you?"
        }
    ],
    "load_balance_group": {
        "group_id":"THE_GROUP_ID" // from Load balancing page
    }
}
4

Add load balancing group in code (Optional)

You could also add the load balancing group in your codebase directly.

The models field will overwrite the load_balance_group you specified in the UI.

You could also set up fallback models to avoid errors. It will fall back to the list of models you specified in the fallback field once have any outages. Check out the Fallbacks section for more information.

Load balancing between deployments

You could go to the platform and add multiple deployments for the same provider. You could specify the load balancing weights for each deployment, which could be helpful when you want to enhance rate limits for a single provider.

Deprecated params

The loadbalance_models parameter is deprecated. You should use the load_balance_group parameter instead.