Load balancing

Load balancing is a feature that allows you to balance the request load across different deployments. You could specify weights for each deployment based on their rate limit and your preference. See all supported params here.

Load balancing between models

You could specify the load balancing weights for different models. This is useful when you want to balance the load between different models from different providers.

Go to the Load balancing page

Go to the Load balancing page and click on Create new load balancer

Add models

Click Add model to add models and specify the weight for each model and add your own credentials.

Copy group ID to your codebase

After you have added the models, copy the group ID (the blue text) to your codebase and use it in your requests.

The model parameter will overwrite the load_balance_group!

{
// you don't need to specify the model parameter, otherwise, the model parameter will overwrite the load balance group
    "messages": [
        {
            "role": "user",
            "content": "Hi, how are you?"
        }
    ],
    "load_balance_group": {
        "group_id":"THE_GROUP_ID" // from Load balancing page
    }
}

Add load balancing group in code (Optional)

You could also add the load balancing group in your codebase directly.The models field will overwrite the load_balance_group you specified in the UI.

Example code

{
  "load_balance_group": {
      "group_id":"THE_GROUP_ID", // from Load balancing page
      "models": [
        {
          "model": "azure/gpt-35-turbo",
          "weight": 1
        },
        {
          "model": "azure/gpt-4",
          "credentials": { // add your own credentials if you want to use your own Azure credentials or custom model name
              "api_base": "Your own Azure api_base",
              "api_version": "Your own Azure api_version",
              "api_key": "Your own Azure api_key"
          },
          "weight": 1
        } 
      ]
  }
}

Fallback usage in load balancing

You could also set up fallback models to avoid errors. It will fall back to the list of models you specified in the fallback field once have any outages. Check out the Fallbacks section for more information.

Load balancing between deployments

What is a deployment?

A deployment basically means a credential. If you add an OpenAI API key, you can say that you have one deployment. If you add 2 OpenAI API keys, you can say that you have 2 deployments.

In the platform

You could go to the platform and add multiple deployments for the same provider. You could specify the load balancing weights for each deployment, which could be helpful when you want to enhance rate limits for a single provider.

In the codebase

You could also load balance between deployments in your codebase. You can add different deployments in the customer_credentials field and specify the weight for each deployment. Example:

{
  "customer_credentials": [
    {
        "credentials": {
            "openai": {
                "api_key": "YOUR_OPENAI_API_KEY",
            }
        },
        "weight": 1.0 // The weight of the deployment
    },
    {
        "credentials": {
            "openai": {
                "api_key": "YOUR_OPENAI_API_KEY", // Another deployment
            }
        },
        "weight": 1.0 // The weight of the deployment
    },
  ],
}

In this example, requests to OpenAI models will be evenly distributed between the two deployments based on their specified weights.

Specify available models

You could also specify the available models for load balancing. This is useful when you want to specify the models you want to load balance. For example, if you only want to use gpt-3.5-turbo in an OpenAI deployment, you could specify it in the available_models field or do it in the platform. Learn more about how to specify available models in the platform here. Example code:

{
  "customer_credentials": [
    {
        "credentials": {
            "openai": {
                "api_key": "YOUR_OPENAI_API_KEY",
            }
        },
        "weight": 1.0, // The weight of the deployment
        "available_models": ["gpt-3.5-turbo"],
        "exclude_models": ["gpt-4"] // Exclude gpt-4 from this deployment
    },
    {
        "credentials": {
            "openai": {
                "api_key": "YOUR_OPENAI_API_KEY", // Another deployment
            }
        },
        "weight": 1.0, // The weight of the deployment
    },
  ],
}

In this example, requests to OpenAI models will be distributed between the two deployments, with the first deployment only handling gpt-3.5-turbo requests and explicitly excluding gpt-4, while the second deployment can handle requests for any OpenAI model. Based on the deployment weights and model configurations:

GPT-3.5-turbo requests are evenly split (50/50) between both deployments
GPT-4 requests are routed exclusively to the second deployment since it’s excluded from the first
All other model requests are distributed evenly between deployments according to their weights

Deprecated params

loadbalance_models

The loadbalance_models parameter is deprecated. You should use the load_balance_group parameter instead.

Example code

{
    // ...other parameters...
    "loadbalance_models": [
        {
            "model": "claude-3-5-sonnet-20240620",
            "weight": 34,
            "credentials": { // Your own Anthropic API key, optional for team plan and above
                "api_key": "Your own Anthropic API key"
            }
        },
        {
            "model": "azure/gpt-35-turbo",
            "weight": 34,
            "credentials": { // Your own Azure credentials, optional for team plan and above
                "api_base": "Your own Azure api_base",
                "api_version": "Your own Azure api_version",
                "api_key": "Your own Azure api_key"
            }
        }
    ]
}

Depracted params

model

string

required

Specify which model to balance load. See the list of models here.

weight

integer

required

Specify the weight of the model (has to be a positive integer). The higher the weight, the more requests will be sent to the model.

credentials

list

This is required for all free plan users. For team plan and above, this is optional.
See how to add your own credentials here.

Get started

Features

Admin

Security

Resources

Help & Community

Load balancing

Load balancing between models

Fallback usage in load balancing

Load balancing between deployments

What is a deployment?

In the platform

In the codebase

Specify available models

Deprecated params

loadbalance_models

Get started

Features

Admin

Security

Resources

Help & Community

​Load balancing between models

​Fallback usage in load balancing

​Load balancing between deployments

​What is a deployment?

​In the platform

​In the codebase

​Specify available models

​Deprecated params

​loadbalance_models

Load balancing between models

Fallback usage in load balancing

Load balancing between deployments

What is a deployment?

In the platform

In the codebase

Specify available models

Deprecated params

loadbalance_models