Enable load balancing
Increase your LLM rate limits with our load balancing feature.
Load balancing is a feature that allows you to balance the request load across different deployments. You could specify weights for each deployment based on their rate limit and your preference.
See all supported params here.
Load balancing between deployments
What is a deployment?
A deployment basically means a credential. If you add an OpenAI API key, you can say that you have one deployment. If you add 2 OpenAI API keys, you can say that you have 2 deployments.
In the platform
You could go to the platform and add multiple deployments for the same provider. You could specify the load balancing weights for each deployment, which could be helpful when you want to enhance rate limits for a single provider.
In the codebase
You could also load balance between deployments in your codebase. You can add different deployments in the customer_credentials
field and specify the weight for each deployment.
Example:
In this example, requests to OpenAI models will be evenly distributed between the two deployments based on their specified weights.
Specify available models
You could also specify the available models for load balancing. This is useful when you want to specify the models you want to load balance. For example, if you only want to use gpt-3.5-turbo
in an OpenAI deployment, you could specify it in the available_models
field or do it in the platform.
Learn more about how to specify available models in the platform here.
Example code:
In this example, requests to OpenAI models will be distributed between the two deployments, with the first deployment only handling gpt-3.5-turbo requests and explicitly excluding gpt-4, while the second deployment can handle requests for any OpenAI model.
Based on the deployment weights and model configurations:
- GPT-3.5-turbo requests are evenly split (50/50) between both deployments
- GPT-4 requests are routed exclusively to the second deployment since it’s excluded from the first
- All other model requests are distributed evenly between deployments according to their weights
Deprecated params
loadbalance_models
The loadbalance_models
parameter is deprecated. You should use the load_balance_group
parameter instead.
load_balance_group
The load_balance_group
parameter is deprecated. We currently don’t support load balancing between models. If you want to load balance between models, contact us at team@keywordsai.co.