Load balancing between models
You could specify the load balancing weights for different models. This is useful when you want to balance the load between different models from different providers.1
Go to the Load balancing page
Go to the Load balancing page and click on 
Create new load balancer

2
Add models
Click
Add model
to add models and specify the weight for each model and add your own credentials.3
Copy group ID to your codebase
After you have added the models, copy the group ID (the blue text) to your codebase and use it in your requests.
The
model
parameter will overwrite the load_balance_group
! 4
Add load balancing group in code (Optional)
You could also add the load balancing group in your codebase directly.The
models
field will overwrite the load_balance_group
you specified in the UI.Example code
Example code
Fallback usage in load balancing
You could also set up fallback models to avoid errors. It will fall back to the list of models you specified in thefallback
field once have any outages. Check out the Fallbacks section for more information.
Load balancing between deployments
What is a deployment?
A deployment basically means a credential. If you add an OpenAI API key, you can say that you have one deployment. If you add 2 OpenAI API keys, you can say that you have 2 deployments.In the platform
You could go to the platform and add multiple deployments for the same provider. You could specify the load balancing weights for each deployment, which could be helpful when you want to enhance rate limits for a single provider.In the codebase
You could also load balance between deployments in your codebase. You can add different deployments in thecustomer_credentials
field and specify the weight for each deployment.
Example:
Specify available models
You could also specify the available models for load balancing. This is useful when you want to specify the models you want to load balance. For example, if you only want to usegpt-3.5-turbo
in an OpenAI deployment, you could specify it in the available_models
field or do it in the platform.
Learn more about how to specify available models in the platform here.
Example code:
- GPT-3.5-turbo requests are evenly split (50/50) between both deployments
- GPT-4 requests are routed exclusively to the second deployment since it’s excluded from the first
- All other model requests are distributed evenly between deployments according to their weights
Deprecated params
loadbalance_models
Theloadbalance_models
parameter is deprecated. You should use the load_balance_group
parameter instead.
Example code
Example code
Depracted params
Depracted params
Specify the weight of the model (has to be a positive integer). The higher the weight, the more requests will be sent to the model.