Streaming
Pushing the output token by token
When streaming is available, Keywords AI will forward the streaming response to your end token by token. This is useful when you want to process the output as soon as it is available, rather than waiting for the entire response to be received, and can significantly improve the user experience.
See all params here.
Streaming example
How it Works in Keywords AI (Optional reading)
Keywords AI runs on ASGI server to handle large loads of concurrent requests.
We receive the stream from our provider as synchronous generator, and we forward it to the frontend as an asynchronous generator as soon as we start receiving the data:
from asyncio import sleep as async_sleep
async def stream_response(response: Response):
wait_time = 0.001
async for chunk in response.iter_lines():
await async_sleep(1)
yield chunk
The wait_time will not add actual latency. It is necessary for the asynchronous event loop to “break” from this task and send the request chunk by chunk.
Deprecated
Mannually handling streaming
This one is deprecated, just for reference!