When streaming is available, Keywords AI will forward the streaming response to your end token by token. This is useful when you want to process the output as soon as it is available, rather than waiting for the entire response to be received, and can significantly improve the user experience.
Code Examples for Handling Streaming
type CallbackFunction = (line: string) => void;
type StreamComplete = (done?: boolean) => void;
const readStream = async (
streamResponse: Response, // HTTP response from Keywords AI API
callbackFunction: CallbackFunction, // The callback function to handle each "token" from the stream
streamComplete: StreamComplete = (done) => console.log("Stream done")
): Promise<() => void> => {
/* Return an abort control */
const reader = streamResponse.body.getReader();
const decoder = new TextDecoder();
const abortController = new AbortController();
const signal = abortController.signal;
// Start reading the stream
(async () => {
try {
while (true) {
const { done, value } = await reader.read();
if (done || signal.aborted) {
streamComplete();
break;
}
const message = decoder.decode(value);
// Splitting the returned text chunk with the delimiter
for (const line of message.split("---")) {
// Line is a JSON string
callbackFunction(line);
}
}
} catch (e) {
console.error("Stream error:", e);
}
})();
// Return a function to abort the stream from outside
return () => {
console.log("Aborting stream");
abortController.abort();
};
};
How it Works in Keywords AI (Optional reading)
Keywords AI runs on ASGI server to handle large loads of concurrent requests.
We receive the stream from our provider as synchronous generator, and we forward it to the frontend as an asynchronous generator as soon as we start receiving the data:
from asyncio import sleep as async_sleep
async def stream_response(response: Response):
wait_time = 0.001
async for chunk in response.iter_lines():
await async_sleep(1)
yield chunk
The wait_time will not add actual latency. It is necessary for the asynchronous event loop to “break” from this task and send the request chunk by chunk.