When streaming is available, Keywords AI will forward the streaming response to your end token by token. This is useful when you want to process the output as soon as it is available, rather than waiting for the entire response to be received, and can significantly improve the user experience.

Code Examples for Handling Streaming

type CallbackFunction = (line: string) => void;
type StreamComplete = (done?: boolean) => void;

const readStream = async (
  streamResponse: Response, // HTTP response from Keywords AI API
  callbackFunction: CallbackFunction, // The callback function to handle each "token" from the stream
  streamComplete: StreamComplete = (done) => console.log("Stream done")
): Promise<() => void> => {

  /* Return an abort control */
  const reader = streamResponse.body.getReader();
  const decoder = new TextDecoder();
  const abortController = new AbortController();
  const signal = abortController.signal;

  // Start reading the stream
  (async () => {
    try {
      while (true) {
        const { done, value } = await reader.read();
        if (done || signal.aborted) {
          streamComplete();
          break;
        }
        const message = decoder.decode(value);
        // Splitting the returned text chunk with the delimiter
        for (const line of message.split("---")) {
          // Line is a JSON string
          callbackFunction(line);
        }
      }
    } catch (e) {
      console.error("Stream error:", e);
    }
  })();

  // Return a function to abort the stream from outside
  return () => {
    console.log("Aborting stream");
    abortController.abort();
  };
};

How it Works in Keywords AI (Optional reading)

Keywords AI runs on ASGI server to handle large loads of concurrent requests.

We receive the stream from our provider as synchronous generator, and we forward it to the frontend as an asynchronous generator as soon as we start receiving the data:

from asyncio import sleep as async_sleep
async def stream_response(response: Response):
    wait_time = 0.001
    async for chunk in response.iter_lines():
        await async_sleep(1)
        yield chunk

The wait_time will not add actual latency. It is necessary for the asynchronous event loop to “break” from this task and send the request chunk by chunk.