Streaming Responses in LangChain

Last Updated : 27 Oct, 2025

Streaming responses in LangChain is a method that allows developers to receive output from a language model incrementally, token by token, instead of waiting for the entire response to complete. This approach creates a more interactive and responsive user experience, similar to live typing in chat applications. By leveraging streaming, applications can provide immediate feedback, reduce perceived latency and enable real-time interaction with LLMs.

Token-by-token output: Users see model responses as they are generated.
Improved interactivity: Makes applications feel faster and more responsive.
Real-time applications: Ideal for chatbots, assistants, or any live feedback systems.
Integration with LangChain: Works seamlessly with ChatOpenAI and other LangChain LLMs.
Underlying technology: Uses Server-Sent Events (SSE) to stream data to the front-end.
Flexible front-end support: Can be combined with JavaScript or frameworks like React for live updates.
Extendable: Supports conversation memory, multi-user setups and different LLM models.

Implementation

Step 1: Set Up the Environment

We will create a python virtual environment,

bash

python -m venv .venv

Step 2: Activate Environment

Now we need to activate the environment,

1. Windows:

bash

.venv\Scripts\activate

2. Linux / Mac:

bash

source .venv/bin/activate

Step 3: Install packages

We will install the necessary packages,

bash

pip install --upgrade langchain langchain-openai python-dotenv fastapi uvicorn

Step 4: API Key Setup

We need to attach our API key, we will create an .env file and store the key in it,

ini

OPENAI_API_KEY=your_openai_api_key_here

Step 5: Build the Streaming LLM Backend

We will now build our streaming LLM backend, for this we need to create a server.py file in our directory, we have named it as server_stream_better.py,

streaming=True enables token-by-token generation.
event_stream() yields each token in the SSE format.
/stream endpoint sends real-time data to the front-end.

Python

from fastapi import FastAPI
from fastapi.responses import StreamingResponse, HTMLResponse
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from dotenv import load_dotenv
import os
import time

load_dotenv()
if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found in .env")

app = FastAPI()
llm = ChatOpenAI(streaming=True, temperature=0.7, model="gpt-4o-mini")

def event_stream(prompt_text: str):
    prompt = ChatPromptTemplate.from_messages([
        HumanMessagePromptTemplate.from_template(prompt_text)
    ])
    for chunk in llm.stream(prompt.format_messages()):
        yield f"data: {chunk.content}\n\n"
        time.sleep(0.01)

@app.get("/stream")
def stream_response(prompt: str):
    return StreamingResponse(
        event_stream(prompt),
        media_type="text/event-stream"
    )

Step 6: Add a Live HTML Front-End

We will extend and add a minimal front-end to our code,

Displays output live as tokens arrive.
Auto-scroll ensures the latest content is always visible.
Simple input box and button for sending prompts.

Python

@app.get("/", response_class=HTMLResponse)
def home():
    return """
    <!DOCTYPE html>
    <html>
    <head>
        <title>Interactive LLM Streaming</title>
        <style>
            body { font-family: monospace; background-color: #1e1e1e; color: #d4d4d4; padding: 20px; }
            #output { border: 1px solid #555; padding: 10px; height: 400px; overflow-y: auto; white-space: pre-wrap; background-color: #252526; }
            input, button { font-size: 16px; padding: 5px; margin-right: 5px; }
            #form-container { margin-bottom: 10px; }
        </style>
    </head>
    <body>
        <h2>Interactive LLM Streaming</h2>
        <div id="form-container">
            <input type="text" id="prompt" placeholder="Enter your prompt here" size="50">
            <button onclick="startStream()">Send</button>
        </div>
        <div id="output"></div>

        <script>
            let evtSource;

            function startStream() {
                const output = document.getElementById("output");
                const prompt = document.getElementById("prompt").value.trim();

                if (!prompt) {
                    alert("Please enter a prompt!");
                    return;
                }

                output.textContent = "";

                if (evtSource) evtSource.close();

                const url = "/stream?prompt=" + encodeURIComponent(prompt);
                evtSource = new EventSource(url);

                evtSource.onmessage = function(e) {
                    output.textContent += e.data;
                    output.scrollTop = output.scrollHeight;
                };

                evtSource.onerror = function() {
                    output.textContent += "\\n[Connection closed]";
                    evtSource.close();
                };
            }
        </script>
    </body>
    </html>
    """

Step 7: Run the Application

We will start the FastAPI server, type the following command in the terminal,

bash

uvicorn server_stream_better:app --reload

After a successful startup, we can see the following on our terminal,

Now open the browser and go to: http://127.0.0.1:8000

Screenshot-2025-10-17-173107 — Browser Terminal

Here we can see the interface, we can enter any prompt and then click on send. Tokens will appear live as the model generates them.

Let's understand how Streaming Works

1. LLM Streaming

streaming=True in ChatOpenAI allows token-by-token output.
Without streaming, you only get the full response after generation finishes.

2. Server-Sent Events (SSE)

The /stream endpoint sends each token in the format data: <token>\n\n.
Browser receives live updates using EventSource.

3. Front-End

JavaScript appends each token to a <div> dynamically.
Auto-scroll ensures the latest text is always visible.
Together, this creates a real-time interactive experience.

The source code can be download from here.

Applications of Streaming in LangChain

Real-Time Chatbots: Enables token-by-token replies for faster, human-like conversations.
Coding Assistants: Streams code output live, improving interaction and usability.
Learning Platforms: Provides gradual explanations or hints in educational tools.
Content Generation: Allows live story writing or copy generation as text unfolds.
Data Summarization: Streams ongoing summaries of documents or logs in real time.
Voice and Speech Systems: Powers responsive voice assistants with live transcription.
Collaborative Tools: Supports multi-user AI brainstorming or writing platforms.

mohammap46h

Improve

Article Tags :

Streaming Responses in LangChain

Implementation

Step 1: Set Up the Environment

Step 2: Activate Environment

Step 3: Install packages

Step 4: API Key Setup

Step 5: Build the Streaming LLM Backend

Step 6: Add a Live HTML Front-End

Step 7: Run the Application

Applications of Streaming in LangChain

Explore

Introduction to AI

AI Concepts

Machine Learning in AI

Robotics and AI

Generative AI

AI Practice

Thank You!

What kind of Experience do you want to share?