Open In App

Streaming Responses in LangChain

Last Updated : 27 Oct, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Streaming responses in LangChain is a method that allows developers to receive output from a language model incrementally, token by token, instead of waiting for the entire response to complete. This approach creates a more interactive and responsive user experience, similar to live typing in chat applications. By leveraging streaming, applications can provide immediate feedback, reduce perceived latency and enable real-time interaction with LLMs.

  • Token-by-token output: Users see model responses as they are generated.
  • Improved interactivity: Makes applications feel faster and more responsive.
  • Real-time applications: Ideal for chatbots, assistants, or any live feedback systems.
  • Integration with LangChain: Works seamlessly with ChatOpenAI and other LangChain LLMs.
  • Underlying technology: Uses Server-Sent Events (SSE) to stream data to the front-end.
  • Flexible front-end support: Can be combined with JavaScript or frameworks like React for live updates.
  • Extendable: Supports conversation memory, multi-user setups and different LLM models.

Implementation

Step 1: Set Up the Environment

We will create a python virtual environment,

bash
python -m venv .venv

Step 2: Activate Environment

Now we need to activate the environment,

1. Windows:

bash
.venv\Scripts\activate

2. Linux / Mac:

bash
source .venv/bin/activate

Step 3: Install packages

We will install the necessary packages,

bash
pip install --upgrade langchain langchain-openai python-dotenv fastapi uvicorn

Step 4: API Key Setup

We need to attach our API key, we will create an .env file and store the key in it,

ini
OPENAI_API_KEY=your_openai_api_key_here

Step 5: Build the Streaming LLM Backend

We will now build our streaming LLM backend, for this we need to create a server.py file in our directory, we have named it as server_stream_better.py,

  • streaming=True enables token-by-token generation.
  • event_stream() yields each token in the SSE format.
  • /stream endpoint sends real-time data to the front-end.
Python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse, HTMLResponse
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from dotenv import load_dotenv
import os
import time

load_dotenv()
if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found in .env")

app = FastAPI()
llm = ChatOpenAI(streaming=True, temperature=0.7, model="gpt-4o-mini")

def event_stream(prompt_text: str):
    prompt = ChatPromptTemplate.from_messages([
        HumanMessagePromptTemplate.from_template(prompt_text)
    ])
    for chunk in llm.stream(prompt.format_messages()):
        yield f"data: {chunk.content}\n\n"
        time.sleep(0.01)

@app.get("/stream")
def stream_response(prompt: str):
    return StreamingResponse(
        event_stream(prompt),
        media_type="text/event-stream"
    )

Step 6: Add a Live HTML Front-End

We will extend and add a minimal front-end to our code,

  • Displays output live as tokens arrive.
  • Auto-scroll ensures the latest content is always visible.
  • Simple input box and button for sending prompts.
Python
@app.get("/", response_class=HTMLResponse)
def home():
    return """
    <!DOCTYPE html>
    <html>
    <head>
        <title>Interactive LLM Streaming</title>
        <style>
            body { font-family: monospace; background-color: #1e1e1e; color: #d4d4d4; padding: 20px; }
            #output { border: 1px solid #555; padding: 10px; height: 400px; overflow-y: auto; white-space: pre-wrap; background-color: #252526; }
            input, button { font-size: 16px; padding: 5px; margin-right: 5px; }
            #form-container { margin-bottom: 10px; }
        </style>
    </head>
    <body>
        <h2>Interactive LLM Streaming</h2>
        <div id="form-container">
            <input type="text" id="prompt" placeholder="Enter your prompt here" size="50">
            <button onclick="startStream()">Send</button>
        </div>
        <div id="output"></div>

        <script>
            let evtSource;

            function startStream() {
                const output = document.getElementById("output");
                const prompt = document.getElementById("prompt").value.trim();

                if (!prompt) {
                    alert("Please enter a prompt!");
                    return;
                }

                output.textContent = "";

                if (evtSource) evtSource.close();

                const url = "/stream?prompt=" + encodeURIComponent(prompt);
                evtSource = new EventSource(url);

                evtSource.onmessage = function(e) {
                    output.textContent += e.data;
                    output.scrollTop = output.scrollHeight;
                };

                evtSource.onerror = function() {
                    output.textContent += "\\n[Connection closed]";
                    evtSource.close();
                };
            }
        </script>
    </body>
    </html>
    """

Step 7: Run the Application

We will start the FastAPI server, type the following command in the terminal,

bash
uvicorn server_stream_better:app --reload

After a successful startup, we can see the following on our terminal,

Screenshot-2025-10-18-105206
Terminal

Now open the browser and go to: http://127.0.0.1:8000

Screenshot-2025-10-17-173107
Browser Terminal

Here we can see the interface, we can enter any prompt and then click on send. Tokens will appear live as the model generates them.

Screenshot-2025-10-17-173051
Response

Let's understand how Streaming Works

1. LLM Streaming

  • streaming=True in ChatOpenAI allows token-by-token output.
  • Without streaming, you only get the full response after generation finishes.

2. Server-Sent Events (SSE)

  • The /stream endpoint sends each token in the format data: <token>\n\n.
  • Browser receives live updates using EventSource.

3. Front-End

  • JavaScript appends each token to a <div> dynamically.
  • Auto-scroll ensures the latest text is always visible.
  • Together, this creates a real-time interactive experience.

The source code can be download from here.

Applications of Streaming in LangChain

  • Real-Time Chatbots: Enables token-by-token replies for faster, human-like conversations.
  • Coding Assistants: Streams code output live, improving interaction and usability.
  • Learning Platforms: Provides gradual explanations or hints in educational tools.
  • Content Generation: Allows live story writing or copy generation as text unfolds.
  • Data Summarization: Streams ongoing summaries of documents or logs in real time.
  • Voice and Speech Systems: Powers responsive voice assistants with live transcription.
  • Collaborative Tools: Supports multi-user AI brainstorming or writing platforms.

Explore