Streaming Responses in LangChain
Last Updated :
27 Oct, 2025
Streaming responses in LangChain is a method that allows developers to receive output from a language model incrementally, token by token, instead of waiting for the entire response to complete. This approach creates a more interactive and responsive user experience, similar to live typing in chat applications. By leveraging streaming, applications can provide immediate feedback, reduce perceived latency and enable real-time interaction with LLMs.
- Token-by-token output: Users see model responses as they are generated.
- Improved interactivity: Makes applications feel faster and more responsive.
- Real-time applications: Ideal for chatbots, assistants, or any live feedback systems.
- Integration with LangChain: Works seamlessly with ChatOpenAI and other LangChain LLMs.
- Underlying technology: Uses Server-Sent Events (SSE) to stream data to the front-end.
- Flexible front-end support: Can be combined with JavaScript or frameworks like React for live updates.
- Extendable: Supports conversation memory, multi-user setups and different LLM models.
Implementation
Step 1: Set Up the Environment
We will create a python virtual environment,
bash
Step 2: Activate Environment
Now we need to activate the environment,
1. Windows:
bash
2. Linux / Mac:
bash
source .venv/bin/activate
Step 3: Install packages
We will install the necessary packages,
bash
pip install --upgrade langchain langchain-openai python-dotenv fastapi uvicorn
Step 4: API Key Setup
We need to attach our API key, we will create an .env file and store the key in it,
ini
OPENAI_API_KEY=your_openai_api_key_here
Step 5: Build the Streaming LLM Backend
We will now build our streaming LLM backend, for this we need to create a server.py file in our directory, we have named it as server_stream_better.py,
- streaming=True enables token-by-token generation.
- event_stream() yields each token in the SSE format.
- /stream endpoint sends real-time data to the front-end.
Python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse, HTMLResponse
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from dotenv import load_dotenv
import os
import time
load_dotenv()
if not os.getenv("OPENAI_API_KEY"):
raise ValueError("OPENAI_API_KEY not found in .env")
app = FastAPI()
llm = ChatOpenAI(streaming=True, temperature=0.7, model="gpt-4o-mini")
def event_stream(prompt_text: str):
prompt = ChatPromptTemplate.from_messages([
HumanMessagePromptTemplate.from_template(prompt_text)
])
for chunk in llm.stream(prompt.format_messages()):
yield f"data: {chunk.content}\n\n"
time.sleep(0.01)
@app.get("/stream")
def stream_response(prompt: str):
return StreamingResponse(
event_stream(prompt),
media_type="text/event-stream"
)
Step 6: Add a Live HTML Front-End
We will extend and add a minimal front-end to our code,
- Displays output live as tokens arrive.
- Auto-scroll ensures the latest content is always visible.
- Simple input box and button for sending prompts.
Python
@app.get("/", response_class=HTMLResponse)
def home():
return """
<!DOCTYPE html>
<html>
<head>
<title>Interactive LLM Streaming</title>
<style>
body { font-family: monospace; background-color: #1e1e1e; color: #d4d4d4; padding: 20px; }
#output { border: 1px solid #555; padding: 10px; height: 400px; overflow-y: auto; white-space: pre-wrap; background-color: #252526; }
input, button { font-size: 16px; padding: 5px; margin-right: 5px; }
#form-container { margin-bottom: 10px; }
</style>
</head>
<body>
<h2>Interactive LLM Streaming</h2>
<div id="form-container">
<input type="text" id="prompt" placeholder="Enter your prompt here" size="50">
<button onclick="startStream()">Send</button>
</div>
<div id="output"></div>
<script>
let evtSource;
function startStream() {
const output = document.getElementById("output");
const prompt = document.getElementById("prompt").value.trim();
if (!prompt) {
alert("Please enter a prompt!");
return;
}
output.textContent = "";
if (evtSource) evtSource.close();
const url = "/stream?prompt=" + encodeURIComponent(prompt);
evtSource = new EventSource(url);
evtSource.onmessage = function(e) {
output.textContent += e.data;
output.scrollTop = output.scrollHeight;
};
evtSource.onerror = function() {
output.textContent += "\\n[Connection closed]";
evtSource.close();
};
}
</script>
</body>
</html>
"""
Step 7: Run the Application
We will start the FastAPI server, type the following command in the terminal,
bash
uvicorn server_stream_better:app --reload
After a successful startup, we can see the following on our terminal,
TerminalNow open the browser and go to: http://127.0.0.1:8000
Browser TerminalHere we can see the interface, we can enter any prompt and then click on send. Tokens will appear live as the model generates them.
ResponseLet's understand how Streaming Works
1. LLM Streaming
- streaming=True in ChatOpenAI allows token-by-token output.
- Without streaming, you only get the full response after generation finishes.
2. Server-Sent Events (SSE)
- The /stream endpoint sends each token in the format data: <token>\n\n.
- Browser receives live updates using EventSource.
3. Front-End
- JavaScript appends each token to a <div> dynamically.
- Auto-scroll ensures the latest text is always visible.
- Together, this creates a real-time interactive experience.
The source code can be download from here.
Applications of Streaming in LangChain
- Real-Time Chatbots: Enables token-by-token replies for faster, human-like conversations.
- Coding Assistants: Streams code output live, improving interaction and usability.
- Learning Platforms: Provides gradual explanations or hints in educational tools.
- Content Generation: Allows live story writing or copy generation as text unfolds.
- Data Summarization: Streams ongoing summaries of documents or logs in real time.
- Voice and Speech Systems: Powers responsive voice assistants with live transcription.
- Collaborative Tools: Supports multi-user AI brainstorming or writing platforms.
Explore
Introduction to AI
AI Concepts
Machine Learning in AI
Robotics and AI
Generative AI
AI Practice