0% found this document useful (0 votes)

219 views6 pages

QuentinFuxa WhisperLiveKit

Uploaded by

ahmad musa abdul haq fathony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

219 views6 pages

QuentinFuxa WhisperLiveKit

Uploaded by

ahmad musa abdul haq fathony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

QuentinFuxa/WhisperLiveKit: Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI.

02/09/25, 09.30

QuentinFuxa WhisperLiveKit Type / to search

Code Issues 56 Pull requests 9 Discussions Actions Projects Security Insights

WhisperLiveKit Public

5 Branches 15 Tags Go to file t Go to file Add file About

Code

Real-time & local speech-to-text,

QuentinFuxa update architecture 50b0527 · 7 hours ago 488 Commits
translation, and speaker diarization. With
server & web UI.
whisperlivekit mlx/fasterWhisper encoders… 16 hours ago
Readme
.gitignore update LICENSE with Simul… 2 months ago
View license

CONTRIBUTING.md Update CONTRIBUTING.md 3 weeks ago Contributing

Activity
DEV_NOTES.md mlx/fasterWhisper encoders… 16 hours ago
5.6k stars
Dockerfile explanations about model p… 3 days ago 39 watching

484 forks
Dockerfile.cpu new dockerfile for cpu only. … 2 weeks ago
Report repository
LICENSE update LICENSE with Simul… 2 months ago

Releases 14
README.md get_web_interface_html to … 3 days ago

0.2.7 Latest
ReadmeJP.md Translate README.md to Ja… 3 days ago 5 days ago

architecture.png update architecture 7 hours ago + 13 releases

available_models.md indications on how to choos… last year

Packages
demo.png add microphone picker 2 days ago
No packages published

pyproject.toml remove triton <3 condition 5 days ago

Contributors 25

README Contributing License

WhisperLiveKit + 11 contributors

Languages

Python 91.8% JavaScript 4.8%

CSS 2.1% Other 1.3%

https://github.com/QuentinFuxa/WhisperLiveKit Page 1 of 6
QuentinFuxa/WhisperLiveKit: Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI. 02/09/25, 09.30

Real-time, Fully Local Speech-to-Text with Speaker Identification

pypi v0.2.7 installations 18k python 3.9-3.13 License MIT/Dual Licensed

Real-time speech transcription directly to your browser, with a ready-to-use backend+server and a simple frontend.

Powered by Leading Research:

SimulStreaming (SOTA 2025) - Ultra-low latency transcription with AlignAtt policy

WhisperStreaming (SOTA 2023) - Low latency transcription with LocalAgreement policy
Streaming Sortformer (SOTA 2025) - Advanced real-time speaker diarization
Diart (SOTA 2021) - Real-time speaker diarization
Silero VAD (2024) - Enterprise-grade Voice Activity Detection

Why not just run a simple Whisper model on every audio batch? Whisper is designed for complete utterances, not real-time chunks.
Processing small segments loses context, cuts off words mid-syllable, and produces poor transcription. WhisperLiveKit uses state-of-
the-art simultaneous speech research for intelligent buffering and incremental processing.

Architecture

The backend supports multiple concurrent users. Voice Activity Detection reduces overhead when no voice is detected.

Installation & Quick Start

pip install whisperlivekit

FFmpeg is required and must be installed before using WhisperLiveKit

OS How to install

Ubuntu/Debian sudo apt install ffmpeg

MacOS brew install ffmpeg

Windows Download .exe from https://ffmpeg.org/download.html and add to PATH

Quick Start

1. Start the transcription server:

whisperlivekit-server --model base --language en

2. Open your browser and navigate to http://localhost:8000 . Start speaking and watch your words appear in real-time!

See tokenizer.py for the list of all available languages.

For HTTPS requirements, see the Parameters section for SSL configuration options.

https://github.com/QuentinFuxa/WhisperLiveKit Page 2 of 6
QuentinFuxa/WhisperLiveKit: Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI. 02/09/25, 09.30

Optional Dependencies

Optional pip install

Speaker diarization with Sortformer git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]

Speaker diarization with Diart diart

Original Whisper backend whisper

Improved timestamps backend whisper-timestamped

Apple Silicon optimization backend mlx-whisper

OpenAI API backend openai

See Parameters & Configuration below on how to use them.

Usage Examples
Command-line Interface: Start the transcription server with various options:

# Use better model than default (small)

whisperlivekit-server --model large-v3

# Advanced configuration with diarization and language

whisperlivekit-server --host 0.0.0.0 --port 8000 --model medium --diarization --language fr

Python API Integration: Check basic_server for a more complete example of how to use the functions and classes.

from whisperlivekit import TranscriptionEngine, AudioProcessor, parse_args

from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import HTMLResponse
from contextlib import asynccontextmanager
import asyncio

transcription_engine = None

@asynccontextmanager
async def lifespan(app: FastAPI):
global transcription_engine
transcription_engine = TranscriptionEngine(model="medium", diarization=True, lan="en")
yield

app = FastAPI(lifespan=lifespan)

async def handle_websocket_results(websocket: WebSocket, results_generator):

async for response in results_generator:
await websocket.send_json(response)
await websocket.send_json({"type": "ready_to_stop"})

@app.websocket("/asr")
async def websocket_endpoint(websocket: WebSocket):
global transcription_engine

# Create a new AudioProcessor for each connection, passing the shared engine
audio_processor = AudioProcessor(transcription_engine=transcription_engine)
results_generator = await audio_processor.create_tasks()
results_task = asyncio.create_task(handle_websocket_results(websocket, results_generator))
await websocket.accept()
while True:
message = await websocket.receive_bytes()
await audio_processor.process_audio(message)

Frontend Implementation: The package includes an HTML/JavaScript implementation here. You can also import it using from
whisperlivekit import get_inline_ui_html & page = get_inline_ui_html()

Parameters & Configuration

An important list of parameters can be changed. But what should you change?

https://github.com/QuentinFuxa/WhisperLiveKit Page 3 of 6
QuentinFuxa/WhisperLiveKit: Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI. 02/09/25, 09.30

the --model size. List and recommandations here

the --language . List here. If you use auto , the model attempts to detect the language automatically, but it tends to bias towards
English.
the --backend ? you can switch to --backend faster-whisper if simulstreaming does not work correctly or if you prefer to avoid
the dual-license requirements.
--warmup-file , if you have one

--host , --port , --ssl-certfile , --ssl-keyfile , if you set up a server

--diarization , if you want to use it.

The rest I don't recommend. But below are your options.

Parameter Description Default

--model Whisper model size. small

--language Source language code or auto auto

--task transcribe or translate transcribe

--backend Processing backend simulstreaming

--min-chunk-size Minimum audio chunk size (seconds) 1.0

--no-vac Disable Voice Activity Controller False

--no-vad Disable Voice Activity Detection False

--warmup-file Audio file path for model warmup jfk.wav

--host Server host address localhost

--port Server port 8000

--ssl-certfile Path to the SSL certificate file (for HTTPS support) None

--ssl-keyfile Path to the SSL private key file (for HTTPS support) None

WhisperStreaming backend options Description Default

--confidence-validation Use confidence scores for faster validation False

--buffer_trimming Buffer trimming strategy ( sentence or segment ) segment

SimulStreaming
Description Default
backend options

--frame-threshold AlignAtt frame threshold (lower = faster, higher = more accurate) 25

--beams Number of beams for beam search (1 = greedy decoding) 1

--decoder Force decoder type ( beam or greedy ) auto

--audio-max-len Maximum audio buffer length (seconds) 30.0

--audio-min-len Minimum audio length to process (seconds) 0.0

--cif-ckpt-path Path to CIF model for word boundary detection None

--never-fire Never truncate incomplete words False

--init-prompt Initial prompt for the model None

--static-init-prompt Static prompt that doesn't scroll None

--max-context-tokens Maximum context tokens None

--model-path Direct path to .pt model file. Download it if not found ./base.pt

--preloaded-model- Optional. Number of models to preload in memory to speed up loading (set up to the
1
count expected number of concurrent users)

https://github.com/QuentinFuxa/WhisperLiveKit Page 4 of 6
QuentinFuxa/WhisperLiveKit: Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI. 02/09/25, 09.30

Diarization options Description Default

--diarization Enable speaker identification False

--diarization-
diart or sortformer sortformer
backend

--segmentation- Hugging Face model ID for Diart segmentation model. Available

pyannote/segmentation-3.0
model models

Hugging Face model ID for Diart embedding model. Available speechbrain/spkrec-ecapa-

--embedding-model
models voxceleb

For diarization using Diart, you need access to pyannote.audio models:

1. Accept user conditions for the pyannote/segmentation model

2. Accept user conditions for the pyannote/segmentation-3.0 model
3. Accept user conditions for the pyannote/embedding model
4. Login with HuggingFace: huggingface-cli login

Deployment Guide
To deploy WhisperLiveKit in production:

1. Server Setup: Install production ASGI server & launch with multiple workers

pip install uvicorn gunicorn

gunicorn -k uvicorn.workers.UvicornWorker -w 4 your_app:app

2. Frontend: Host your customized version of the html example & ensure WebSocket connection points correctly

3. Nginx Configuration (recommended for production):

server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
}}

4. HTTPS Support: For secure deployments, use "wss://" instead of "ws://" in WebSocket URL

Docker

Deploy the application easily using Docker with GPU or CPU support.

Prerequisites
Docker installed on your system
For GPU support: NVIDIA Docker runtime installed

Quick Start
With GPU acceleration (recommended):

docker build -t wlk .

docker run --gpus all -p 8000:8000 --name wlk wlk

CPU only:

https://github.com/QuentinFuxa/WhisperLiveKit Page 5 of 6
QuentinFuxa/WhisperLiveKit: Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI. 02/09/25, 09.30

docker build -f Dockerfile.cpu -t wlk .

docker run -p 8000:8000 --name wlk wlk

Advanced Usage
Custom configuration:

# Example with custom model and language

docker run --gpus all -p 8000:8000 --name wlk wlk --model large-v3 --language fr

Memory Requirements
Large models: Ensure your Docker runtime has sufficient memory allocated

Customization

--build-arg Options:
EXTRAS="whisper-timestamped" - Add extras to the image's installation (no spaces). Remember to set necessary container
options!
HF_PRECACHE_DIR="./.cache/" - Pre-load a model cache for faster first-time start

HF_TKN_FILE="./token" - Add your Hugging Face Hub access token to download gated models

Use Cases

Capture discussions in real-time for meeting transcription, help hearing-impaired users follow conversations through accessibility tools,
transcribe podcasts or videos automatically for content creation, transcribe support calls with speaker identification for customer service...

https://github.com/QuentinFuxa/WhisperLiveKit Page 6 of 6

2023 Ijcnlp-Demo 3
No ratings yet
2023 Ijcnlp-Demo 3
8 pages
Real Time Transcription Service For Online Meetings Using Whisper Api
No ratings yet
Real Time Transcription Service For Online Meetings Using Whisper Api
16 pages
Py Report
No ratings yet
Py Report
8 pages
Setting Up Packages For Speech Recognition
No ratings yet
Setting Up Packages For Speech Recognition
3 pages
Code Docs 9.6 To 9.12
No ratings yet
Code Docs 9.6 To 9.12
5 pages
Whisper AI: A Guide for Tech Users
No ratings yet
Whisper AI: A Guide for Tech Users
20 pages
Product Requirements
No ratings yet
Product Requirements
5 pages
Fine-Tuning Whisper for Multilingual ASR
No ratings yet
Fine-Tuning Whisper for Multilingual ASR
24 pages
Voice Assistant AI Python
No ratings yet
Voice Assistant AI Python
10 pages
Building A Windows Desktop AI Assistant (Python, Voice I - O, 3D Avatar)
No ratings yet
Building A Windows Desktop AI Assistant (Python, Voice I - O, 3D Avatar)
5 pages
Test SDHGFJHDF'
No ratings yet
Test SDHGFJHDF'
5 pages
OpenAI Whisper GitHub Updates
No ratings yet
OpenAI Whisper GitHub Updates
1 page
Personal Voice Assistant
No ratings yet
Personal Voice Assistant
7 pages
Jarvis Voice Assistant
No ratings yet
Jarvis Voice Assistant
2 pages
Fine-Tune Whisper For Multilingual ASR With ? Transformers
No ratings yet
Fine-Tune Whisper For Multilingual ASR With ? Transformers
26 pages
WhisperX Model
No ratings yet
WhisperX Model
6 pages
OpenAI Whisper: Multilingual Speech Recognition
No ratings yet
OpenAI Whisper: Multilingual Speech Recognition
5 pages
Iris Virtual Assistant Project
No ratings yet
Iris Virtual Assistant Project
17 pages
System Overview
No ratings yet
System Overview
6 pages
Python Open Ended Code
No ratings yet
Python Open Ended Code
13 pages
Desktop Assistant Final
No ratings yet
Desktop Assistant Final
15 pages
Speech Understanding Content
No ratings yet
Speech Understanding Content
9 pages
Project Documentation: Muhammad Munib Muhammad Afaaf
No ratings yet
Project Documentation: Muhammad Munib Muhammad Afaaf
11 pages
Personal Assistant Bot
No ratings yet
Personal Assistant Bot
7 pages
Voice Command Assistant Script
No ratings yet
Voice Command Assistant Script
2 pages
Voice-Activated Assistant in Python
No ratings yet
Voice-Activated Assistant in Python
8 pages
#Pip Install Pyttsx3 #Pip Install Speechrecognition #Pip Install Wikipedia
No ratings yet
#Pip Install Pyttsx3 #Pip Install Speechrecognition #Pip Install Wikipedia
3 pages
224s 22 Lec1
No ratings yet
224s 22 Lec1
31 pages
AI Baat Chit
No ratings yet
AI Baat Chit
4 pages
Requirements
No ratings yet
Requirements
2 pages
OpenAI O1 and New Tools For Developers - OpenAI
No ratings yet
OpenAI O1 and New Tools For Developers - OpenAI
12 pages
Voice Assistant Project in Python
No ratings yet
Voice Assistant Project in Python
48 pages
Voice Assistant Using Python 2
No ratings yet
Voice Assistant Using Python 2
20 pages
Multimodal Llama 3.2 Overview
No ratings yet
Multimodal Llama 3.2 Overview
29 pages
Zero Shot Voice Cloning Guide
No ratings yet
Zero Shot Voice Cloning Guide
2 pages
V13I9202401
No ratings yet
V13I9202401
8 pages
Python Voice Assistant Development
No ratings yet
Python Voice Assistant Development
6 pages
Voice - Assistant - Research Paper
No ratings yet
Voice - Assistant - Research Paper
4 pages
Project Testing
No ratings yet
Project Testing
11 pages
Audio and Speech - OpenAI API
No ratings yet
Audio and Speech - OpenAI API
5 pages
Speech To Text
No ratings yet
Speech To Text
17 pages
AI Programming with Astral LLM Insights
No ratings yet
AI Programming with Astral LLM Insights
31 pages
Voice Assistant Python Script
No ratings yet
Voice Assistant Python Script
6 pages
Complete Installation and Running Guide - Whisper.c
No ratings yet
Complete Installation and Running Guide - Whisper.c
10 pages
Getting Started - OpenAI Realtime and WebRTC - by Chris McKenzie - Medium
No ratings yet
Getting Started - OpenAI Realtime and WebRTC - by Chris McKenzie - Medium
17 pages
Alexa Mejorado Tarea #4
No ratings yet
Alexa Mejorado Tarea #4
2 pages
Best Locally-Usable Voice Transcription AI in 2025
No ratings yet
Best Locally-Usable Voice Transcription AI in 2025
2 pages
Format Edit
No ratings yet
Format Edit
10 pages
Efficient Whisper On Streaming Speech: Rongxiang Wang Zhiming Xu Felix Xiaozhu Lin
No ratings yet
Efficient Whisper On Streaming Speech: Rongxiang Wang Zhiming Xu Felix Xiaozhu Lin
15 pages
Build Your Own AI Voice Assistant in Python
No ratings yet
Build Your Own AI Voice Assistant in Python
7 pages
Saathi Project Plan & Structure
No ratings yet
Saathi Project Plan & Structure
5 pages
Coding The Future: A Comprehensive Guide To AI Development-By Tyler Welch
No ratings yet
Coding The Future: A Comprehensive Guide To AI Development-By Tyler Welch
180 pages
TensorGo Voice Assistant Presentation
No ratings yet
TensorGo Voice Assistant Presentation
10 pages
Python Assistent Mini Project Report
No ratings yet
Python Assistent Mini Project Report
23 pages
Voice - Assistant - Research Paper
No ratings yet
Voice - Assistant - Research Paper
6 pages
Whisper To Waves-3
No ratings yet
Whisper To Waves-3
20 pages
Installing Whisper WebUI On Windows 10 - Web
No ratings yet
Installing Whisper WebUI On Windows 10 - Web
24 pages
Juspay Hyperswitch Reduced
No ratings yet
Juspay Hyperswitch Reduced
8 pages
Dockur Windows
No ratings yet
Dockur Windows
8 pages
JetBrains Koog
No ratings yet
JetBrains Koog
4 pages
1.6. Feedback: GNU/Linux Command-Line Tools Summary
No ratings yet
1.6. Feedback: GNU/Linux Command-Line Tools Summary
1 page
1.5. Resources Used To Create This Document: GNU/Linux Command-Line Tools Summary
No ratings yet
1.5. Resources Used To Create This Document: GNU/Linux Command-Line Tools Summary
1 page
Dyshidrotic Eczema: Background, Etiology, Epidemiology: Pompholyx Reoccur
No ratings yet
Dyshidrotic Eczema: Background, Etiology, Epidemiology: Pompholyx Reoccur
2 pages
1.3. Availability of Sources: GNU/Linux Command-Line Tools Summary
No ratings yet
1.3. Availability of Sources: GNU/Linux Command-Line Tools Summary
1 page
Edwards Personal Preference Schedule Questionnaire - Penelusuran Google
No ratings yet
Edwards Personal Preference Schedule Questionnaire - Penelusuran Google
2 pages
NumPy Cookbook - Second Edition - Sample Chapter
100% (4)
NumPy Cookbook - Second Edition - Sample Chapter
32 pages
3-D Language Acquisition Test Evaluation PDF
No ratings yet
3-D Language Acquisition Test Evaluation PDF
14 pages
Math CH 2 Excercise
No ratings yet
Math CH 2 Excercise
14 pages
Writing An Effective Introduction
No ratings yet
Writing An Effective Introduction
2 pages
MATATAG Kindergarten Training Agenda
No ratings yet
MATATAG Kindergarten Training Agenda
2 pages
Bibliography English Litearature
100% (1)
Bibliography English Litearature
2 pages
Danish Language Grammar Guide
100% (1)
Danish Language Grammar Guide
47 pages
Bachelor of Software Engineering Diploma
No ratings yet
Bachelor of Software Engineering Diploma
3 pages
Aseema
No ratings yet
Aseema
8 pages
Automata Theory
100% (1)
Automata Theory
103 pages
DLL 21ST Week 4
No ratings yet
DLL 21ST Week 4
5 pages
Itip05 - Real-timeDataSharing
No ratings yet
Itip05 - Real-timeDataSharing
4 pages
Maths Mate Homework Help Online
100% (1)
Maths Mate Homework Help Online
4 pages
Public Selection Results
No ratings yet
Public Selection Results
35 pages
The Crypter Blue Print
No ratings yet
The Crypter Blue Print
46 pages
1970 - Richard H. Hiers - The Kingdom of God in The Synoptic Tradition - Part Two
No ratings yet
1970 - Richard H. Hiers - The Kingdom of God in The Synoptic Tradition - Part Two
11 pages
Knowledge Representation Rules
No ratings yet
Knowledge Representation Rules
10 pages
Presentation of Maat of Ancient Egypt
100% (1)
Presentation of Maat of Ancient Egypt
0 pages
Luther After Derrida 41st Edition Marisa Strizzi
No ratings yet
Luther After Derrida 41st Edition Marisa Strizzi
68 pages
Word Shortcut Keys Guide
No ratings yet
Word Shortcut Keys Guide
13 pages
MYP Grade 10 Unit 1 Description 2010-11
100% (1)
MYP Grade 10 Unit 1 Description 2010-11
7 pages
Karma Cards
67% (6)
Karma Cards
97 pages
Research Order5
No ratings yet
Research Order5
2 pages
GNU Octave Basics & Operations Guide
No ratings yet
GNU Octave Basics & Operations Guide
17 pages
Communication - Written Report
No ratings yet
Communication - Written Report
6 pages
Experienced English Educator Profile
No ratings yet
Experienced English Educator Profile
2 pages
The Karaits: History and Christianity
100% (1)
The Karaits: History and Christianity
14 pages
PAA20+ V1.8.1 English
No ratings yet
PAA20+ V1.8.1 English
147 pages
Iep Case Study
100% (6)
Iep Case Study
14 pages
1 - Intro To OS
No ratings yet
1 - Intro To OS
18 pages