Telegram Voice Summarizer Bot

A Telegram bot that automatically transcribes and summarizes voice messages longer than 10 seconds using local AI models.

Features

🎤 Transcribes voice messages using OpenAI Whisper large-v3 (optimized for Portuguese)
📝 Summarizes transcriptions using Ollama with llama3.1:8b (strict factual mode)
⚡ Processes voice messages > 10 seconds automatically
🔒 All processing done locally (no external APIs)
🇧🇷 Excellent Portuguese language support

Prerequisites

Python 3.10+
uv - Fast Python package installer

Ollama - For running local LLM

# Install Ollama (macOS)
brew install ollama

# Start Ollama service
ollama serve

# Pull the model (in another terminal)
ollama pull llama3.1:8b

FFmpeg - For audio conversion
```
# macOS
brew install ffmpeg
```

Setup

Install dependencies:
```
uv pip install -e .
```
Create a Telegram bot:
- Talk to @BotFather on Telegram
- Create a new bot with /newbot
- Copy the bot token

Set environment variable:

export TELEGRAM_BOT_TOKEN="your-token-here"
# Optional: Set custom Ollama host (defaults to http://localhost:11434)
export OLLAMA_HOST="http://localhost:11434"

Add bot to group:
- Add your bot to a Telegram group
- Make sure the bot has permission to read messages

Usage

Running locally:

telegram-summarizer
# or
python -m telegram_summarizer

Running with Docker:

docker run -d \
  -e TELEGRAM_BOT_TOKEN=your-token \
  -e OLLAMA_HOST=http://host.docker.internal:11434 \
  -v whisper-cache:/data/.cache \
  ghcr.io/caarlos0/telegram-summarizer:latest

The -v whisper-cache:/data/.cache volume persists the ~3GB Whisper model between restarts.

The bot will:

Listen for voice messages in groups
Ignore messages <= 10 seconds
For longer messages:
- React with 🙉 while processing
- Transcribe and extract core idea
- Reply only if relevant content found

Configuration

Environment variables:

TELEGRAM_BOT_TOKEN - Your bot token (required)
OLLAMA_HOST - Ollama server URL (optional, defaults to http://localhost:11434)

Code customization - Edit src/telegram_summarizer/__main__.py:

whisper.load_model("large-v3") - Current: best quality for Portuguese. Can change to medium, small, or base for faster processing
llama3.1:8b - Current model. Can use llama3.2:3b (faster) or llama3.2:1b (even faster but less reliable)
voice.duration <= 10 - Change minimum duration threshold

Models

Current configuration (optimized for Portuguese accuracy):

Whisper large-v3: ~3GB, best quality especially for Portuguese
llama3.1:8b: ~4.7GB, excellent accuracy with strict factual prompting

Alternative models (faster but less accurate):

Whisper: medium (~1.5GB), small (~500MB), base (~140MB)
Ollama: llama3.2:3b (~2GB), llama3.2:1b (~1.3GB)

For even better quality:

Ollama: llama3.1:70b (~40GB) if you have powerful hardware

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src/telegram_summarizer		src/telegram_summarizer
.env.example		.env.example
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Telegram Voice Summarizer Bot

Features

Prerequisites

Setup

Usage

Running locally:

Running with Docker:

Configuration

Models

License

About

Uh oh!

Releases 7

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Languages

Uh oh!

caarlos0/telegram-summarizer

Folders and files

Latest commit

History

Repository files navigation

Telegram Voice Summarizer Bot

Features

Prerequisites

Setup

Usage

Running locally:

Running with Docker:

Configuration

Models

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Languages

Packages