Audio Capture Layer

Relevant source files

Purpose and Scope

This document details the cross-platform audio I/O implementation in the hypr-audio crate (crates/audio/). This layer provides low-level audio capture functionality for both microphone input and system audio (speaker) output across macOS, Windows, and Linux. It handles platform-specific audio APIs and implements ring buffer patterns for real-time thread communication between audio hardware callbacks and the async application layer.

For information about how captured audio is processed (VAD, AGC, AEC), see Audio Processing Pipeline. For details on how this audio capture integrates with the plugin system, see Listener Plugin.

Crate Structure and Dependencies

The hypr-audio crate is organized into several modules that handle different aspects of audio capture:

Module	Purpose	Platform Support
`mic`	Microphone input via cpal	All platforms
`speaker`	System audio capture	macOS, Windows, Linux
`device_monitor`	Audio device change detection	macOS, Linux
`norm`	Audio normalization utilities	All platforms
`utils`	Helper functions (e.g., headphone detection)	Platform-specific

Key Dependencies

Dependency	Purpose	Usage
`cpal`	Cross-platform audio library	Microphone input
`cidre` (macOS)	macOS Core Audio bindings	Speaker tap and device monitoring
`wasapi` (Windows)	Windows Audio Session API	Speaker loopback capture
`alsa` (Linux)	Advanced Linux Sound Architecture	Speaker monitor capture
`ringbuf`	Lock-free ring buffer	Real-time audio thread communication
`kalosm-sound`	Audio streaming traits	Stream abstractions

Sources: crates/audio/Cargo.toml1-42

Architecture Overview

Sources: crates/audio/src/lib.rs1-14 crates/audio/src/speaker/mod.rs1-71

Unified Audio Interface

The AudioInput struct provides a unified interface for both microphone and speaker audio sources:

AudioInput API

Method	Description	Returns
`from_mic(device_name)`	Create microphone input	`Result<AudioInput>`
`from_speaker()`	Create speaker output capture	`AudioInput`
`sample_rate()`	Get sample rate	`u32`
`device_name()`	Get device name	`String`
`stream()`	Create audio stream	`AudioStream`
`list_mic_devices()`	List available microphones	`Vec<String>`

The AudioStream enum unifies all audio sources and implements the kalosm_sound::AsyncSource trait for async streaming:

Sources: crates/audio/src/lib.rs72-209

Microphone Input (cpal-based)

The MicInput and MicStream structs handle microphone capture using the cross-platform cpal library.

MicInput Structure

Implementation Details

The microphone implementation uses a multi-threaded pattern:

Device Selection: Filters out the tap device (hypr-audio-tap) to avoid capture loops crates/audio/src/mic.rs12-22
Stream Creation: Spawns a background thread that manages the cpal stream crates/audio/src/mic.rs108-171
Format Handling: Supports multiple sample formats (I8, I16, I32, F32) with automatic conversion to f32 crates/audio/src/mic.rs133-142
Channel Reduction: Converts multi-channel audio to mono by stepping through channels crates/audio/src/mic.rs119-123
Async Bridge: Uses mpsc::unbounded to bridge the synchronous cpal callback to async Stream crates/audio/src/mic.rs102

Lifecycle Management

The MicStream implements a clean shutdown mechanism:

Holds a drop_tx sender crates/audio/src/mic.rs184
Background thread waits on drop_rx.recv() crates/audio/src/mic.rs167
On drop, signals the thread to stop and clean up the cpal stream crates/audio/src/mic.rs190-193

Sources: crates/audio/src/mic.rs24-223

Speaker Input - Platform Implementations

Speaker input (system audio capture) requires platform-specific implementations due to different audio APIs:

Platform	API	Device Type	Sample Format
macOS	Core Audio	Aggregate Device + Tap	Multiple formats via AVFoundation
Windows	WASAPI	Loopback Capture	Float32
Linux	ALSA	Monitor Device	Float32

Platform Abstraction Layer

The speaker/mod.rs module provides a unified interface with conditional compilation:

Both SpeakerInput and SpeakerStream wrap platform-specific implementations and provide consistent APIs. The SpeakerStream additionally buffers samples to present them one at a time via the Stream trait crates/audio/src/speaker/mod.rs73-117

Sources: crates/audio/src/speaker/mod.rs1-134

macOS Implementation: Core Audio Tap

The macOS implementation uses Core Audio's tap functionality to capture system audio output.

Tap Configuration

The tap is configured with specific properties:

Tap Descriptor: Creates a mono global tap that excludes specific processes crates/audio/src/speaker/macos.rs54-55
Sub-device Configuration: Wraps tap UID in a sub-device dictionary crates/audio/src/speaker/macos.rs57-60
Aggregate Device: Creates a private aggregate device with tap auto-start disabled crates/audio/src/speaker/macos.rs62-77

Audio Processing Callback

The proc function is the real-time audio callback that runs on the audio thread:

Key implementation details:

Sample Rate Tracking: Atomically tracks sample rate changes from the device crates/audio/src/speaker/macos.rs101-109
Format Conversion: Handles multiple PCM formats (F32, F64, I32, I16) with normalization crates/audio/src/speaker/macos.rs124-152
Zero-Copy Path: Uses av::AudioPcmBuf for direct f32 data access when possible crates/audio/src/speaker/macos.rs111-116
Ring Buffer Push: Writes samples to ring buffer and logs dropped samples if buffer is full crates/audio/src/speaker/macos.rs246-252
Waker Management: Wakes the async consumer only once when data becomes available crates/audio/src/speaker/macos.rs254-267

Sources: crates/audio/src/speaker/macos.rs17-298

Windows Implementation: WASAPI Loopback

The Windows implementation uses WASAPI loopback mode to capture system audio.

Initialization and Configuration

The WASAPI implementation initializes in several steps crates/audio/src/speaker/windows.rs77-98:

Get default render device (speakers)
Create audio client
Configure 32-bit float format at 44100Hz mono
Get minimum buffer period
Initialize in shared mode with loopback capture
Create event handle for synchronization
Get capture client interface
Start the audio stream

Capture Loop

The capture thread runs in a dedicated OS thread and performs the following:

Event Wait: Waits on event handle with 3-second timeout crates/audio/src/speaker/windows.rs112-115
Data Read: Reads available audio data into a temporary queue crates/audio/src/speaker/windows.rs117-121
Byte Assembly: Assembles 4 bytes into f32 samples crates/audio/src/speaker/windows.rs128-137
Queue Management: Appends samples to shared queue, drops oldest if over 8192 samples crates/audio/src/speaker/windows.rs140-150
Waker Notification: Wakes async consumer if new data arrived crates/audio/src/speaker/windows.rs152-162
Shutdown Check: Checks shutdown flag on each iteration crates/audio/src/speaker/windows.rs105-110

Async Integration

The SpeakerStream::poll_next implementation bridges the capture thread to async Rust:

Check shutdown flag crates/audio/src/speaker/windows.rs196-201
Try to pop samples from queue crates/audio/src/speaker/windows.rs203-212
Register waker if no data available crates/audio/src/speaker/windows.rs214-222
Double-check queue after registration to avoid race crates/audio/src/speaker/windows.rs224-236

Sources: crates/audio/src/speaker/windows.rs11-238

Linux Implementation: ALSA Monitor

The Linux implementation uses ALSA to capture from a monitor device.

Device Configuration

The Linux implementation configures ALSA with the following parameters crates/audio/src/speaker/linux.rs116-134:

Channels: 1 (mono)
Sample Rate: 48000Hz (nearest)
Format: Float32
Access: Interleaved read/write

Capture Loop

The capture loop runs in a dedicated thread crates/audio/src/speaker/linux.rs109-183:

Open PCM: Opens the capture device with configured parameters
Read Frames: Reads CHUNK_SIZE frames at a time using io.readi() crates/audio/src/speaker/linux.rs144
Push to Buffer: Pushes samples to ring buffer crates/audio/src/speaker/linux.rs148
Drop Tracking: Logs dropped samples if buffer is full crates/audio/src/speaker/linux.rs150-153
Wake Consumer: Wakes the async stream consumer when data arrives crates/audio/src/speaker/linux.rs155-168
Error Recovery: Attempts to recover from PCM errors crates/audio/src/speaker/linux.rs172-178
Stop Check: Checks atomic stop signal on each iteration crates/audio/src/speaker/linux.rs143

Sources: crates/audio/src/speaker/linux.rs13-226

Ring Buffer Pattern for Real-Time Communication

All speaker implementations use the ringbuf crate to safely communicate between real-time audio threads and async Rust code.

Ring Buffer Architecture

Buffer Configuration

All implementations use consistent buffer sizing:

Parameter	Value	Purpose
`CHUNK_SIZE`	256 samples	Size of each read/write operation
`BUFFER_SIZE`	1024 samples (256 * 4)	Total ring buffer capacity

This provides approximately 21ms of buffering at 48kHz (macOS/Linux) or 23ms at 44.1kHz (Windows).

Waker State Pattern

The implementations use a shared WakerState to coordinate between threads:

Producer side (audio thread):

Write samples to ring buffer
Lock WakerState briefly
If has_data is false, set it to true and take the waker
Drop the lock immediately
Call waker.wake() outside the lock

Consumer side (async task):

Try to read from ring buffer
If no data available, lock WakerState
Set has_data to false
Register current task's waker
Return Poll::Pending

This pattern ensures the audio thread spends minimal time in locks while correctly notifying the async consumer.

Sources: crates/audio/src/speaker/macos.rs43-269 crates/audio/src/speaker/windows.rs54-162 crates/audio/src/speaker/linux.rs18-168

Device Monitoring

The DeviceMonitor provides real-time notifications when audio devices change.

Device Events

Platform Implementations

macOS Device Monitoring

Uses Core Audio property listeners crates/audio/src/device_monitor.rs70-144:

Register listeners for HW_DEFAULT_INPUT_DEVICE and HW_DEFAULT_OUTPUT_DEVICE properties
Listener callback runs on Core Audio thread
Detects headphone connection via stream terminal type
Runs Core Foundation run loop until stop signal received
Unregisters listeners on shutdown

Linux Device Monitoring

Uses PulseAudio subscription API crates/audio/src/device_monitor.rs146-296:

Creates PulseAudio context and mainloop
Subscribes to SINK, SOURCE, and SERVER events
Callback fires on device changes
Detects headphone via active port name (headphone, headset, etc.)
Stops mainloop and disconnects on shutdown

Usage Pattern

The monitor spawns a background thread and returns a handle that automatically stops monitoring when dropped.

Sources: crates/audio/src/device_monitor.rs1-313

Sample Rate Handling

All audio implementations dynamically track sample rates to handle device changes:

Implementation	Mechanism	Storage
macOS	Query `device.nominal_sample_rate()` in callback	`Arc<AtomicU32>`
Windows	Fixed at 44100Hz	Constant
Linux	Query `hwp.get_rate()` during initialization	`Arc<AtomicU32>`
Microphone	From `cpal::SupportedStreamConfig`	`cpal::SupportedStreamConfig`

The atomic sample rate storage allows the async consumer to query the current rate without blocking the audio thread crates/audio/src/speaker/macos.rs38-40 crates/audio/src/speaker/linux.rs187-189

Error Handling and Resilience

Microphone Input

Device enumeration filters out tap devices to prevent feedback crates/audio/src/mic.rs44
Falls back to default device if requested device not found crates/audio/src/mic.rs76-82
Logs stream errors but continues running crates/audio/src/mic.rs126

Speaker Input

macOS:

Returns errors for initialization failures crates/audio/src/speaker/macos.rs53-80
Logs but continues on format detection issues crates/audio/src/speaker/macos.rs151
Tracks dropped samples crates/audio/src/speaker/macos.rs249-252

Windows:

Sends initialization errors via channel crates/audio/src/speaker/windows.rs165-168
Logs capture failures and continues crates/audio/src/speaker/windows.rs118-121
Tracks dropped samples when queue exceeds 8192 crates/audio/src/speaker/windows.rs145-149

Linux:

Returns errors for PCM failures crates/audio/src/speaker/linux.rs94
Attempts PCM recovery on read errors crates/audio/src/speaker/linux.rs172-178
Tracks dropped samples crates/audio/src/speaker/linux.rs151-153

Sources: crates/audio/src/mic.rs69-92 crates/audio/src/speaker/macos.rs246-252 crates/audio/src/speaker/windows.rs145-149 crates/audio/src/speaker/linux.rs150-178

Audio Capture Layer

Purpose and Scope

Crate Structure and Dependencies

Key Dependencies

Architecture Overview

Unified Audio Interface

AudioInput API

Microphone Input (cpal-based)

MicInput Structure

Implementation Details

Lifecycle Management

Speaker Input - Platform Implementations

Platform Abstraction Layer

macOS Implementation: Core Audio Tap

Tap Configuration

Audio Processing Callback

Windows Implementation: WASAPI Loopback

Initialization and Configuration

Capture Loop

Async Integration

Linux Implementation: ALSA Monitor

Device Configuration

Capture Loop

Ring Buffer Pattern for Real-Time Communication

Ring Buffer Architecture

Buffer Configuration

Waker State Pattern

Device Monitoring

Device Events

Platform Implementations

macOS Device Monitoring

Linux Device Monitoring

Usage Pattern

Sample Rate Handling

Error Handling and Resilience

Microphone Input

Speaker Input

On this page