Menu

Audio Capture Layer

Relevant source files

Purpose and Scope

This document details the cross-platform audio I/O implementation in the hypr-audio crate (crates/audio/). This layer provides low-level audio capture functionality for both microphone input and system audio (speaker) output across macOS, Windows, and Linux. It handles platform-specific audio APIs and implements ring buffer patterns for real-time thread communication between audio hardware callbacks and the async application layer.

For information about how captured audio is processed (VAD, AGC, AEC), see Audio Processing Pipeline. For details on how this audio capture integrates with the plugin system, see Listener Plugin.


Crate Structure and Dependencies

The hypr-audio crate is organized into several modules that handle different aspects of audio capture:

ModulePurposePlatform Support
micMicrophone input via cpalAll platforms
speakerSystem audio capturemacOS, Windows, Linux
device_monitorAudio device change detectionmacOS, Linux
normAudio normalization utilitiesAll platforms
utilsHelper functions (e.g., headphone detection)Platform-specific

Key Dependencies

DependencyPurposeUsage
cpalCross-platform audio libraryMicrophone input
cidre (macOS)macOS Core Audio bindingsSpeaker tap and device monitoring
wasapi (Windows)Windows Audio Session APISpeaker loopback capture
alsa (Linux)Advanced Linux Sound ArchitectureSpeaker monitor capture
ringbufLock-free ring bufferReal-time audio thread communication
kalosm-soundAudio streaming traitsStream abstractions

Sources: crates/audio/Cargo.toml1-42


Architecture Overview

Sources: crates/audio/src/lib.rs1-14 crates/audio/src/speaker/mod.rs1-71


Unified Audio Interface

The AudioInput struct provides a unified interface for both microphone and speaker audio sources:

AudioInput API

MethodDescriptionReturns
from_mic(device_name)Create microphone inputResult<AudioInput>
from_speaker()Create speaker output captureAudioInput
sample_rate()Get sample rateu32
device_name()Get device nameString
stream()Create audio streamAudioStream
list_mic_devices()List available microphonesVec<String>

The AudioStream enum unifies all audio sources and implements the kalosm_sound::AsyncSource trait for async streaming:

Sources: crates/audio/src/lib.rs72-209


Microphone Input (cpal-based)

The MicInput and MicStream structs handle microphone capture using the cross-platform cpal library.

MicInput Structure

Implementation Details

The microphone implementation uses a multi-threaded pattern:

  1. Device Selection: Filters out the tap device (hypr-audio-tap) to avoid capture loops crates/audio/src/mic.rs12-22
  2. Stream Creation: Spawns a background thread that manages the cpal stream crates/audio/src/mic.rs108-171
  3. Format Handling: Supports multiple sample formats (I8, I16, I32, F32) with automatic conversion to f32 crates/audio/src/mic.rs133-142
  4. Channel Reduction: Converts multi-channel audio to mono by stepping through channels crates/audio/src/mic.rs119-123
  5. Async Bridge: Uses mpsc::unbounded to bridge the synchronous cpal callback to async Stream crates/audio/src/mic.rs102

Lifecycle Management

The MicStream implements a clean shutdown mechanism:

Sources: crates/audio/src/mic.rs24-223


Speaker Input - Platform Implementations

Speaker input (system audio capture) requires platform-specific implementations due to different audio APIs:

PlatformAPIDevice TypeSample Format
macOSCore AudioAggregate Device + TapMultiple formats via AVFoundation
WindowsWASAPILoopback CaptureFloat32
LinuxALSAMonitor DeviceFloat32

Platform Abstraction Layer

The speaker/mod.rs module provides a unified interface with conditional compilation:

Both SpeakerInput and SpeakerStream wrap platform-specific implementations and provide consistent APIs. The SpeakerStream additionally buffers samples to present them one at a time via the Stream trait crates/audio/src/speaker/mod.rs73-117

Sources: crates/audio/src/speaker/mod.rs1-134


macOS Implementation: Core Audio Tap

The macOS implementation uses Core Audio's tap functionality to capture system audio output.

Tap Configuration

The tap is configured with specific properties:

  1. Tap Descriptor: Creates a mono global tap that excludes specific processes crates/audio/src/speaker/macos.rs54-55
  2. Sub-device Configuration: Wraps tap UID in a sub-device dictionary crates/audio/src/speaker/macos.rs57-60
  3. Aggregate Device: Creates a private aggregate device with tap auto-start disabled crates/audio/src/speaker/macos.rs62-77

Audio Processing Callback

The proc function is the real-time audio callback that runs on the audio thread:

Key implementation details:

Sources: crates/audio/src/speaker/macos.rs17-298


Windows Implementation: WASAPI Loopback

The Windows implementation uses WASAPI loopback mode to capture system audio.

Initialization and Configuration

The WASAPI implementation initializes in several steps crates/audio/src/speaker/windows.rs77-98:

  1. Get default render device (speakers)
  2. Create audio client
  3. Configure 32-bit float format at 44100Hz mono
  4. Get minimum buffer period
  5. Initialize in shared mode with loopback capture
  6. Create event handle for synchronization
  7. Get capture client interface
  8. Start the audio stream

Capture Loop

The capture thread runs in a dedicated OS thread and performs the following:

  1. Event Wait: Waits on event handle with 3-second timeout crates/audio/src/speaker/windows.rs112-115
  2. Data Read: Reads available audio data into a temporary queue crates/audio/src/speaker/windows.rs117-121
  3. Byte Assembly: Assembles 4 bytes into f32 samples crates/audio/src/speaker/windows.rs128-137
  4. Queue Management: Appends samples to shared queue, drops oldest if over 8192 samples crates/audio/src/speaker/windows.rs140-150
  5. Waker Notification: Wakes async consumer if new data arrived crates/audio/src/speaker/windows.rs152-162
  6. Shutdown Check: Checks shutdown flag on each iteration crates/audio/src/speaker/windows.rs105-110

Async Integration

The SpeakerStream::poll_next implementation bridges the capture thread to async Rust:

  1. Check shutdown flag crates/audio/src/speaker/windows.rs196-201
  2. Try to pop samples from queue crates/audio/src/speaker/windows.rs203-212
  3. Register waker if no data available crates/audio/src/speaker/windows.rs214-222
  4. Double-check queue after registration to avoid race crates/audio/src/speaker/windows.rs224-236

Sources: crates/audio/src/speaker/windows.rs11-238


Linux Implementation: ALSA Monitor

The Linux implementation uses ALSA to capture from a monitor device.

Device Configuration

The Linux implementation configures ALSA with the following parameters crates/audio/src/speaker/linux.rs116-134:

  • Channels: 1 (mono)
  • Sample Rate: 48000Hz (nearest)
  • Format: Float32
  • Access: Interleaved read/write

Capture Loop

The capture loop runs in a dedicated thread crates/audio/src/speaker/linux.rs109-183:

  1. Open PCM: Opens the capture device with configured parameters
  2. Read Frames: Reads CHUNK_SIZE frames at a time using io.readi() crates/audio/src/speaker/linux.rs144
  3. Push to Buffer: Pushes samples to ring buffer crates/audio/src/speaker/linux.rs148
  4. Drop Tracking: Logs dropped samples if buffer is full crates/audio/src/speaker/linux.rs150-153
  5. Wake Consumer: Wakes the async stream consumer when data arrives crates/audio/src/speaker/linux.rs155-168
  6. Error Recovery: Attempts to recover from PCM errors crates/audio/src/speaker/linux.rs172-178
  7. Stop Check: Checks atomic stop signal on each iteration crates/audio/src/speaker/linux.rs143

Sources: crates/audio/src/speaker/linux.rs13-226


Ring Buffer Pattern for Real-Time Communication

All speaker implementations use the ringbuf crate to safely communicate between real-time audio threads and async Rust code.

Ring Buffer Architecture

Buffer Configuration

All implementations use consistent buffer sizing:

ParameterValuePurpose
CHUNK_SIZE256 samplesSize of each read/write operation
BUFFER_SIZE1024 samples (256 * 4)Total ring buffer capacity

This provides approximately 21ms of buffering at 48kHz (macOS/Linux) or 23ms at 44.1kHz (Windows).

Waker State Pattern

The implementations use a shared WakerState to coordinate between threads:

Producer side (audio thread):

  1. Write samples to ring buffer
  2. Lock WakerState briefly
  3. If has_data is false, set it to true and take the waker
  4. Drop the lock immediately
  5. Call waker.wake() outside the lock

Consumer side (async task):

  1. Try to read from ring buffer
  2. If no data available, lock WakerState
  3. Set has_data to false
  4. Register current task's waker
  5. Return Poll::Pending

This pattern ensures the audio thread spends minimal time in locks while correctly notifying the async consumer.

Sources: crates/audio/src/speaker/macos.rs43-269 crates/audio/src/speaker/windows.rs54-162 crates/audio/src/speaker/linux.rs18-168


Device Monitoring

The DeviceMonitor provides real-time notifications when audio devices change.

Device Events

Platform Implementations

macOS Device Monitoring

Uses Core Audio property listeners crates/audio/src/device_monitor.rs70-144:

  1. Register listeners for HW_DEFAULT_INPUT_DEVICE and HW_DEFAULT_OUTPUT_DEVICE properties
  2. Listener callback runs on Core Audio thread
  3. Detects headphone connection via stream terminal type
  4. Runs Core Foundation run loop until stop signal received
  5. Unregisters listeners on shutdown

Linux Device Monitoring

Uses PulseAudio subscription API crates/audio/src/device_monitor.rs146-296:

  1. Creates PulseAudio context and mainloop
  2. Subscribes to SINK, SOURCE, and SERVER events
  3. Callback fires on device changes
  4. Detects headphone via active port name (headphone, headset, etc.)
  5. Stops mainloop and disconnects on shutdown

Usage Pattern

The monitor spawns a background thread and returns a handle that automatically stops monitoring when dropped.

Sources: crates/audio/src/device_monitor.rs1-313


Sample Rate Handling

All audio implementations dynamically track sample rates to handle device changes:

ImplementationMechanismStorage
macOSQuery device.nominal_sample_rate() in callbackArc<AtomicU32>
WindowsFixed at 44100HzConstant
LinuxQuery hwp.get_rate() during initializationArc<AtomicU32>
MicrophoneFrom cpal::SupportedStreamConfigcpal::SupportedStreamConfig

The atomic sample rate storage allows the async consumer to query the current rate without blocking the audio thread crates/audio/src/speaker/macos.rs38-40 crates/audio/src/speaker/linux.rs187-189


Error Handling and Resilience

Microphone Input

Speaker Input

macOS:

Windows:

Linux:

Sources: crates/audio/src/mic.rs69-92 crates/audio/src/speaker/macos.rs246-252 crates/audio/src/speaker/windows.rs145-149 crates/audio/src/speaker/linux.rs150-178