This document details the cross-platform audio I/O implementation in the hypr-audio crate (crates/audio/). This layer provides low-level audio capture functionality for both microphone input and system audio (speaker) output across macOS, Windows, and Linux. It handles platform-specific audio APIs and implements ring buffer patterns for real-time thread communication between audio hardware callbacks and the async application layer.
For information about how captured audio is processed (VAD, AGC, AEC), see Audio Processing Pipeline. For details on how this audio capture integrates with the plugin system, see Listener Plugin.
The hypr-audio crate is organized into several modules that handle different aspects of audio capture:
| Module | Purpose | Platform Support |
|---|---|---|
mic | Microphone input via cpal | All platforms |
speaker | System audio capture | macOS, Windows, Linux |
device_monitor | Audio device change detection | macOS, Linux |
norm | Audio normalization utilities | All platforms |
utils | Helper functions (e.g., headphone detection) | Platform-specific |
| Dependency | Purpose | Usage |
|---|---|---|
cpal | Cross-platform audio library | Microphone input |
cidre (macOS) | macOS Core Audio bindings | Speaker tap and device monitoring |
wasapi (Windows) | Windows Audio Session API | Speaker loopback capture |
alsa (Linux) | Advanced Linux Sound Architecture | Speaker monitor capture |
ringbuf | Lock-free ring buffer | Real-time audio thread communication |
kalosm-sound | Audio streaming traits | Stream abstractions |
Sources: crates/audio/Cargo.toml1-42
Sources: crates/audio/src/lib.rs1-14 crates/audio/src/speaker/mod.rs1-71
The AudioInput struct provides a unified interface for both microphone and speaker audio sources:
| Method | Description | Returns |
|---|---|---|
from_mic(device_name) | Create microphone input | Result<AudioInput> |
from_speaker() | Create speaker output capture | AudioInput |
sample_rate() | Get sample rate | u32 |
device_name() | Get device name | String |
stream() | Create audio stream | AudioStream |
list_mic_devices() | List available microphones | Vec<String> |
The AudioStream enum unifies all audio sources and implements the kalosm_sound::AsyncSource trait for async streaming:
Sources: crates/audio/src/lib.rs72-209
The MicInput and MicStream structs handle microphone capture using the cross-platform cpal library.
The microphone implementation uses a multi-threaded pattern:
hypr-audio-tap) to avoid capture loops crates/audio/src/mic.rs12-22mpsc::unbounded to bridge the synchronous cpal callback to async Stream crates/audio/src/mic.rs102The MicStream implements a clean shutdown mechanism:
drop_tx sender crates/audio/src/mic.rs184drop_rx.recv() crates/audio/src/mic.rs167Sources: crates/audio/src/mic.rs24-223
Speaker input (system audio capture) requires platform-specific implementations due to different audio APIs:
| Platform | API | Device Type | Sample Format |
|---|---|---|---|
| macOS | Core Audio | Aggregate Device + Tap | Multiple formats via AVFoundation |
| Windows | WASAPI | Loopback Capture | Float32 |
| Linux | ALSA | Monitor Device | Float32 |
The speaker/mod.rs module provides a unified interface with conditional compilation:
Both SpeakerInput and SpeakerStream wrap platform-specific implementations and provide consistent APIs. The SpeakerStream additionally buffers samples to present them one at a time via the Stream trait crates/audio/src/speaker/mod.rs73-117
Sources: crates/audio/src/speaker/mod.rs1-134
The macOS implementation uses Core Audio's tap functionality to capture system audio output.
The tap is configured with specific properties:
The proc function is the real-time audio callback that runs on the audio thread:
Key implementation details:
av::AudioPcmBuf for direct f32 data access when possible crates/audio/src/speaker/macos.rs111-116Sources: crates/audio/src/speaker/macos.rs17-298
The Windows implementation uses WASAPI loopback mode to capture system audio.
The WASAPI implementation initializes in several steps crates/audio/src/speaker/windows.rs77-98:
The capture thread runs in a dedicated OS thread and performs the following:
The SpeakerStream::poll_next implementation bridges the capture thread to async Rust:
Sources: crates/audio/src/speaker/windows.rs11-238
The Linux implementation uses ALSA to capture from a monitor device.
The Linux implementation configures ALSA with the following parameters crates/audio/src/speaker/linux.rs116-134:
The capture loop runs in a dedicated thread crates/audio/src/speaker/linux.rs109-183:
io.readi() crates/audio/src/speaker/linux.rs144Sources: crates/audio/src/speaker/linux.rs13-226
All speaker implementations use the ringbuf crate to safely communicate between real-time audio threads and async Rust code.
All implementations use consistent buffer sizing:
| Parameter | Value | Purpose |
|---|---|---|
CHUNK_SIZE | 256 samples | Size of each read/write operation |
BUFFER_SIZE | 1024 samples (256 * 4) | Total ring buffer capacity |
This provides approximately 21ms of buffering at 48kHz (macOS/Linux) or 23ms at 44.1kHz (Windows).
The implementations use a shared WakerState to coordinate between threads:
Producer side (audio thread):
WakerState brieflyhas_data is false, set it to true and take the wakerwaker.wake() outside the lockConsumer side (async task):
WakerStatehas_data to falsePoll::PendingThis pattern ensures the audio thread spends minimal time in locks while correctly notifying the async consumer.
Sources: crates/audio/src/speaker/macos.rs43-269 crates/audio/src/speaker/windows.rs54-162 crates/audio/src/speaker/linux.rs18-168
The DeviceMonitor provides real-time notifications when audio devices change.
Uses Core Audio property listeners crates/audio/src/device_monitor.rs70-144:
HW_DEFAULT_INPUT_DEVICE and HW_DEFAULT_OUTPUT_DEVICE propertiesUses PulseAudio subscription API crates/audio/src/device_monitor.rs146-296:
SINK, SOURCE, and SERVER eventsheadphone, headset, etc.)The monitor spawns a background thread and returns a handle that automatically stops monitoring when dropped.
Sources: crates/audio/src/device_monitor.rs1-313
All audio implementations dynamically track sample rates to handle device changes:
| Implementation | Mechanism | Storage |
|---|---|---|
| macOS | Query device.nominal_sample_rate() in callback | Arc<AtomicU32> |
| Windows | Fixed at 44100Hz | Constant |
| Linux | Query hwp.get_rate() during initialization | Arc<AtomicU32> |
| Microphone | From cpal::SupportedStreamConfig | cpal::SupportedStreamConfig |
The atomic sample rate storage allows the async consumer to query the current rate without blocking the audio thread crates/audio/src/speaker/macos.rs38-40 crates/audio/src/speaker/linux.rs187-189
macOS:
Windows:
Linux:
Sources: crates/audio/src/mic.rs69-92 crates/audio/src/speaker/macos.rs246-252 crates/audio/src/speaker/windows.rs145-149 crates/audio/src/speaker/linux.rs150-178
Refresh this wiki
This wiki was recently refreshed. Please wait 7 days to refresh again.