Showing 9 open source projects for "mpeg audio decoder"

View related business solutions
  • Raima Database Manager is an embedded in-memory database for IoT and Edge devices Icon
    Raima Database Manager is an embedded in-memory database for IoT and Edge devices

    Built by Developers, for Developers

    Raima Database Manager (RDM) is an embedded relational database optimized to run on resource-constrained IoT edge devices that require real-time response. RDM enables intelligent decisions to be made at the device level within microseconds.
    Learn More
  • Software for Complex Engineering, Assembly, Testing, and Operations Icon
    Software for Complex Engineering, Assembly, Testing, and Operations

    Built for Space; Powerful Enough for Any Operation

    Epsilon3 is an AI-powered procedure and resource management tool designed for teams building, testing, and operating advanced products and systems.
    Learn More
  • 1
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    ...These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
    Downloads: 51 This Week
    Last Update:
    See Project
  • 2
    TorchAudio

    TorchAudio

    Data manipulation and transformation for audio signal processing

    The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    ...It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice cloning — meaning it can mimic a target speaker’s voice from a short reference sample — making it versatile for multi-voice uses. Compared to many open-source TTS tools, IndexTTS emphasizes efficiency and controllability: it offers faster inference, simpler training pipelines, and controllable speech parameters (like duration, pitch, and prosody), which is critical for production use.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Striven | All In One Business Management Software Icon
    Striven | All In One Business Management Software

    Striven is an all-in-one business management software suite with everything your organization needs for success.

    Striven is the all-in-one business management software that lowers your costs, improves your operations, and makes work easier. Make your company’s data coherent, connected, and relevant.
    Learn More
  • 5
    NÜWA - Pytorch

    NÜWA - Pytorch

    Implementation of NÜWA, attention network for text to video synthesis

    Implementation of NÜWA, state of the art attention network for text-to-video synthesis, in Pytorch. It also contains an extension into video and audio generation, using a dual decoder approach. It seems as though a diffusion-based method has taken the new throne for SOTA. However, I will continue on with NUWA, extending it to use multi-headed codes + hierarchical causal transformer. I think that direction is untapped for improving on this line of work. In the paper, they also present a way to condition the video generation based on segmentation mask(s). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Denoiser

    Denoiser

    Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)

    Denoiser is a real-time speech enhancement model operating directly on raw waveforms, designed to clean noisy audio while running efficiently on CPU. It uses a causal encoder-decoder architecture with skip connections, optimized with losses defined both in the time domain and frequency domain to better suppress noise while preserving speech. Unlike models that operate on spectrograms alone, this design enables lower latency and coherent waveform output.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    wav2letter++

    wav2letter++

    Facebook AI research's automatic speech recognition toolkit

    First, install Flashlight (using the 0.3 branch is required) with the ASR application. This repository includes recipes to reproduce the following research papers as well as pre-trained models. All results reproduction must use Flashlight <= 0.3.2 for exact reproducibility. At least one of LZMA, BZip2, or Z is required for LM compression with KenLM. It is highly recommended to build KenLM with position-independent code (-fPIC) enabled, to enable python compatibility. After installing, run...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    JuliusModels

    JuliusModels

    Open source speech models for Julius in English and other languages.

    Open source speech models for Julius speech decoder. Its aim is to give access a wider community of speech recognition enthusiasts to quality models, which they can use in their own projects on different OS platforms (Unix, Windows, etc...) All of the models are based on HTK modelling software and data sets available freely on the Internet.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 9

    Distant Speech Recognition

    Beamforming and Speech Recognition Toolkit

    BTK contains C++ and Python libraries that implement speech processing and microphone array techniques such as speech feature extraction, speech enhancement, speaker tracking, beamforming, dereverberation and echo cancellation algorithms. The Millennium ASR provides C++ and python libraries for automatic speech recognition. The Millennium ASR implements a weighted finite state transducer (WFST) decoder, training and adaptation methods. These toolkits are meant for facilitating research and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Retail Point of Sale and Inventory Management Software Icon
    Retail Point of Sale and Inventory Management Software

    RetailEdge retail point of sale software is designed for single and multi-store businesses

    Caters to small and mid-sized retailers with multiple locations who require a reliable point of sale and inventory management software solution.
    Learn More
  • Previous
  • You're on page 1
  • Next