Showing 16 open source projects for "mpeg audio decoder"

View related business solutions
  • Zenflow- The AI Workflow Engine for Software Devs Icon
    Zenflow- The AI Workflow Engine for Software Devs

    Parallel agents. Multi-agent orchestration. Specs that turn into shipped code. Zenflow automates planning, coding, testing, and verification.

    Zenflow is the AI workflow engine built for real teams. Parallel agents plan, code, test, and verify in one workflow. With spec-driven development and deep context, Zenflow turns requirements into production-ready output so teams ship faster and stay in flow.
    Try free now
  • Auth0 for AI Agents now in GA Icon
    Auth0 for AI Agents now in GA

    Ready to implement AI with confidence (without sacrificing security)?

    Connect your AI agents to apps and data more securely, give users control over the actions AI agents can perform and the data they can access, and enable human confirmation for critical agent actions.
    Start building today
  • 1
    TorchAudio

    TorchAudio

    Data manipulation and transformation for audio signal processing

    The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    ...These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
    Downloads: 63 This Week
    Last Update:
    See Project
  • 3
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    ...It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice cloning — meaning it can mimic a target speaker’s voice from a short reference sample — making it versatile for multi-voice uses. Compared to many open-source TTS tools, IndexTTS emphasizes efficiency and controllability: it offers faster inference, simpler training pipelines, and controllable speech parameters (like duration, pitch, and prosody), which is critical for production use.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    Multimodal

    Multimodal

    TorchMultimodal is a PyTorch library

    ...The library provides modular building blocks such as encoders, fusion modules, loss functions, and transformations that support combining modalities (vision, text, audio, etc.) in unified architectures. It includes a collection of ready model classes—like ALBEF, CLIP, BLIP-2, COCA, FLAVA, MDETR, and Omnivore—that serve as reference implementations you can adopt or adapt. The design emphasizes composability: you can mix and match encoder, fusion, and decoder components rather than starting from monolithic models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Grafana: The open and composable observability platform Icon
    Grafana: The open and composable observability platform

    Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

    Grafana is the open source analytics & monitoring solution for every database.
    Learn More
  • 5
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    NÜWA - Pytorch

    NÜWA - Pytorch

    Implementation of NÜWA, attention network for text to video synthesis

    Implementation of NÜWA, state of the art attention network for text-to-video synthesis, in Pytorch. It also contains an extension into video and audio generation, using a dual decoder approach. It seems as though a diffusion-based method has taken the new throne for SOTA. However, I will continue on with NUWA, extending it to use multi-headed codes + hierarchical causal transformer. I think that direction is untapped for improving on this line of work. In the paper, they also present a way to condition the video generation based on segmentation mask(s). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    EnCodec

    EnCodec

    State-of-the-art deep learning based audio codec

    Encodec is a neural audio codec developed by Meta for high-fidelity, low-bitrate audio compression using end-to-end deep learning. Unlike traditional codecs (like MP3 or Opus), Encodec uses a learned quantizer and decoder to reconstruct complex waveforms with remarkable accuracy at bitrates as low as 1.5 kbps. It employs a convolutional encoder–decoder architecture trained with perceptual loss functions that optimize for human auditory quality rather than raw waveform distance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Denoiser

    Denoiser

    Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)

    Denoiser is a real-time speech enhancement model operating directly on raw waveforms, designed to clean noisy audio while running efficiently on CPU. It uses a causal encoder-decoder architecture with skip connections, optimized with losses defined both in the time domain and frequency domain to better suppress noise while preserving speech. Unlike models that operate on spectrograms alone, this design enables lower latency and coherent waveform output.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Fast Forward

    Fast Forward

    Free video editor to convert, cut, trim, stream select and encode

    Fast Forward is free video editing software that allows you to convert, cut, trim, remove streams, encode and customise a variety of parameters such as frame rate, bitrate, frame size and output file size. Fast Forward can encode H264, MPEG2 or Xvid video, as well as Dolby Digital AC3, Dolby Digital Plus eAC3+, AAC and Vorbis audio. It is very useful for removing ads from recorded TV programs, or combining the .VOB files from a DVD file system. Thanks to FFmpeg, these processes are extremely streamlined and fast. To speed up your conversions, use the "Straight Copy" codec options (only useable under specific circumstances). Accepted formats include: *.3g2 *.3gp *.asf *.avi *.drc *.flv *.gif *.gifv *.m2ts *.m2v *.m4p *.m4v *.mkv *.mng *.mov *.mp4 *.mpeg *.mpg *.mxf *.nsv *.ogg *.ogv *.rm *.rmvb *.roq *.svi *.vob *.webm *.wmv *.yuv
    Downloads: 3 This Week
    Last Update:
    See Project
  • Cloud-based help desk software with ServoDesk Icon
    Cloud-based help desk software with ServoDesk

    Full access to Enterprise features. No credit card required.

    What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.
    Try ServoDesk for free
  • 10

    Distant Speech Recognition

    Beamforming and Speech Recognition Toolkit

    BTK contains C++ and Python libraries that implement speech processing and microphone array techniques such as speech feature extraction, speech enhancement, speaker tracking, beamforming, dereverberation and echo cancellation algorithms. The Millennium ASR provides C++ and python libraries for automatic speech recognition. The Millennium ASR implements a weighted finite state transducer (WFST) decoder, training and adaptation methods. These toolkits are meant for facilitating research and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    EnKoDeur-Mixeur
    EnKoDeur-Mixeur (EKD) is an open source software which makes videos, pictures and audio post-production. It can be also used to convert videos in many formats. It is written in python and use the PyQt4 bindings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    seqtonedecoder

    seqtonedecoder

    A sequential tone decoder geared towards the type of paging used by Fire/EMS. It is written in Python for portability and has been tested on Windows, Linux and Mac. It provides email notification of pages which includes the audio.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    toneDetect

    toneDetect

    A sequential tone decoder geared towards the type of paging used by Fire/EMS. It is written in Python for portability and has been tested on Windows, Linux and Mac. It provides email notification of pages which includes the audio.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    PyKaraoke is a cross-platform karaoke player. It currently supports CDG (MP3+G, OGG+G, WAV+G), MIDI (.KAR, .MID) and MPEG formats.
    Downloads: 43 This Week
    Last Update:
    See Project
  • 15
    Tool for encoding tags of mp3 files in the cyrillic charsets (cp1251, koi8-r) to unicode. Solution of problem with displaying tags in thе different charsets in a playlist. If you will find bug, mistake in the this text - mail me on hlamer@tut.by
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    This is a program for in-car use. Its designed in python using pygame. The main goal is to have an easy to navigate program while having it look good. The features will include mp3/divx/mpeg/dvd and other media files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next