open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,521 307 Updated Nov 5, 2024

gnobitab / RectifiedFlow

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 1,542 92 Updated Jul 20, 2024

showlab / Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

798 43 Updated Oct 10, 2025

sallymmx / m2clip

[AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

Python 72 4 Updated Dec 23, 2024

open-mmlab / FoleyCrafter

[IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝

Python 643 65 Updated Jul 26, 2024

showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 636 70 Updated Nov 26, 2025

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 28,500 2,884 Updated Apr 30, 2025

yangdongchao / LLM-Codec

The open source code for LLM-Codec

Python 145 10 Updated Aug 18, 2024

Stability-AI / StableStudio

Community interface for generative AI

TypeScript 9,053 931 Updated Apr 30, 2024

jaeyeonkim99 / EnCLAP

Official Implementation of EnCLAP (ICASSP 2024)

Python 94 5 Updated Jun 2, 2024

kyegomez / USM

Implementation of Google's USM speech model in Pytorch

Python 34 5 Updated Jan 18, 2026

qiuqiangkong / audioset_tagging_cnn

Python 1,661 298 Updated Jul 25, 2024

sming256 / OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 315 25 Updated Apr 29, 2025

RetroCirce / HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Python 468 75 Updated Sep 18, 2025

descriptinc / descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,708 168 Updated Jan 26, 2026

facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,879 348 Updated Jan 4, 2024

esbatmop / MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

4,115 287 Updated Jan 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auzxb

Achievements

Achievements

Block or report auzxb

Lists (1)

✨ Inspiration

Stars

Ereboas / MagiCodec

Gen-Verse / MMaDA

OpenLMLab / GAOKAO-Bench

openai / grade-school-math

Unakar / Logic-RL

deepseek-ai / FlashMLA

huggingface / open-r1

apple / ml-slowfast-llava

SakanaAI / self-adaptive-llms

deepseek-ai / DeepSeek-V3

Genesis-Embodied-AI / Genesis

zhenye234 / xcodec

gemelo-ai / vocos

gpt-omni / mini-omni