Skip to content
View auzxb's full-sized avatar
😌
I may be slow to respond.
😌
I may be slow to respond.

Block or report auzxb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.

Python 112 7 Updated Jun 4, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,560 83 Updated Nov 16, 2025

GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.

Python 700 54 Updated Jan 7, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,430 164 Updated Mar 20, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,378 967 Updated Jan 20, 2026

Fully open reproduction of DeepSeek-R1

Python 25,842 2,411 Updated Nov 24, 2025

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Python 287 16 Updated Sep 16, 2024

A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!

Python 1,183 140 Updated Jan 30, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 28,021 2,594 Updated Jan 26, 2026

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 287 21 Updated Oct 12, 2025

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 1,050 122 Updated Aug 7, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,515 308 Updated Nov 5, 2024

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 1,534 92 Updated Jul 20, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

792 43 Updated Oct 10, 2025

[AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

Python 72 4 Updated Dec 23, 2024

[IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝

Python 639 65 Updated Jul 26, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 622 68 Updated Nov 26, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 28,438 2,871 Updated Apr 30, 2025

The open source code for LLM-Codec

Python 146 10 Updated Aug 18, 2024

Community interface for generative AI

TypeScript 9,053 930 Updated Apr 30, 2024

Official Implementation of EnCLAP (ICASSP 2024)

Python 94 5 Updated Jun 2, 2024

Implementation of Google's USM speech model in Pytorch

Python 34 5 Updated Jan 18, 2026

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 313 25 Updated Apr 29, 2025

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Python 466 75 Updated Sep 18, 2025

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,703 167 Updated Jan 16, 2026

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,877 347 Updated Jan 4, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

4,103 287 Updated Jan 3, 2026
Next