Skip to content
View auzxb's full-sized avatar
😌
I may be slow to respond.
😌
I may be slow to respond.

Block or report auzxb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.

Python 112 7 Updated Jun 4, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,570 83 Updated Nov 16, 2025

GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.

Python 703 54 Updated Jan 7, 2025

Reproduce R1 Zero on Logic Puzzle

Python 2,431 164 Updated Mar 20, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,449 981 Updated Jan 20, 2026

Fully open reproduction of DeepSeek-R1

Python 25,858 2,410 Updated Nov 24, 2025

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Python 286 16 Updated Sep 16, 2024

A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!

Python 1,186 140 Updated Jan 30, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 28,094 2,604 Updated Feb 6, 2026

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 289 21 Updated Oct 12, 2025

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 1,062 122 Updated Aug 7, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,521 307 Updated Nov 5, 2024

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 1,542 92 Updated Jul 20, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

798 43 Updated Oct 10, 2025

[AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

Python 72 4 Updated Dec 23, 2024

[IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝

Python 643 65 Updated Jul 26, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 636 70 Updated Nov 26, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 28,500 2,884 Updated Apr 30, 2025

The open source code for LLM-Codec

Python 145 10 Updated Aug 18, 2024

Community interface for generative AI

TypeScript 9,053 931 Updated Apr 30, 2024

Official Implementation of EnCLAP (ICASSP 2024)

Python 94 5 Updated Jun 2, 2024

Implementation of Google's USM speech model in Pytorch

Python 34 5 Updated Jan 18, 2026

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 315 25 Updated Apr 29, 2025

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Python 468 75 Updated Sep 18, 2025

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,708 168 Updated Jan 26, 2026

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,879 348 Updated Jan 4, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

4,115 287 Updated Jan 31, 2026
Next