GitHub - ysharma3501/LinaCodec: A highly compressive and high-quality neural audio codec for speech models.

Linacodec: Highly compressive audio tokenizer for speech models.

Linacodec is an audio tokenizer that compresses audio into just 12.5 tokens per second (171 bps) and decodes to 48khz audio.

linacodec.mp4

Key benefits

Compression: 12.5 tokens/sec (60x more compressed than DAC).
Audio Quality: 48khz output (much clearer then 16khz/24khz which is the standard).
Encoder Speed: 200x realtime.
Decoder Speed: 400x realtime(even faster with batching)
Many Tasks: Indirectly even supports voice conversion, audio super-resolution, and audio denoising!

Why is this even useful?

Audio tokenizers directly contribute to speed, quality, and capability of TTS/ASR models. LinaCodec massively improves upon previous codecs in these areas.

Inference Speed: Enables TTS models to run 800x realtime, 8x faster than MiraTTS!
Fast training: High-quality TTS models can be trained in less then 1 day.
Versatile: Works for both Text-to-Speech and Speech-to-Text unlike most other codecs.

Comparisons

Model	Total Tokens/Sec	Sample Rate
Linacodec	12.5	48khz
DAC	774	44.1khz
EnCodec	300	24khz
Xcodec2	50	16khz
Mimi	200	24khz

Lower tokens/sec means faster models and higher sample rate means more clarity.

Usage

Simple 1 line installation:

pip install git+https://github.com/ysharma3501/LinaCodec.git

Reconstruction

from IPython.display import Audio
from linacodec.codec import LinaCodec

## load model
lina_tokenizer = LinaCodec() ## will download YatharthS/LinaCodec from huggingface

## get speech tokens and global embedding
speech_tokens, global_embedding = lina_tokenizer.encode("your_audio_path.wav")

## decode them into 48khz audio
audio = lina_tokenizer.decode(speech_tokens, global_embedding)

## display audio
display(Audio(audio.cpu(), rate=48000))

Voice conversion

## Assuming you have loaded model
source_wav = "source_wav.wav" ## the content you want
reference_wav = "reference_wav.wav" ## the timbre(style) you want

## convert voice
audio = lina_tokenizer.convert_voice(source_wav, reference_wav)

## display audio
display(Audio(audio.cpu(), rate=48000))

Audio super resolution

## get speech tokens and global embedding from 24khz wav
speech_tokens, global_embedding = lina_tokenizer.encode("your_audio_path.wav")

## decode them into 48khz audio(upsamples from 24khz-->48khz)
audio = lina_tokenizer.decode(speech_tokens, global_embedding)

## display audio
display(Audio(audio.cpu(), rate=48000))

Notes

This is heavily based of kanade-tokenizer so massive thanks to them!

The key novel parts I added are:

Dual-Path Vocos Decoder: Enables high-quality 48kHz reconstruction from original 24khz vocos using only 30 hours of training data (compared to the typical hundreds of hours).
Distilled WavLM Base+: Increased encoder speed while being similar quality.
Snake based upsampling: Used custom upsampling block to upscale features based off snake activation from BigVGAN.

Next steps

Release code and model
Release article on how kanade and Lina work so well at rates of 12.5 t/s compared to others.
Possible paper on how these techniques can easily work on any codec.

Stars and Likes would be appreciated if found helpful, thank you.

Model link: https://huggingface.co/YatharthS/LinaCodec Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/linacodec		src/linacodec
.gitignore		.gitignore
README.md		README.md
linacodec.mp4		linacodec.mp4
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linacodec: Highly compressive audio tokenizer for speech models.

Key benefits

Why is this even useful?

Comparisons

Usage

Notes

Next steps

About

Uh oh!

Releases

Packages

Languages

ysharma3501/LinaCodec

Folders and files

Latest commit

History

Repository files navigation

Linacodec: Highly compressive audio tokenizer for speech models.

Key benefits

Why is this even useful?

Comparisons

Usage

Notes

Next steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages