🤒
On the way
- beijing
-
06:34
(UTC +08:00)
Stars
A PyTorch native library for training speculative decoding models
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Train your Agent model via our easy and efficient framework
My learning notes for ML SYS.
A highly optimized LLM inference acceleration engine for Llama and its variants.
GPU operators for sparse tensor operations
An easy-to-use package for implementing SmoothQuant for LLMs



