Skip to content
View howardlau1999's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report howardlau1999

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Helpful kernel tutorials and examples for tile-based GPU programming

Python 524 29 Updated Jan 2, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,721 266 Updated Dec 30, 2025

The ultimate S3 compatible high performance object storage in the AI era.

Rust 177 7 Updated Dec 29, 2025

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,484 298 Updated Dec 19, 2025

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 481 27 Updated Nov 19, 2025

Nex Venus Communication Library

C++ 68 6 Updated Nov 17, 2025

A better wrapper for using RDMA programming APIs in Rust flavor

Rust 65 5 Updated Dec 12, 2025
Python 657 68 Updated Dec 31, 2025

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 32,291 6,654 Updated Dec 31, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,514 1,991 Updated Jan 2, 2026

Production-grade client-side tracing, profiling, and analysis for complex software systems.

C++ 5,333 653 Updated Jan 2, 2026

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 1,976 216 Updated Dec 19, 2025

Materials for learning SGLang

709 51 Updated Dec 15, 2025

slime is an LLM post-training framework for RL Scaling.

Python 3,086 381 Updated Jan 2, 2026

Tile primitives for speedy kernels

Cuda 3,028 221 Updated Dec 9, 2025

An exabyte-scale, multi-region distributed file system

C++ 1,253 79 Updated Dec 22, 2025

High-performance distributed multi-tier cache system. Built in Rust.

Rust 557 66 Updated Dec 31, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,216 84 Updated Aug 28, 2025

A workshop that teaches you how to build your own coding agent. Similar to Roo code, Cline, Amp, Cursor, Windsurf or OpenCode.

Go 4,629 499 Updated Nov 25, 2025

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 150 14 Updated Sep 18, 2025

ZeroFS - The Filesystem That Makes S3 your Primary Storage. ZeroFS is 9P/NFS/NBD on top of S3. Initially built for www.merklemap.com

Rust 1,444 50 Updated Dec 31, 2025

The s6 supervision suite.

C 879 43 Updated Dec 24, 2025

CI at Scale

JavaScript 421 262 Updated Jan 1, 2026
C 210 76 Updated Sep 15, 2025

Forum for discussing Internet censorship circumvention

Python 4,801 108 Updated Sep 30, 2025

A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1

SystemVerilog 1,102 86 Updated Aug 21, 2025

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 436 15 Updated Dec 16, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & TIS & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,691 842 Updated Dec 18, 2025

VPS融合怪服务器测评项目 GO版本 VPS Fusion Monster Server Test GO Version 尽量成为最全能的服务器测评项目,使用 Go 实现,无需任何环境依赖。 Aiming to be the most comprehensive server testing project, implemented in Go with zero environment d…

Go 1,585 102 Updated Jan 2, 2026
Next