howardlau1999

🎯

Focusing

Howard Lau howardlau1999

🎯

Focusing

Previous Intern @aliyun Object Storage Service, @pingcap and @Tencent WXG

621 followers · 451 following

Sun Yat-sen University
Hangzhou
12:18 (UTC +08:00)
https://blog.howardlau.me
@howardlau1999
https://www.zhihu.com/people/liu-hao-hua-32
in/liuhaohua

Achievements

x3 x2

Achievements

x3 x2

Starred repositories

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 524 29 Updated Jan 2, 2026

sgl-project / mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,721 266 Updated Dec 30, 2025

fractalbits-labs / fractalbits-main

The ultimate S3 compatible high performance object storage in the AI era.

Rust 177 7 Updated Dec 29, 2025

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,484 298 Updated Dec 19, 2025

deepseek-ai / LPLB

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 481 27 Updated Nov 19, 2025

nex-agi / NexVenusCL

Nex Venus Communication Library

C++ 68 6 Updated Nov 17, 2025

RDMA-Rust / sideway

A better wrapper for using RDMA programming APIs in Rust flavor

Rust 65 5 Updated Dec 12, 2025

radixark / miles

Python 657 68 Updated Dec 31, 2025

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 32,291 6,654 Updated Dec 31, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,514 1,991 Updated Jan 2, 2026

google / perfetto

Production-grade client-side tracing, profiling, and analysis for complex software systems.

C++ 5,333 653 Updated Jan 2, 2026

google-coral / coralnpu

A machine learning accelerator core designed for energy-efficient AI at the edge.

Emacs Lisp 1,976 216 Updated Dec 19, 2025

sgl-project / sgl-learning-materials

Materials for learning SGLang

709 51 Updated Dec 15, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 3,086 381 Updated Jan 2, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,028 221 Updated Dec 9, 2025

XTXMarkets / ternfs

An exabyte-scale, multi-region distributed file system

C++ 1,253 79 Updated Dec 22, 2025

CurvineIO / curvine

High-performance distributed multi-tier cache system. Built in Rust.

Rust 557 66 Updated Dec 31, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,216 84 Updated Aug 28, 2025

ghuntley / how-to-build-a-coding-agent

A workshop that teaches you how to build your own coding agent. Similar to Roo code, Cline, Amp, Cursor, Windsurf or OpenCode.

Go 4,629 499 Updated Nov 25, 2025

KuangjuX / NVSHMEM-Tutorial

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 150 14 Updated Sep 18, 2025

Barre / ZeroFS

ZeroFS - The Filesystem That Makes S3 your Primary Storage. ZeroFS is 9P/NFS/NBD on top of S3. Initially built for www.merklemap.com

Rust 1,444 50 Updated Dec 31, 2025

skarnet / s6

The s6 supervision suite.

C 879 43 Updated Dec 24, 2025

taskcluster / taskcluster

CI at Scale

JavaScript 421 262 Updated Jan 1, 2026

LGA1150 / nf_deaf

C 210 76 Updated Sep 15, 2025

net4people / bbs

Forum for discussing Internet censorship circumvention

Python 4,801 108 Updated Sep 30, 2025

tiny-tpu-v2 / tiny-tpu

A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1

SystemVerilog 1,102 86 Updated Aug 21, 2025

NVIDIA / tilus

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 436 15 Updated Dec 16, 2025

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & TIS & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,691 842 Updated Dec 18, 2025

oneclickvirt / ecs

VPS融合怪服务器测评项目 GO版本 VPS Fusion Monster Server Test GO Version 尽量成为最全能的服务器测评项目，使用 Go 实现，无需任何环境依赖。 Aiming to be the most comprehensive server testing project, implemented in Go with zero environment d…

Howard Lau howardlau1999

Starred repositories

mahchine-leaning

Natural language processing