-
Sun Yat-sen University
- Hangzhou
-
12:18
(UTC +08:00) - https://blog.howardlau.me
- @howardlau1999
- https://www.zhihu.com/people/liu-hao-hua-32
- in/liuhaohua
- All languages
- ANTLR
- ActionScript
- Assembly
- Ballerina
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CodeQL
- CoffeeScript
- Cuda
- Cython
- D
- Dart
- Dockerfile
- Elixir
- Emacs Lisp
- Erlang
- GLSL
- Go
- Groovy
- HCL
- HTML
- Hack
- Handlebars
- Haskell
- Haxe
- HolyC
- JSON
- Java
- JavaScript
- Jsonnet
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MLIR
- Makefile
- Markdown
- Meson
- Mojo
- OCaml
- Objective-C
- PHP
- PLSQL
- PLpgSQL
- Perl
- PowerShell
- Prolog
- Python
- R
- ReScript
- RenderScript
- Rocq Prover
- Roff
- Ruby
- Rust
- SCSS
- SWIG
- Scala
- Shell
- Starlark
- Svelte
- Swift
- SystemVerilog
- TLA
- Tcl
- TeX
- TypeScript
- Typst
- V
- VHDL
- Verilog
- Vim Script
- Vue
- WebAssembly
- XSLT
- YAML
- Zig
Starred repositories
Helpful kernel tutorials and examples for tile-based GPU programming
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
The ultimate S3 compatible high performance object storage in the AI era.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
An early research stage expert-parallel load balancer for MoE models based on linear programming.
A better wrapper for using RDMA programming APIs in Rust flavor
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Production-grade client-side tracing, profiling, and analysis for complex software systems.
A machine learning accelerator core designed for energy-efficient AI at the edge.
slime is an LLM post-training framework for RL Scaling.
An exabyte-scale, multi-region distributed file system
High-performance distributed multi-tier cache system. Built in Rust.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
A workshop that teaches you how to build your own coding agent. Similar to Roo code, Cline, Amp, Cursor, Windsurf or OpenCode.
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
ZeroFS - The Filesystem That Makes S3 your Primary Storage. ZeroFS is 9P/NFS/NBD on top of S3. Initially built for www.merklemap.com
Forum for discussing Internet censorship circumvention
A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & TIS & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
VPS融合怪服务器测评项目 GO版本 VPS Fusion Monster Server Test GO Version 尽量成为最全能的服务器测评项目,使用 Go 实现,无需任何环境依赖。 Aiming to be the most comprehensive server testing project, implemented in Go with zero environment d…



