-
sgl-learning-materials Public
Forked from sgl-project/sgl-learning-materialsMaterials for learning SGLang
MIT License UpdatedAug 23, 2025 -
accelerated-computing-hub Public
Forked from NVIDIA/accelerated-computing-hubNVIDIA curated collection of educational resources related to general purpose GPU programming.
Jupyter Notebook Other UpdatedJun 7, 2025 -
sumi-emu Public
Forked from ovsky/sumi-emuSumi | The latest, best and especially most performant Nintendo Switch emulator! Run Nintendo Switch titles on your Android, Windows, Mac and Linux devices :)
C++ GNU General Public License v3.0 UpdatedJun 4, 2025 -
mgpusim Public
Forked from sarchlab/mgpusimA highly-flexible GPU simulator for AMD GPUs.
Go MIT License UpdatedMay 30, 2025 -
-
astra-sim Public
Forked from astra-sim/astra-simASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
C++ MIT License UpdatedMay 7, 2025 -
CutlassAcademy Public
Forked from MekkCyber/CutlassAcademyA curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
1 UpdatedMay 6, 2025 -
-
GenZ-LLM-Analyzer Public
Forked from abhibambhaniya/GenZ-LLM-AnalyzerLLM Inference analyzer for different hardware platforms
Jupyter Notebook MIT License UpdatedApr 30, 2025 -
scale-sim-v2 Public
Forked from scalesim-project/SCALE-SimRepository to host and maintain scale-sim-v2 code
Python MIT License UpdatedApr 23, 2025 -
nvbandwidth Public
Forked from NVIDIA/nvbandwidthA tool for bandwidth measurements on NVIDIA GPUs.
C++ Apache License 2.0 UpdatedApr 15, 2025 -
-
fast.cu Public
Forked from pranjalssh/fast.cuFastest kernels written from scratch
Cuda MIT License UpdatedApr 3, 2025 -
timeloop Public
Forked from NVlabs/timeloopTimeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.
C++ BSD 3-Clause "New" or "Revised" License UpdatedMar 31, 2025 -
DeepGEMM Public
Forked from deepseek-ai/DeepGEMMDeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
-
-
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedMar 21, 2025 -
-
-
DocumentSASS Public
Forked from 0xD0GF00D/DocumentSASSUnofficial description of the CUDA assembly (SASS) instruction sets.
Python MIT License UpdatedMar 10, 2025 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedMar 6, 2025 -
DeepEP Public
Forked from deepseek-ai/DeepEPDeepEP: an efficient expert-parallel communication library
-
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedFeb 21, 2025 -
lectures Public
Forked from gpu-mode/lecturesMaterial for gpu-mode lectures
Jupyter Notebook Apache License 2.0 UpdatedFeb 9, 2025 -
cudahandbook Public
Forked from ArchaeaSoftware/cudahandbookSource code that accompanies The CUDA Handbook.
Cuda UpdatedFeb 5, 2025 -
-
-
-
CUDA-Learn-Notes Public
Forked from xlite-dev/LeetCUDA📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
-


