Audience
ML infrastructure engineers and researchers seeking a tool offering a deployment framework for GPUs or heterogeneous devices
About Luminal
Luminal is a machine-learning framework built for speed, simplicity, and composability, focusing on static graphs and compiler-based optimization to deliver high performance even for complex neural networks. It compiles models into minimal “primops” (only 12 primitive operations) and then applies compiler passes to replace those with device-specific optimized kernels, enabling efficient execution on GPU or other backends. It supports modules (building blocks of networks with a standard forward API) and the GraphTensor interface (typed tensors and graphs at compile time) for model definition and execution. Luminal’s core remains intentionally small and hackable, with extensibility via external compilers for datatypes, devices, training, quantization, and more. Quick-start guidance shows how to clone the repo, build a “Hello World” example, or run a larger model like LLaMA 3 using GPU features.