Skip to content
/ ml Public

ML implentation of LLM architecture, sharding strategies and kernel optimizations

Notifications You must be signed in to change notification settings

KookaS/ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning

Here are the modern implementations of LLM architecture, sharding strategies and kernel optimizations.

Core

Transformer Architecture

Positional Encoder

  • Positional Encoder Sinusoidal in NumPy
  • RoPE in NumPy
  • RoPE GPT-NeoX in NumPy
📊 Positional Encoding Visualizations
Sinusoidal RoPE RoPE GPT-NeoX
Sinusoidal RoPE RoPE NeoX

Sharding strategies

Scaling plots

The following are roofline analysis for different architectures. Those are non-fused operations.

  • MLP roofline analysis in NumyPy
  • Multi-Head Attention roofline analysis in NumyPy
📊 Roofline Plots
MLP Attention
MLP Roofline Attention Roofline

NumPy Tutorial

JAX Tutorial

PyTorch Notes

  • Torch distributed API.
  • don't use the old primitives, instead use in-place ones like dist.all_gather_into_tensor and dist.all_reduce_tensor that aggregate along the primary dimension.
  • custom classes for training requires torch.autograd.Function, @staticmethod and ctx.save_for_backward

About

ML implentation of LLM architecture, sharding strategies and kernel optimizations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages