Skip to content

RoPE kernel#2415

Merged
ethche merged 1 commit into
mainfrom
helion-rope-pretuned
May 13, 2026
Merged

RoPE kernel#2415
ethche merged 1 commit into
mainfrom
helion-rope-pretuned

Conversation

@ethche
Copy link
Copy Markdown
Contributor

@ethche ethche commented May 13, 2026

Adds Helion RoPE kernel along with tuned configs for tritonbench stock shapes. The kernel is structured as a fused Q/K RoPE pass over (batch, seq) tiles. Each program:

  • loads cos/sin once for the tile
  • uses hl.split / hl.join to operate on the two rotary halves
  • applies RoPE to both Q and K in the same kernel
  • writes the rotated outputs directly, avoiding separate per-tensor launches
  • pre-tuned configs saved in pretuned_kernels/rope

This PR makes some additional changes to test_pretuned_kernels.py. The main change is to re-structure the performance test to enable new hardware other than B200.

Summary results:

device  type  Liger avg speedup  Inductor avg speedup  Helion avg speedup  Helion accuracy
------  ----  -----------------  --------------------  ------------------  ---------------
H100    fwd               2.73x                 4.61x               5.60x                1
H100    bwd               2.37x                 4.91x               5.63x                1

All results:

## Forward, H100

(H, T)          baseline ms       Liger ms  Liger x   Inductor ms  Inductor x   Helion ms  Helion x  Helion acc
-------------  -----------  -------------  -------  ------------  ----------  ----------  --------  ----------
(8192, 1024)      0.230880       0.085056    2.71x      0.046240       4.99x    0.040512     5.70x           1
(8192, 2048)      0.450176       0.169280    2.66x      0.085408       5.27x    0.084864     5.30x           1
(8192, 4096)      0.899168       0.342080    2.63x      0.183712       4.89x    0.166528     5.40x           1
(8192, 8192)      1.778464       0.674784    2.64x      0.485568       3.66x    0.323904     5.49x           1
(8192, 16384)     3.898144       1.331168    2.93x      1.180832       3.30x    0.635168     6.14x           1
(512, 2048)       0.051968       0.018784    2.77x      0.010272       5.06x    0.008320     6.25x           1
(2048, 2048)      0.117920       0.041536    2.84x      0.026592       4.43x    0.021216     5.56x           1
average           0.984600       0.353972    2.73x      0.263068       4.61x    0.170636     5.64x           1

## Backward, H100

(H, T)          baseline ms       Liger ms  Liger x   Inductor ms  Inductor x   Helion ms  Helion x  Helion acc
-------------  -----------  -------------  -------  ------------  ----------  ----------  --------  ----------
(8192, 1024)      0.302944       0.129216    2.34x      0.073280       4.13x    0.072768     4.16x           1
(8192, 2048)      0.625056       0.266400    2.35x      0.085824       7.28x    0.085248     7.33x           1
(8192, 4096)      1.245312       0.526784    2.36x      0.164448       7.57x    0.166496     7.48x           1
(8192, 8192)      2.480416       1.029600    2.41x      0.444544       5.58x    0.323904     7.66x           1
(8192, 16384)     5.275840       2.034432    2.59x      1.181600       4.47x    0.634048     8.32x           1
(512, 2048)       0.151616       0.060800    2.49x      0.069152       2.19x    0.095552     1.59x           1
(2048, 2048)      0.256672       0.109024    2.35x      0.105120       2.44x    0.053088     4.83x           1
average           1.370692       0.552772    2.41x      0.276224       5.12x    0.189544     6.09x           1

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 13, 2026
@ethche ethche force-pushed the helion-rope-pretuned branch from 4e6a6cc to 376cce3 Compare May 13, 2026 03:36
@ethche ethche force-pushed the helion-rope-pretuned branch from 376cce3 to ad4dfd7 Compare May 13, 2026 03:57
@ethche ethche merged commit 99d62e0 into main May 13, 2026
23 checks passed
@choijon5
Copy link
Copy Markdown
Contributor

@ethche could you pretune this for B200 also?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants