[Autotuner] Add FROM_BEST_AVAILABLE initial population strategy by fulvius31 · Pull Request #1365 · pytorch/helion

fulvius31 · 2026-01-30T17:39:17Z

Summary

Adds FROM_BEST_AVAILABLE initial population strategy that bootstraps autotuning from previously cached best configs, probably addressing the request for "bootstrapping from a known good config" in #1274.

Target use case: Developers iterating on kernel code who want faster autotuning without trying to not sacrifice kernel performance and without falling back to fixed, pre-defined configs.

How it works

Differential Evolution: Starts with default config plus up to 20 matching cached configs from prior runs, fills remainder with random configs to reach population size
Pattern Search: Uses default config plus cached configs directly as initial population (no random fill)

Cache matching uses hardware name + normalized specialization key (tensor dtype, device, shape, strides), filtering out code object references so configs transfer across kernel edits.

Benchmark results using 1 cached best_config and default PatternSearch

The kernels used are the one from ~/examples.

Hardware : Nvidia RTX 5090
torch Version: 2.10.0+cu130
helion Version: 0.2.11.dev7+ga7e94e60c
triton Version: 3.6.0+git9844da95

MatMul Benchmark

Strategy	Autotune Time	Codegen Calls
Full Random	1238s	5462
FROM_DEFAULT	65s	671
FROM_BEST_AVAILABLE (FROM_DEFAULT cache)	95s	1002
FROM_BEST_AVAILABLE (FULL cache)	92s	855

Implementation	Full Random	FROM_DEFAULT	BEST_AVAIL (default)	BEST_AVAIL (full)
helion (1)	0.0183ms (0.95x)	0.024ms (0.73x)	0.0186ms (0.91x)	0.0183ms (0.94x)
helion (2)	0.019ms (1.13x)	0.0246ms (0.88x)	0.0204ms (1.06x)	0.019ms (1.13x)
helion (3)	0.0184ms (1.31x)	0.0225ms (1.09x)	0.0225ms (1.11x)	0.0184ms (1.35x)
helion (4)	0.0184ms (1.27x)	0.021ms (1.24x)	0.0214ms (1.21x)	0.0184ms (1.27x)
helion_matmul_autograd	0.0184ms (0.95x)	0.0232ms (0.75x)	0.0187ms (0.90x)	0.0183ms (0.94x)
helion_addmm_autograd	0.0189ms (1.11x)	0.0238ms (0.96x)	0.0211ms (0.99x)	0.0189ms (1.11x)
helion_addmm_autograd_scaled	0.019ms (1.10x)	0.024ms (0.96x)	0.0212ms (0.99x)	0.019ms (1.10x)

Result: FROM_BEST_AVAILABLE with full cache matches Full Random kernel times across all implementations at 13x less tuning cost. FROM_DEFAULT is 19x faster but produces 14-31% slower kernels.

Softmax Benchmark

Strategy	Autotune Time	Codegen Calls
Full Random	808s	2627
FROM_DEFAULT	32s	220
FROM_BEST_AVAILABLE (FROM_DEFAULT cache)	42s	305
FROM_BEST_AVAILABLE (FULL cache)	53s	423

Implementation	Full Random	FROM_DEFAULT	BEST_AVAIL (default)	BEST_AVAIL (full)
Helion Simple	0.02ms (2.34x)	0.0164ms (2.50x)	0.0169ms (2.40x)	0.0191ms (2.33x)
Helion Two Pass	0.0199ms (2.35x)	0.0225ms (1.83x)	0.0168ms (2.43x)	0.0169ms (2.63x)
Helion (Aggregate)	0.0231ms (2.04x)	0.0225ms (1.86x)	0.019ms (2.16x)	0.0231ms (2.07x)

Result: Mixed outcome—FROM_DEFAULT wins for Helion Simple (best kernel at lowest cost), but FROM_BEST_AVAILABLE (default cache) wins for Helion Two Pass.

Key takeaways

FROM_BEST_AVAILABLE with good cache matches Full Random quality at ~13x less cost (MatMul: all 7 implementations match or beat)
FROM_DEFAULT is fastest but can miss optimal configs when default is far from best (MatMul: 14-31% slower kernels)
Cache quality matters: Full effort cache outperforms default cache in MatMul; results vary for Softmax
Workload-dependent: When default is already near-optimal (Softmax Simple), cache-seeding adds overhead without benefit

When to use

FROM_BEST_AVAILABLE: When iterating on kernel code and you have prior tuning runs for similar shapes/hardware
FROM_DEFAULT: Quick iteration when no relevant cache exists or default is known to be good and - in any case - if effort is set to quick, FROM_BEST_AVAILABLE will use FROM_DEFAULT as a config anyway
Full Random: Offline profiling when compile time is not a concern

Usage

HELION_AUTOTUNE_EFFORT=quick HELION_AUTOTUNER_INITIAL_POPULATION=from_best_available python example/{matmul.py,softmax.py}

Configuration

Env var	Default	Description
`HELION_BEST_AVAILABLE_MAX_CONFIGS`	20	Max cached configs to seed
`HELION_BEST_AVAILABLE_MAX_CACHE_SCAN`	500	Max cache files to scan

fulvius31 · 2026-02-03T20:23:43Z

I think it's pretty ready. Could you take a look when you have a moment? @jansel @v0i0 @oulgen

Moved imports to top Made MAX_BEST_AVAILABLE_CONFIGS configurable Added cache scan limit configurable Updated tests accordingly

Rename test_warm_start to reflect from_best_available

…igSpec attributes instead of hardcoded list

other improvements

fulvius31 · 2026-02-09T20:31:20Z

@jansel I don't think the failed test is related to this PR

jansel

@fulvius31 can you rebase and resolve the merge conflict? That might also fix the test.

fulvius31 · 2026-02-26T17:37:38Z

Test failures related?

@jansel i don't think so. I think there are failing since #1542

fulvius31 · 2026-02-28T22:42:48Z

@jansel I don't think the tests fail were related to this PR.

jansel · 2026-03-03T06:59:50Z

@fulvius31 can you rebase and fix merge conflicts?

fulvius31 · 2026-03-03T13:10:32Z

@fulvius31 can you rebase and fix merge conflicts?

@jansel done

…rch#1365)

fulvius31 marked this pull request as draft January 30, 2026 17:39

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 30, 2026

fulvius31 mentioned this pull request Jan 30, 2026

Tips for faster but still performant autotuning? #1274

Open

fulvius31 force-pushed the from-warm-start-test branch 4 times, most recently from 86dc446 to b4d04a7 Compare February 3, 2026 19:51

fulvius31 changed the title ~~[WIP][Autotuner] Add FROM_BEST_AVAILABLE initial population strategy~~ [Autotuner] Add FROM_BEST_AVAILABLE initial population strategy Feb 3, 2026

fulvius31 marked this pull request as ready for review February 3, 2026 20:21

fulvius31 force-pushed the from-warm-start-test branch 5 times, most recently from 7df1c0a to ff8d8c8 Compare February 6, 2026 16:06

fulvius31 added 12 commits February 8, 2026 07:27

from_best_available initial commit

c1d5178

Removed unnecessary shrink_config call

d9af6ec

Moved imports to top Made MAX_BEST_AVAILABLE_CONFIGS configurable Added cache scan limit configurable Updated tests accordingly

Optimizations

7348f0c

Rename test_warm_start to reflect from_best_available

STRUCTURAL_LIST_FIELDS moved

6a07392

_build_key_index_mapping now derives order programmatically from Conf…

31fd068

…igSpec attributes instead of hardcoded list

Improvements

d4d5d82

kernel hash was not needed so removed

ca069e5

other improvements

take in consideration of non-nvidia devices

37a3355

avoid too verbose logging if not debug

e0023d3

rename to best_available

babc82d

fix test

705f74f

fix test

c10607d

fulvius31 force-pushed the from-warm-start-test branch from 2c448da to c10607d Compare February 8, 2026 13:27

jansel requested changes Feb 10, 2026

View reviewed changes

fulvius31 added 3 commits February 26, 2026 10:52

address feedback

22ea7d4

Merge remote-tracking branch 'upstream' into from-warm-start-test

53b4f6c

less verbose docstring

e96261b

fulvius31 requested a review from jansel February 26, 2026 17:37

fulvius31 added 5 commits February 26, 2026 16:58

Merge branch 'main' into from-warm-start-test

5a44e03

Merge branch 'main' into from-warm-start-test

fb32e44

fix lint

95e91fb

Merge branch 'main' into from-warm-start-test

cb5697c

Merge branch 'main' into from-warm-start-test

8095d82

jansel requested changes Feb 28, 2026

View reviewed changes

Comment thread helion/autotuner/config_spec.py Outdated

Comment thread helion/autotuner/local_cache.py

fulvius31 added 2 commits February 28, 2026 16:25

avoid isinstance

239f967

Merge remote-tracking branch 'upstream/main' into from-warm-start-test

ce72127

fulvius31 requested a review from jansel February 28, 2026 22:41

fulvius31 added 2 commits March 1, 2026 00:00

get hardware on local_cache using _compat if cuda is available

e21399a

get hardware unified

be08efe

jansel requested changes Mar 1, 2026

View reviewed changes

Comment thread helion/_compat.py Outdated

rename dev to device and use it even for cuda

9d84295

fulvius31 requested a review from jansel March 1, 2026 23:24

Merge branch 'main' into from-warm-start-test

d89d62b

jansel approved these changes Mar 3, 2026

View reviewed changes

fix conflict and update test

549bfef

fulvius31 force-pushed the from-warm-start-test branch from 653df7f to 549bfef Compare March 3, 2026 13:09

jansel merged commit 5b02214 into pytorch:main Mar 3, 2026
17 of 19 checks passed

nullplay pushed a commit to nullplay/helion that referenced this pull request Mar 17, 2026

[Autotuner] Add FROM_BEST_AVAILABLE initial population strategy (pyto…

8618a69

…rch#1365)

umechand-amd pushed a commit to umechand-amd/helion that referenced this pull request Mar 23, 2026

[Autotuner] Add FROM_BEST_AVAILABLE initial population strategy (pyto…

c1a6e03

…rch#1365)

fulvius31 mentioned this pull request Mar 24, 2026

Remove FROM_DEFAULT, unify initial population on FROM_BEST_AVAILABLE with pad control #1809

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Autotuner] Add FROM_BEST_AVAILABLE initial population strategy#1365

[Autotuner] Add FROM_BEST_AVAILABLE initial population strategy#1365
jansel merged 104 commits into
pytorch:mainfrom
fulvius31:from-warm-start-test

fulvius31 commented Jan 30, 2026 •

edited

Loading

Uh oh!

fulvius31 commented Feb 3, 2026

Uh oh!

fulvius31 commented Feb 9, 2026

Uh oh!

jansel left a comment

Uh oh!

fulvius31 commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

fulvius31 commented Feb 28, 2026

Uh oh!

Uh oh!

jansel commented Mar 3, 2026

Uh oh!

fulvius31 commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fulvius31 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Benchmark results using 1 cached best_config and default PatternSearch

MatMul Benchmark

Softmax Benchmark

Key takeaways

When to use

Usage

Configuration

Uh oh!

fulvius31 commented Feb 3, 2026

Uh oh!

fulvius31 commented Feb 9, 2026

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

fulvius31 commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

fulvius31 commented Feb 28, 2026

Uh oh!

Uh oh!

jansel commented Mar 3, 2026

Uh oh!

fulvius31 commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fulvius31 commented Jan 30, 2026 •

edited

Loading