Remove FROM_DEFAULT, unify initial population on FROM_BEST_AVAILABLE with pad control#1809
Conversation
| log: logging.Logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def should_skip_cache() -> bool: |
There was a problem hiding this comment.
Is it not possible to access this information from self.settings like other configurations?
There was a problem hiding this comment.
@hinriksnaer Thanks for the comment!
I'm following the same behavior as before
helion/helion/autotuner/base_cache.py
Line 186 in f460986
self._skip_cache is also used tho
There was a problem hiding this comment.
Might be cleaner to use the runtime settings if possible but I am not sure if there is anything stopping us from doing that.
There was a problem hiding this comment.
yes, HELION_SKIP_CACHE isn't wired through settings unlike the other env vars, maybe because it's only for debug? probably the one that the users need is HELION_FORCE_AUTOTUNE that is in settings already ? The two env vars partially overlap.
I'm not sure if we should add the env var to settings, let's wait for the maintainers to have a look!
There was a problem hiding this comment.
I just opened #1815 related to this topic
efcb51c to
2621a1d
Compare
…LE with no random padding - Remove FROM_DEFAULT enum value and all its branches - Add best_available_pad_random flag: True (full) pads to budget with random configs, False (quick) uses only default + cached - Quick effort profile now uses from_best_available + pad_random=False, degrading to just the default when no cache exists - Extract should_skip_cache() helper; make _find_similar_cached_configs respect HELION_SKIP_CACHE and skip_cache param - Fix DE to use random_flat() instead of random_population_flat() to avoid duplicate default
2621a1d to
7958293
Compare
|
Doesn't look like it is working in this repo yet. |
Summary
FROM_DEFAULTinitial population strategy since it's a strict subset ofFROM_BEST_AVAILABLEwith no cachefrom_best_availablewithbest_available_pad_random=False: reuses cached configs when available, degrades to just the default on cold cachefrom_best_available: pads with random configs to fillinitial_populationHELION_SKIP_CACHEnow consistently skips all cache reads, including the config history scan in_find_similar_cached_configsrandom_population_flat()which injected a duplicate default configBenchmark results
Benchmarks on RTX 5090 (torch 2.10.0+cu130, triton 3.6.0) show that
FROM_BEST_AVAILABLEwith a good cache matches Full Random kernel quality at 13x less tuning cost for MatMul (92s vs 1238s, 7 implementations). The removedFROM_DEFAULTwas 19x faster but produced 14-31% slower kernels. For Softmax, results are workload-dependent -- cache-seeding helps when the default config is far from optimal but adds overhead when it's already near-optimal.Full benchmark details and the original
FROM_BEST_AVAILABLEintroduction are in #1365.Test
Adjusted test_best_available accordingly