Skip to content

How to use the new autotune intoruced in https://github.com/pytorch/torchdynamo/pull/1338 #2023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ayoub-louati opened this issue Jan 9, 2023 · 3 comments

Comments

@ayoub-louati
Copy link

Hello,
Please how can we use the new decorator related to the caching_autotune introduced in #1338 with a new defined kernel. Here is an example of the kernel’s signature:

def kernel_fma(
    C,  # Pointers to matrices
    ACT_INPUTS,
    A,
    B,
    bias,
    # Matrix dimensions
    M,
    N,
    K,
    CACHE_KEY_M,
    CACHE_KEY_N,
    CACHE_KEY_K,
    # The stride variables represent how much to increase the ptr by when moving by 1
    # element in a particular dimension. E.g. stride_am is how much to increase a_ptr
    # by to get the element one row down (A has M rows)
    stride_om,
    stride_on,
    stride_im,
    stride_ik,
    stride_wn,
    stride_wk,
    # Meta-parameters
    BLOCK_M: tl.constexpr,
    GROUP_M: tl.constexpr,
    BLOCK_N: tl.constexpr,
    BLOCK_K: tl.constexpr,
    # split k not used, not performant with activation, kept because early_config_prune is expecting it
    SPLIT_K: tl.constexpr,
    EVEN_K: tl.constexpr,
    BIAS: tl.constexpr,
    SAVE_ACT_INPUTS: tl.constexpr,
    ACTIVATION: tl.constexpr,
)

I introduced this decorator:

def autotune(configs, meta, save_cache_hook=False):
    def decorator(fn):
        return CachingAutotuner(
            # force autotune by setting save_cache_hook to False
            fn,
            meta=meta,
            configs=configs,
            save_cache_hook=save_cache_hook,
        )

    return decorator

based on this example of test: pytorch/test_torchinductor.py at fae821c2f166fccab6a3c34e293c7268f61e82ba · pytorch/pytorch · GitHub 1

But i thought it might be a better way to use the caching_autotune.

Thanks in advance,

@jansel
Copy link
Contributor

jansel commented Jan 11, 2023

Are you trying to use this for handwritten Triton kernels without inductor? If so, why not just use triton.autotune? There is also an AOT compilation option in Triton.

@ayoub-louati
Copy link
Author

ayoub-louati commented Jan 12, 2023

@jansel Yes, it is handwritten triton kernel without inductor, and i'm trying to use this one because as said in the PR it reduces CPU overheads when cudagraphs is disabled and the cache introduced is really interesting because it offers the ability to reuse the compiled kernels from a run to another one. Is it possible or it should be related to inductor ?

@jansel
Copy link
Contributor

jansel commented Jan 14, 2023

This API is internal to inductor and not intended for handwritten kernels. You may be able to adapt it to your needs, but will need to annotate the Triton signature/invariants/metadata manually and will have no backward compatibility guarantees.

Inductor generates the needed metadata here:
https://github.com/pytorch/pytorch/blob/d41b5d7c145f3e09c7223c2b707933266241ec9b/torch/_inductor/codegen/triton.py#L1063
which relies on some compiler analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants