Skip to content

Support cpp wrapper code #93522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #93557
jgong5 opened this issue Oct 8, 2022 · 6 comments
Closed
Tracked by #93557

Support cpp wrapper code #93522

jgong5 opened this issue Oct 8, 2022 · 6 comments
Assignees
Labels
oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@jgong5
Copy link
Collaborator

jgong5 commented Oct 8, 2022

Currently wrapper.py generates python code that invokes generated kernels and external kernels. This would incur Python overhead which can be avoided if the wrapper can generate c++ code that invokes these kernels via C++ directly.

cc @ezyang @soumith @msaroufim @wconstab @ngimel @bdhirsh

@voznesenskym
Copy link
Collaborator

Currently wrapper.py generates python code that invokes generated kernels and external kernels. This would incur Python overhead which can be avoided if the wrapper can generate c++ code that invokes these kernels via C++ directly.

Yes. It could also run where we don't have access to Python. In fact, this would go great with torchdynamo.export, for a general no Python inductor flow.

Do you plan on working on direct c++ invocation? If not, I can take a look.

@jgong5
Copy link
Collaborator Author

jgong5 commented Oct 8, 2022

Do you plan on working on direct c++ invocation? If not, I can take a look.

It is in our tentative plan. It's great if you'd like to offer the hand. Feel free to let me know if you need any help!

@jgong5
Copy link
Collaborator Author

jgong5 commented Oct 14, 2022

@voznesenskym Have you started working on it? If not, we will take over. :-)

chunyuan-w referenced this issue Nov 3, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [ ] Support Reduction
- [ ] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 3, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [ ] Support Reduction
- [ ] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 4, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [ ] Support Reduction
- [ ] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 4, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [ ] Support Reduction
- [ ] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 7, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [ ] Support Reduction
- [ ] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 7, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [ ] Support Reduction
- [ ] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 7, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 7, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 8, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 8, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 28, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 28, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 28, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 28, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [ ] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 28, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [x] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 28, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [x] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 29, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [x] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 29, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [x] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 29, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [x] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 29, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [x] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 29, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [x] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 29, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction
- [x] Support None
- [ ] Support ExternKernel
- [x] Cache the kernel loading part
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 30, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: #88561
- [x] Support None: #88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: #88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: #89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 30, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: #88561
- [x] Support None: #88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: #88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: #89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 30, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: #88561
- [x] Support None: #88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: #88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: #89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 30, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: #88561
- [x] Support None: #88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: #88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: #89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 30, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: #88561
- [x] Support None: #88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: #88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: #89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 30, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: #88561
- [x] Support None: #88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: #88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: #89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 30, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: #88561
- [x] Support None: #88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: #88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: #89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
chunyuan-w referenced this issue Nov 30, 2022


cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float); 
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float); 
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: #88561
- [x] Support None: #88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: #88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: #89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant


[ghstack-poisoned]
pytorchmergebot referenced this issue Nov 30, 2022
## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float);
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float);
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: #88561
- [x] Support None: #88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: #88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: #89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant

Pull Request resolved: #88167
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire
kulinseth referenced this issue in kulinseth/pytorch Dec 10, 2022
## Description
Implements https://github.com/pytorch/torchdynamo/issues/1556.
This PR adds a cpp wrapper to invoke the generated kernels. The cpp wrapper is turned off by default and can be turned on by setting:
```python
from torch._inductor import config
config.cpp_wrapper = True
```

### Example
The main part of the generated code:
```python
from torch.utils.cpp_extension import load_inline
wrapper = (
'''
#include <dlfcn.h>
#include <assert.h>
    std::tuple<at::Tensor, at::Tensor> call_0(std::tuple<at::Tensor, at::Tensor> args) {
    at::Tensor arg0_1, arg1_1;
    std::tie(arg0_1, arg1_1) = args;
    auto buf0 = at::empty_strided({8, 8}, {8, 1}, at::ScalarType::Float);
    auto buf1 = at::empty_strided({8, 8}, {1, 8}, at::ScalarType::Float);
    auto kernel0_lib = dlopen("/tmp/torchinductor_user/kn/ckn7ubcn2qbkme2vx5r6antnh5sv6d3o3t6qwdfgfoupnxty6pnm.so", RTLD_NOW);
    assert(kernel0_lib != nullptr);
    void (*kernel0)(const float*,const float*,float*,float*);
    *(void **) (&kernel0) = dlsym(kernel0_lib, "kernel");
    kernel0((float*)(arg0_1.data_ptr()), (float*)(arg1_1.data_ptr()), (float*)(buf0.data_ptr()), (float*)(buf1.data_ptr()));
    arg0_1.reset();
    arg1_1.reset();
    return std::make_tuple(buf0, buf1); }''' )

module = load_inline(
    name='inline_extension_c64wpbccpbre3th2k6oxwrjy5bhvxnmkdxkhcfxlsw7xpsg4eabu',
    cpp_sources=[wrapper],
    functions=['call_0'],
    extra_cflags=['-fPIC -Wall -std=c++14 -Wno-unused-variable -march=native -O3 -ffast-math -fno-finite-math-only -fopenmp'],
    extra_ldflags=['-shared  -lgomp'],
    extra_include_paths=['-I/home/user/pytorch/torch/include -I/home/user/pytorch/torch/include/torch/csrc/api/include -I/home/user/pytorch/torch/include/TH -I/home/user/pytorch/torch/include/THC -I/home/user/miniconda3/envs/pytorch/include/python3.7m'])

def _wrap_func(f):
    def g(args):
        return f(args)
    return g
call = _wrap_func(module.call_0)
```

### Next steps
The below items will be addressed in upcoming PRs.
- [x] Support Reduction: pytorch#88561
- [x] Support None: pytorch#88560
- [ ] Support ExternKernel
   - [x] ATen GEMM-related OPs: pytorch#88667
   - [ ] ATen Conv
   - [ ] Conv/GEMM fusion OPs
- [x] Cache the kernel loading part: pytorch#89742
- [ ] De-allocate input buffers when possible by leveraging CPython APIs
- [ ] Support Constant

Pull Request resolved: pytorch#88167
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire
@malfet malfet transferred this issue from pytorch/torchdynamo Feb 1, 2023
@ezyang
Copy link
Contributor

ezyang commented Feb 1, 2023

I think this was done

@ezyang ezyang closed this as completed Feb 1, 2023
@github-project-automation github-project-automation bot moved this from TODO to Done in PyTorch Intel Feb 1, 2023
@jgong5
Copy link
Collaborator Author

jgong5 commented Feb 2, 2023

I think this was done

Oh, not yet. It is WIP: #89743

@jgong5 jgong5 reopened this Feb 2, 2023
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 7, 2023
@ydwu4
Copy link
Contributor

ydwu4 commented Nov 27, 2023

Seems #89743 is merged. Will close this issue but feel free to reopen if it's not.

@ydwu4 ydwu4 closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

No branches or pull requests

7 participants