indutor: fix issue of compute index_expr range#103147
indutor: fix issue of compute index_expr range#103147XiaobingSuper wants to merge 7 commits intogh/XiaobingSuper/129/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103147
Note: Links to docs will display an error until the docs builds have been completed. ✅ 2 Unrelated FailuresAs of commit e95601d: UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
I just meet one index expr which only has one |
| # min_value may be greater than max_value, such as ModularIndexing(513*i2 + i3 + 262400, 512, 513), | ||
| # with vars_ranges is {i2: ValueRanges(lower=0, upper=256), i3: ValueRanges(lower=0, upper=513)}. |
There was a problem hiding this comment.
I guess a better approach is to deduce the range with the divisor. For example, we may create a new symbol with the range [0, divisor-1] suppose the divisor is constant, and put this range to the vars_ranges to calculate the min/max value with the algorithm in this function.
For the CPU inductor side, there has an optimization to convert ```int64``` index_expr to ```int32``` for good performance(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/cpp.py#L2034), but for ```ModularIndexing``` exp, we replace it as division(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/optimize_indexing.py#L73, ```ModularIndexing``` doesn't have derivative) to compute derivative and then compute the expr's value range, there may meet issue which the min value may greater than the max value(```ModularIndexing(513*i2 + i3 + 262400, 512, 513), with vars_ranges is {i2: ValueRanges(lower=0, upper=256), i3: ValueRanges(lower=0, upper=513)}```). One solution is that we don't replace ```ModularIndexing```, but it can't get the value range. Another solution is that return ```inf``` range when the min val is great than the max val. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]
| if len(symbols) == 0: | ||
| return ValueRanges(expr, expr) | ||
|
|
||
| vars_ranges_temp = vars_ranges.copy() |
There was a problem hiding this comment.
simpler to do vars_ranges = vars_ranges.copy()?
|
We should just return (0, z-1) range for ModularIndexing, and not go through derivative computation. cc @eellison |
Yes, for a simple expr which only has ModularIndexing, it is ok just to return (0, z-1), but for a complex expr, I think we need still to use derivative and replace ModularIndexing as a variable that has a range (0, z-1). @eellison |
| def mod_indexing_rep(x, y, z): | ||
| if z.is_constant(): | ||
| return x / y | ||
| new_var = sympy_symbol("mod_index" + f"{next(cnt)}") |
There was a problem hiding this comment.
Should we check if x / y has a range <= z and return x / y in that case ?
There was a problem hiding this comment.
If we want to return x/y, it needs to check z is positive or not, and x/y ranges are the same sign, for example, if x/y ranges are [-2, 2], z is 4, we can't direct return x/y. if we want to return x/y, we may need to add consider many conditions, I think using z's range is ok even if the range is huge for some cases.
There was a problem hiding this comment.
maybe add TODO to optimize more
|
Yeah, #102722 is strictly better than this, but this is better than the current state so I'm fine with merging this one. FWIW, I'm going to be on PTO for the next 3 weeks, so let's merge this one now and I'll have the other one ready when I'm back. |
For the CPU inductor side, there has an optimization to convert ```int64``` index_expr to ```int32``` for good performance(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/cpp.py#L2034), but for ```ModularIndexing``` exp, we replace it as division(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/optimize_indexing.py#L73, ```ModularIndexing``` doesn't have derivative) to compute derivative and then compute the expr's value range, there may meet issue which the min value may greater than the max value(```ModularIndexing(513*i2 + i3 + 262400, 512, 513), with vars_ranges is {i2: ValueRanges(lower=0, upper=256), i3: ValueRanges(lower=0, upper=513)}```). One solution is that we don't replace ```ModularIndexing```, but it can't get the value range. Another solution is that return ```inf``` range when the min val is great than the max val. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]
Added. |
For the CPU inductor side, there has an optimization to convert ```int64``` index_expr to ```int32``` for good performance(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/cpp.py#L2034), but for ```ModularIndexing``` exp, we replace it as division(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/optimize_indexing.py#L73, ```ModularIndexing``` doesn't have derivative) to compute derivative and then compute the expr's value range, there may meet issue which the min value may greater than the max value(```ModularIndexing(513*i2 + i3 + 262400, 512, 513), with vars_ranges is {i2: ValueRanges(lower=0, upper=256), i3: ValueRanges(lower=0, upper=513)}```). One solution is that we don't replace ```ModularIndexing```, but it can't get the value range. Another solution is that return ```inf``` range when the min val is great than the max val. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
For the CPU inductor side, there has an optimization to convert ```int64``` index_expr to ```int32``` for good performance(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/cpp.py#L2034), but for ```ModularIndexing``` exp, we replace it as division(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/optimize_indexing.py#L73, ```ModularIndexing``` doesn't have derivative) to compute derivative and then compute the expr's value range, there may meet issue which the min value may greater than the max value(```ModularIndexing(513*i2 + i3 + 262400, 512, 513), with vars_ranges is {i2: ValueRanges(lower=0, upper=256), i3: ValueRanges(lower=0, upper=513)}```). One solution is that we don't replace ```ModularIndexing```, but it can't get the value range. Another solution is that return ```inf``` range when the min val is great than the max val. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 [ghstack-poisoned]
|
Successfully rebased |
| (torch.randn(8),), | ||
| ) | ||
|
|
||
| @patch("torch.cuda.is_available", lambda: False) |
| def mod_indexing_rep(x, y, z): | ||
| if z.is_constant(): | ||
| return x / y | ||
| new_var = sympy_symbol("mod_index" + f"{next(cnt)}") |
There was a problem hiding this comment.
maybe add TODO to optimize more
For the CPU inductor side, there has an optimization to convert ```int64``` index_expr to ```int32``` for good performance(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/cpp.py#L2034), but for ```ModularIndexing``` exp, we replace it as division(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/optimize_indexing.py#L73, ```ModularIndexing``` doesn't have derivative) to compute derivative and then compute the expr's value range, there may meet issue which the min value may greater than the max value(```ModularIndexing(513*i2 + i3 + 262400, 512, 513), with vars_ranges is {i2: ValueRanges(lower=0, upper=256), i3: ValueRanges(lower=0, upper=513)}```). One solution is that we don't replace ```ModularIndexing```, but it can't get the value range. Another solution is that return ```inf``` range when the min val is great than the max val. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 [ghstack-poisoned]
For the CPU inductor side, there has an optimization to convert ```int64``` index_expr to ```int32``` for good performance(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/cpp.py#L2034), but for ```ModularIndexing``` exp, we replace it as division(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/optimize_indexing.py#L73, ```ModularIndexing``` doesn't have derivative) to compute derivative and then compute the expr's value range, there may meet issue which the min value may greater than the max value(```ModularIndexing(513*i2 + i3 + 262400, 512, 513), with vars_ranges is {i2: ValueRanges(lower=0, upper=256), i3: ValueRanges(lower=0, upper=513)}```). One solution is that we don't replace ```ModularIndexing```, but it can't get the value range. Another solution is that return ```inf``` range when the min val is great than the max val. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
For the CPU inductor side, there has an optimization to convert
int64index_expr toint32for good performance(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codegen/cpp.py#L2034), but forModularIndexingexp, we replace it as division(https://github.com/pytorch/pytorch/blob/main/torch/_inductor/optimize_indexing.py#L73,ModularIndexingdoesn't have derivative) to compute derivative and then compute the expr's value range, there may meet issue which the min value may greater than the max value(ModularIndexing(513*i2 + i3 + 262400, 512, 513), with vars_ranges is {i2: ValueRanges(lower=0, upper=256), i3: ValueRanges(lower=0, upper=513)}).One solution is that we don't replace
ModularIndexing, but it can't get the value range.Another solution is that return
infrange when the min val is great than the max val.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78