Age | Commit message (Collapse) | Author |
|
[Bug #20653]
This commit refactors how Onigmo handles timeout. Instead of raising a
timeout error, onig_search will return a ONIGERR_TIMEOUT which the
caller can free memory, and then raise a timeout error.
This fixes a memory leak in String#start_with when the regexp times out.
For example:
regex = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
str = "a" * 1000000 + "x"
10.times do
100.times do
str.start_with?(regex)
rescue
end
puts `ps -o rss= -p #{$$}`
end
Before:
33216
51936
71152
81728
97152
103248
120384
133392
133520
133616
After:
14912
15376
15824
15824
16128
16128
16144
16144
16160
16160
Notes:
Merged: https://github.com/ruby/ruby/pull/11247
|
|
[Bug #20650]
The capture group allocates memory that is leaked when it times out.
For example:
re = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
str = "a" * 1000000 + "x"
10.times do
100.times do
re =~ str
rescue Regexp::TimeoutError
end
puts `ps -o rss= -p #{$$}`
end
Before:
34688
56416
78288
100368
120784
140704
161904
183568
204320
224800
After:
16288
16288
16880
16896
16912
16928
16944
17184
17184
17200
Notes:
Merged: https://github.com/ruby/ruby/pull/11238
|
|
https://bugs.ruby-lang.org/issues/20228 started freeing `stk_base` to
avoid a memory leak. But `stk_base` is sometimes stack allocated (using
`xalloca`), so the free only works if the regex stack has grown enough
to hit `stack_double` (which uses `xmalloc` and `xrealloc`).
To reproduce the problem on master and 3.3.1:
```ruby
Regexp.timeout = 0.001
/^(a*)x$/ =~ "a" * 1000000 + "x"'
```
Some details about this potential fix:
`stk_base == stk_alloc` on
[init](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1153),
so if `stk_base != stk_alloc` we can be sure we called
[`stack_double`](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1210)
and it's safe to free. It's also safe to free if we've
[saved](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1187-L1189)
the stack to `msa->stack_p`, since we do the `stk_base != stk_alloc`
check before saving.
This matches the check we do inside
[`stack_double`](https://github.com/ruby/ruby/blob/dde99215f2bc60c22a00fc941ff7f714f011e920/regexec.c#L1221)
|
|
Co-authored-by: Isaac Peka <[email protected]>
|
|
|
|
When matching against an incomplete character, some `enclen` calls are
expected not to exceed the limit, and some are expected to return the
required length and then the results are checked if it exceeds.
|
|
|
|
[Bug #20228]
If rb_reg_check_timeout raises a Regexp::TimeoutError, then the stk_base
will leak.
|
|
Fix [Bug #20207]
Fix [Bug #20212]
Handling consecutive lookarounds in init_cache_opcodes is buggy, so it
causes invalid memory access reported in [Bug #20207] and [Bug #20212].
This fixes it by using recursive functions to detected lookarounds
nesting correctly.
|
|
|
|
This commit also reduces the warning `'stkp' may be used
uninitialized in this function`.
|
|
|
|
|
|
Previously the following read and wrote 1 byte out-of-bounds:
$ valgrind ruby -e 'p /(\W+)[bx]\?/i.match? "aaaaaa aaaaaaaaa aaaa aaaaaaaa aaa aaaaxaaaaaaaaaaa aaaaa aaaaaaaaaaaa a ? aaa aaaa a ?"' 2> >(grep Invalid -A 30)
Because of the `match_cache_point_index + 1` in
memoize_extended_match_cache_point() and
check_extended_match_cache_point(), we need one more byte of space.
|
|
|
|
rb_reg_onig_match performs preparation, error handling, and cleanup for
matching a regex against a string. This reduces repetitive code and
removes the need for StringScanner to access internal data of regex.
Notes:
Merged: https://github.com/ruby/ruby/pull/8123
|
|
According to the C99 specification section 7.20.3.2 paragraph 2:
> If ptr is a null pointer, no action occurs.
So we do not need to check that the pointer is a null pointer.
Notes:
Merged: https://github.com/ruby/ruby/pull/8004
|
|
Notes:
Merged-By: makenowjust <[email protected]>
|
|
Notes:
Merged-By: makenowjust <[email protected]>
|
|
Notes:
Merged-By: makenowjust <[email protected]>
|
|
* Refactor Regexp#match cache implementation
Improved variable and function names
Fixed [Bug 19537] (Maybe fixed in https://github.com/ruby/ruby/pull/7694)
* Add a comment of the glossary for "match cache"
* Skip to reset match cache when no cache point on null check
Notes:
Merged-By: makenowjust <[email protected]>
|
|
On platforms where unaligned word access is not allowed, and if
`sizeof(val)` and `sizeof(type)` differ:
- `val` > `type`, `val` will be a garbage.
- `val` < `type`, outside `val` will be clobbered.
Notes:
Merged: https://github.com/ruby/ruby/pull/7723
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7694
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7694
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/7694
|
|
Notes:
Merged-By: makenowjust <[email protected]>
|
|
`OP_ANYCHAR_STAR_PEEK_NEXT` (#7454)
Notes:
Merged-By: makenowjust <[email protected]>
|
|
Notes:
Merged-By: makenowjust <[email protected]>
|
|
|
|
Notes:
Merged-By: makenowjust <[email protected]>
|
|
https://bugs.ruby-lang.org/issues/19104#change-100542
Notes:
Merged: https://github.com/ruby/ruby/pull/6902
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/6744
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/6744
|
|
|
|
|
|
|
|
Currently, the keys for CACHE_MATCH are handled as an `int` type. So we
should make sure the table size are smaller than the range of `int`.
|
|
```
regexec.c: In function ‘reset_match_cache’:
regexec.c:1259:56: warning: suggest parentheses around ‘-’ inside ‘<<’ [-Wparentheses]
1259 | match_cache[k1 >> 3] &= ((1 << (8 - (k2 & 7) - 1)) - 1 << ((k2 & 7) + 1)) | ((1 << (k1 & 7)) - 1);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
regexec.c:1269:60: warning: suggest parentheses around ‘-’ inside ‘<<’ [-Wparentheses]
1269 | match_cache[k2 >> 3] &= ((1 << (8 - (k2 & 7) - 1)) - 1 << ((k2 & 7) + 1));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
regexec.c: In function ‘find_cache_index_table’:
regexec.c:1192:11: warning: ‘m’ may be used uninitialized [-Wmaybe-uninitialized]
1192 | if (!(0 <= m && m < num_cache_table && table[m].addr == p)) {
| ~~^~~~
regexec.c: In function ‘match_at’:
regexec.c:1238:12: warning: ‘m1’ is used uninitialized [-Wuninitialized]
1238 | if (table[m1].addr < pbegin && m1 + 1 < num_cache_table) m1++;
| ^
regexec.c:1218:39: note: ‘m1’ was declared here
1218 | int l = 0, r = num_cache_table - 1, m1, m2;
| ^~
regexec.c:1239:12: warning: ‘m2’ is used uninitialized [-Wuninitialized]
1239 | if (table[m2].addr > pend && m2 - 1 > 0) m2--;
| ^
regexec.c:1218:43: note: ‘m2’ was declared here
1218 | int l = 0, r = num_cache_table - 1, m1, m2;
| ^~
```
|
|
|
|
This reverts commit 1e6673d6bbd2adbf555d82c7c0906ceb148ed6ee.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|