Skip to content

Fix GH-10634: Lexing memory corruption #10866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 17, 2023
Merged

Conversation

nielsdos
Copy link
Member

Fixes GH-10634

We're not relying on re2c's bounds checking mechanism because re2c:yyfill:check = 0; is set. We just return 0 if we read over the end of the input in YYFILL. Note that we used to use the "any character" wildcard in the comment regexes.
But that means if we go over the end in the comment regexes, we don't know that and it's just like the 0 bytes are part of the token. Since a 0 byte already is considered as an end-of-file, we can just block those in the regex.

For the regexes with newlines, I had to not only include \x00 in the denylist, but also \n and \r because otherwise it would greedily match those and let the single-line comment run over multiple lines.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@iluuu1994
Copy link
Member

Perfect, thank you! Feel free to merge. Maybe you could also add a short comment on why the \x00 is there for the next person that encounters it :)

We're not relying on re2c's bounds checking mechanism because
re2c:yyfill:check = 0; is set. We just return 0 if we read over the end
of the input in YYFILL. Note that we used to use the "any character"
wildcard in the comment regexes.
But that means if we go over the end in the comment regexes,
we don't know that and it's just like the 0 bytes are part of the token.
Since a 0 byte already is considered as an end-of-file, we can just block
those in the regex.

For the regexes with newlines, I had to not only include \x00 in the
denylist, but also \n and \r because otherwise it would greedily match
those and let the single-line comment run over multiple lines.
@nielsdos nielsdos merged commit ac99645 into php:master Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lexing memory corruption
2 participants