Skip to content

Error while lexing identifier if it contains an escaped newline followed by a Unicode character #65156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattmanj17 opened this issue Sep 1, 2023 · 6 comments
Assignees
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema"

Comments

@mattmanj17
Copy link

mattmanj17 commented Sep 1, 2023

Minimal test case

int main(void) {
    int a\
ス = 42;
    return aス;
}

Godbolt example of failed compilation in clang 16
https://godbolt.org/z/TsMrbKqT6

Godbolt example of successful compilation with GCC 13.2
https://godbolt.org/z/hrzqcPKnn

Based on some debugging, I think we are looking at the '\\' in tryConsumeIdentifierUTF8Char, downstream of LexIdentifierContinue.
Perhaps that function needs to take a Size argument like tryConsumeIdentifierUCN does, so that it can correctly decode the Unicode char after the escaped newline.

@EugeneZelenko EugeneZelenko added clang:frontend Language frontend issues, e.g. anything involving "Sema" and removed new issue labels Sep 1, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 1, 2023

@llvm/issue-subscribers-clang-frontend

@danix800
Copy link
Member

danix800 commented Sep 1, 2023

@mattmanj17
Copy link
Author

@danix800

What part in particular on that page is relevant to the case I posted?

@danix800
Copy link
Member

danix800 commented Sep 1, 2023

The GCC extension is not relevant to this issue.

This seems to be a bug.

@shafik
Copy link
Collaborator

shafik commented Sep 1, 2023

CC @cor3ntin

@cor3ntin
Copy link
Contributor

cor3ntin commented Sep 1, 2023

@shafik https://reviews.llvm.org/D159345 :)

@cor3ntin cor3ntin self-assigned this Sep 2, 2023
avillega pushed a commit to avillega/llvm-project that referenced this issue Sep 11, 2023
int a\
ス;

Failed to be parsed as a valid identifier.

Fixes llvm#65156

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D159345
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema"
Projects
None yet
Development

No branches or pull requests

6 participants