Skip to content

code points above Latin1 are not recognized as white space #665

@gibson042

Description

@gibson042

Environment: XS 10.5.0

Description
In the ECMAScript lexical grammar, WhiteSpace is defined to include ASCII tab/vertical tab/form feed/space, nonbreaking space, BOM/zero-width nonbreaking space, and any code point with the Unicode property “Space_Separator”, and LineTerminator is any of U+000A LINE FEED, U+000D CARRIAGE RETURN, U+2028 LINE SEPARATOR, and U+2029 PARAGRAPH SEPARATOR. Both are required to be insignificant in between expression tokens, but XS does not seem to recognize WhiteSpace code points above the Latin-1 Supplement block.

Steps to Reproduce

  1. Evaluate source text like 0  (i.e., a numeric literal followed by U+1680 OGHAM SPACE MARK).
  2. Check all WhiteSpace and LineTerminator characters (as identified at unicode.org):
    ["0009", "000A", "000B", "000C", "000D", "0020", "00A0",
     "1680", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "200A",
     "2028", "2029", "202F", "205F", "3000", "FEFF"
    ].filter(hex => {
      const cp = String.fromCharCode(parseInt(hex, 16));
      try {
        eval("[true]" + cp + "[0]");
        // No parsing error, suppress code point.
        return false;
      } catch ( ex ) {
        // Parsing error, keep code point.
        return true;
      }
    })
    

Expected behavior

  1. No error.
  2. An empty array.

Actual behavior

  1. SyntaxError: invalid character 5760
  2. 1680,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,200A,202F,205F,3000,FEFF

Script

$ ./xs -v; cat /tmp/js; ./xs /tmp/js
XS 10.5.0
const ws = ["0009", "000A", "000B", "000C", "000D", "0020", "00A0",
  "1680", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "200A",
  "2028", "2029", "202F", "205F", "3000", "FEFF"
]
const rejected = ws.filter(hex => {
  const cp = String.fromCharCode(parseInt(hex, 16));
  try {
    eval("/x/" + cp);
    // No parsing error, suppress code point.
    return false;
  } catch ( ex ) {
    // Parsing error, keep code point.
    return true;
  }
});
print("rejected: " + rejected);
rejected: 1680,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,200A,202F,205F,3000,FEFF

Metadata

Metadata

Assignees

No one assigned

    Labels

    confirmedissue reported has been reproducedfixed - please verifyIssue has been fixed. Please verify and close.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions