Skip to content

Conversation

@jujn
Copy link
Contributor

@jujn jujn commented Jan 2, 2026

What this PR does / why we need it?

Summary of your change

Please indicate you've done the following:

  • Made sure tests are passing and test coverage is added if needed.
  • Made sure commit message follow the rule of Conventional Commits specification.
  • Considered the docs impact and opened a new docs issue or PR with docs changes if needed.

Copilot AI review requested due to automatic review settings January 2, 2026 06:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #3928 by adding support for 4-byte UTF-8 characters (emojis) in JSON object keys. Previously, when parsing JSON with emoji characters in field names, the parser would throw a JSONException for malformed input.

Key Changes

  • Added case 15 handling in getFieldName() method to decode 4-byte UTF-8 sequences into surrogate pairs
  • Modified character array resizing logic to accommodate surrogate pairs
  • Added comprehensive test coverage for emoji characters in object keys with various edge cases

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
core/src/main/java/com/alibaba/fastjson2/JSONReaderUTF8.java Implements 4-byte UTF-8 decoding (case 15) in getFieldName method, converting to surrogate pairs and adjusting array handling
test-jdk17/src/test/java/com/alibaba/fastjson2/issues/Issue3928.java Adds test cases covering emoji with backslashes, escaped unicode, mixed content, and double backslashes in object keys

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

int codePoint = ((ch & 0x07) << 18)
| ((b1 & 0x3F) << 12)
| ((b2 & 0x3F) << 6)
| (b3 & 0x3F);
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 4-byte UTF-8 decoding implementation is missing validation for the valid supplementary code point range. The decoded code point should be validated to ensure it falls within the valid range (0x10000 to 0x10FFFF). This validation is present in similar 4-byte UTF-8 handling code elsewhere in this file (see lines 2782-2787). Without this check, invalid UTF-8 sequences could be incorrectly decoded without throwing an exception.

Suggested change
| (b3 & 0x3F);
| (b3 & 0x3F);
if (codePoint < 0x10000 || codePoint > 0x10FFFF) {
throw new JSONException("malformed input around byte " + offset);
}

Copilot uses AI. Check for mistakes.
Comment on lines +3072 to +3074
int b1 = bytes[offset + 1] & 0xFF;
int b2 = bytes[offset + 2] & 0xFF;
int b3 = bytes[offset + 3] & 0xFF;
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing bounds check before accessing bytes array at offset + 1, offset + 2, and offset + 3. If the byte array ends prematurely (i.e., offset + 3 >= bytes.length), this will cause an ArrayIndexOutOfBoundsException. Consider adding a check to ensure offset + 3 < bytes.length or offset + 3 < nameEnd before accessing these bytes.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +21
@Test
public void testIssue() {
Map<String, String> map = Map.of("\uD83D\uDE07\\", "");

String jsonString = JSON.toJSONString(map);
assertEquals(map, JSON.parseObject(jsonString, Map.class));

byte[] jsonBytes = JSON.toJSONBytes(map, StandardCharsets.UTF_8);
assertEquals(map, JSON.parseObject(jsonBytes, Map.class));
}
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage is missing for the readFieldNameHashCode method with 4-byte UTF-8 characters in field names. The readFieldNameHashCode method at lines 2572-2590 does not handle case 15 (4-byte UTF-8 sequences) and will throw a JSONException when encountering emoji or other 4-byte UTF-8 characters in field names. This is the same issue that was fixed in getFieldName. Consider adding a test case that exercises readFieldNameHashCode with emoji characters in field names to ensure this method also handles 4-byte UTF-8 correctly.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant