fix: support 4-byte UTF-8 characters (Emoji) in object keys, for issue #3928 #3929

jujn · 2026-01-02T06:23:45Z

What this PR does / why we need it?

Summary of your change

Please indicate you've done the following:

Made sure tests are passing and test coverage is added if needed.
Made sure commit message follow the rule of Conventional Commits specification.
Considered the docs impact and opened a new docs issue or PR with docs changes if needed.

Copilot

Pull request overview

This PR fixes issue #3928 by adding support for 4-byte UTF-8 characters (emojis) in JSON object keys. Previously, when parsing JSON with emoji characters in field names, the parser would throw a JSONException for malformed input.

Key Changes

Added case 15 handling in getFieldName() method to decode 4-byte UTF-8 sequences into surrogate pairs
Modified character array resizing logic to accommodate surrogate pairs
Added comprehensive test coverage for emoji characters in object keys with various edge cases

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
core/src/main/java/com/alibaba/fastjson2/JSONReaderUTF8.java	Implements 4-byte UTF-8 decoding (case 15) in getFieldName method, converting to surrogate pairs and adjusting array handling
test-jdk17/src/test/java/com/alibaba/fastjson2/issues/Issue3928.java	Adds test cases covering emoji with backslashes, escaped unicode, mixed content, and double backslashes in object keys

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-02T06:30:36Z

core/src/main/java/com/alibaba/fastjson2/JSONReaderUTF8.java

+                        int codePoint = ((ch & 0x07) << 18)
+                                | ((b1 & 0x3F) << 12)
+                                | ((b2 & 0x3F) << 6)
+                                | (b3 & 0x3F);


The 4-byte UTF-8 decoding implementation is missing validation for the valid supplementary code point range. The decoded code point should be validated to ensure it falls within the valid range (0x10000 to 0x10FFFF). This validation is present in similar 4-byte UTF-8 handling code elsewhere in this file (see lines 2782-2787). Without this check, invalid UTF-8 sequences could be incorrectly decoded without throwing an exception.

Suggested change

| (b3 & 0x3F);

| (b3 & 0x3F);

if (codePoint < 0x10000 || codePoint > 0x10FFFF) {

throw new JSONException("malformed input around byte " + offset);

}

Copilot · 2026-01-02T06:30:36Z

core/src/main/java/com/alibaba/fastjson2/JSONReaderUTF8.java

+                        int b1 = bytes[offset + 1] & 0xFF;
+                        int b2 = bytes[offset + 2] & 0xFF;
+                        int b3 = bytes[offset + 3] & 0xFF;


Missing bounds check before accessing bytes array at offset + 1, offset + 2, and offset + 3. If the byte array ends prematurely (i.e., offset + 3 >= bytes.length), this will cause an ArrayIndexOutOfBoundsException. Consider adding a check to ensure offset + 3 < bytes.length or offset + 3 < nameEnd before accessing these bytes.

Copilot · 2026-01-02T06:30:37Z

test-jdk17/src/test/java/com/alibaba/fastjson2/issues/Issue3928.java

+    @Test
+    public void testIssue() {
+        Map<String, String> map = Map.of("\uD83D\uDE07\\", "");
+
+        String jsonString = JSON.toJSONString(map);
+        assertEquals(map, JSON.parseObject(jsonString, Map.class));
+
+        byte[] jsonBytes = JSON.toJSONBytes(map, StandardCharsets.UTF_8);
+        assertEquals(map, JSON.parseObject(jsonBytes, Map.class));
+    }


The test coverage is missing for the readFieldNameHashCode method with 4-byte UTF-8 characters in field names. The readFieldNameHashCode method at lines 2572-2590 does not handle case 15 (4-byte UTF-8 sequences) and will throw a JSONException when encountering emoji or other 4-byte UTF-8 characters in field names. This is the same issue that was fixed in getFieldName. Consider adding a test case that exercises readFieldNameHashCode with emoji characters in field names to ensure this method also handles 4-byte UTF-8 correctly.

fix: support 4-byte UTF-8 characters (Emoji) in object keys, for issue …

af67751

…alibaba#3928

Copilot AI review requested due to automatic review settings January 2, 2026 06:23

Copilot started reviewing on behalf of jujn January 2, 2026 06:24 View session

Copilot AI reviewed Jan 2, 2026

View reviewed changes

optimized

19ce53e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: support 4-byte UTF-8 characters (Emoji) in object keys, for issue #3928 #3929

fix: support 4-byte UTF-8 characters (Emoji) in object keys, for issue #3928 #3929

jujn commented Jan 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 2, 2026

Uh oh!

Copilot AI Jan 2, 2026

Uh oh!

Copilot AI Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: support 4-byte UTF-8 characters (Emoji) in object keys, for issue #3928 #3929

Are you sure you want to change the base?

fix: support 4-byte UTF-8 characters (Emoji) in object keys, for issue #3928 #3929

Conversation

jujn commented Jan 2, 2026

What this PR does / why we need it?

Summary of your change

Please indicate you've done the following:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant