[ruby-core:80334] [Ruby trunk Bug#13292] Invalid encodings in UTF-32

From: usa@...
Date: 2017-03-25 17:01:57 UTC
List: ruby-core #80334
Issue #13292 has been updated by usa (Usaku NAKAMURA).

Backport changed from 2.2: REQUIRED, 2.3: REQUIRED, 2.4: DONE to 2.2: DONE, 2.3: REQUIRED, 2.4: DONE

ruby_2_2 r58103 merged revision(s) 57816,57817.

----------------------------------------
Bug #13292: Invalid encodings in UTF-32
https://bugs.ruby-lang.org/issues/13292#change-63812

* Author: rbjl (Jan Lelis)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]
* Backport: 2.2: DONE, 2.3: REQUIRED, 2.4: DONE
----------------------------------------
Ruby is very strict about valid UTF-8 encodings, which is great.

Strings that encode surrogates or too large codepoints are not valid.

However, in UTF-32, it is possible to encode such values, and Ruby treats them as valid:

Example 1 (too large value)

```
a = [0, 0, 17, 0].pack("C*").force_encoding("UTF-32LE") #=> "\u{110000}"
a.valid_encoding? # => true
```

Example 2 (surrogate)

```
b = [0, 216, 0, 0].pack("C*").force_encoding("UTF-32LE") # => "\uD800"
b.valid_encoding? #=> true
```

The behaviour should be changed to `String#valid_encoding?` reporting `false`

For reference: http://unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf (page 71)



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:[email protected]?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next