Age | Commit message (Collapse) | Author |
|
Add encoding conversion (transcoding) from UTF-8 to CESU-8
and back. CESU-8 is an encoding similar to UTF-8, but encodes
codepoints above U+FFFF as two surrogates, these surrogates
again being encoded as if they were UTF-8 codepoints. This
preserves the same binary sorting order as in UTF-16. It is
also somewhat similar (although not exactly identical) to an
encoding used internally by Java.
This completes issue #15995.
enc/trans/cesu_8.trans: Add encoding conversion from/to CESU-8
test/ruby/test_transcode.rb: Add tests for above
|
|
FrozenError will be used instead of RuntimeError for exceptions
raised when there is an attempt to modify a frozen object. The
reason for this change is to differentiate exceptions related
to frozen objects from generic exceptions such as those generated
by Kernel#raise without an exception class.
From: Jeremy Evans <[email protected]>
Signed-off-by: Urabe Shyouhei <[email protected]>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@61131 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* test/ruby/test_transcode.rb: fix typo in comment
patched by larskanis (Lars Kanis) [GH-1681]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60323 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_enc_str_scrub): enc can differ from the actual
encoding of the string, the cached coderange is useless then.
[ruby-core:82674] [Bug #13874]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59763 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59521 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59189 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
Add test method test_ill_formed_utf_8_replace to test/ruby/test_transcode.rb
to check for the recommended number of \uFFFD replacement characters.
This is the first part, using ill-formed prefixes, with suffixes up to
the length of the original UTF-8 structure (including overlongs and
the full 31-bit space.)
For more details, see Unicode 9.0.0, Section 3.9, Best Practices for Using U+FFFD.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59026 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* enc/trans/windows-1255-tbl.rb: update mapping from 0xCA to
U+05BA. [Feature #12877]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56516 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* transcode.c (str_transcode0): scrub in the given encoding when
the source encoding is given, not in the encoding of the
receiver. [ruby-core:75732] [Bug #12431]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55181 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
When you change this to true, you may need to add more tests.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53141 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
test/ruby/test_transcode.rb: Fixed encoding name
to the correct one in the IANA registry (IBM037)
and added an alias (ebcdic-cp-us)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53124 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53113 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* enc/trans/ebcdic.trans: transcodings between EBCDIC-US
and iso-8859-1 [with code from Andrea Ribuoli]
* test/ruby/test_transcode.rb: tests for above
* tool/transcode_tablegen.rb: additional argument for
method transcode_tblgen
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53112 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* transcode.c (rb_econv_set_replacement): target encoding name can
be empty now. [ruby-core:69841] [Bug #11324]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51116 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* transcode.c (load_transcoder_entry): fix transcoder loading race
condition, by waiting in require. [ruby-dev:49106] [Bug #11277]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@51037 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* test/lib/find_executable.rb: Ditto.
* test/lib/memory_status.rb: Ditto.
* test/lib/test/unit.rb: require envutil.
* test/: Don't require envutil in test files.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48409 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* test/ruby/test_transcode.rb (test_valid_dummy_encoding): add
assertion messages and suppress a warning.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@44477 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* encoding.c (must_encindex, rb_enc_from_index, rb_obj_encoding): mask
encoding index and ignore dummy flags. [ruby-core:59354] [Bug #9314]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@44462 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
str.encode doesn't have explicit invalid: :replace.
workaround fix for see #8995
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@43802 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* test/ruby/test_transcode.rb (TestTranscode#test_pseudo_encoding_inspect):
test for proper base encoding. [ruby-core:57318] [Bug #8940]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@43024 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* enc/encdb.c, enc/utf_16_32.h (ENC_DUMMY_UNICODE): Unicode with BOM
must be based on big endian variants, so that actual encodings would
work. [ruby-core:57318] [Bug #8940]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@43023 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
Fix conversion table and logic. [ruby-dev:47680]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@42823 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@41036 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
tool/transcode-tblgen.rb: change EUC-JP-2004 to EUC-JIS-2004.
This is follow up to changes in r41024.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@41035 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
by John Shahid <[email protected]>
https://github.com/ruby/ruby/pull/148
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36808 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
[ruby-dev:45571] [Feature #6349]
Requested by Kyouhei Yanagita <[email protected]>.
* enc/trans/japanese_euc.trans: ditto.
* enc/trans/JIS/JISX0213-[12]%UCS@{BMP,SIP}.src: JIS X 0213:2004 ->
Unicode mapping table from NetBSD.
* enc/trans/JIS/UCS@{BMP,SIP}%JISX0213-[12].src: Unicode -> JIX X
0213:2004 mapping table from NetBSD.
* tool/transcode-tblgen.rb: added SIP support.
* test/ruby/test_transcode.rb: tests of above changes.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35460 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
indent.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@31980 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
assertion and move back.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30839 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30837 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
hash, method, proc or [] method as fallback. [ruby-dev:42692]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@30118 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
- Removed commented-out options that are no longer under discussion.
- Added two more tests for forthcomming clarifications.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29970 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29927 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29895 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29894 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29892 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
surrogates.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29891 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* enc/trans/utf_16_32.trans: add a converter from UTF-16 to UTF-8.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29889 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* enc/big5.c: split CP951 from Big5-HKSCS.
* enc/trans/big5.trans: import conversion table of Big5, Big5-HKSCS,
CP950, and CP951 from ICU. they need fallback conversions.
ref [ruby-core:33256]
http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/
* tool/transcode-tblgen.rb (import_ucm): add to import ucm files.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@29869 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
whose result is 2 bytes. [ruby-core:30751]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@28307 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* enc/trans/iso2022.trans: add converter for CP50220.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@27860 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
fallback hash has a related key. [ruby-dev:40540]
[ruby-dev:40829] #3036
* transcode.c (rb_econv_prepare_opts): pass to newhash
a value with the key :fallback.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@27326 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@27149 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26417 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
support for new transcoding instruction FUNsio (with Tatsuya Mizuno)
* enc/trans/gb18030.trans: Significantly reduced GB18030 conversion
table footprint using FUNsio and differences (with Tatsuya Mizuno)
* test/ruby/test_transcode.rb: Minor name fix (from Tatsuya Mizuno)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@26065 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
(from Tatsuya Mizuno)
* test/ruby/test_transcode.rb: Added test for converting full range of
Unicode codepoints from/to GB18030 (from Tatsuya Mizuno)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25980 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
test/ruby/test-transcode.rb: Added Encoding 'Big5-UAO' and transcoding
for it (from Tatsuya Mizuno) (see Bug #1784)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@25822 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* test/ruby/test_transcode.rb: added tests for the above
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24322 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
new Chinese BIG5-HKSCS transcoding (with Tatsuya Mizuno)
* test/ruby/test_transcode.rb: added tests for the above
(with Tatsuya Mizuno)
* enc/big5.c: Added BIG5-HKSCS as a replicate encoding of BIG5
(short term solution, needs more work; with Tatsuya Mizuno)
* tool/transcode-tblgen.rb: made 'pat' directly accessible in
class StrSet
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@24264 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@22784 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@21741 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|