Age | Commit message (Collapse) | Author |
|
This exposes the rb_fstring internal function to return a
deduped and frozen string when a non-frozen string is given.
This is useful for writing all sorts of record processing key
values maybe stored, but certain keys and values are often
duplicated at a high frequency, so memory savings can
noticeable.
Use cases are many:
* email/NNTP header processing
There are some standard header keys everybody uses
(From/To/Cc/Date/Subject/Received/Message-ID/References/In-Reply-To),
as well as common ones specific to a certain lists:
(ruby-core has X-Redmine-* headers)
It is also useful to dedupe values, as most inboxes have
multiple messages from the same sender, or MUA.
* package management systems -
things like RubyGems stores identical strings for licenses,
dependency names, author names/emails, etc
* HTTP headers/trailers -
standard headers (Host/Accept/Accept-Encoding/User-Agent/...)
are common, but there are also uncommon ones.
Values may be deduped, as well, as it is likely a user
agent will make multiple/parallel requests to the same
server.
* version control systems -
this can be useful for deduplicating names of frequent
committers (like "nobu" :)
In linux.git and git.git, there are also common
trailers such as Signed-Off-By/Acked-by/Reviewed-by/Fixes/...
as well as less common ones.
* audio metadata -
There are commonly used tags (Artist/Album/Title/Tracknumber),
but Vorbis comments allows arbitrary key values to be stored.
Music collections contain songs by the same artist or mutiple
songs from the same album, so deduplicating values will be
helpful there, too.
* JSON, YAML, XML, HTML processing
Certain fields, tags and attributes are commonly used
across the same and multiple documents
There is no security concern in this being a DoS vector by
causing immortal strings. The fstring table is not a GC-root
and not walked during the mark phase. GC-able dynamic symbols
since Ruby 2.2 are handled in the same manner, and that
implementation also relies on the non-immortality of fstrings.
[Feature #13077] [ruby-core:79663]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57698 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (str_shared_replace): use RUBY_ASSERT for
pre-condition.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57628 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_enumerate_lines): initialize conditionally
used variable.
* thread.c (rb_fd_no_init): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57625 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_enumerate_lines): hint to suppress a
maybe-uninitialized warning by gcc.
* thread.c (rb_fd_no_init): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57618 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c: add example for Symbol#to_s.
The docs for Symbol#to_s only include an example for
Symbol#id2name, but not for #to_s which is an alias;
the docs should include examples for both methods.
From: Marcus Stollsteimer <[email protected]>
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57536 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
Since the fstring table encompasses all strings in the
symbol table, we may reuse the fstring table walk to set
the class and eliminate the branch in rb_id2str.
* string.c (Init_String): use rb_cString immediately after definition
* symbol.c (rb_id2str): eliminate branch to set class
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57521 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
Handle the embedded case first, since we may have an embedded
duplicate and non-embedded original string.
* string.c (rb_str_tmp_frozen_release): handled embedded strings
* test/ruby/test_io.rb (test_write_no_garbage): new test
[ruby-core:78898] [Bug #13085]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57471 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (STR_IS_SHARED_M): new flag to mark shared mulitple times
(STR_SET_SHARED): set STR_IS_SHARED_M
(rb_str_tmp_frozen_acquire, rb_str_tmp_frozen_release): new functions
(str_new_frozen): set/unset STR_IS_SHARED_M as appropriate
* internal.h: declare new functions
* io.c (fwrite_arg, fwrite_do, fwrite_end): new
(io_fwrite): use new functions
Introduce rb_str_tmp_frozen_acquire and rb_str_tmp_frozen_release
to manage a hidden, frozen string. Reuse one bit of the embed
length for shared strings as STR_IS_SHARED_M to indicate a string
has been shared multiple times. In the common case, the string
is only shared once so the object slot can be reclaimed immediately.
minimum results in each 3 measurements. (time and size)
Execution time (sec)
name trunk built
io_copy_stream_write 0.682 0.254
io_copy_stream_write_socket 1.225 0.751
Speedup ratio: compare with the result of `trunk' (greater is better)
name built
io_copy_stream_write 2.680
io_copy_stream_write_socket 1.630
Memory usage (last size) (B)
name trunk built
io_copy_stream_write 95436800.000 6512640.000
io_copy_stream_write_socket 117628928.000 7127040.000
Memory consuming ratio (size) with the result of `trunk' (greater is better)
name built
io_copy_stream_write 14.654
io_copy_stream_write_socket 16.505
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57469 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
This seems a bug introduced by r520 (1.4.0). [ruby-core:79110] [Bug #13135]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57374 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* file.c (rb_get_path_check_convert): refine the error message
when the path name contains null byte.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57336 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_enc_str_scrub): only one of replacement and block
is allowed. [ruby-core:79038] [Bug #13119]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57304 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_enc_str_scrub): yield the invalid part only with
ASCII-incompatible. [ruby-core:79039] [Bug #13120]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57303 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_enc_str_scrub): honor the given block with
ASCII-incompatible encoding. [ruby-core:79039] [Bug #13120]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57302 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_enumerate_lines): allow CRLF to separate
paragraphs.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57185 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_enumerate_lines): in paragraph mode, do not
include newlines which separate paragraphs, so that it will be
consistent with IO#each_line.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57184 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_casecmp_p): [DOC] use Unicode escape form to
get rid of warning C4819 by Microsoft Visual C++.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57154 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
Add size_t cast to avoid signed integer overflow. r56157 ("string.c:
avoid signed integer overflow", 2016-09-13) missed this. Suppresses
UBSan.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (crypt.h): crypt_r() was added in FreeBSD 12.0 but is
declared in unistd.h. [ruby-core:78664] [Bug #13038]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57091 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (chomp_newline): fix chomping newline only line.
rb_enc_prev_char return NULL if no previous character and must
not call rb_enc_ascget on it. a patch by Ary Borenszweig
<asterite AT gmail.com> at [ruby-core:78666]. [Bug #13037]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57088 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_equal): [DOC] fix fallback method name. the
peer's == method will be used, not ===.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57056 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_match_m_p): inverse of Regexp#match?. based on
the patch by Herwin Weststrate <[email protected]>.
[Fix GH-1483] [Feature #12898]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57053 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_enumerate_lines): implement chomp option.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56972 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
Fix documentation of String#casecmp? (examples didn't have the '?').
Add an example with non-ASCII characters. Clarify that casecmp,
unlike casecmp?, only does case-insensitivity on A-Z/a-z.
[ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56926 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_casemap): use xmalloc simply instead of
ALLOC_N.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (mapping_buffer): get rid of zero-length array member,
which is not a part of C90.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56915 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_casecmp_p): [DOC] move forward declaration of
rb_str_downcase to enable rdoc.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56913 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c: Implement String#casecmp? and Symbol#casecmp? by using
String#downcase :fold for Unicode case folding. This does not include
options such as :turkic, because these currently cannot be combined
with the :fold option. This implements feature #12786.
* test/ruby/test_string.rb/test_symbol.rb: Tests for above.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* io.c (extract_getline_opts): extract chomp option.
[Feature #12553]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56581 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* numeric.c: [DOC] update document for Integer class.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56492 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_sub, rb_str_gsub): [DOC] 'backlash' should read
'backslash'. [Fix GH-1461]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56460 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
a hash value of Object might be Bignum, but it causes many troubles
expecially the Object is used as a key of a hash. so I've gave up
to do so.
* array.c (rb_ary_hash): use above macro.
* bignum.c (rb_big_hash): ditto.
* hash.c (rb_obj_hash, rb_hash_hash): ditto.
* numeric.c (rb_dbl_hash): ditto.
* proc.c (proc_hash): ditto.
* re.c (rb_reg_hash, match_hash): ditto.
* string.c (rb_str_hash_m): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_hash_m): hash values may be negative.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56321 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
size with int, and of course also not guaranteed the value can be
Fixnum.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (lstrip_offset): add a fast path in the case of single
byte optimizable strings, as well as rstrip_offset.
[ruby-core:77392] [Feature #12788]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56250 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (enc_strlen, rb_enc_strlen_cr): Avoid signed integer
overflow. The result type of a pointer subtraction may have the same
size as long. This fixes String#size returning an negative value on
i686-linux environment:
str = "\x00" * ((1<<31)-2))
str.slice!(-3, 3)
str.force_encoding("UTF-32BE")
str << 1234
p str.size
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56247 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
actually check its availability rather to check GCC's version.
* configure.in (WARN_UNUSED_RESULT): moved to here.
* configure.in (RUBY_FUNC_ATTRIBUTE): change function declaration
to return int rather than void, because it makes no sense for a
warn_unused_result attributed function to return void.
Funny thing however is that it also makes no sense for noreturn
attributed function to return int. So there is a fundamental
conflict between them. While I tested this, I confirmed both
GCC 6 and Clang 3.8 prefers int over void to correctly detect
necessary attributes under this setup. Maybe subject to change
in future.
* internal.h (UNINITIALIZED_VAR): renamed to MAYBE_UNUSED, then
moved to configure.in for the same reason we move
WARN_UNUSED_RESULT.
* configure.in (MAYBE_UNUSED): moved to here.
* internal.h (__has_attribute): deleted, because it has no use now.
* string.c (rb_str_enumerate_lines): refactor macro rename.
* string.c (rb_str_enumerate_bytes): ditto.
* string.c (rb_str_enumerate_chars): ditto.
* string.c (rb_str_enumerate_codepoints): ditto.
* thread.c (do_select): ditto.
* vm_backtrace.c (rb_debug_inspector_open): ditto.
* vsnprintf.c (BSD_vfprintf): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56169 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
The behavior on signed integer overflow is undefined. On platform with
sizeof(long)==4, it's fairly easy that 'len + termlen' overflows, where
len is the string length and termlen is the terminator length.
So, prevent the integer overflow by avoiding adding to a string length,
or casting to size_t before adding where the total size is passed to
{RE,}ALLOC*().
* string.c (STR_HEAP_SIZE, RESIZE_CAPA_TERM, str_new0, rb_str_buf_new,
str_shared_replace, rb_str_init, str_make_independent_expand,
rb_str_resize): Avoid overflow by casting the length to size_t. size_t
should be able to represent LONG_MAX+termlen.
* string.c (rb_str_modify_expand): Check that the new length is in the
range of long before resizing. Also refactor to use RESIZE_CAPA_TERM
macro.
* string.c (str_buf_cat): Fix so that it does not create a negative
length String. Also fix the condition for 'string sizes too big', the
total length can be up to LONG_MAX.
* string.c (rb_str_plus): Check the resulting String length does not
exceed LONG_MAX.
* string.c (rb_str_dump): Fix integer overflow. The dump result will be
longer then the original String.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56157 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (STR_EMBEDDABLE_P): Renamed from STR_EMBEDABLE_P(). And use
it in more places.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56155 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (STR_EMBEDABLE_P): extract the predicate macro to tell
if the given length is capable in an embedded string, and fix
possible integer overflow.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_change_terminator_length): fix integer overflow
in the case growing the terminator length and the string length
is around LONG_MAX.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56149 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_set_len): The buffer overflow check is wrong. The
space for termlen is allocated outside the capacity returned by
rb_str_capacity(). This fixes r41920 ("string.c: multi-byte
terminator", 2013-07-11). [ruby-core:77257] [Bug #12757]
* test/-ext-/string/test_set_len.rb (test_capacity_equals_to_new_size):
Test for this change. Applying only the test will trigger [BUG].
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56148 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56102 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* array.c (rb_ary_concat_multi): take multiple arguments. based
on the patch by Satoru Horie. [Feature #12333]
* string.c (rb_str_concat_multi, rb_str_prepend_multi): ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_fs_setter): check and convert $; value at
assignment.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55990 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* string.c (rb_str_split_m): show $; name in error message when it
is a wrong object.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55986 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
of ISO-8859-1~16 is now supported. [ci skip]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55777 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* *.c: rename rb_funcall2 to rb_funcallv, except for extensions
which are/will be/may be gems. [Fix GH-1406]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55773 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
[Bug #12628]
This patch introduce many changes.
* Introduce concept of "Block Handler (BH)" to represent
passed blocks.
* move rb_control_frame_t::flag to ep[0] (as a special local
variable). This flags represents not only frame type, but also
env flags such as escaped.
* rename `rb_block_t` to `struct rb_block`.
* Make Proc, Binding and RubyVM::Env objects wb-protected.
Check [Bug #12628] for more details.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55766 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* vm.c (vm_set_main_stack): remove unnecessary check. toplevel
binding must be initialized. [Bug #12611] (N1)
* win32/win32.c (w32_symlink): fix return type. [Bug #12611] (N3)
* string.c (rb_str_split_m): simplify the condition.
[Bug #12611](N4)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55729 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
UTF-8 to use upper-case four-digit hexadecimal escapes without braces
where possible [Feature #12419].
* test/ruby/test_string.rb (test_dump): Add tests for above.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55728 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|