summaryrefslogtreecommitdiff
path: root/internal/string.h
AgeCommit message (Collapse)Author
2025-04-18Lock-free hash set for fstrings [Feature #21268]John Hawthorn
This implements a hash set which is wait-free for lookup and lock-free for insert (unless resizing) to use for fstring de-duplication. As highlighted in https://bugs.ruby-lang.org/issues/19288, heavy use of fstrings (frozen interned strings) can significantly reduce the parallelism of Ractors. I tried a few other approaches first: using an RWLock, striping a series of RWlocks (partitioning the hash N-ways to reduce lock contention), and putting a cache in front of it. All of these improved the situation, but were unsatisfying as all still required locks for writes (and granular locks are awkward, since we run the risk of needing to reach a vm barrier) and this table is somewhat write-heavy. My main reference for this was Cliff Click's talk on a lock free hash-table for java https://www.youtube.com/watch?v=HJ-719EGIts. It turns out this lock-free hash set is made easier to implement by a few properties: * We only need a hash set rather than a hash table (we only need keys, not values), and so the full entry can be written as a single VALUE * As a set we only need lookup/insert/delete, no update * Delete is only run inside GC so does not need to be atomic (It could be made concurrent) * I use rb_vm_barrier for the (rare) table rebuilds (It could be made concurrent) We VM lock (but don't require other threads to stop) for table rebuilds, as those are rare * The conservative garbage collector makes deferred replication easy, using a T_DATA object Another benefits of having a table specific to fstrings is that we compare by value on lookup/insert, but by identity on delete, as we only want to remove the exact string which is being freed. This is faster and provides a second way to avoid the race condition in https://bugs.ruby-lang.org/issues/21172. This is a pretty standard open-addressing hash table with quadratic probing. Similar to our existing st_table or id_table. Deletes (which happen on GC) replace existing keys with a tombstone, which is the only type of update which can occur. Tombstones are only cleared out on resize. Unlike st_table, the VALUEs are stored in the hash table itself (st_table's bins) rather than as a compact index. This avoids an extra pointer dereference and is possible because we don't need to preserve insertion order. The table targets a load factor of 2 (it is enlarged once it is half full). Notes: Merged: https://github.com/ruby/ruby/pull/12921
2025-04-18Extract rb_gc_free_fstring to string.cJohn Hawthorn
This allows more flexibility in how we deal with the fstring table Notes: Merged: https://github.com/ruby/ruby/pull/12921
2025-03-27Freeze $/ and make it ractor safeÉtienne Barrié
[Feature #21109] By always freezing when setting the global rb_rs variable, we can ensure it is not modified and can be accessed from a ractor. We're also making sure it's an instance of String and does not have any instance variables. Of course, if $/ is changed at runtime, it may cause surprising behavior but doing so is deprecated already anyway. Co-authored-by: Jean Boussier <[email protected]> Notes: Merged: https://github.com/ruby/ruby/pull/12975
2024-11-13YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments (#12069)Randy Stauner
* YJIT: Specialize `String#[]` (`String#slice`) with fixnum arguments String#[] is in the top few C calls of several YJIT benchmarks: liquid-compile rubocop mail sudoku This speeds up these benchmarks by 1-2%. * YJIT: Try harder to get type info for `String#[]` In the large generated code of the mail gem the context doesn't have the type info. In that case if we peek at the stack and add a guard we can still apply the specialization and it speeds up the mail benchmark by 5%. Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Takashi Kokubun (k0kubun) <[email protected]> --------- Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Takashi Kokubun (k0kubun) <[email protected]> Notes: Merged-By: maximecb <[email protected]>
2024-11-13Mark strings returned by Symbol#to_s as chilled (#12065)Jean byroot Boussier
* Use FL_USER0 for ELTS_SHARED This makes space in RString for two bits for chilled strings. * Mark strings returned by `Symbol#to_s` as chilled [Feature #20350] `STR_CHILLED` now spans on two user flags. If one bit is set it marks a chilled string literal, if it's the other it marks a `Symbol#to_s` chilled string. Since it's not possible, and doesn't make much sense to include debug info when `--debug-frozen-string-literal` is set, we can't include allocation source, but we can safely include the symbol name in the warning message, making it much easier to find the source of the issue. Co-Authored-By: Étienne Barrié <[email protected]> --------- Co-authored-by: Étienne Barrié <[email protected]> Co-authored-by: Jean Boussier <[email protected]>
2024-10-21Show where mutated chilled strings were allocatedÉtienne Barrié
[Feature #20205] The warning now suggests running with --debug-frozen-string-literal: ``` test.rb:3: warning: literal string will be frozen in the future (run with --debug-frozen-string-literal for more information) ``` When using --debug-frozen-string-literal, the location where the string was created is shown: ``` test.rb:3: warning: literal string will be frozen in the future test.rb:1: info: the string was created here ``` When resurrecting strings and debug mode is not enabled, the overhead is a simple FL_TEST_RAW. When mutating chilled strings and deprecation warnings are not enabled, the overhead is a simple warning category enabled check. Co-authored-by: Jean Boussier <[email protected]> Co-authored-by: Nobuyoshi Nakada <[email protected]> Co-authored-by: Jean Boussier <[email protected]> Notes: Merged: https://github.com/ruby/ruby/pull/11893
2024-09-16Don't export unnecessary string functionsPeter Zhu
These functions are not used publicly, so we don't need to export them. Notes: Merged: https://github.com/ruby/ruby/pull/11634
2024-05-28Precompute embedded string literals hash codeJean Boussier
With embedded strings we often have some space left in the slot, which we can use to store the string Hash code. It's probably only worth it for string literals, as they are the ones likely to be used as hash keys. We chose to store the Hash code right after the string terminator as to make it easy/fast to compute, and not require one more union in RString. ``` compare-ruby: ruby 3.4.0dev (2024-04-22T06:32:21Z main f77618c1fa) [arm64-darwin23] built-ruby: ruby 3.4.0dev (2024-04-22T10:13:03Z interned-string-ha.. 8a1a32331b) [arm64-darwin23] last_commit=Precompute embedded string literals hash code | |compare-ruby|built-ruby| |:-----------|-----------:|---------:| |symbol | 39.275M| 39.753M| | | -| 1.01x| |dyn_symbol | 37.348M| 37.704M| | | -| 1.01x| |small_lit | 29.514M| 33.948M| | | -| 1.15x| |frozen_lit | 27.180M| 33.056M| | | -| 1.22x| |iseq_lit | 27.391M| 32.242M| | | -| 1.18x| ``` Co-Authored-By: Étienne Barrié <[email protected]>
2024-05-28Stop marking chilled strings as frozenÉtienne Barrié
They were initially made frozen to avoid false positives for cases such as: str = str.dup if str.frozen? But this may cause bugs and is generally confusing for users. [Feature #20205] Co-authored-by: Jean Boussier <[email protected]>
2024-03-19Implement chilled stringsÉtienne Barrié
[Feature #20205] As a path toward enabling frozen string literals by default in the future, this commit introduce "chilled strings". From a user perspective chilled strings pretend to be frozen, but on the first attempt to mutate them, they lose their frozen status and emit a warning rather than to raise a `FrozenError`. Implementation wise, `rb_compile_option_struct.frozen_string_literal` is no longer a boolean but a tri-state of `enabled/disabled/unset`. When code is compiled with frozen string literals neither explictly enabled or disabled, string literals are compiled with a new `putchilledstring` instruction. This instruction is identical to `putstring` except it marks the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags. Chilled strings have the `FL_FREEZE` flag as to minimize the need to check for chilled strings across the codebase, and to improve compatibility with C extensions. Notes: - `String#freeze`: clears the chilled flag. - `String#-@`: acts as if the string was mutable. - `String#+@`: acts as if the string was mutable. - `String#clone`: copies the chilled flag. Co-authored-by: Jean Boussier <[email protected]>
2024-02-19[Bug #20280] Check by `rb_parser_enc_str_coderange`Nobuyoshi Nakada
Co-authored-by: Yuichiro Kaneko <[email protected]>
2024-02-19[Bug #20280] Raise SyntaxError on invalid encoding symbolNobuyoshi Nakada
2024-02-13Specialize String#byteslice(a, b) (#9939)Aaron Patterson
* Specialize String#byteslice(a, b) This adds a specialization for String#byteslice when there are two parameters. This makes our protobuf parser go from 5.84x slower to 5.33x slower ``` Comparison: decode upstream (53738 bytes): 7228.5 i/s decode protobuff (53738 bytes): 1236.8 i/s - 5.84x slower Comparison: decode upstream (53738 bytes): 7024.8 i/s decode protobuff (53738 bytes): 1318.5 i/s - 5.33x slower ``` * Update yjit/src/codegen.rs --------- Co-authored-by: Maxime Chevalier-Boisvert <[email protected]>
2024-02-05Make io_fwrite safe for compactionPeter Zhu
[Bug #20169] Embedded strings are not safe for system calls without the GVL because compaction can cause pages to be locked causing the operation to fail with EFAULT. This commit changes io_fwrite to use rb_str_tmp_frozen_no_embed_acquire, which guarantees that the return string is not embedded.
2023-12-01Pin embedded shared stringsPeter Zhu
Embedded shared strings cannot be moved because strings point into the slot of the shared string. There may be code using the RSTRING_PTR on the stack, which would pin the string but not pin the shared string, causing it to move.
2023-09-01Use end of char boundary in start_with?John Hawthorn
Previously we used the next character following the found prefix to determine if the match ended on a broken character. This had caused surprising behaviour when a valid character was followed by a UTF-8 continuation byte. This commit changes the behaviour to instead look for the end of the last character in the prefix. [Bug #19784] Co-authored-by: ywenc <[email protected]> Co-authored-by: Nobuyoshi Nakada <[email protected]> Notes: Merged: https://github.com/ruby/ruby/pull/8348
2023-08-26Introduce `at_char_boundary` functionNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/8296
2023-03-06Stop exporting symbols for MJITTakashi Kokubun
Notes: Merged: https://github.com/ruby/ruby/pull/7459
2022-11-02Use shared flags of the typePeter Zhu
The ELTS_SHARED flag is generic, so we should prefer to use the flags specific of the type (STR_SHARED for strings and RARRAY_SHARED_FLAG for arrays).
2022-09-23Revert "Revert "error.c: Let Exception#inspect inspect its message""Yusuke Endoh
This reverts commit b9f030954a8a1572032f3548b39c5b8ac35792ce. [Bug #18170]
2022-08-31[Bug #18973] Promote US-ASCII to ASCII-8BIT when adding 8-bit charNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/6306
2022-06-13Move String RVALUES between poolsMatt Valentine-House
And re-embed any strings that can now fit inside the slot they've been moved to Notes: Merged: https://github.com/ruby/ruby/pull/5986
2022-06-07Revert "error.c: Let Exception#inspect inspect its message"Yusuke Endoh
This reverts commit 9d927204e7b86eb00bfd07a060a6383139edf741. Notes: Merged: https://github.com/ruby/ruby/pull/5981
2022-06-07error.c: Let Exception#inspect inspect its messageYusuke Endoh
... only when the message string has a newline. `p StandardError.new("foo\nbar")` now prints `#<StandardError: "foo\nbar">' instead of: #<StandardError: bar> [Bug #18170] Notes: Merged: https://github.com/ruby/ruby/pull/4857
2021-10-20Add comments about special runtime routines YJIT callsAlan Wu
When YJIT make calls to routines without reconstructing interpreter state through jit_prepare_routine_call(), it relies on the routine to never allocate, raise, and push/pop control frames. Comment about this on the routines that YJTI calls. This is probably something we should dynamically verify on debug builds. It's hard to statically verify this as it requires verifying all functions in the call tree. Maybe something to look at in the future.
2021-10-01Skip broken strings as the locale encodingNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/4915
2021-09-10internal/*.h: skip doxygen卜部昌平
These contents are purely implementation details, not worth appearing in CAPI documents. [ci skip] Notes: Merged: https://github.com/ruby/ruby/pull/4815
2021-07-11Move rb_str_escape function declarationS-H-GAMELINKS
Notes: Merged: https://github.com/ruby/ruby/pull/4607
2021-06-01Remove unneeded rb_str_initialize defination in internal/string.h (#4465)S.H
Notes: Merged-By: k0kubun <[email protected]>
2020-12-07tuning trial: newobj with current ecKoichi Sasada
Passing current ec can improve performance of newobj. This patch tries it for Array and String literals ([] and ''). Notes: Merged: https://github.com/ruby/ruby/pull/3842
2020-12-01should not use rb_str_modify(), tooKoichi Sasada
Same as 8247b8edde, should not use rb_str_modify() here. https://bugs.ruby-lang.org/issues/17343#change-88858 Notes: Merged: https://github.com/ruby/ruby/pull/3833
2020-05-11sed -i 's|ruby/impl|ruby/internal|'卜部昌平
To fix build failures. Notes: Merged: https://github.com/ruby/ruby/pull/3079
2020-05-11sed -i s|ruby/3|ruby/impl|g卜部昌平
This shall fix compile errors. Notes: Merged: https://github.com/ruby/ruby/pull/3079
2020-04-13add #include guard hack卜部昌平
According to MSVC manual (*1), cl.exe can skip including a header file when that: - contains #pragma once, or - starts with #ifndef, or - starts with #if ! defined. GCC has a similar trick (*2), but it acts more stricter (e. g. there must be _no tokens_ outside of #ifndef...#endif). Sun C lacked #pragma once for a looong time. Oracle Developer Studio 12.5 finally implemented it, but we cannot assume such recent version. This changeset modifies header files so that each of them include strictly one #ifndef...#endif. I believe this is the most portable way to trigger compiler optimizations. [Bug #16770] *1: https://docs.microsoft.com/en-us/cpp/preprocessor/once *2: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html Notes: Merged: https://github.com/ruby/ruby/pull/3023
2020-04-08Merge pull request #2991 from shyouhei/ruby.h卜部昌平
Split ruby.h Notes: Merged-By: shyouhei <[email protected]>
2019-12-26decouple internal.h headers卜部昌平
Saves comitters' daily life by avoid #include-ing everything from internal.h to make each file do so instead. This would significantly speed up incremental builds. We take the following inclusion order in this changeset: 1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very first thing among everything). 2. RUBY_EXTCONF_H if any. 3. Standard C headers, sorted alphabetically. 4. Other system headers, maybe guarded by #ifdef 5. Everything else, sorted alphabetically. Exceptions are those win32-related headers, which tend not be self- containing (headers have inclusion order dependencies). Notes: Merged: https://github.com/ruby/ruby/pull/2711
2019-12-26internal/string.h rework卜部昌平
Reduced the number of macros defined in the file. Also made it explicit for MJIT_FUNC_EXPORTTED functions to be so. Notes: Merged: https://github.com/ruby/ruby/pull/2711
2019-12-26internal/proc.h rework卜部昌平
Annotated MJIT_FUNC_EXPORTED functions as such. Declaration of rb_sym_to_proc is moved into this file because the function is defined in proc.c rather than string.c. Notes: Merged: https://github.com/ruby/ruby/pull/2711
2019-12-26split internal.h into files卜部昌平
One day, I could not resist the way it was written. I finally started to make the code clean. This changeset is the beginning of a series of housekeeping commits. It is a simple refactoring; split internal.h into files, so that we can divide and concur in the upcoming commits. No lines of codes are either added or removed, except the obvious file headers/footers. The generated binary is identical to the one before. Notes: Merged: https://github.com/ruby/ruby/pull/2711