From: "martinemde (Martin Emde) via ruby-core" Date: 2024-05-14T21:51:09+00:00 Subject: [ruby-core:117881] [Ruby master Bug#20424] ZLib::GZipReader always double allocates strings when passed outbuf, significantly increasing memory usage Issue #20424 has been updated by martinemde (Martin Emde). [zlib #61 was merged](https://github.com/ruby/zlib/pull/61#event-12808578072). It seems like we can consider this ticket complete. ---------------------------------------- Bug #20424: ZLib::GZipReader always double allocates strings when passed outbuf, significantly increasing memory usage https://bugs.ruby-lang.org/issues/20424#change-108296 * Author: martinemde (Martin Emde) * Status: Open * ruby -v: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23] * Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN ---------------------------------------- In trying to improve the memory performance during the install of rubygems, we previously found a bug in `eof?`. Further investigation into the memory usage during the fix for this bug found wasteful allocating of strings in readpartial and read. In ZLib, when reading with readpartial or read, a new string is always created for the bytes read from the buffer. The current approach allocates a string no matter if there is an outbuf passed. ``` # vastly simplified psuedo implementation def readpartial(len, dst=nil) if (buffer.empty?) buffer = gzipfile.readpartial(len, dst) # adds inflated bytes into dst if passed end dst = allocate_new_string(len) # make a new string for the destination dst << buffer.read(len) # read from the buffer into the destination end ``` The result is that readpartial always allocated at least double the bytes necessary. Samuel Giddins submitted, and I have tested and reviewed, [a pull request, zlib#61](https://github.com/ruby/zlib/pull/61) that resolves the issue and vastly improves the memory usage and increases the speed of GZipReader by avoiding excess memcpy and rb_str_new calls that were wasted. This PR also adds an outbuf to GZipReader#read for improvement memory management, very similar to [IO#read](https://ruby-doc.org/core-2.5.1/IO.html#method-i-read) We appreciate your attention to this performance improvement. We believe it will further improve the performance of rubygems gem installs. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/