From: headius@... Date: 2020-07-01T17:26:32+00:00 Subject: [ruby-core:99015] [Ruby master Feature#17002] Extend heap pages to exactly 16KiB Issue #17002 has been updated by headius (Charles Nutter). I chatted with @tenderlovemaking a bit about this. I did some research as well and my reading of this "extra" metadata from malloc seems to indicate that you really should not assumptions about it being in a specific place, or a specific size, or on a specific page. It is a **malloc-internal** detail, and presumably a good implementation would not dirty a whole new page just to do this bookkeeping. Let malloc do malloc. It seems clear from this issue that at least one of these assumptions (where the metadata goes) is not always correct. To me, that's as bad as never being correct. @tenderlovemaking had already discovered this when I made the same suggestion based on my independent research, and we both came to the same conclusion. I think this is the right change to make. ---------------------------------------- Feature #17002: Extend heap pages to exactly 16KiB https://bugs.ruby-lang.org/issues/17002#change-86390 * Author: tenderlovemaking (Aaron Patterson) * Status: Open * Priority: Normal ---------------------------------------- Hi, I would like to extend heap pages to be exactly 16KiB. Currently, `struct heap_page_body` is 16KiB - `(sizeof(size_t) * 5)`. Before I list the reasons I want to change, there are two important facts I want to list. First, OS pages are 4KiB on platforms I tested (macOS, Ubuntu, Windows). Second, when the GC allocates pages, it first allocates `struct heap_page_body` immediately followed by `struct heap_page`: https://github.com/ruby/ruby/blob/289a28e68f30e879760fd000833b512d506a0805/gc.c#L1756-L1767 I want to make this change for a few reasons: 1. I would like `struct heap_page_body` to be a multiple of OS pages so that we can use `mprotect` on it (I want to implement read barriers on heap pages with `mprotect`, so this is my selfish reason) 2. Some allocators (specifically glibc) will put `struct heap_page` on the same OS page as `struct heap_page_body`. `struct heap_page` is frequently modified, so that OS page (including Ruby objects) will be copied. Extending `struct heap_page_body` to 16KiB can help prevent CoW faults. (see Note 1) 3. Allocating 16KiB can reduce overall memory consumption. Some allocators (specifically jemalloc) will round requested chunks to bin sizes. jemalloc has a 16KiB bin size, so our request for `16KiB - (sizeof(size_t) * 5)` is rounded up to 16KiB anyway, and `(sizeof(size_t) * 5)` is wasted. `(sizeof(size_t) * 5)` is enough room to fit one more Ruby object, so if we use that space for one more object, then we don't need to allocate as many pages, and memory usage can actually decrease. My hypothesis is that this patch will either not change overall memory usage, or decrease overall memory usage. But in either case it will allow us to use `mprotect`, and improve CoW. Tests === I tested this patch on an Ubuntu machine with jemalloc and glibc. Here is my system information: Linux version: ``` aaron@whiteclaw ~> uname -a Linux whiteclaw 5.4.0-37-generic #41-Ubuntu SMP Wed Jun 3 18:57:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux aaron@whiteclaw ~> lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04 LTS Release: 20.04 Codename: focal ``` GLIBC version: ``` aaron@whiteclaw ~> ldd --version ldd (Ubuntu GLIBC 2.31-0ubuntu9) 2.31 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper. ``` jemalloc version: ``` aaron@whiteclaw ~/git> apt list --installed | grep jemalloc WARNING: apt does not have a stable CLI interface. Use with caution in scripts. libjemalloc-dev/focal,now 5.2.1-1ubuntu1 amd64 [installed] libjemalloc2/focal,now 5.2.1-1ubuntu1 amd64 [installed,automatic] ``` To test memory usage, I used this tool: https://github.com/bpowers/mstat `mstat` is a sampling profiler that will report memory usage over time. I generated RDoc for Ruby and took samples while documentation was generated. Here is the Ruby command I used: ``` ./ruby --disable-gems "./libexec/rdoc" --root "." --encoding=UTF-8 --all --ri --op ".ext/rdoc" --page-dir "./doc" --no-force-update "." ``` I made 50 samples for each allocator and branch like this: ``` for x in (seq 50) sudo rm -rf .ext/rdoc; sudo ../src/mstat/mstat -o glibc-branch_$x.tsv ./ruby --disable-gems "./libexec/rdoc" --root "." --encoding=UTF-8 --all --ri --op ".ext/rdoc" --page-dir "./doc" --no-force-update "." end ``` In other words I made 200 samples total (50 jemalloc + master, 50 jemalloc + branch, 50 glibc + master, 50 glibc + branch). glibc ==== Here is a comparison of glibc over time (lower is better): ![glibc changes](https://user-images.githubusercontent.com/3124/86180732-947f4c80-bae1-11ea-8388-0d1ab121a270.png) From this graph it looks like glibc is mostly the same, but sometimes lower. It looks like there are some outlier samples that go higher. I made a box plot to compare maximum RSS: ![glibc max boxplot](https://user-images.githubusercontent.com/3124/86180925-ef18a880-bae1-11ea-978a-e45a0b3d116f.png) The box plot shows the max RSS is usually lower with some outliers that are higher. jemalloc ==== Here is a comparison of jemalloc over time (lower is better): ![jemalloc over time](https://user-images.githubusercontent.com/3124/86181071-31da8080-bae2-11ea-9e9e-3c8dd868be4d.png) According to this graph jemalloc is usually lower. I made another box plot to compare maximum RSS on jemalloc: ![jemalloc max RSS](https://user-images.githubusercontent.com/3124/86181149-5b93a780-bae2-11ea-88c8-ed17d34ab758.png) The box plot shows that max RSS is typically lower on jemalloc. CoW Performance ==== I didn't find a good way to measure CoW performance, but I don't think this patch would possibly degrade it. Summary === I would like to merge this patch because there are a few good points (ability to use mprotect, memory savings, possible CoW improvements), and I can't find any downsides. Thanks John Hawthorn for helping me get the math right on the "end pointer" part. Note 1: I was able to prove that `struct heap_page` will exist on the same OS page as `struct heap_page_body` here: https://github.com/ruby/ruby/pull/3253/commits/33390d15e7a6f803823efcb41205167c8b126fbb ---Files-------------------------------- 0001-Expand-heap-pages-to-be-exactly-16kb.patch (4.51 KB) -- https://bugs.ruby-lang.org/ Unsubscribe: