1477622 - Add microbenchmarks measuring hash table performance

Assignee

Description

•

7 years ago

Hash tables are used a lot, and we have a bunch of different implementations. Microbenchmarks will give us a sense of how they compare, performance-wise.

Comment hidden (mozreview-request)

Nicholas Nethercote [inactive]

Assignee

Updated

•

7 years ago

Blocks: 1477626

Nicholas Nethercote [inactive]

Assignee

Updated

•

7 years ago

Blocks: 1477628

Nicholas Nethercote [inactive]

Assignee

Comment 2

•

7 years ago

Results from an opt build on my fast Linux box: > [ RUN ] BenchCollections.PLDHash > > succ_lookups fail_lookups insert_remove iterate > 39.0 ms 35.3 ms 16.9 ms 23.9 ms > 38.2 ms 35.2 ms 15.7 ms 24.4 ms > 38.4 ms 36.1 ms 16.5 ms 25.4 ms > 39.3 ms 36.2 ms 16.1 ms 24.6 ms > 39.9 ms 36.3 ms 16.0 ms 25.6 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 116225, "replicates": [115107,113632,116428,116225,117817], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.PLDHash (579 ms) > [ RUN ] BenchCollections.nsDataHash > > succ_lookups fail_lookups insert_remove iterate > 40.6 ms 40.1 ms 19.6 ms 24.9 ms > 39.2 ms 38.0 ms 18.7 ms 25.0 ms > 38.8 ms 38.0 ms 19.4 ms 25.3 ms > 38.6 ms 38.8 ms 18.7 ms 24.9 ms > 38.6 ms 37.8 ms 18.6 ms 25.0 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 121068, "replicates": [125224,120960,121527,121068,119899], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.nsDataHash (609 ms) > [ RUN ] BenchCollections.JSHash > > succ_lookups fail_lookups insert_remove iterate > 17.9 ms 22.0 ms 12.8 ms 6.8 ms > 18.2 ms 21.1 ms 13.0 ms 7.2 ms > 18.1 ms 21.5 ms 12.8 ms 7.0 ms > 17.3 ms 22.0 ms 13.0 ms 7.4 ms > 17.4 ms 21.8 ms 12.4 ms 7.2 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 59439, "replicates": [59474,59439,59405,59711,58833], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.JSHash (297 ms) > [ RUN ] BenchCollections.RustHash > > succ_lookups fail_lookups insert_remove iterate > 847.6 ms 830.6 ms 127.1 ms 107.7 ms > 865.8 ms 837.4 ms 128.4 ms 108.3 ms > 870.3 ms 819.4 ms 127.1 ms 107.8 ms > 1006.6 ms 853.4 ms 128.0 ms 115.3 ms > 884.0 ms 831.1 ms 126.5 ms 106.8 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 1939955, "replicates": [1912975,1939955,1924661,2103302,1948463], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustHash (9830 ms) > [ RUN ] BenchCollections.RustFnvHash > > succ_lookups fail_lookups insert_remove iterate > 701.5 ms 679.4 ms 118.9 ms 106.0 ms > 737.3 ms 784.3 ms 121.3 ms 105.8 ms > 703.2 ms 673.9 ms 119.4 ms 105.6 ms > 703.7 ms 676.4 ms 119.4 ms 108.1 ms > 708.8 ms 684.0 ms 121.9 ms 110.7 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 1607643, "replicates": [1605724,1748703,1602168,1607643,1625358], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFnvHash (8189 ms) > [ RUN ] BenchCollections.RustFxHash > > succ_lookups fail_lookups insert_remove iterate > 468.8 ms 334.1 ms 55.7 ms 118.1 ms > 384.4 ms 292.0 ms 54.6 ms 116.4 ms > 447.6 ms 348.5 ms 60.1 ms 125.5 ms > 459.9 ms 292.1 ms 54.9 ms 115.9 ms > 387.6 ms 296.2 ms 54.9 ms 117.1 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 922830, "replicates": [976660,847416,981651,922830,855663], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFxHash (4584 ms) Results from a release (i.e. --enable-release) build on my fast Linux box: > [ RUN ] BenchCollections.PLDHash > > succ_lookups fail_lookups insert_remove iterate > 43.2 ms 52.4 ms 21.0 ms 33.5 ms > 41.6 ms 52.2 ms 19.9 ms 33.7 ms > 41.3 ms 51.4 ms 19.3 ms 33.5 ms > 41.8 ms 51.1 ms 19.8 ms 34.2 ms > 41.3 ms 51.3 ms 19.3 ms 34.0 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 146968, "replicates": [150070,147343,145430,146968,145905], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.PLDHash (736 ms) > [ RUN ] BenchCollections.nsDataHash > > succ_lookups fail_lookups insert_remove iterate > 43.2 ms 55.6 ms 19.7 ms 34.0 ms > 43.6 ms 56.9 ms 19.6 ms 34.6 ms > 45.5 ms 55.6 ms 19.6 ms 33.8 ms > 43.1 ms 54.9 ms 19.1 ms 33.6 ms > 43.0 ms 55.2 ms 18.8 ms 34.4 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 152587, "replicates": [152587,154664,154490,150807,151499], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.nsDataHash (764 ms) > [ RUN ] BenchCollections.JSHash > > succ_lookups fail_lookups insert_remove iterate > 15.2 ms 17.8 ms 12.2 ms 5.5 ms > 15.5 ms 18.5 ms 11.7 ms 5.7 ms > 14.9 ms 18.5 ms 11.9 ms 5.4 ms > 15.3 ms 17.7 ms 11.9 ms 5.3 ms > 14.7 ms 18.3 ms 11.6 ms 5.5 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 50608, "replicates": [50650,51419,50608,50147,50102], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.JSHash (253 ms) > [ RUN ] BenchCollections.RustHash > > succ_lookups fail_lookups insert_remove iterate > 71.1 ms 46.7 ms 14.5 ms 8.2 ms > 66.0 ms 47.1 ms 15.0 ms 8.1 ms > 62.3 ms 46.4 ms 15.2 ms 8.1 ms > 66.4 ms 46.4 ms 14.6 ms 8.2 ms > 66.8 ms 46.9 ms 14.7 ms 8.1 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 136219, "replicates": [140511,136219,132036,135527,136441], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustHash (681 ms) > [ RUN ] BenchCollections.RustFnvHash > > succ_lookups fail_lookups insert_remove iterate > 55.7 ms 32.8 ms 12.8 ms 8.2 ms > 55.2 ms 33.5 ms 12.8 ms 8.2 ms > 54.0 ms 33.1 ms 12.8 ms 8.2 ms > 54.3 ms 32.4 ms 12.9 ms 8.1 ms > 55.0 ms 33.4 ms 12.9 ms 8.6 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 109523, "replicates": [109523,109718,108079,107759,109966], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFnvHash (545 ms) > [ RUN ] BenchCollections.RustFxHash > > succ_lookups fail_lookups insert_remove iterate > 9.2 ms 7.4 ms 5.2 ms 7.8 ms > 8.9 ms 7.4 ms 5.0 ms 7.9 ms > 8.8 ms 7.5 ms 5.0 ms 8.4 ms > 8.6 ms 7.6 ms 5.1 ms 7.9 ms > 8.6 ms 7.8 ms 5.0 ms 7.9 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 29352, "replicates": [29609,29286,29733,29298,29352], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFxHash (148 ms) Things of note when comparing the non-release build with the release build: - PLDHash and nsDataHash are noticeably slower with --enable-release. - JSHash is noticeably faster with --enable-release. - The Rust ones are massively (10--20x) faster with --enable-release. (The non-release Rust results are quite similar to a debug build.) I'm not sure what to make of that. Things of note within the release build results: - nsDataHash is the same or slightly slower than PLDHash. This makes sense, because it's layered on top of PLDHash. - JSHash is much faster than PLDHash/nsDataHash: about 2--6x faster, depending on the benchmark. They use almost identical double hashing algorithms, so the most likely reason for this is that PLDHash has "virtual ops" that require function calls, while JSHash uses templates and so can inline those calls. - Rust hash tables with default hashing are slower than PLDHash for succ_lookups but faster for the other benchmarks. Fnv hashing speeds things up a bit. Fx hashing speeds things up a *lot*. - FxHash is the fastest overall, and JSHash is second fastest. The others are significantly slower. Conclusions: - We should use JSHash instead of PLDHash/nsDataHash for hot tables such as CCGraph::mPtrToNodeMap (bug 1477627). This will require moving JSHash into MFBT (bug 1477626), as was done with js::Vector some time ago. Then hot PLDHash/nsDataHash instances can be converted to JSHash. (As well as being faster, JSHash also has a much nicer API.) - We should use FxHash instead of FnvHash within Rust code (bug 1477628). Unlike PLDHash-to-JSHash conversions, which must be done one at a time and only when it makes sense, FnvHash-to-FxHash conversions are trivial and can be done en masse. (BTW, this very change was a win for rustc back in 2016: https://github.com/rust-lang/rust/pull/37229). - Robin hood hashing may explain why FxHash is faster than JSHash. But it looks like there are some constant factors (e.g. avoiding virtual op calls) that can be fixed more easily and give bigger wins than changing the algorithm.

Nicholas Nethercote [inactive]

Assignee

Updated

•

7 years ago

Blocks: 1477632

Nathan Froyd [:froydnj]

Comment 3

•

7 years ago

mozreview-review

Comment on attachment 8994104 [details] Bug 1477622 - Add microbenchmarks measuring hash table performance. https://reviewboard.mozilla.org/r/258730/#review265684 Thanks. ::: xpcom/rust/gtest/bench-collections/Bench.cpp:261 (Diff revision 1) > +MOZ_GTEST_BENCH_F(BenchCollections, PLDHash, [this] { > + BenchImpl(Bench_Cpp_PLDHashTable); > +}); Please add a benchmark for `std::unordered_map`, so we have some data to point to if somebody comes along and asks why we're not using the `std::` data structures. (Or we have data indicating we should switch, but I expect that's unlikely.) Bonus points if you test it with the default hash and also whatever `mozilla::Hash` does. ::: xpcom/rust/gtest/bench-collections/Bench.cpp:265 (Diff revision 1) > +MOZ_GTEST_BENCH_F(BenchCollections, nsDataHash, [this] { > + BenchImpl(Bench_Cpp_nsDataHashtable); > +}); Why is this benchmark valuable compared to the `PLDHash` benchmark? ::: xpcom/rust/gtest/bench-collections/bench.rs:51 (Diff revision 1) > +fn Bench_Rust<H: std::hash::BuildHasher>(mut hs: HashSet<*const c_void, H>, params: *const Params, > + vals: *const *const c_void, len: usize) { I'm assuming that we're relying on the `assert!()` bits to prevent `rustc` from optimizing away the body of this function after inlining? Should we note that? Or should we use `black_box` (or whatever it's called) to achieve the same thing without the extra conditional branches?

Attachment #8994104 - Flags: review?(nfroyd) → review+

Nicholas Nethercote [inactive]

Assignee

Comment 4

•

7 years ago

> Please add a benchmark for `std::unordered_map` Good suggestion. Here are the results with the default hash function: > succ_lookups fail_lookups insert_remove iterate > 44.5 ms 51.1 ms 21.9 ms 5.0 ms > 43.7 ms 50.9 ms 21.5 ms 5.0 ms > 43.8 ms 51.3 ms 21.4 ms 4.9 ms > 43.7 ms 51.3 ms 21.6 ms 5.0 ms > 43.2 ms 50.9 ms 21.6 ms 5.0 ms Very similar to PLDHash, except that iteration over unordered_set is much faster: indeed, faster than any other hash table. > > + BenchImpl(Bench_Cpp_nsDataHashtable); > > Why is this benchmark valuable compared to the `PLDHash` benchmark? I was curious to see if the cost of the layering was significant, and the answer is "slightly". I'll remove it. > I'm assuming that we're relying on the `assert!()` bits to prevent `rustc` > from optimizing away the body of this function after inlining? Should we > note that? Or should we use `black_box` (or whatever it's called) to > achieve the same thing without the extra conditional branches? I used `assert!` because it seemed the most equivalent thing to `MOZ_RELEASE_ASSERT`. I'll add a note that the cross-language comparisons are less reliable than the intra-language comparisons.

Nicholas Nethercote [inactive]

Assignee

Comment 5

•

7 years ago

Here are Linux opt 64 measurements from a try push: > TEST-START | BenchCollections.unordered_set > succ_lookups fail_lookups insert_remove iterate total > 105.0 ms 118.6 ms 35.6 ms 7.3 ms 266.5 ms > 100.5 ms 119.3 ms 37.3 ms 7.3 ms 264.4 ms > 100.7 ms 119.0 ms 38.3 ms 7.3 ms 265.3 ms > 100.7 ms 118.7 ms 38.0 ms 7.3 ms 264.7 ms > 100.7 ms 119.4 ms 37.7 ms 7.3 ms 265.1 ms > TEST-START | BenchCollections.PLDHash > succ_lookups fail_lookups insert_remove iterate total > 75.1 ms 63.6 ms 29.6 ms 58.2 ms 226.5 ms > 74.6 ms 63.9 ms 29.6 ms 58.2 ms 226.3 ms > 74.5 ms 64.5 ms 29.3 ms 59.9 ms 228.2 ms > 74.7 ms 65.2 ms 29.4 ms 58.6 ms 228.0 ms > 74.8 ms 64.1 ms 29.3 ms 58.1 ms 226.2 ms > TEST-START | BenchCollections.JSHash > succ_lookups fail_lookups insert_remove iterate total > 29.0 ms 29.8 ms 17.7 ms 14.4 ms 91.0 ms > 28.5 ms 30.0 ms 17.6 ms 14.3 ms 90.4 ms > 28.6 ms 30.1 ms 17.6 ms 14.3 ms 90.6 ms > 28.3 ms 29.7 ms 17.7 ms 13.9 ms 89.6 ms > 28.1 ms 29.9 ms 17.6 ms 14.0 ms 89.6 ms > TEST-START | BenchCollections.RustHash > succ_lookups fail_lookups insert_remove iterate total > 93.4 ms 79.4 ms 21.7 ms 13.2 ms 207.6 ms > 93.2 ms 80.9 ms 21.3 ms 13.1 ms 208.5 ms > 94.4 ms 78.8 ms 21.4 ms 12.8 ms 207.5 ms > 95.6 ms 80.1 ms 21.5 ms 12.8 ms 210.0 ms > 94.1 ms 80.4 ms 22.3 ms 13.3 ms 210.0 ms > TEST-START | BenchCollections.RustFnvHash > succ_lookups fail_lookups insert_remove iterate total > 88.1 ms 57.1 ms 20.9 ms 13.1 ms 179.2 ms > 87.9 ms 56.6 ms 20.3 ms 13.1 ms 177.8 ms > 87.7 ms 56.9 ms 20.1 ms 13.3 ms 177.9 ms > 87.8 ms 56.6 ms 20.1 ms 13.2 ms 177.6 ms > 87.6 ms 56.8 ms 20.2 ms 13.2 ms 177.7 ms > TEST-START | BenchCollections.RustFxHash > succ_lookups fail_lookups insert_remove iterate total > 14.5 ms 10.8 ms 8.7 ms 13.0 ms 47.1 ms > 14.0 ms 10.8 ms 8.8 ms 13.0 ms 46.6 ms > 13.9 ms 10.8 ms 8.7 ms 12.9 ms 46.3 ms > 14.0 ms 10.8 ms 8.6 ms 13.0 ms 46.5 ms > 13.9 ms 10.8 ms 8.6 ms 12.9 ms 46.3 ms Roughly similar to what I get locally (albeit uniformly slower) except that unordered_set is clearly worse than PLDHash.

Pulsebot

Comment 6

•

7 years ago

Pushed by nnethercote@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/1ea12424829b Add microbenchmarks measuring hash table performance. r=froydnj

Raul Gurzau (:RaulG)

Comment 7

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1ea12424829b

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox63: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla63

Jeff Muizelaar [:jrmuizel]

Comment 8

•

7 years ago

It would probably be good to include the size of the hash table, if possible, in these benchmarks.

Mike Hommey [:glandium]

Comment 9

•

7 years ago

Relatedly, I found this in my RSS feed this morning: https://hpjansson.org/blag/2018/07/24/a-hash-table-re-hash/

Bugzilla

Add microbenchmarks measuring hash table performance

Categories

(Core :: XPCOM, enhancement)

Tracking

()

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Updated

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type