Closed Bug 1477622 Opened 7 years ago Closed 7 years ago

Add microbenchmarks measuring hash table performance

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(1 file)

Hash tables are used a lot, and we have a bunch of different implementations. Microbenchmarks will give us a sense of how they compare, performance-wise.
Blocks: 1477626
Blocks: 1477628
Results from an opt build on my fast Linux box: > [ RUN ] BenchCollections.PLDHash > > succ_lookups fail_lookups insert_remove iterate > 39.0 ms 35.3 ms 16.9 ms 23.9 ms > 38.2 ms 35.2 ms 15.7 ms 24.4 ms > 38.4 ms 36.1 ms 16.5 ms 25.4 ms > 39.3 ms 36.2 ms 16.1 ms 24.6 ms > 39.9 ms 36.3 ms 16.0 ms 25.6 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 116225, "replicates": [115107,113632,116428,116225,117817], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.PLDHash (579 ms) > [ RUN ] BenchCollections.nsDataHash > > succ_lookups fail_lookups insert_remove iterate > 40.6 ms 40.1 ms 19.6 ms 24.9 ms > 39.2 ms 38.0 ms 18.7 ms 25.0 ms > 38.8 ms 38.0 ms 19.4 ms 25.3 ms > 38.6 ms 38.8 ms 18.7 ms 24.9 ms > 38.6 ms 37.8 ms 18.6 ms 25.0 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 121068, "replicates": [125224,120960,121527,121068,119899], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.nsDataHash (609 ms) > [ RUN ] BenchCollections.JSHash > > succ_lookups fail_lookups insert_remove iterate > 17.9 ms 22.0 ms 12.8 ms 6.8 ms > 18.2 ms 21.1 ms 13.0 ms 7.2 ms > 18.1 ms 21.5 ms 12.8 ms 7.0 ms > 17.3 ms 22.0 ms 13.0 ms 7.4 ms > 17.4 ms 21.8 ms 12.4 ms 7.2 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 59439, "replicates": [59474,59439,59405,59711,58833], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.JSHash (297 ms) > [ RUN ] BenchCollections.RustHash > > succ_lookups fail_lookups insert_remove iterate > 847.6 ms 830.6 ms 127.1 ms 107.7 ms > 865.8 ms 837.4 ms 128.4 ms 108.3 ms > 870.3 ms 819.4 ms 127.1 ms 107.8 ms > 1006.6 ms 853.4 ms 128.0 ms 115.3 ms > 884.0 ms 831.1 ms 126.5 ms 106.8 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 1939955, "replicates": [1912975,1939955,1924661,2103302,1948463], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustHash (9830 ms) > [ RUN ] BenchCollections.RustFnvHash > > succ_lookups fail_lookups insert_remove iterate > 701.5 ms 679.4 ms 118.9 ms 106.0 ms > 737.3 ms 784.3 ms 121.3 ms 105.8 ms > 703.2 ms 673.9 ms 119.4 ms 105.6 ms > 703.7 ms 676.4 ms 119.4 ms 108.1 ms > 708.8 ms 684.0 ms 121.9 ms 110.7 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 1607643, "replicates": [1605724,1748703,1602168,1607643,1625358], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFnvHash (8189 ms) > [ RUN ] BenchCollections.RustFxHash > > succ_lookups fail_lookups insert_remove iterate > 468.8 ms 334.1 ms 55.7 ms 118.1 ms > 384.4 ms 292.0 ms 54.6 ms 116.4 ms > 447.6 ms 348.5 ms 60.1 ms 125.5 ms > 459.9 ms 292.1 ms 54.9 ms 115.9 ms > 387.6 ms 296.2 ms 54.9 ms 117.1 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 922830, "replicates": [976660,847416,981651,922830,855663], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFxHash (4584 ms) Results from a release (i.e. --enable-release) build on my fast Linux box: > [ RUN ] BenchCollections.PLDHash > > succ_lookups fail_lookups insert_remove iterate > 43.2 ms 52.4 ms 21.0 ms 33.5 ms > 41.6 ms 52.2 ms 19.9 ms 33.7 ms > 41.3 ms 51.4 ms 19.3 ms 33.5 ms > 41.8 ms 51.1 ms 19.8 ms 34.2 ms > 41.3 ms 51.3 ms 19.3 ms 34.0 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 146968, "replicates": [150070,147343,145430,146968,145905], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.PLDHash (736 ms) > [ RUN ] BenchCollections.nsDataHash > > succ_lookups fail_lookups insert_remove iterate > 43.2 ms 55.6 ms 19.7 ms 34.0 ms > 43.6 ms 56.9 ms 19.6 ms 34.6 ms > 45.5 ms 55.6 ms 19.6 ms 33.8 ms > 43.1 ms 54.9 ms 19.1 ms 33.6 ms > 43.0 ms 55.2 ms 18.8 ms 34.4 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 152587, "replicates": [152587,154664,154490,150807,151499], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.nsDataHash (764 ms) > [ RUN ] BenchCollections.JSHash > > succ_lookups fail_lookups insert_remove iterate > 15.2 ms 17.8 ms 12.2 ms 5.5 ms > 15.5 ms 18.5 ms 11.7 ms 5.7 ms > 14.9 ms 18.5 ms 11.9 ms 5.4 ms > 15.3 ms 17.7 ms 11.9 ms 5.3 ms > 14.7 ms 18.3 ms 11.6 ms 5.5 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 50608, "replicates": [50650,51419,50608,50147,50102], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.JSHash (253 ms) > [ RUN ] BenchCollections.RustHash > > succ_lookups fail_lookups insert_remove iterate > 71.1 ms 46.7 ms 14.5 ms 8.2 ms > 66.0 ms 47.1 ms 15.0 ms 8.1 ms > 62.3 ms 46.4 ms 15.2 ms 8.1 ms > 66.4 ms 46.4 ms 14.6 ms 8.2 ms > 66.8 ms 46.9 ms 14.7 ms 8.1 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 136219, "replicates": [140511,136219,132036,135527,136441], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustHash (681 ms) > [ RUN ] BenchCollections.RustFnvHash > > succ_lookups fail_lookups insert_remove iterate > 55.7 ms 32.8 ms 12.8 ms 8.2 ms > 55.2 ms 33.5 ms 12.8 ms 8.2 ms > 54.0 ms 33.1 ms 12.8 ms 8.2 ms > 54.3 ms 32.4 ms 12.9 ms 8.1 ms > 55.0 ms 33.4 ms 12.9 ms 8.6 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 109523, "replicates": [109523,109718,108079,107759,109966], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFnvHash (545 ms) > [ RUN ] BenchCollections.RustFxHash > > succ_lookups fail_lookups insert_remove iterate > 9.2 ms 7.4 ms 5.2 ms 7.8 ms > 8.9 ms 7.4 ms 5.0 ms 7.9 ms > 8.8 ms 7.5 ms 5.0 ms 8.4 ms > 8.6 ms 7.6 ms 5.1 ms 7.9 ms > 8.6 ms 7.8 ms 5.0 ms 7.9 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 29352, "replicates": [29609,29286,29733,29298,29352], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFxHash (148 ms) Things of note when comparing the non-release build with the release build: - PLDHash and nsDataHash are noticeably slower with --enable-release. - JSHash is noticeably faster with --enable-release. - The Rust ones are massively (10--20x) faster with --enable-release. (The non-release Rust results are quite similar to a debug build.) I'm not sure what to make of that. Things of note within the release build results: - nsDataHash is the same or slightly slower than PLDHash. This makes sense, because it's layered on top of PLDHash. - JSHash is much faster than PLDHash/nsDataHash: about 2--6x faster, depending on the benchmark. They use almost identical double hashing algorithms, so the most likely reason for this is that PLDHash has "virtual ops" that require function calls, while JSHash uses templates and so can inline those calls. - Rust hash tables with default hashing are slower than PLDHash for succ_lookups but faster for the other benchmarks. Fnv hashing speeds things up a bit. Fx hashing speeds things up a *lot*. - FxHash is the fastest overall, and JSHash is second fastest. The others are significantly slower. Conclusions: - We should use JSHash instead of PLDHash/nsDataHash for hot tables such as CCGraph::mPtrToNodeMap (bug 1477627). This will require moving JSHash into MFBT (bug 1477626), as was done with js::Vector some time ago. Then hot PLDHash/nsDataHash instances can be converted to JSHash. (As well as being faster, JSHash also has a much nicer API.) - We should use FxHash instead of FnvHash within Rust code (bug 1477628). Unlike PLDHash-to-JSHash conversions, which must be done one at a time and only when it makes sense, FnvHash-to-FxHash conversions are trivial and can be done en masse. (BTW, this very change was a win for rustc back in 2016: https://github.com/rust-lang/rust/pull/37229). - Robin hood hashing may explain why FxHash is faster than JSHash. But it looks like there are some constant factors (e.g. avoiding virtual op calls) that can be fixed more easily and give bigger wins than changing the algorithm.
Blocks: 1477632
Comment on attachment 8994104 [details] Bug 1477622 - Add microbenchmarks measuring hash table performance. https://reviewboard.mozilla.org/r/258730/#review265684 Thanks. ::: xpcom/rust/gtest/bench-collections/Bench.cpp:261 (Diff revision 1) > +MOZ_GTEST_BENCH_F(BenchCollections, PLDHash, [this] { > + BenchImpl(Bench_Cpp_PLDHashTable); > +}); Please add a benchmark for `std::unordered_map`, so we have some data to point to if somebody comes along and asks why we're not using the `std::` data structures. (Or we have data indicating we should switch, but I expect that's unlikely.) Bonus points if you test it with the default hash and also whatever `mozilla::Hash` does. ::: xpcom/rust/gtest/bench-collections/Bench.cpp:265 (Diff revision 1) > +MOZ_GTEST_BENCH_F(BenchCollections, nsDataHash, [this] { > + BenchImpl(Bench_Cpp_nsDataHashtable); > +}); Why is this benchmark valuable compared to the `PLDHash` benchmark? ::: xpcom/rust/gtest/bench-collections/bench.rs:51 (Diff revision 1) > +fn Bench_Rust<H: std::hash::BuildHasher>(mut hs: HashSet<*const c_void, H>, params: *const Params, > + vals: *const *const c_void, len: usize) { I'm assuming that we're relying on the `assert!()` bits to prevent `rustc` from optimizing away the body of this function after inlining? Should we note that? Or should we use `black_box` (or whatever it's called) to achieve the same thing without the extra conditional branches?
Attachment #8994104 - Flags: review?(nfroyd) → review+
> Please add a benchmark for `std::unordered_map` Good suggestion. Here are the results with the default hash function: > succ_lookups fail_lookups insert_remove iterate > 44.5 ms 51.1 ms 21.9 ms 5.0 ms > 43.7 ms 50.9 ms 21.5 ms 5.0 ms > 43.8 ms 51.3 ms 21.4 ms 4.9 ms > 43.7 ms 51.3 ms 21.6 ms 5.0 ms > 43.2 ms 50.9 ms 21.6 ms 5.0 ms Very similar to PLDHash, except that iteration over unordered_set is much faster: indeed, faster than any other hash table. > > + BenchImpl(Bench_Cpp_nsDataHashtable); > > Why is this benchmark valuable compared to the `PLDHash` benchmark? I was curious to see if the cost of the layering was significant, and the answer is "slightly". I'll remove it. > I'm assuming that we're relying on the `assert!()` bits to prevent `rustc` > from optimizing away the body of this function after inlining? Should we > note that? Or should we use `black_box` (or whatever it's called) to > achieve the same thing without the extra conditional branches? I used `assert!` because it seemed the most equivalent thing to `MOZ_RELEASE_ASSERT`. I'll add a note that the cross-language comparisons are less reliable than the intra-language comparisons.
Here are Linux opt 64 measurements from a try push: > TEST-START | BenchCollections.unordered_set > succ_lookups fail_lookups insert_remove iterate total > 105.0 ms 118.6 ms 35.6 ms 7.3 ms 266.5 ms > 100.5 ms 119.3 ms 37.3 ms 7.3 ms 264.4 ms > 100.7 ms 119.0 ms 38.3 ms 7.3 ms 265.3 ms > 100.7 ms 118.7 ms 38.0 ms 7.3 ms 264.7 ms > 100.7 ms 119.4 ms 37.7 ms 7.3 ms 265.1 ms > TEST-START | BenchCollections.PLDHash > succ_lookups fail_lookups insert_remove iterate total > 75.1 ms 63.6 ms 29.6 ms 58.2 ms 226.5 ms > 74.6 ms 63.9 ms 29.6 ms 58.2 ms 226.3 ms > 74.5 ms 64.5 ms 29.3 ms 59.9 ms 228.2 ms > 74.7 ms 65.2 ms 29.4 ms 58.6 ms 228.0 ms > 74.8 ms 64.1 ms 29.3 ms 58.1 ms 226.2 ms > TEST-START | BenchCollections.JSHash > succ_lookups fail_lookups insert_remove iterate total > 29.0 ms 29.8 ms 17.7 ms 14.4 ms 91.0 ms > 28.5 ms 30.0 ms 17.6 ms 14.3 ms 90.4 ms > 28.6 ms 30.1 ms 17.6 ms 14.3 ms 90.6 ms > 28.3 ms 29.7 ms 17.7 ms 13.9 ms 89.6 ms > 28.1 ms 29.9 ms 17.6 ms 14.0 ms 89.6 ms > TEST-START | BenchCollections.RustHash > succ_lookups fail_lookups insert_remove iterate total > 93.4 ms 79.4 ms 21.7 ms 13.2 ms 207.6 ms > 93.2 ms 80.9 ms 21.3 ms 13.1 ms 208.5 ms > 94.4 ms 78.8 ms 21.4 ms 12.8 ms 207.5 ms > 95.6 ms 80.1 ms 21.5 ms 12.8 ms 210.0 ms > 94.1 ms 80.4 ms 22.3 ms 13.3 ms 210.0 ms > TEST-START | BenchCollections.RustFnvHash > succ_lookups fail_lookups insert_remove iterate total > 88.1 ms 57.1 ms 20.9 ms 13.1 ms 179.2 ms > 87.9 ms 56.6 ms 20.3 ms 13.1 ms 177.8 ms > 87.7 ms 56.9 ms 20.1 ms 13.3 ms 177.9 ms > 87.8 ms 56.6 ms 20.1 ms 13.2 ms 177.6 ms > 87.6 ms 56.8 ms 20.2 ms 13.2 ms 177.7 ms > TEST-START | BenchCollections.RustFxHash > succ_lookups fail_lookups insert_remove iterate total > 14.5 ms 10.8 ms 8.7 ms 13.0 ms 47.1 ms > 14.0 ms 10.8 ms 8.8 ms 13.0 ms 46.6 ms > 13.9 ms 10.8 ms 8.7 ms 12.9 ms 46.3 ms > 14.0 ms 10.8 ms 8.6 ms 13.0 ms 46.5 ms > 13.9 ms 10.8 ms 8.6 ms 12.9 ms 46.3 ms Roughly similar to what I get locally (albeit uniformly slower) except that unordered_set is clearly worse than PLDHash.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
It would probably be good to include the size of the hash table, if possible, in these benchmarks.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: