-
Notifications
You must be signed in to change notification settings - Fork 2
Permalink
Choose a base ref
{{ refName }}
default
Choose a head ref
{{ refName }}
default
Comparing changes
Choose two branches to see what’s changed or to start a new pull request.
If you need to, you can also or
learn more about diff comparisons.
Open a pull request
Create a new pull request by comparing changes across two branches. If you need to, you can also .
Learn more about diff comparisons here.
base repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: cf/4620~1
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
...
head repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cf/4620
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
- 4 commits
- 10 files changed
- 2 contributors
Commits on Mar 24, 2025
-
Execute hardware CRC computation in parallel
CRC computations on the current input word depend not only on that input, but also on the CRC of the previous input. This means that the speed is limited by the latency of the CRC instruction. Most modern CPUs can start executing a new CRC instruction before a currently executing one has finished, i.e. the reciprocal throughput is lower than latency. By computing partial CRCs of non-overlapping segments of the input, we can achieve the full throughput that the CPU is capable of. To preserve the correctness of the result, however, we must recombine the partial results using carryless multiplication with constants specific to the input length. We get these from a lookup table of pre-computed CRCs. Because of the overhead of the recombinination step, parallelism is only faster with inputs of at least a few hundred bytes. For now we only implement parallelism for x86 and Arm. It might be worthwhile to apply this technique to LoongArch, depending on the throughput of CRC on that platform. XXX The lookup table and supporting code is found in pg_crc32c_sb.c, which is now built unconditionally on all platforms. Perhaps s/sb8/common/ ? This technique originated from the Intel white paper "Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction", by Vinodh Gopal et al, 2011. Thanks to Raghuveer Devulapalli for assistance in verifying the usability of this technique from a legal perspective. Xiang Gao's original proposal was specific to the Arm architecture, computed in fixed-size chunks of 1024 bytes, and required hardware support for carryless multiplication. I added support for x86 and a wider range of chunk sizes, and switched to pure C for carryless multiplication. The portability of the latter is important for two reasons: 1) We may want to use this technique on architectures that don't have hardware carryless multiplication and 2) This is intended as a fallback, since if hardware carryless multiplication is available, there are other algorithms that are useful on much smaller inputs than this one. Author: Xiang Gao <[email protected]> Author: John Naylor <[email protected]> Reviewed-by: Nathan Bossart <[email protected]> Discussion: https://postgr.es/m/DB9PR08MB6991329A73923BF8ED4B3422F5DBA@DB9PR08MB6991.eurprd08.prod.outlook.com
Configuration menu - View commit details
-
Copy full SHA for 194112e - Browse repository at this point
Copy the full SHA 194112eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 476f8e3 - Browse repository at this point
Copy the full SHA 476f8e3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0aef52a - Browse repository at this point
Copy the full SHA 0aef52aView commit details -
[CF 4620] v10 - CRC32C Parallel Computation Optimization on ARM
This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/4620 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/CANWCAZbdjPLkojSFo2kObBOsucvyExkAJ9rnTUneoAR=5mrQGQ@mail.gmail.com Author(s): xiang gao
Commitfest Bot committedMar 24, 2025 Configuration menu - View commit details
-
Copy full SHA for 6f7d4bf - Browse repository at this point
Copy the full SHA 6f7d4bfView commit details
Loading
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff cf/4620~1...cf/4620