summaryrefslogtreecommitdiff
path: root/src/test/regress/expected/strings.out
diff options
context:
space:
mode:
authorJohn Naylor2025-04-06 07:04:30 +0000
committerJohn Naylor2025-04-06 07:04:30 +0000
commit3c6e8c123896584f1be1fe69aaf68dcb5eb094d5 (patch)
treeac5e8e8ffce6646927ef1981dd3aec47037a0ea1 /src/test/regress/expected/strings.out
parent683df3f4de00bf50b20eae92369e006badf7cd57 (diff)
Compute CRC32C using AVX-512 instructions where available
The previous implementation of CRC32C on x86 relied on the native CRC32 instruction from the SSE 4.2 extension, which operates on up to 8 bytes at a time. We can get a substantial speedup by using carryless multiplication on SIMD registers, processing 64 bytes per loop iteration. Shorter inputs fall back to ordinary CRC instructions. On Intel Tiger Lake hardware (2020), CRC is now 50% faster for inputs between 64 and 112 bytes, and 3x faster for 256 bytes. The VPCLMULQDQ instruction on 512-bit registers has been available on Intel hardware since 2019 and AMD since 2022. There is an older variant for 128-bit registers, but at least on Zen 2 it performs worse than normal CRC instructions for short inputs. We must now do a runtime check, even for builds that target SSE 4.2. This doesn't matter in practice for WAL (arguably the most critical case), because since commit e2809e3a1 the final computation with the 20-byte WAL header is inlined and unrolled when targeting that extension. Compared with two direct function calls, testing showed equal or slightly faster performance in performing an indirect function call on several dozen bytes followed by inlined instructions on constant input of 20 bytes. The MIT-licensed implementation was generated with the "generate" program from https://github.com/corsix/fast-crc32/ Based on: "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction" V. Gopal, E. Ozturk, et al., 2009 Co-authored-by: Raghuveer Devulapalli <[email protected]> Co-authored-by: Paul Amonson <[email protected]> Reviewed-by: Nathan Bossart <[email protected]> Reviewed-by: Andres Freund <[email protected]> (earlier version) Reviewed-by: Matthew Sterrett <[email protected]> (earlier version) Tested-by: Raghuveer Devulapalli <[email protected]> Tested-by: David Rowley <<[email protected]>> (earlier version) Discussion: https://postgr.es/m/BL1PR11MB530401FA7E9B1CA432CF9DC3DC192@BL1PR11MB5304.namprd11.prod.outlook.com Discussion: https://postgr.es/m/PH8PR11MB82869FF741DFA4E9A029FF13FBF72@PH8PR11MB8286.namprd11.prod.outlook.com
Diffstat (limited to 'src/test/regress/expected/strings.out')
-rw-r--r--src/test/regress/expected/strings.out24
1 files changed, 24 insertions, 0 deletions
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index dc485735aa4..174f0a68331 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2330,6 +2330,30 @@ SELECT crc32c('The quick brown fox jumps over the lazy dog.');
419469235
(1 row)
+SELECT crc32c(repeat('A', 127)::bytea);
+ crc32c
+-----------
+ 291820082
+(1 row)
+
+SELECT crc32c(repeat('A', 128)::bytea);
+ crc32c
+-----------
+ 816091258
+(1 row)
+
+SELECT crc32c(repeat('A', 129)::bytea);
+ crc32c
+------------
+ 4213642571
+(1 row)
+
+SELECT crc32c(repeat('A', 800)::bytea);
+ crc32c
+------------
+ 3134039419
+(1 row)
+
--
-- encode/decode
--