Tags · hexon/vectorcmp

v1.4.2

Optimize AVX(1) VPSHUFB by loading into a register

instead of repeatedly reading it from memory

AsmVectorEquals16-32   2.208µ ± 2%   1.890µ ± 2%  -14.42% (p=0.000 n=10)

Oct 5, 2025
3ff822d
zip
tar.gz

v1.4.1

Bugfix: we accidentally did signed comparisons instead of unsigned

Oct 3, 2025
cb1b7b5
zip
tar.gz

v1.4.0

Add VectorNotEquals APIs

Oct 3, 2025
6cb4a8e
zip
tar.gz

v1.3.3

Add bounds checking before calling into ASM

This avoids memory corruption for bad calls

Mar 4, 2025
2ef0089
zip
tar.gz

v1.3.2

Avoid BM2 in the AVX(1) implementation

I could've also generated all options for with/without BMI2/AVX2, but I
simply went for having the fastest option require AVX2+BMI2 and use
AVX(1) as the fallback.

Feb 11, 2025
aba25a0
zip
tar.gz

v1.3.1

Check cpuflag for BMI2

Found out the hard way I shouldn't assume that having AVX doesn't imply
having BMI2.

Feb 10, 2025
d4e96fc
zip
tar.gz

v1.3.0

Add IsNaN methods for floats

Oct 30, 2024
90e35df
zip
tar.gz

v1.2.2

Adding a testcase for NaNs revealed that didn't work correctly

Oct 30, 2024
86570d6
zip
tar.gz

v1.2.1

Enable AVX implementations for 64 bit width inputs

The XMM registers are 128 bits, so that's only 2 rows per round. I
implemented merging 4x 2 bits together so we can write back a byte. For
that I increased the number of rounds for 64 bit inputs to 4 (instead of
the default 2).

Oct 16, 2024
ece86a6
zip
tar.gz

v1.2.0

Call VZEROALL for a 30% benchmark improvement

Call VZEROALL before leaving any ASM function.

I didn't expect this large a speedup, so I'm a little suspicious.

Oct 15, 2024
b5459ff
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.4.2

v1.4.1

v1.4.0

v1.3.3

v1.3.2

v1.3.1

v1.3.0

v1.2.2

v1.2.1

v1.2.0

Tags: hexon/vectorcmp