Skip to content

Implement an SSE2 accelerated version of zend_adler32 to speedup startup time for the file cache #10507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 5, 2023

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Feb 4, 2023

When benchmarking the file cache of opcache on index.php from a dummy WordPress install, I noticed that 36.42% of the time was spent in zend_adler32 to verify the checksums of the files. Callgrind reported that 332,731,216 instructions were executed during that run and average time to execute the index file was around 91ms.

This patch implements an SSE2 accelerated version of zend_adler32, which reduces the number of instructions executed on that bench to 248,600,983, which is a reduction of ~25%. There is also a decrease in wallclock time measurable: around 10ms. Now only 16.05% of the time is spent computing checksums.

The benchmark tests were performed using Callgrind, and time for the wallclock time. These tests were executed multiple times and their results were averaged. The WordPress install only contains two almost-blank posts.

When benchmarking the file cache of opcache on index.php from a dummy
WordPress install, I noticed that 36.42% of the time was spent in
zend_adler32 to verify the checksums of the files. Callgrind reported
that 332,731,216 instructions were executed during that run and average
time to execute the index file was around 91ms.

This patch implements an SSE2 accelerated version of zend_adler32, which
reduces the number of instructions executed on that bench to
248,600,983, which is a reduction of ~25%. There is also a decrease in
wallclock time measurable: around 10ms. Now only 16.05% of the time is
spent computing checksums.

The benchmark tests were performed using Callgrind, and time for the
wallclock time. These tests were executed multiple times and their
results were averaged. The WordPress install only contains two
almost-blank posts.
@Girgias
Copy link
Member

Girgias commented Feb 5, 2023

This looks sensible but would like @alexdowad to have a look at it as he has a better understanding of SSE2 then I do :)

@alexdowad
Copy link
Contributor

I don't know the adler32 algorithm, but otherwise the code looks reasonable. 👍

@Girgias Girgias merged commit 722fbd0 into php:master Feb 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants