Skip to content

ext/opcache: remove option huge_code_pages #10301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

MaxKellermann
Copy link
Contributor

This feature is useless at best, but is fragile and increases memory usage because it creates a copy of the PHP executable in RAM, when another copy is already in the kernel's page cache.

Additionally, each PHP process has its own copy, multiplying the memory usage with the number of PHP processes.

Even worse: under memory pressure, the kernel would usually just discard those code pages, and reload them from disk on demand; but with huge_code_pages, the kernel cannot simply discard those pages, instead it has to allocate room in the swap partition and then has to write the pages to the swap partition before being able to discard them. This adds disk I/O to an already extremely constrained situation.

The feature was added 7 years ago in commit 669f0b3, and the commit message stated that this "provided 2% improvement on real-life apps, because of 2-3 times reduction in number of iTLB misses", but did not tell how these results were produced.

I tried, but failed to measure a difference. None of my tests went any faster with huge_code_pages enabled.

In a second attempt, I tried perf stat with events itlb_misses.miss_causes_a_walk, itlb_misses.stlb_hit, itlb_misses.walk_completed, itlb_misses.walk_duration and could not see any improvements here either. The results were unstable, but nothing suggested it was better with huge_code_pages.

Quite contrary, this operation delays PHP startup by 5 ms (measured on an Intel Xeon E5-2630).

phpinfo() with opcache.huge_code_pages=0:

         17.92 msec task-clock:u                     #    0.985 CPUs utilized
    29,804,809      cycles:u                         #    1.663 GHz
    44,512,668      instructions:u                   #    1.49  insn per cycle

phpinfo() with opcache.huge_code_pages=1:

         23.20 msec task-clock:u                     #    0.988 CPUs utilized
    35,709,593      cycles:u                         #    1.539 GHz
    45,940,779      instructions:u                   #    1.29  insn per cycle

In order to reduce PHP's memory size, reduce the load during PHP startup, simplify debugging and profiling, I propose removing this feature.

@MaxKellermann
Copy link
Contributor Author

Rebased & fixed conflict.

@MaxKellermann
Copy link
Contributor Author

Travis failure unrelated to this PR.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This technology definitely makes some speed improvement (not the same as 7 years ago, but still visible). All PHP workers inherit the same read-only mapping of huge code pages. It's done configurable on purpose and may be disabled/enabled by admin. I don't see any reason to remove this.

$ perf stat -e cycles:u -e instructions:u -e dTLB-load-misses -e iTLB-load-misses sapi/cgi/php-cgi -d opcache.jit=0 -d opcache.huge_code_pages=0 -T 5,10000 /var/www/html/bench/wordpress/index.php > /dev/null

Elapsed time: 25.027888 sec

 Performance counter stats for 'sapi/cgi/php-cgi -d opcache.jit=0 -d opcache.huge_code_pages=0 -T 5,10000 /var/www/html/bench/wordpress-3.6/index.php':

    92,924,210,183      cycles:u                                                           
   119,988,584,666      instructions:u                   #    1.29  insn per cycle         
         5,208,210      dTLB-load-misses:u                                                 
         7,231,349      iTLB-load-misses:u                                                 

      25.124572589 seconds time elapsed

      20.503810000 seconds user
       2.395449000 seconds sys


$ perf stat -e cycles:u -e instructions:u -e dTLB-load-misses -e iTLB-load-misses sapi/cgi/php-cgi -d opcache.jit=0 -d opcache.huge_code_pages=1 -T 5,10000 /var/www/html/bench/wordpress-3.6/index.php > /dev/null

Elapsed time: 24.738100 sec

 Performance counter stats for 'sapi/cgi/php-cgi -d opcache.jit=0 -d opcache.huge_code_pages=1 -T 5,10000 /var/www/html/bench/wordpress/index.php':

    91,544,779,717      cycles:u                                                           
   119,989,768,215      instructions:u                   #    1.31  insn per cycle         
         4,431,370      dTLB-load-misses:u                                                 
         6,454,243      iTLB-load-misses:u                                                 

      24.837880346 seconds time elapsed

      20.398353000 seconds user
       2.240638000 seconds sys

@MaxKellermann
Copy link
Contributor Author

@dstogov, I tried the same WordPress test, but could not get consistent results in repeated runs. On an otherwise idle machine (with cpu governor=performance, of course), instructions was pretty stable, but cycles and user jumped around by up to 1% with no clear winner.
dTLB-misses was more consistent, 9% less with huge_code_pages (in line with your data saying it's 15% less).
iTLB-misses was very unstable; sometimes, factors of 2 on repeated runs (with the same parameters), but usually a win for huge_code_pages.

Unlike my previous benchmark attempts, it shows the expected wins for huge pages on the *TLB-misses metric, but that did not result in any measurable practical speedup. I can cherry-pick the one result of repeated runs and demonstrate that an arbitrary option value is better, but that's pointless.

That is with a single PHP process running alone on a computer.

All PHP workers inherit the same read-only mapping of huge code pages.

That is only true if all PHP processes are clones of a single initial PHP process, e.g. if you have a single system-wide FPM daemon which is shared by all users and all applications.

All theoretical advantages collapse and turn to the opposite once you start a second PHP process, for example in a container (which is a very common thing nowadays).

//
Of course, we agree that huge pages are a good thing, and the Linux kernel people are working on making huge pages compatible with the page cache, which will make this PHP feature completely obsolete. What we don't agree on is that this feature is worth the complexity, fragility and memory overhead. And we don't agree that this feature really brings the speedups its documentation promises. Many users will not benefit at all, quite contrary, they will see increases in memory usage, worse behavior under memory pressure and reduced cache efficiency.

It's done configurable on purpose and may be disabled/enabled by admin.

Most admins probably don't understand what the feature means and don't know the caveats - why would they, the documentation doesn't tell them, it only says "should improve performance", and who does not want to "improve performance"?

Delegating responsibility for this under-documented and complex feature, one that can easily reduce performance, to admins doesn't sound like a plan that really works.

Instead of removing the feature, you can improve the documentation to describe the limitations of the feature, to keep people from using it unless they really meet the narrow preconditions that may bring some theoretical sub-1% performance gain in some situations.

As it is now, the feature is harmful, that's why I suggest removing it.

@devnexen
Copy link
Member

Just personal opinion but instead of removing it for 8.3, what if we document the feature (much) better about the possible pros and cons and then revisit the decision whether we remove it or not for 9.0, I think it will be smoother especially user-wise ?

@MaxKellermann
Copy link
Contributor Author

Just personal opinion but instead of removing it for 8.3, what if we document the feature (much) better about the possible pros and cons and then revisit the decision whether we remove it or not for 9.0, I think it will be smoother especially user-wise ?

I don't know the PHP roadmap, I only see the "master" branch but I don't know what it will become and when - but improving documentation is something that is hard to disagree on. And for this feature, the documentation is certainly misleading. I'll try to find where the documentation resides (doc-en?) and submit a PR over there.

If the feature may be removed for 9.0 or whatever version, we can keep this PR open and you merge it whenever you feel like it fits into the roadmap.

@devnexen
Copy link
Member

doc-en indeed I believe.
Anyway, by the time we reach 9.0, old kernels users population will be decreasing significantly enough by then, I think.

@otobio
Copy link

otobio commented Jan 15, 2023

Just personal opinion but instead of removing it for 8.3, what if we document the feature (much) better about the possible pros and cons and then revisit the decision whether we remove it or not for 9.0, I think it will be smoother especially user-wise ?

I don't know the PHP roadmap, I only see the "master" branch but I don't know what it will become and when - but improving documentation is something that is hard to disagree on. And for this feature, the documentation is certainly misleading. I'll try to find where the documentation resides (doc-en?) and submit a PR over there.

If the feature may be removed for 9.0 or whatever version, we can keep this PR open and you merge it whenever you feel like it fits into the roadmap.

@MaxKellermann You are quite so adamant on removing the feature even though it is an opt-in one. I would rather argue that it should be off by default on production ini and improve the documentation. That should do it.

@MaxKellermann
Copy link
Contributor Author

You are quite so adamant on removing the feature even though it is an opt-in one

I believe this is an anti-feature and I am convinced not having this option is better than having it. It has disadvantages, but no advantage.

I evaluated the feature last year - yes, of course, I want a performance improvement! For our web hosting product I was going to squeeze out the very last millisecond of performance (look at my older PRs here, 8.2 has benefited already a lot from my work, some of which I have successfully upstreamed).

Not only did PHP actually become slower with huge_code_pages (5 ms of added startup time, which I care a lot about), there were also lots of crashes which I spent some time debugging, only to be horrified to see how this feature is implemented. I wondered how anybody could ever think adding this kind of fragility and complexity would ever improve anything.

I felt that all the time spent on this "feature" was completely wasted, and I wanted to save others from wasting time with it, therefore I decided to suggest removing it. Here I am.

MaxKellermann added a commit to CM4all/php-src that referenced this pull request Jan 16, 2023
There are only very narrow circumstances under which this option has
been reported to provide 1% performance gain due to reduction of TLB
misses.  In many setups, this option only increases memory usage, and
will actually decrease performance.  To avoid this, let's leave it
disabled by default, and let it be an explicit decision to enable it.

For a discussion, see php#10301
MaxKellermann added a commit to CM4all/php-src that referenced this pull request Feb 7, 2023
There are only very narrow circumstances under which this option has
been reported to provide 1% performance gain due to reduction of TLB
misses.  In many setups, this option only increases memory usage, and
will actually decrease performance.  To avoid this, let's leave it
disabled by default, and let it be an explicit decision to enable it.

For a discussion, see php#10301
@MaxKellermann MaxKellermann force-pushed the no_huge_code_pages branch 2 times, most recently from c6afa5d to 5b7b07d Compare February 7, 2023 16:28
MaxKellermann added a commit to CM4all/php-src that referenced this pull request Feb 7, 2023
There are only very narrow circumstances under which this option has
been reported to provide 1% performance gain due to reduction of TLB
misses.  In many setups, this option only increases memory usage, and
will actually decrease performance.  To avoid this, let's leave it
disabled by default, and let it be an explicit decision to enable it.

For a discussion, see php#10301
MaxKellermann added a commit to CM4all/php-src that referenced this pull request Feb 7, 2023
There are only very narrow circumstances under which this option has
been reported to provide 1% performance gain due to reduction of TLB
misses.  In many setups, this option only increases memory usage, and
will actually decrease performance.  To avoid this, let's leave it
disabled by default, and let it be an explicit decision to enable it.

For a discussion, see php#10301
devnexen pushed a commit that referenced this pull request Feb 7, 2023
There are only very narrow circumstances under which this option has
been reported to provide 1% performance gain due to reduction of TLB
misses.  In many setups, this option only increases memory usage, and
will actually decrease performance.  To avoid this, let's leave it
disabled by default, and let it be an explicit decision to enable it.

For a discussion, see #10301

Closes GH-10336
MaxKellermann added a commit to CM4all/php-src that referenced this pull request Feb 14, 2023
There are only very narrow circumstances under which this option has
been reported to provide 1% performance gain due to reduction of TLB
misses.  In many setups, this option only increases memory usage, and
will actually decrease performance.  To avoid this, let's leave it
disabled by default, and let it be an explicit decision to enable it.

For a discussion, see php#10301

Closes phpGH-10336
This feature is useless at best, but is fragile and increases memory
usage because it creates a copy of the PHP executable in RAM, when
another copy is already in the kernel's page cache.

Additionally, each PHP process has its own copy, multiplying the
memory usage with the number of PHP processes.

Even worse: under memory pressure, the kernel would usually just
discard those code pages, and reload them from disk on demand; but
with `huge_code_pages`, the kernel cannot simply discard those pages,
instead it has to allocate room in the swap partition and then has to
write the pages to the swap partition before being able to discard
them.  This adds disk I/O to an already extremely constrained
situation.

The feature was added 7 years ago in commit 669f0b3, and the
commit message stated that this "provided 2% improvement on real-life
apps, because of 2-3 times reduction in number of iTLB misses", but
did not tell how these results were produced.

I tried, but failed to measure a difference.  None of my tests went
any faster with `huge_code_pages` enabled.

In a second attempt, I tried `perf stat` with events
`itlb_misses.miss_causes_a_walk`, `itlb_misses.stlb_hit`,
`itlb_misses.walk_completed`, `itlb_misses.walk_duration` and could
not see any improvements here either.  The results were unstable, but
nothing suggested it was better with `huge_code_pages`.

Quite contrary, this operation delays PHP startup by 5 ms (measured on
an Intel Xeon E5-2630).

phpinfo() with opcache.huge_code_pages=0:

             17.92 msec task-clock:u                     #    0.985 CPUs utilized
        29,804,809      cycles:u                         #    1.663 GHz
        44,512,668      instructions:u                   #    1.49  insn per cycle

phpinfo() with opcache.huge_code_pages=1:

             23.20 msec task-clock:u                     #    0.988 CPUs utilized
        35,709,593      cycles:u                         #    1.539 GHz
        45,940,779      instructions:u                   #    1.29  insn per cycle


In order to reduce PHP's memory size, reduce the load during PHP
startup, simplify debugging and profiling, I propose removing this
feature.
@MaxKellermann MaxKellermann deleted the no_huge_code_pages branch March 6, 2023 04:21
MaxKellermann added a commit to CM4all/php-src that referenced this pull request Mar 7, 2023
There are only very narrow circumstances under which this option has
been reported to provide 1% performance gain due to reduction of TLB
misses.  In many setups, this option only increases memory usage, and
will actually decrease performance.  To avoid this, let's leave it
disabled by default, and let it be an explicit decision to enable it.

For a discussion, see php#10301

Closes phpGH-10336
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants