Skip to content

Fix GH-9675: Re-adjust run_time_cache init for internal enum methods #10143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

spideyfusion
Copy link
Contributor

After bisecting the commit history in order to track down the cause of the reported issue, I started going through the origin pull request in order to understand the context behind the observed changes.

This discussion prompted me to dig deeper into the run_time_cache mechanism, as it was describing the issue we're currently having in web context (PHP-FPM).

I noticed in the two places where run_time_cache mechanism was introduced for internal functions, the way it was handled was inconsistent.

With the applied fix, we're not experiencing anymore errors in production. We are running a high-traffic PHP application inside a Kubernetes cluster with over 41k RPM.

It would be helpful if @PhilETaylor and @Zenexer could confirm if this fixes the issue for them as well.

@arnaud-lb
Copy link
Member

arnaud-lb commented Dec 21, 2022

Thank you!

This looks good to me. Waiting for additional feedbacks before merging.

@PhilETaylor
Copy link

Im sorry Im unable to participate as I have already left home for Christmas travelling to the USA and back to my island.

@Zenexer
Copy link

Zenexer commented Dec 21, 2022

I'll see if I can get this tested today.

@Zenexer
Copy link

Zenexer commented Dec 26, 2022

I wasn't able to get it compiled and deployed in time for the holidays. The error rate seems to have dropped without any change on my end, so it would be difficult to confirm the validity of the fix anyway.

@spideyfusion
Copy link
Contributor Author

As an additional verification measure, I cherry-picked my commit and put it on top of the php-8.2.0 tag.

Then, I created a release tarball using the scripts/dev/makedist script. With the resulting artifact, I was able to build a variant of the official Docker image using the bashbrew utility.

In order to help with the testing efforts, I uploaded all FPM image variants for the amd64 architecture to Docker Hub:

Of course, anyone who doesn't find compiling/packaging a daunting task can do all of this on their own. 😄

Nothing has changed on my side. Without the fix, there is a flood of errors on the initial application deployment. With the fix, there isn't a single error to be observed.

@Zenexer I should note that the fix eliminates all errors.

@@ -409,8 +409,11 @@ static void zend_enum_register_func(zend_class_entry *ce, zend_known_string_id n
zif->module = EG(current_module);
zif->scope = ce;
zif->T = ZEND_OBSERVER_ENABLED;
ZEND_MAP_PTR_NEW(zif->run_time_cache);
ZEND_MAP_PTR_SET(zif->run_time_cache, zend_arena_alloc(&CG(arena), zend_internal_run_time_cache_reserved_size()));
if (EG(active)) { // at run-time: this ought to only happen if registered with dl() or somehow temporarily at runtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"somehow" - for enums that's probably quite a regular occurrence :-)

@bwoebi
Copy link
Member

bwoebi commented Jan 4, 2023

Hey, I've been on vacation recently, but looking at the fix this looks quite sensible to me.

@Zenexer
Copy link

Zenexer commented Jan 6, 2023

Starting next week, I'll be testing this with Docker on ARM64 rather than AMD64. Everything is built and ready to go, but I'm about to head on a plane and don't want to be stuck debugging on an unreliable internet connection if I can avoid it.

@gndk
Copy link

gndk commented Jan 7, 2023

Starting next week, I'll be testing this with Docker on ARM64 rather than AMD64. Everything is built and ready to go, but I'm about to head on a plane and don't want to be stuck debugging on an unreliable internet connection if I can avoid it.

@Zenexer I'm seeing those errors in production, too (on AWS ARM64 instances). Pretty reliably on the first request on every deployment, since every deployment is on a completely new instance with fresh Docker container / caches.

I'd like to help out with testing this, but I'm not familiar at all with building php from source, and even less inside a Docker image. Currently installing php from ppa:ondrej/php.

Would you mind sharing the relevant parts of your Dockerfile?

@Zenexer
Copy link

Zenexer commented Jan 10, 2023

I haven't seen this issue since applying the patch. It's hard to definitively say it works since the issue was sporadic, but so far so good.

@gndk I'm using a massive script that applies all sorts of patches, so it's difficult to extract the relevant parts, but here's a rough idea of the process:

  1. Install build-essentials
  2. Run apt-get build-dep and apt-get source on the relevant package
  3. Copy the patch file to the relevant quilt folder--I believe it's debian/patches
  4. Run dpkg-buildpackage -uc
  5. Use apt-get to install and remove the unmodified packages from Ondrej's repos, thereby ensuring you have all the dependencies
  6. Use dpkg to install the deb files generated by dpkg-buildpackage

@bwoebi
Copy link
Member

bwoebi commented Jan 16, 2023

We just got a proper reliable reproducer in DataDog/dd-trace-php#1859.

As such, I can definitely verify that this bugfix works.

@PhilETaylor
Copy link

Thanks for following this through to completion :) Appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants