Skip to content

php-fpm: fix Solaris port events.mechanism #6913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

psumbera
Copy link
Contributor

Bug #65800
Fix by: [email protected]

@nikic nikic requested a review from bukka April 26, 2021 08:59
ev = (struct fpm_event_s *)events[i].portev_user;

/* TODO: Escalate an error? */
fpm_event_port_add(ev);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part still needed? I remember that this might have not been port specific and possibly might be addressed by this commit 0ed6c37 from @nikic if I understand the description of the patch (comment from Rainer in the bug report). There has been some other changes to signal handling from @plmnikulin that could possibly resolve this.

@psumbera can you try it without this change and confirm if it's still needed?

@psumbera
Copy link
Contributor Author

psumbera commented May 4, 2021

I can confirm that it works fine even without the second part of the original patch.

@bukka
Copy link
Member

bukka commented May 8, 2021

@psumbera Looks good. Please can you squash it to a single commit (push force it to this PR) and if you think it should go to PHP-7.4 (it means if you tested that in 7.4), then please change base branch of this PR to PHP-7.4. I will merge it then.

@psumbera psumbera force-pushed the solaris-event-ports branch from 98ec621 to bc9e791 Compare May 10, 2021 08:57
@psumbera psumbera changed the base branch from master to PHP-7.4 May 10, 2021 09:06
@krakjoe
Copy link
Member

krakjoe commented May 11, 2021

@psumbera something has gone wrong with the rebase

@psumbera psumbera changed the base branch from PHP-7.4 to master May 11, 2021 06:17
@psumbera
Copy link
Contributor Author

Sorry, I don't really know how to change base branch to PHP-7.4 without making mess. Leaving it now at master branch.

@krakjoe
Copy link
Member

krakjoe commented May 11, 2021

Merged as 04078a5

@krakjoe krakjoe closed this May 11, 2021
@psumbera
Copy link
Contributor Author

@krakjoe not sure how this happened. But there is missing ';' after 'int ret'. Can you please have look at it?

@krakjoe
Copy link
Member

krakjoe commented May 11, 2021

@psumbera that was quickly followed up in ef3e0ee, that was my mistake in rebasing, sorry about the noise.

@bukka
Copy link
Member

bukka commented May 11, 2021

@psumbera has this been actually tested against 7.4? That was the main reason why I didn't merge it before...

@psumbera
Copy link
Contributor Author

Yes I have tested it on 7.4.

@steffen-moser
Copy link

Strangely, it seems that I do encounter the zombie behavior with PHP 7.4.26 on Solaris 11.4 SRU 44 running Nextcloud 22.2.7 as it is described in the third paragraph of Rainer Jung's message from [2015-09-20 08:33 UTC] posted here. The PHP binary I use comes from Oracle and has been built using GCC as described here.

The behavior I see is as follows: After a few days, I get more and more zombie processes of PHP-FPM workers on my system. When doing a "truss", I see a behavior which seems to fit quite well to Rainer Jung's description:

  1. The parent process (PID 22251) sends a SIGQUIT to one of its children (PID 22253):
22251:  port_getn(13, 0x3ADD67DF0, 1, 1, 0x7FE3EB787340) = 62 [0]
22251:  getpid()                                        = 22251 [21545]
22251:  kill(22253, SIGQUIT)                            = 0
22251:  getpid()                                        = 22251 [21545]
22253:      Received signal #3, SIGQUIT, in fcntl() [caught]
22253:        siginfo: SIGQUIT pid=22251 uid=0 SI_USER
22251:  getpid()                                        = 22251 [21545]
22253:  fcntl(9, F_SETLKW, 0x7FE3EB787330)              Err#4 EINTR
22253:  lwp_sigmask(SIG_SETMASK, 0xF83FF047, 0xFFFFFFF7, 0x000000FF, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
22253:  close(9)                                        = 0
22253:  so_socket(PF_UNIX, SOCK_STREAM, 0, NULL, SOV_XPG4_2) = 6
22253:  setcontext(0x7FE3EB785CC0)
22253:  munmap(0x7FD847E00000, 127889)                  = 0
22253:  munmap(0x7FD847F20000, 4816)                    = 0
22253:  munmap(0x7FD847C00000, 102353)                  = 0
22253:  munmap(0x7FD847D19000, 3676)                    = 0
22253:  munmap(0x7FD847A00000, 30369)                   = 0
[...]
  1. After a lot of munmaps, the child process sends a SIGCLD CHILD_EXITED to the parent:
[...]
22253:  munmap(0x7FD856600000, 2097152)                 = 0
22253:  close(3)                                        = 0
22253:  openat(AT_FDCWD, "/dev/dtrace/helper", O_RDWR)  = 3
22253:  ioctl(3, DTRACEHIOC_REMOVE, 0x00000000)         = 0
22253:  close(3)                                        = 0
22253:  issetugid()                                     = 0
22253:  _exit(0)
22251:      Received signal #18, SIGCLD, in port_getn() [caught]
22251:        siginfo: SIGCLD CLD_EXITED pid=22253 status=0x0000
22251:  port_getn(13, 0x3ADD67DF0, 1, 1, 0x7FE3EB787340) Err#4 EINTR
22251:  lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0xFFFFFFF7, 0x000000FF, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
22251:  getpid()                                        = 22251 [21545]
[...]
  1. The parent sends the "C":
[...]
22251:  write(11, " C", 1)                              = 1
22251:  setcontext(0x7FE3EB786010)
22251:  getpid()                                        = 22251 [21545]
[...]
  1. The child never dies, gets a zombie and the parent resends the SIGQUIT every few seconds forever:
[...]
22251:  port_getn(13, 0x3ADD67DF0, 1, 1, 0x7FE3EB787340) = 62 [0]
22251:  getpid()                                        = 22251 [21545]
22251:  kill(22253, SIGQUIT)                            = 0
22251:  getpid()                                        = 22251 [21545]
22251:  getpid()                                        = 22251 [21545]
[...]

While the server is running for a few days, I collect more and more zombies. I do see this on at least three non-global Solaris zones. Unfortunately, I cannot say, when the problem has started to occur. Due to a security problem in FPM a few months ago, I switched back from PHP-FPM to Apache module usage. In the former times (when I was using FPM), I never realized the problem.

@psumbera
Copy link
Contributor Author

I can confirm that it works fine even without the second part of the original patch.

I was wrong. Call to fpm_event_port_add() is really needed.

#9960

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants