Bug 2158 - Race condition in receiving SIGTERM
Summary: Race condition in receiving SIGTERM
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 6.2p1
Hardware: All Linux
: P5 minor
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks: V_8_7
  Show dependency treegraph
 
Reported: 2013-10-09 05:39 AEDT by Ben Maurer
Modified: 2022-02-25 13:59 AEDT (History)
3 users (show)

See Also:


Attachments
Mask sigterm and replace select with pselect in server_accept_loop (1.75 KB, patch)
2017-07-28 15:24 AEST, Darren Tucker
no flags Details | Diff
use pselect in server_accept_loop and wait_until_can_do_something (8.62 KB, patch)
2021-05-14 15:14 AEST, Darren Tucker
no flags Details | Diff
use pselect in server_accept_loop and wait_until_can_do_something (8.83 KB, patch)
2021-05-21 17:07 AEST, Darren Tucker
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Maurer 2013-10-09 05:39:53 AEDT
To handle sigterm, openssh uses this handler:

static void
sigterm_handler(int sig)
{
	received_sigterm = sig;
}

in the select loop, it checks this flag

ret = select(maxfd+1, fdset, NULL, NULL, NULL);
...
if (received_sigterm) {


select() will return -1 with an EINTR when it gets a signal. Therefore, in most cases this successfully shuts down the process. However, if SSH were executing something other than this select call (eg, accepting a new connection) it would never notice the sigterm until a new event came in.

This created a race condition in a large, real world deployment. The default init script in the openssh package sends a SIGTERM in order to kill the process. On a small fraction of servers, the race condition mentioned here occurred. The new openssh process was launched while the old one still ran. When the new process attempted to bind() to a port, it failed.
Comment 1 Ben Maurer 2013-10-09 05:44:05 AEDT
Some potential strategies that come to mind to prevent this:

(1) handle the termination in the signal handler. The signal handler doesn't appear to use malloc, it just closes the listening ports and removes the PID file. This could be done inside the signal handler

(2) Create a pipe which signal handlers can write to in order to wake up the select loop.
Comment 2 Damien Miller 2014-08-30 04:38:47 AEST
Retarget incomplete bugs to 6.8 release.
Comment 3 Damien Miller 2014-08-30 04:40:08 AEST
These bugs are no longer targeted at the imminent 6.7 release
Comment 4 Damien Miller 2015-03-03 07:59:42 AEDT
OpenSSH 6.8 is approaching release and closed for major work. Retarget these bugs for the next release.
Comment 5 Damien Miller 2015-03-03 08:01:14 AEDT
Retarget to 6.9
Comment 6 Damien Miller 2015-06-05 13:57:46 AEST
Retarget to 7.0 release, we'll probably add a notification fd
Comment 7 Damien Miller 2015-08-11 22:59:10 AEST
Retarget pending bugs to openssh-7.1
Comment 8 Damien Miller 2016-02-26 14:44:32 AEDT
Retarget to openssh-7.3
Comment 9 Damien Miller 2016-02-26 14:47:22 AEDT
Retarget to openssh-7.3
Comment 10 Damien Miller 2016-07-22 14:10:58 AEST
retarget unfinished bugs to next release
Comment 11 Damien Miller 2016-07-22 14:14:49 AEST
retarget unfinished bugs to next release
Comment 12 Damien Miller 2016-07-22 14:15:41 AEST
retarget unfinished bugs to next release
Comment 13 Damien Miller 2016-07-22 14:17:10 AEST
retarget unfinished bugs to next release
Comment 14 Damien Miller 2016-12-16 14:31:28 AEDT
OpenSSH 7.4 release is closing; punt the bugs to 7.5
Comment 15 Damien Miller 2017-06-30 13:43:15 AEST
Move incomplete bugs to openssh-7.6 target since 7.5 shipped a while back.

To calibrate expectations, there's little chance all of these are going to make 7.6.
Comment 16 Damien Miller 2017-06-30 13:44:23 AEST
remove 7.5 target
Comment 17 Darren Tucker 2017-07-28 15:21:06 AEST
(3) mask the signals and use pselect instead of select?
Comment 18 Darren Tucker 2017-07-28 15:24:45 AEST
Created attachment 3023 [details]
Mask sigterm and replace select with pselect in server_accept_loop
Comment 19 Damien Miller 2018-04-06 13:12:16 AEST
Move to OpenSSH 7.8 tracking bug
Comment 20 Damien Miller 2018-08-10 11:37:59 AEST
Retarget remaining bugs planned for 7.8 release to 7.9
Comment 21 Damien Miller 2018-08-10 11:38:24 AEST
Retarget remaining bugs planned for 7.8 release to 7.9
Comment 22 Damien Miller 2018-10-19 17:13:42 AEDT
Retarget unfinished bugs to OpenSSH 8.0
Comment 23 Damien Miller 2018-10-19 17:14:48 AEDT
Retarget unfinished bugs to OpenSSH 8.0
Comment 24 Damien Miller 2018-10-19 17:15:49 AEDT
Retarget unfinished bugs to OpenSSH 8.0
Comment 25 Damien Miller 2019-04-03 10:10:34 AEDT
Retarget outstanding bugs at next release
Comment 26 Damien Miller 2019-10-09 15:07:25 AEDT
Retarget these bugs to 8.2 release
Comment 27 Damien Miller 2020-02-04 11:44:21 AEDT
Prepare for 8.2 release; retarget bugs
Comment 28 Damien Miller 2020-05-08 13:39:22 AEST
Retarget bugs to 8.4 release
Comment 29 Damien Miller 2021-03-04 09:46:59 AEDT
retarget to 8.6
Comment 30 Damien Miller 2021-04-23 14:50:14 AEST
retarget after 8.6p1 release
Comment 31 Darren Tucker 2021-05-14 15:14:59 AEST
Created attachment 3520 [details]
use pselect in server_accept_loop and wait_until_can_do_something
Comment 32 Darren Tucker 2021-05-21 17:07:11 AEST
Created attachment 3523 [details]
use pselect in server_accept_loop and wait_until_can_do_something

Previous patch had some problems (eg it broke SIGINT in the user's shell).  This one seems to work OK so far.
Comment 33 Darren Tucker 2021-06-10 20:56:47 AEST
This was committed in https://github.com/openssh/openssh-portable/commit/771f57a8626709f2ad207058efd68fbf30d31553 and will be in the next major release.  thanks for the report.
Comment 34 Damien Miller 2022-02-25 13:59:40 AEDT
closing bugs resolved before openssh-8.9