Bug 1470 - adjust Linux out-of-memory killer to stop sshd being killed
Summary: adjust Linux out-of-memory killer to stop sshd being killed
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: -current
Hardware: All Linux
: P2 enhancement
Assignee: Assigned to nobody
URL: http://bugs.debian.org/cgi-bin/bugrep...
Keywords:
Depends on:
Blocks: V_5_4
  Show dependency treegraph
 
Reported: 2008-05-26 08:31 AEST by Colin Watson
Modified: 2023-01-13 13:16 AEDT (History)
3 users (show)

See Also:


Attachments
adjust Linux out-of-memory killer (4.81 KB, patch)
2008-05-26 08:31 AEST, Colin Watson
no flags Details | Diff
Revised patch for Linux OOM killer (3.80 KB, patch)
2009-10-28 11:28 AEDT, Iain Morgan
no flags Details | Diff
openssh-linux-oom_kill.patch (4.62 KB, text/plain)
2009-12-07 12:04 AEDT, Darren Tucker
no flags Details
openssh-linux-oom_kill.patch (4.78 KB, patch)
2009-12-07 12:37 AEDT, Darren Tucker
djm: ok-
Details | Diff
openssh-linux-oom_kill.patch (4.71 KB, text/plain)
2009-12-07 17:07 AEDT, Darren Tucker
djm: ok+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Colin Watson 2008-05-26 08:31:50 AEST
Created attachment 1507 [details]
adjust Linux out-of-memory killer

In some out-of-memory situations, the Linux kernel will look for a process to kill, employing some heuristics to try to guess what will help. It doesn't always get this right and can occasionally end up killing innocent bystanders (though as noted in the referenced bug log it's possible to tweak this to be more accurate).

It is useful to instruct the OOM killer never to kill sshd, since almost everyone wants it to keep on running so that they have a chance of dealing with the problem remotely. Originally I implemented this in an init script, by getting sshd's pid and writing to /proc/$pid/oom_adj, but Vaclav Ovsik pointed out in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=480020 that that ends up immortalising child processes too.

The attached patch is based on Vaclav's, though I tidied it up and moved chunks of it into openbsd-compat/port-linux. The use of an environment variable for configuration is a bit odd. I didn't feel good about introducing a port-specific configuration file key, and the values you write into oom_adj have a pretty bizarre syntax (documented in http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/proc.txt;hb=HEAD) which I think is unlikely to be portable to other systems. I'd appreciate any better ideas here.
Comment 1 Damien Miller 2009-01-22 10:38:15 AEDT
some feedback on the diff:

Isn't /proc/self/oom_adj just an integer? I don't see any bizarre syntax in the referenced document, but the available docs are seriously deficient... If it is just an integer then why not just read and write it using stdio?

I don't think that it should be controlled by an environment variable - sshd should just set the "never kill me" flag for the master process unconditionally.

Failure to open() /proc/self/oom_adj shouldn't throw an logit() - this will just spam older Linux systems that lack the control or don't have /proc mounted.

The port-linux.c code is incorrect: it doesn't handle EINTR errors on read()/write() - it should use atomicio() if it can't use stdio.
Comment 2 Scott Emery 2009-10-10 10:25:48 AEDT
Why is the oom_adj value being passed in as an environment variable?
I would have expected it to be a flag in /etc/ssh/sshd_config. Is it
bad form to have OS specific sshd_config flags?
Comment 3 Iain Morgan 2009-10-28 11:28:14 AEDT
Created attachment 1712 [details]
Revised patch for Linux OOM killer

Updated the previous patch based on Damien's feedback in Comment #1.
Limited testidng indicates that the patch works. The one oddity is that
the message logged by verbose() when restoring the original oom_adj
value shows up three times in /var/log/syslog.
Comment 4 Darren Tucker 2009-12-07 12:02:36 AEDT
Add to list for 5.4
Comment 5 Darren Tucker 2009-12-07 12:04:46 AEDT
Created attachment 1740 [details]
openssh-linux-oom_kill.patch

Use the platform_* hooks to avoid sprinkling more #ifdefs into the main code.
Move the saved value to port-linux.c.  Add LINUX_ to the define since it is
Linux-specific.
Comment 6 Darren Tucker 2009-12-07 12:37:17 AEDT
Created attachment 1741 [details]
openssh-linux-oom_kill.patch

Create a platform_pre_listen hook and use that for the oom adjust.
Comment 7 Darren Tucker 2009-12-07 17:07:37 AEDT
Created attachment 1742 [details]
openssh-linux-oom_kill.patch

Don't try to restore a value that we did not save.
Comment 8 Darren Tucker 2009-12-07 17:14:16 AEDT
(In reply to comment #3)
> The one oddity is that
> the message logged by verbose() when restoring the original oom_adj
> value shows up three times in /var/log/syslog.

I think I can explain that:

#1: Despite what's implied by the message oom_adjust_setup() actually logs the saved value not the set value:

+	verbose("Set %s to %d", 
+	   OOM_ADJ_PATH, oom_adj_save);

#2: oom_adjust_setup() gets called a second time when sshd re-execs itself to randomize its address space.

#3: the real call to oom_adjust_restore()
Comment 9 Damien Miller 2009-12-08 11:09:20 AEDT
Comment on attachment 1742 [details]
openssh-linux-oom_kill.patch

>Index: openbsd-compat/port-linux.c
>===================================================================
>RCS file: /var/cvs/openssh/openbsd-compat/port-linux.c,v
>retrieving revision 1.6
>diff -u -p -r1.6 port-linux.c
>--- openbsd-compat/port-linux.c	24 Oct 2009 04:04:13 -0000	1.6
>+++ openbsd-compat/port-linux.c	7 Dec 2009 06:06:11 -0000
>@@ -27,8 +27,15 @@
> #include <stdarg.h>
> #include <string.h>
> 
>-#ifdef WITH_SELINUX
>+#if defined(LINUX_OOM_ADJUST) || defined(WITH_SELINUX)
> #include "log.h"
>+#endif
>+
>+#ifdef LINUX_OOM_ADJUST
>+#include <stdio.h>
>+#endif
>+

I wouldn't bother slicing and dicing the header inclusion based on preprocessor symbols. There is little cost to including them unconditionally, or perhaps conditionally on the union of all supported symbols for this file.

>+#ifdef LINUX_OOM_ADJUST
>+#define OOM_ADJ_PATH	"/proc/self/oom_adj"
>+#define OOM_ADJ_NOKILL	-17  /* magic value to disable OOM killer */

FYI, -17 is documented in Documentation/filesystems/proc.txt in the Linux source. A stable URL for this is http://lxr.linux.no/#linux+v2.6.32/Documentation/filesystems/proc.txt if you want to include it.
Comment 10 Darren Tucker 2009-12-08 13:41:31 AEDT
(In reply to comment #9)
> perhaps conditionally on the union of all supported
> symbols for this file.

Done.

> FYI, -17 is documented in Documentation/filesystems/proc.txt in the
> Linux source. A stable URL for this is
> http://lxr.linux.no/#linux+v2.6.32/Documentation/filesystems/proc.txt
> if you want to include it.

I know, I read it :-).  Reference added.

Thanks all, the patch has been applied and will be in 5.4p1.
Comment 11 Colin Watson 2010-02-28 07:49:34 AEDT
The patch as applied has one flaw that I can see.  Apparently some virtualisation containers (vserver/OpenVZ) don't allow processes to write to /proc/self/oom_adj, and will return an error code if they try.  It would be a shame for sshd to unconditionally log an error on such systems; I think this was probably the main benefit of having it controlled by an environment variable, so that they could turn this feature off.

How about just lowering errors from writing to /proc/self/oom_adj to debug1(), rather than logit()?
Comment 12 Darren Tucker 2010-03-01 15:53:27 AEDT
(In reply to comment #11)
> The patch as applied has one flaw that I can see.  Apparently some
> virtualisation containers (vserver/OpenVZ) don't allow processes to
> write to /proc/self/oom_adj, and will return an error code if they try.
>  It would be a shame for sshd to unconditionally log an error on such
> systems; I think this was probably the main benefit of having it
> controlled by an environment variable, so that they could turn this
> feature off.
> 
> How about just lowering errors from writing to /proc/self/oom_adj to
> debug1(), rather than logit()?

I've lowered them to verbose(), same as the other calls.
Comment 13 Colin Watson 2010-03-01 20:56:31 AEDT
Thanks, that should do the job.
Comment 14 Darren Tucker 2010-03-26 10:51:49 AEDT
With the release of 5.4p1, this bug is now considered closed.