From: "hanazuki (Kasumi Hanazuki) via ruby-core" <ruby-core@...>
Date: 2024-02-05T04:59:21+00:00
Subject: [ruby-core:116581] [Ruby master Bug#20237] Unable to unshare(CLONE_NEWUSER) in Linux because of timer thread

Issue #20237 has been reported by hanazuki (Kasumi Hanazuki).

----------------------------------------
Bug #20237: Unable to unshare(CLONE_NEWUSER) in Linux because of timer thread
https://bugs.ruby-lang.org/issues/20237

* Author: hanazuki (Kasumi Hanazuki)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.4.0dev (2024-02-04T16:05:02Z master 8bc6fff322) [x86_64-linux]
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
## Backgrounds

[unshare(2)](https://man7.org/linux/man-pages/man2/unshare.2.html) is a syscall in Linux to move the calling process into a fresh execution context. With `unshare(CLONE_NEWUSER)` you can move a process into a new [user_namespace(7)](https://man7.org/linux/man-pages/man7/user_namespaces.7.html), where the process gains the full capability on the resources within the namespace. This is fundamental for Linux containers to achieve privilege separation. `unshare(CLONE_NEWUSER)` requires the calling process to be single-threaded (or no background threads are running). So, it is often invoked after `fork(2)` as forking propagates only the calling thread to the child process.

## Problem

It becomes a problem that Ruby 3.3 on Linux uses timer threads even for a single-`Thread`ed application. Because `Kernel#fork` spawns a thread in the child process before the control returns to the user code, there is no chance to call `unshare(CLONE_NEWUSER)` in Ruby.

The following snippet is a reproducer of this problem. This program first forks and then shows the user namespace to which the process belongs before and after calling unshare(2). It also shows the threads of the child process after forking.

```ruby
p(RUBY_DESCRIPTION:)
require 'fiddle/import'
module C
  extend Fiddle::Importer
  dlload 'libc.so.6'

  extern 'int unshare(int flags)'
  CLONE_NEWUSER = 0x10000000

  def self.raise_system_call_error
    raise SystemCallError.new(Fiddle.last_error)
  end
end

pid = fork do
  system("ps -O tid -T -p #$$")
  system("ls -l /proc/self/ns/user")

  if C.unshare(C::CLONE_NEWUSER) != 0
    C.raise_system_call_error  # => EINVAL with Ruby 3.3
  end

  system("ls -l /proc/self/ns/user")
end

p Process.wait2(pid)
```

The program successfully changes the user namespace with Ruby 3.2, but it raises EINVAL with Ruby 3.3. You can see Ruby 3.3 has two threads running after forking.

```
% rbenv shell 3.2 && ruby ./test.rb
{:RUBY_DESCRIPTION=>"ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]"}
    PID     TID S TTY          TIME COMMAND
1585787 1585787 S pts/12   00:00:00 ruby ./test.rb
lrwxrwxrwx 1 kasumi kasumi 0 Feb  5 02:25 /proc/self/ns/user -> 'user:[4026531837]'
lrwxrwxrwx 1 nobody nogroup 0 Feb  5 02:25 /proc/self/ns/user -> 'user:[4026532675]'
[1585787, #<Process::Status: pid 1585787 exit 0>]

% rbenv shell 3.3 && ruby ./test.rb
{:RUBY_DESCRIPTION=>"ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]"}
    PID     TID S TTY          TIME COMMAND
1585849 1585849 S pts/12   00:00:00 ruby ./test.rb
1585849 1585851 S pts/12   00:00:00 ruby ./test.rb
lrwxrwxrwx 1 kasumi kasumi 0 Feb  5 02:25 /proc/self/ns/user -> 'user:[4026531837]'
./test.rb:10:in `raise_system_call_error': Invalid argument (Errno::EINVAL)
        from ./test.rb:24:in `block in <main>'
        from ./test.rb:19:in `fork'
        from ./test.rb:19:in `<main>'
[1585849, #<Process::Status: pid 1585849 exit 1>]

% rbenv shell master && ruby ./test.rb
{:RUBY_DESCRIPTION=>"ruby 3.4.0dev (2024-02-04T16:05:02Z master 8bc6fff322) [x86_64-linux]"}
    PID     TID S TTY          TIME COMMAND
1585965 1585965 S pts/12   00:00:00 ruby ./test.rb
1585965 1585967 S pts/12   00:00:00 ruby ./test.rb
lrwxrwxrwx 1 kasumi kasumi 0 Feb  5 02:25 /proc/self/ns/user -> 'user:[4026531837]'
./test.rb:10:in `raise_system_call_error': Invalid argument (Errno::EINVAL)
        from ./test.rb:24:in `block in <main>'
        from ./test.rb:19:in `fork'
        from ./test.rb:19:in `<main>'
[1585965, #<Process::Status: pid 1585965 exit 1>]
```

## Workaround

My workaround is to rebuild ruby with `rb_thread_stop_timer_thread` and `rb_thread_start_timer_thread` exported, and use a C-ext that stops the timer thread before calling `unshare`. This seems not robust because the process cannot know when the terminated thread is reclaimed by the kernel, after which the process is considered single-threaded.

```c
#define _GNU_SOURCE 1
#include <sched.h>
#include <ruby/ruby.h>

static VALUE Unshare_s_unshare(VALUE _self, VALUE rflags) {
  int const flags = NUM2INT(rflags);
  rb_thread_stop_timer_thread();
  usleep(1000);  // FIXME: It takes some time for the kernel to remove the stopped thread?
  int const ret  = unshare(flags);
  rb_thread_start_timer_thread();
  if(ret != 0) rb_sys_fail_str(rb_sprintf("unshare(%#x)", flags));
  return Qnil;
}


RUBY_FUNC_EXPORTED void
Init_unshare(void) {
  VALUE rb_mUnshare = rb_define_module("Unshare");
  rb_define_singleton_method(rb_mUnshare, "unshare", Unshare_s_unshare, 1);
  rb_define_const(rb_mUnshare, "CLONE_NEWUSER", INT2FIX(CLONE_NEWUSER));
}
```

## Questions

- Is this a limitation of Ruby?
- Is it safe (or even possible) to stop the timer thread during execution?
  - If so, can we export it as the public API?
  - But it may not so useful for this problem as explained in the workaround.
- Is it guaranteed that no other threads are running after forks?
- Are there any better ways to solve this issue?
  - Can we somehow delay the start of the timer thread after forking, or hook into `fork` to run some code in the child process immediately after it spawns.
  - Can they be Ruby API instead of C API?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/