From: nobu@... Date: 2017-12-15T04:31:15+00:00 Subject: [ruby-core:84277] [Ruby trunk Bug#14181] hangs or deadlocks from waitpid, threads, and trapping SIGCHLD Issue #14181 has been updated by nobu (Nobuyoshi Nakada). It seems the signal trap causes thread switching then `Process.waitpid` exits. That means the target thread status can change during `RUBY_VM_CHECK_INTS_BLOCKING`, but `sleep_forever` doesn't consider the condition to wait at that moment. ```diff diff --git a/thread.c b/thread.c index baa50ea388..cc62ea3905 100644 --- a/thread.c +++ b/thread.c @@ -883,7 +883,13 @@ thread_join_sleep(VALUE arg) while (target_th->status != THREAD_KILLED) { if (forever) { - sleep_forever(th, TRUE, FALSE); + th->status = THREAD_STOPPED_FOREVER; + th->vm->sleeper++; + rb_check_deadlock(th->vm); + native_sleep(th, 0); + th->vm->sleeper--; + RUBY_VM_CHECK_INTS_BLOCKING(th->ec); + th->status = THREAD_RUNNABLE; } else { double now = timeofday(); ``` ---------------------------------------- Bug #14181: hangs or deadlocks from waitpid, threads, and trapping SIGCHLD https://bugs.ruby-lang.org/issues/14181#change-68431 * Author: ccutrer (Cody Cutrer) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux-gnu] * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN ---------------------------------------- I'm not exactly sure what's going on here, but the end result is basically a thread is getting killed unexpectedly during a waitpid call, when SIGCHLD is being handled. In a more complex scenario, we end up hanging because Thread#join is ends up waiting on a thread that's already dead (presumably because it died in a non-standard way), or in a simpler scenario, the output is: ``` loop 250 loop 251 /usr/lib/ruby/2.4.0/timeout.rb:97:in `join': No live threads left. Deadlock? (fatal) 1 threads, 1 sleeps current:0x00000000019205e0 main thread:0x00000000019205e0 * # rb_thread_t:0x00000000019205e0 native:0x00007f900a082700 int:0 /usr/lib/ruby/2.4.0/timeout.rb:97:in `join' /usr/lib/ruby/2.4.0/timeout.rb:97:in `ensure in block in timeout' /usr/lib/ruby/2.4.0/timeout.rb:97:in `block in timeout' /usr/lib/ruby/2.4.0/timeout.rb:33:in `block in catch' /usr/lib/ruby/2.4.0/timeout.rb:33:in `catch' /usr/lib/ruby/2.4.0/timeout.rb:33:in `catch' /usr/lib/ruby/2.4.0/timeout.rb:108:in `timeout' ./test.rb:11:in `
' from /usr/lib/ruby/2.4.0/timeout.rb:97:in `ensure in block in timeout' from /usr/lib/ruby/2.4.0/timeout.rb:97:in `block in timeout' from /usr/lib/ruby/2.4.0/timeout.rb:33:in `block in catch' from /usr/lib/ruby/2.4.0/timeout.rb:33:in `catch' from /usr/lib/ruby/2.4.0/timeout.rb:33:in `catch' from /usr/lib/ruby/2.4.0/timeout.rb:108:in `timeout' from ./test.rb:11:in `
' ``` The simpler repro, where I'm obviously not doing anything I shouldn't be doing in the signal handler: ``` #!/usr/bin/env ruby require 'timeout' trap(:CHLD) { } x = 0 while true puts "loop #{x += 1}" pid = Process.spawn('sleep 1') Timeout.timeout(30) do Process.waitpid(pid) end end ``` A slightly more complex repro that I'm still pretty sure what I'm doing in the signal handler is okay, but ends up hanging: ``` #!/usr/bin/env ruby require 'timeout' self_pipe = IO.pipe signal_queue = [] def wake_up(self_pipe) self_pipe[1].write_nonblock('.', exception: false) end trap(:CHLD) { signal_queue << :CHLD; wake_up(self_pipe) } signal_processor = Thread.new do loop do self_pipe[0].read(1) signal_queue.pop end end x = 0 while true puts "loop #{x += 1}" pid = Process.spawn('sleep 1') Timeout.timeout(30) do Process.waitpid(pid) end end ``` In either case, it can take many loops before it fails, up to a few hundred. I've reproed on both Ubuntu Xenial, and macOS 10.12.6 (the former with ruby 2.4.2, the latter with ruby 2.4.1). -- https://bugs.ruby-lang.org/ Unsubscribe: