[#46930] [ruby-trunk - Bug #6825][Open] forking and pthread_cond_timedwait: Invalid argument (EINVAL) on OS X / 1.9.3-p194 — "xentronium (Mark A)" <markizko@...>

29 messages 2012/08/02

[#46974] [ruby-trunk - Bug #6830][Assigned] test failure test_constants(OpenSSL::TestConfig) [/ruby/test/openssl/test_config.rb:27] on Mac + homebrew — "kosaki (Motohiro KOSAKI)" <kosaki.motohiro@...>

17 messages 2012/08/04

[#46975] [ruby-trunk - Bug #6831][Assigned] test_getpwuid() on Mountain Lion — "kosaki (Motohiro KOSAKI)" <kosaki.motohiro@...>

12 messages 2012/08/04

[#46996] [ruby-trunk - Bug #6836][Assigned] Improve File.expand_path performance in Windows — "luislavena (Luis Lavena)" <luislavena@...>

15 messages 2012/08/04

[#47036] [ruby-trunk - Feature #6841][Open] Shorthand for Assigning Return Value of Method to Self — "wardrop (Tom Wardrop)" <tom@...>

18 messages 2012/08/07

[#47108] [ruby-trunk - Feature #6852][Open] [].transpose should behave specially — "boris_stitnicky (Boris Stitnicky)" <boris@...>

13 messages 2012/08/10

[#47138] [ruby-trunk - Bug #6861][Open] ERB::Util.escape_html is not escaping single quotes — "spastorino (Santiago Pastorino)" <santiago@...>

14 messages 2012/08/12

[#47163] [ruby-trunk - Bug #6865][Open] GC::Profiler.report might create a huge String and invoke a few GC cycles — "Eregon (Benoit Daloze)" <redmine@...>

9 messages 2012/08/13

[#47189] [ruby-trunk - Feature #6868][Open] Make `do` in block syntax optional when the block is the last argument of a method and is not an optional argument — "alexeymuranov (Alexey Muranov)" <redmine@...>

8 messages 2012/08/14

[#47243] [ruby-trunk - Feature #6895][Open] TracePoint API — "ko1 (Koichi Sasada)" <redmine@...>

27 messages 2012/08/20

[#47267] [ruby-trunk - Bug #6903][Open] [[Ruby 1.9:]] --enable-load-relative broken on systems with /lib64 — "mpapis (Michal Papis)" <mpapis@...>

11 messages 2012/08/22

[#47309] [ruby-trunk - Bug #6929][Open] Documentation for Ripper — "zzak (Zachary Scott)" <zachary@...>

16 messages 2012/08/25

[#47345] [ruby-trunk - Feature #6946][Open] FIPS support? — "vo.x (Vit Ondruch)" <v.ondruch@...>

35 messages 2012/08/28

[ruby-core:46987] [ruby-trunk - Bug #6278] in `join': deadlock detected (fatal)

From: "thedarkone (Vit Z)" <thedarkone2@...>
Date: 2012-08-04 13:53:50 UTC
List: ruby-core #46987
Issue #6278 has been updated by thedarkone (Vit Z).

File deadlock_detection_fix.patch added


=begin
I've tracked this down to a spinlock/deadlock-detection race condition.

Here's a much shorter script to reproduce:

    require 'thread'
    
    class SyncHash < Hash
      WRITE_LOCK = Mutex.new
      
      def put_if_absent(key, value)
        unless self[key]
          WRITE_LOCK.synchronize do
            self[key] = value
          end
        end
      end
    end
    
    keys = []
    250_000.times {|i| keys << i}
    thread_count = 20
    10.times do |i|
      h = SyncHash.new
    
      (1..thread_count).map do
        Thread.new do
          my_keys = keys.shuffle
          my_keys.each {|k| h.put_if_absent(k, k)}
        end
      end.map(&:join)
    
      puts "---#{i}---"
    end

Typical run:
    $ ruby deadlock.rb 
    ---0---
    ---1---
    ---2---
    deadlock.rb:21:in `join': deadlock detected (fatal)
    	from deadlock.rb:21:in `map'
    	from deadlock.rb:21:in `block in <main>'
    	from deadlock.rb:18:in `times'
    	from deadlock.rb:18:in `<main>'

What happens is that a thread waiting for a lock times out from (({native_cond_timedwait})), decrements (({mutex->cond_waiting})) to (({0})) and blocks on (({GVL_UNLOCK_END();})) waiting to acquire the GVL. Meanwhile some other thread (while holding GVL) releases the lock ((({mutex->th = 0}))), calls (({rb_check_deadlock})) and declares a deadlock.

I'm attaching a proposed patch, that adds a (({spinlock_waiting_gvl})) flag to the thread: (({deadlock_detection_fix.patch})).
=end

----------------------------------------
Bug #6278: in `join': deadlock detected (fatal)
https://bugs.ruby-lang.org/issues/6278#change-28644

Author: [email protected] (Kurt Miller)
Status: Assigned
Priority: Normal
Assignee: ko1 (Koichi Sasada)
Category: 
Target version: 
ruby -v: ruby 1.9.3p125 (2012-02-16 revision 34643) [i386-openbsd]


There is a race condition between native_sleep(), thread_start_func_2() where thread_start_func_2() holds the gvl lock, then in this bit of code:

514             /* wake up joining threads */
515             join_th = th->join_list_head;
516             while (join_th) {
517                 if (join_th == main_th) errinfo = Qnil;
518                 rb_threadptr_interrupt(join_th);
519                 switch (join_th->status) {
520                   case THREAD_STOPPED: case THREAD_STOPPED_FOREVER:
521                     printf("%p %d\n", join_th, join_th->status);
522                     join_th->status = THREAD_RUNNABLE;
523                   default: break;
524                 }
525                 join_th = join_th->join_list_next;
526             }
527
528             rb_threadptr_unlock_all_locking_mutexes(th);
529             if (th != main_th) rb_check_deadlock(th->vm);

The other thread in native_sleep() is interrupted via the rb_threadptr_interrupt(join_th) call above.
Then the join_th->status is set to THREAD_RUNNABLE above.
The other thread blocks trying to get the gvl lock in native_sleep():

914         GVL_UNLOCK_END();

Then the thread in thread_start_func_2() calls rb_check_deadlock():

529             if (th != main_th) rb_check_deadlock(th->vm);

which thinks the thread in native_sleep is deadlocked since it is not in the correct state.

Here are the backtraces for the two threads, I added abort() in check_deadlock_i() when it decides there is a deadlock for a thread:

(gdb) thread 1
[Switching to thread 1 (process 930810973)]#0  0x0d9b248d in kill () from /usr/lib/libc.so.63.0
(gdb) bt
#0  0x0d9b248d in kill () from /usr/lib/libc.so.63.0
#1  0x0da1c925 in abort () at /usr/src/lib/libc/stdlib/abort.c:68
#2  0x0ddd5095 in check_deadlock_i (key=2081393040, val=695888864, found=0x7d300f38) at thread.c:4738
#3  0x0dd6b9ce in st_foreach (table=0x7c162d20, func=0xddd503f <check_deadlock_i>, arg=2100301624)
    at st.c:787
#4  0x0ddd526a in rb_check_deadlock (vm=0x7c034a00) at thread.c:4785
#5  0x0ddce8d1 in thread_start_func_2 (th=0x7c64be00, stack_start=0x7d301000) at thread.c:529
#6  0x0ddcd51e in thread_start_func_1 (th_ptr=0x7c64be00) at thread_pthread.c:653
#7  0x097a948e in _rthread_start (v=0x7c60f800) at rthread.c:113
#8  0x0d99ae81 in __tfork_thread () from /usr/lib/libc.so.63.0
(gdb) frame 5
#5  0x0ddce8d1 in thread_start_func_2 (th=0x7c64be00, stack_start=0x7d301000) at thread.c:529
529             if (th != main_th) rb_check_deadlock(th->vm);

(gdb) thread 2
[Switching to thread 2 (process 207424605)]#0  0x0d9a9715 in _thread_sys___thrsleep ()
(gdb) bt
#0  0x0d9a9715 in _thread_sys___thrsleep () from /usr/lib/libc.so.63.0
#1  0x097a7190 in pthread_cond_wait (condp=0x7c034a10, mutexp=0x7c034a08) at rthread_sync.c:473
#2  0x0ddcd0b4 in native_cond_wait (cond=0x7c034a10, mutex=0x7c034a08) at thread_pthread.c:304
#3  0x0ddccb15 in gvl_acquire_common (vm=0x7c034a00) at thread_pthread.c:61
#4  0x0ddccb7f in gvl_acquire (vm=0x7c034a00, th=0x7c034200) at thread_pthread.c:79
#5  0x0ddcd848 in native_sleep (th=0x7c034200, timeout_tv=0x0) at thread_pthread.c:914
#6  0x0ddcf2ea in sleep_forever (th=0x7c034200, deadlockable=1) at thread.c:856
#7  0x0ddcef97 in thread_join_sleep (arg=3485240628) at thread.c:689
#8  0x0dcbdda7 in rb_ensure (b_proc=0xddcef52 <thread_join_sleep>, data1=3485240628,
    e_proc=0xddceee6 <remove_from_join_list>, data2=3485240628) at eval.c:744
#9  0x0ddcf096 in thread_join (target_th=0x7c393600, delay=1e+30) at thread.c:722
#10 0x0ddcf191 in thread_join_m (argc=0, argv=0x7c05003c, self=2087355560) at thread.c:803
#11 0x0ddb7125 in call_cfunc (func=0xddcf113 <thread_join_m>, recv=2087355560, len=Variable "len" is not available.
) at vm_insnhelper.c:317
#12 0x0ddbf1bc in vm_call_method (th=0x7c034200, cfp=0x7c0cfef8, num=0, blockptr=0x1, flag=0, id=4736,
    me=0x7c166220, recv=2087355560) at vm_insnhelper.c:404
#13 0x0ddc2a60 in vm_exec_core (th=0x7c034200, initial=Variable "initial" is not available.
) at insns.def:1015
#14 0x0ddc9401 in vm_exec (th=0x7c034200) at vm.c:1220
#15 0x0ddc9bc0 in invoke_block_from_c (th=0x7c034200, block=0x7c0cff90, self=2081428100, argc=1,
    argv=0xcfbc9700, blockptr=0x0, cref=0x0) at vm.c:624
#16 0x0ddca167 in rb_yield (val=2087355560) at vm.c:654
#17 0x0dc8cad2 in rb_ary_each (array=2087355600) at array.c:1478
#18 0x0ddb710c in call_cfunc (func=0xdc8ca80 <rb_ary_each>, recv=2087355600, len=Variable "len" is not available.
) at vm_insnhelper.c:320
#19 0x0ddbf1bc in vm_call_method (th=0x7c034200, cfp=0x7c0cff7c, num=0, blockptr=0x7c0cff91, flag=0,
    id=424, me=0x7c135940, recv=2087355600) at vm_insnhelper.c:404
#20 0x0ddc2a60 in vm_exec_core (th=0x7c034200, initial=Variable "initial" is not available.
) at insns.def:1015
#21 0x0ddc9401 in vm_exec (th=0x7c034200) at vm.c:1220
#22 0x0ddc95b7 in rb_iseq_eval_main (iseqval=2082861760) at vm.c:1461
#23 0x0dcbea29 in ruby_exec_internal (n=0x7c25f2c0) at eval.c:204
#24 0x0dcbea55 in ruby_exec_node (n=0x7c25f2c0) at eval.c:251
#25 0x0dcbfe2e in ruby_run_node (n=0x7c25f2c0) at eval.c:244
#26 0x1c000984 in main (argc=4, argv=0xcfbca128) at main.c:38
(gdb) frame 5
#5  0x0ddcd848 in native_sleep (th=0x7c034200, timeout_tv=0x0) at thread_pthread.c:914
914         GVL_UNLOCK_END();

$ ruby19 -v
ruby 1.9.3p125 (2012-02-16 revision 34643) [i386-openbsd]

To reproduce install ruby-sequel from http://sequel.rubyforge.org/

$ cat cpu_bench.rb
require 'sequel'

def a
  1000.times{|j| Sequel.connect(:adapter=>:mock){}}
end

(0..10).map{Thread.new{a}}.each{|t| t.join}

$ cat doit
until [ $? -ne 0 ]; do
ruby19 -I sequel/lib cpu_bench.rb
done
echo $?



-- 
http://bugs.ruby-lang.org/

In This Thread