From: Serge Balyuk Date: 2011-04-26T03:13:43+09:00 Subject: [ruby-core:35891] [Ruby 1.9 - Bug #4612][Open] Segmentation fault in fiber GC mark cycle Issue #4612 has been reported by Serge Balyuk. ---------------------------------------- Bug #4612: Segmentation fault in fiber GC mark cycle http://redmine.ruby-lang.org/issues/4612 Author: Serge Balyuk Status: Open Priority: Normal Assignee: Category: Target version: ruby -v: ruby 1.9.2p188 (2011-03-28 revision 31204) [x86_64-darwin10.7.0] ((|Fiber.current|)) can cause segfault on GC cycle when used in threads. Please find attached ruby sample which should help to pinpoint the problem. The coredump shows the following backtrace: .... #18 #19 0x000000010004cce5 in mark_locations_array (objspace=0x100838000, x=0x101dc8ab0, n=649) at gc.c:1315 #20 0x000000010004cea6 in gc_mark_locations (objspace=0x100838000, start=0x101dc8ab0, end=0x101dc9f00) at gc.c:1331 #21 0x000000010004f3dc in rb_gc_mark_machine_stack (th=0x1008a1048) at gc.c:2235 #22 0x0000000100177a4f in rb_thread_mark (ptr=0x1008a1048) at vm.c:1683 #23 0x000000010018179a in cont_mark (ptr=0x1008a1000) at cont.c:88 #24 0x0000000100181947 in fiber_mark (ptr=0x1008a1000) at cont.c:168 #25 0x000000010004dbb9 in gc_mark_children (objspace=0x100838000, ptr=4303907200, lev=1) at gc.c:1719 #26 0x000000010004d3fb in gc_mark (objspace=0x100838000, ptr=4303907200, lev=0) at gc.c:1514 #27 0x000000010004d428 in rb_gc_mark (ptr=4303907200) at gc.c:1520 #28 0x000000010017797f in rb_thread_mark (ptr=0x10049fa00) at vm.c:1673 .... It looks like in frame 28 ((|rb_thread_mark|)) is called on terminated thread, which is ok, but then it goes down to fiber's continuation member ((|saved_thread|)) in frame 22. I think saved_thread holds an earlier snapshot (still running) of the same thread that we see in 28 (thread_id are equal), and because its machine stack pointers are stale, ((|rb_gc_mark_machine_stack|)) starts marking inaccessible memory. One possible quick-fix is to set machine stack pointers to 0 in ((|cont_init()|)), given the original thread will take care of that stuff and free as needed. It cures segfaults, but I wonder if that doesn't break some other code. I was also wondering why continuation holds a copy of thread struct instead of a pointer to it. It's hard to correctly follow real thread life cycle with ((|saved_thread|)). So can it harm in other cases like the above? BTW, while following the code around fibers and continuations, I've found another curious thing: ((|fiber_free()|)) calls ((|cont_free(&fib->cont)|)) on its cont member, and ((|cont_free()|)) calls ((|ruby_xfree()|)) on it. Is that ok, given cont was allocated as part of fiber, and don't we need to ((|ruby_xfree|)) the ((|fib|)) itself? The attached patch is made against ruby_1_9_2 branch; trunk seems to have the segfault behavior too. -- http://redmine.ruby-lang.org