Skip to content

Problem with installing stackprof with native extensions #2044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matadcze opened this issue Jul 7, 2020 · 15 comments
Closed

Problem with installing stackprof with native extensions #2044

matadcze opened this issue Jul 7, 2020 · 15 comments

Comments

@matadcze
Copy link

matadcze commented Jul 7, 2020

Hi there!

I am playing a lot with truffle-ruby last time and during the installation of the gem, I have encountered a problem.

I have in the Gemfile

gem 'rbtrace'
gem 'stackprof'
gem 'flamegraph'
Installing stackprof 0.2.15 with native extensions
Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

current directory:
/gems/truffleruby/20.1.0/gems/stackprof-0.2.15/ext/stackprof
/opt/graalvm-ce-java11-20.1.0/languages/ruby/bin/truffleruby -I
/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/mri -r
./siteconf20200707-1-1x4ur8g.rb extconf.rb
checking for rb_postponed_job_register_one()... no
extconf.rb:8:in `<main>': missing API: are you using ruby 2.1+? (RuntimeError)
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:
	--with-opt-dir
	--without-opt-dir
	--with-opt-include
	--without-opt-include=${opt-dir}/include
	--with-opt-lib
	--without-opt-lib=${opt-dir}/lib
	--with-make-prog
	--without-make-prog
	--srcdir=.
	--curdir
	--ruby=/opt/graalvm-ce-java11-20.1.0/languages/ruby/bin/truffleruby

Contents of mkmf.log:
have_func: checking for rb_postponed_job_register_one()... --------------------
no

"/opt/graalvm-ce-java11-20.1.0/languages/llvm/native/bin/graalvm-native-clang -o
conftest -I/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext/include
-I/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext/include/ruby/backward
-I/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext/include -I.
-Wimplicit-function-declaration -Wno-int-conversion -Wno-int-to-pointer-cast
-Wno-incompatible-pointer-types -Wno-format-invalid-specifier
-Wno-format-extra-args -ferror-limit=500 conftest.c  -L.
-L/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib  -rpath
/opt/graalvm-ce-java11-20.1.0/languages/llvm/native/lib
-L/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext -rpath
/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext -ltruffleruby
-lpolyglot-mock   "
checked program was:
/* begin */
1: #include "ruby.h"
2:
3: int main(int argc, char **argv)
4: {
5:   return 0;
6: }
/* end */

conftest.c:14:57: error: use of undeclared identifier
'rb_postponed_job_register_one'
int t(void) { void ((*volatile p)()); p = (void
((*)()))rb_postponed_job_register_one; return !p; }
                                                        ^
1 error generated.
"/opt/graalvm-ce-java11-20.1.0/languages/llvm/native/bin/graalvm-native-clang -o
conftest -I/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext/include
-I/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext/include/ruby/backward
-I/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext/include -I.
-Wimplicit-function-declaration -Wno-int-conversion -Wno-int-to-pointer-cast
-Wno-incompatible-pointer-types -Wno-format-invalid-specifier
-Wno-format-extra-args -ferror-limit=500 conftest.c  -L.
-L/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib  -rpath
/opt/graalvm-ce-java11-20.1.0/languages/llvm/native/lib
-L/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext -rpath
/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext -ltruffleruby
-lpolyglot-mock   "
Process failed: #<Process::Status: pid 4351 exit 1>
checked program was:
/* begin */
 1: #include "ruby.h"
 2:
 3: /*top*/
 4: extern int t(void);
 5: int main(int argc, char **argv)
 6: {
 7:   if (argc > 1000000) {
 8:     int (* volatile tp)(void)=(int (*)(void))&t;
 9:     printf("%d", (*tp)());
10:   }
11:
12:   return 0;
13: }
14: int t(void) { void ((*volatile p)()); p = (void
((*)()))rb_postponed_job_register_one; return !p; }
/* end */

ld.lld: error: undefined symbol: rb_postponed_job_register_one
>>> referenced by conftest.c:15
>>>               lto.tmp:(t)
clang-9: error: linker command failed with exit code 1 (use -v to see
invocation)
"/opt/graalvm-ce-java11-20.1.0/languages/llvm/native/bin/graalvm-native-clang -o
conftest -I/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext/include
-I/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext/include/ruby/backward
-I/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext/include -I.
-Wimplicit-function-declaration -Wno-int-conversion -Wno-int-to-pointer-cast
-Wno-incompatible-pointer-types -Wno-format-invalid-specifier
-Wno-format-extra-args -ferror-limit=500 conftest.c  -L.
-L/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib  -rpath
/opt/graalvm-ce-java11-20.1.0/languages/llvm/native/lib
-L/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext -rpath
/opt/graalvm-ce-java11-20.1.0/languages/ruby/lib/cext -ltruffleruby
-lpolyglot-mock   "
Process failed: #<Process::Status: pid 4355 exit 1>
checked program was:
/* begin */
 1: #include "ruby.h"
 2:
 3: /*top*/
 4: extern int t(void);
 5: int main(int argc, char **argv)
 6: {
 7:   if (argc > 1000000) {
 8:     int (* volatile tp)(void)=(int (*)(void))&t;
 9:     printf("%d", (*tp)());
10:   }
11:
12:   return 0;
13: }
14: extern void rb_postponed_job_register_one();
15: int t(void) { rb_postponed_job_register_one(); return 0; }
/* end */

--------------------


To see why this extension failed to compile, please check the mkmf.log which can
be found here:

/gems/truffleruby/20.1.0/extensions/x86_64-linux/20.1.0/stackprof-0.2.15/mkmf.log

extconf failed, exit code 1

Gem files will remain installed in
/gems/truffleruby/20.1.0/gems/stackprof-0.2.15 for inspection.
Results logged to
/gems/truffleruby/20.1.0/extensions/x86_64-linux/20.1.0/stackprof-0.2.15/gem_make.out

An error occurred while installing stackprof (0.2.15), and Bundler cannot
continue.
Make sure that `gem install stackprof -v '0.2.15' --source
'https://rubygems.org/'` succeeds before bundling.

In Gemfile:
  stackprof
ERROR: Service 'web' failed to build: The command '/bin/sh -c bundle install' returned a non-zero code: 5

I use the docker container with this Dockerfile. From stackprof#sampling about rb_postponed_job_register_one

signal handlers enqueue a sampling job using rb_postponed_job_register_one. this ensures callstack samples can be taken safely, in case the VM is garbage collecting or in some other inconsistent state during the interruption.

I think that it can be an interesting problem to solve. I would like to fix this issue. Is anybody interested in solving this problem with me?

@eregon eregon added the cexts label Jul 7, 2020
@eregon
Copy link
Member

eregon commented Jul 7, 2020

Related: #1937

First, it's worth noting TruffleRuby already has a builtin CPU sampler via --cpusampler.
See https://github.com/oracle/graal/blob/master/truffle/docs/Profiling.md
and https://github.com/oracle/truffleruby/blob/master/doc/user/tools.md#cpu-sampler
Is there a specific reason you want to use stackprof or just want to use some profiler (and anything missing in CPUSampler)?

I'm not sure what's a good way forward for stackprof.

  • Do nothing, it seems unlikely stackprof can work on TruffleRuby, it relies quite a bit on MRI's stack sampling functionality and AFAIK no other Ruby implementation supports it. Workaround for users is gem 'stackprof', platforms: :mri.
  • Make the gem install, but actually it would fail at runtime. Not sure it's helpful. Maybe print a message when installing that --cpusampler can be used instead?
  • Make the gem work without changing the gem (or not much), seems hard and seems very inefficient to collect stackframes via rb_profile_frames(). At least it wouldn't be allocation-free. We would need to wrap the Java data structures in interop objects to mimic the C API, and run quite a bit of the C extension code just to sample. I'm not sure what rb_postponed_job_register_one does. Sounds like a way to register some task after the signal handler returns?
  • Change the gem to use different APIs on TruffleRuby. For context, we have the CPUSampler tool which provides efficient sampling in Truffle. There is CPUSampler#takeSample. It could be a Ruby API, I don't see the point of using a C API here as it would just add overhead. And it could be a regular Ruby Thread or Ruby signal handler (which would need to run on its own thread for efficiency) that regularly captures the stacktraces (there is no GIL, so not an issue). Or we could even let CPUSampler capture every millisecond for some duration and then get all the data out at once, that would be a lot more efficient and avoid affecting (and potentially bias) the application with the extra logic to capture at a frequent interval. StackProf.run(mode: :cpu, out: 'tmp/stackprof-cpu-myapp.dump') could be supported by just dumping the existing CPUSampler JSON output there, and then add a way to interpret that JSON when reporting.

I'm not sure how valuable it is to get StackProf.run(mode: :cpu, out: 'tmp/stackprof-cpu-myapp.dump') working since --cpusampler on the command line can achieve similar things, yet without even needing to change the application code.
To have more control from when to when the profiler runs? Or to use StackProf's reporting? Something else?

@boris-spas FYI

@matadcze
Copy link
Author

matadcze commented Jul 18, 2020

Is there a specific reason you want to use stackprof or just want to use some profiler (and anything missing in CPUSampler)?

There is not a specific reason. I just wanted to experiment with some profiler and try to speed up some specs only for experimental development.

I was thinking about the issue, and I need to say that you are right. I hadn't known about --cpusampler before. I checked Profiling and there is even FlameGraph, so everything that developer can dream.

@eregon
Copy link
Member

eregon commented Jul 20, 2020

Thanks for the reply.
Maybe we should find a way to improve the discoverability of --cpusampler?

One thought could be to print a message when stackprof is installed.
As a variant maybe we could change stackprof to print such a message at runtime when run on TruffleRuby, since it doesn't seem likely supporting StackProf's current way of profiling on TruffleRuby is worth it in terms of overhead.

@matadcze
Copy link
Author

As a variant maybe we could change stackprof to print such a message at runtime when run on TruffleRuby, since it doesn't seem likely supporting StackProf's current way of profiling on TruffleRuby is worth it in terms of overhead.

I think that it is a good idea to show some information about profiling from TruffleRuby. Let's create some PR to stackprof.

@matadcze
Copy link
Author

matadcze commented Aug 1, 2020

I have some issues with --cpusampler and profiling in general. I run code on truffleruby 20.1.0, like ruby 2.6.5, GraalVM EE Native [x86_64-darwin with this testing code

require 'chunky_png'

a = ChunkyPNG::Image.from_file('a.png')
b = ChunkyPNG::Image.from_file('b.png')
c = a.compose(b, 0, 0)
c.save('c.png')

by invoking
ruby test.rb --cpusampler --cpusampler.Output=json > output.json.
I have downloaded some example png files from the Internet. It takes forever on TruffleRuby, and I do not see any results. On MRI, after two seconds, I can see two merged photos but without profiling, of course.

I followed https://www.graalvm.org/docs/tools/profiler for Javascript without problems, but for Ruby, it was very problematic.
Maybe it can be a problem with memory? I am not sure.

@eregon
Copy link
Member

eregon commented Aug 1, 2020

but for Ruby, it was very problematic.

Any issue, except that this specific program took too much memory?

See https://github.com/oracle/truffleruby/blob/master/doc/user/debugging.md to find out what TruffleRuby is doing if it appears to hang.

You can limit memory with e.g. --vm.Xmx2g (by default Native Image will use 80% of the memory as max heap).
So on this example, I see:
(Images used in https://gist.github.com/eregon/149116794343a15bbf606df27c54a82d )

$ ruby --cpusampler --vm.Xmx2g bench_chunky.rb
[ruby] WARNING OutOfMemoryError
/home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:406:in `setbyte': failed to allocate memory (NoMemoryError)
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:406:in `block in encode_png_str_scanline_up'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:404:in `downto'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:404:in `encode_png_str_scanline_up'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:228:in `block in encode_png_image_pass_to_stream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:225:in `downto'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:225:in `encode_png_image_pass_to_stream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:174:in `encode_png_image_without_interlacing'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:161:in `encode_png_pixelstream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:85:in `to_datastream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/image.rb:62:in `to_datastream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:35:in `write'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:43:in `block in save'
	from <internal:core> core/io.rb:729:in `open'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:43:in `save'
	from bench_chunky.rb:6:in `<main>'
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 2895 samples with period 1ms.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
  Opt %: Percent of time spent in compiled and therefore non-interpreted code.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Thread: Thread[main,5,main]
 Name                                                                                |      Total Time     |  Opt % ||       Self Time     |  Opt % | Location             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 block in ChunkyPNG::Canvas::PNGEncoding#encode_png_str_scanline_up                  |       1340ms  46.3% |  25.6% ||       1340ms  46.3% |  25.6% | ../../../.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb~404:20111-20145 
 ChunkyPNG::Canvas::PNGDecoding#decode_png_pixels_from_scanline_truecolor_alpha_8bit |        428ms  14.8% |   0.0% ||        428ms  14.8% |   0.0% | ../../../.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_decoding.rb~252-254:12560-12716 
 <main>                                                                              |       2895ms 100.0% |   0.0% ||        359ms  12.4% |   0.0% | bench_chunky.rb~1-6:0-140 
 <top (required)>                                                                    |        195ms   6.7% |   0.0% ||        110ms   3.8% |   0.0% | ../../../.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png.rb~1-179:0-5528 

and to get more details:

$ ruby --vm.Xmx2g --experimental-options --cpusampler --cpusampler.Mode=roots --cpusampler.SampleInternal bench_chunky.rb
[ruby] WARNING OutOfMemoryError
/home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:406:in `setbyte': failed to allocate memory (NoMemoryError)
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:406:in `block in encode_png_str_scanline_up'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:404:in `downto'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:404:in `encode_png_str_scanline_up'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:228:in `block in encode_png_image_pass_to_stream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:225:in `downto'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:225:in `encode_png_image_pass_to_stream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:174:in `encode_png_image_without_interlacing'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:161:in `encode_png_pixelstream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:85:in `to_datastream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/image.rb:62:in `to_datastream'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:35:in `write'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:43:in `block in save'
	from <internal:core> core/io.rb:729:in `open'
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_encoding.rb:43:in `save'
	from bench_chunky.rb:6:in `<main>'
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 3510 samples with period 1ms.
  Self Time: Time spent on the top of the stack.
  Total Time: Time spent somewhere on the stack.
  Opt %: Percent of time spent in compiled and therefore non-interpreted code.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Thread: Thread[main,5,main]
 Name                                                                                |      Total Time     |  Opt % ||       Self Time     |  Opt % | Location             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 String#setbyte                                                                      |       1083ms  30.9% |  40.7% ||       1083ms  30.9% |  40.7% | (core)~1:0 
 String#setbyte                                                                      |        685ms  19.5% |   7.2% ||        685ms  19.5% |   7.2% | (core)~1:0 
 String#unpack                                                                       |        212ms   6.0% |   0.0% ||        212ms   6.0% |   0.0% | (core)~1:0 
 String#unpack                                                                       |        205ms   5.8% |   0.0% ||        205ms   5.8% |   0.0% | (core)~1:0 
 String#end_with?                                                                    |        110ms   3.1% |   0.0% ||         78ms   2.2% |   0.0% | resource:/truffleruby/core/string.rb~500-514:13169-13651 
 Array#pack                                                                          |         43ms   1.2% |   0.0% ||         43ms   1.2% |   0.0% | (core)~1:0 
 Array#each                                                                          |         83ms   2.4% |   0.0% ||         40ms   1.1% |   0.0% | (core)~1:0 
 Range#each                                                                          |        184ms   5.2% |   0.0% ||         39ms   1.1% |   0.0% | (core)~1:0 
 Kernel#gem_original_require                                                         |        740ms  21.1% |   0.0% ||         38ms   1.1% |   0.0% | resource:/truffleruby/core/kernel.rb~226-238:7091-7471 
 block (2 levels) in ChunkyPNG::Canvas::Operations#compose!                          |        145ms   4.1% |  11.0% ||         37ms   1.1% |  16.2% | ../../../.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/operations.rb~58:2331-2367 
 block in ChunkyPNG::Canvas::PNGDecoding#decode_png_str_scanline_up                  |         35ms   1.0% |  80.0% ||         33ms   0.9% |  81.8% | ../../../.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/canvas/png_decoding.rb~453:24009-24042 
 ChunkyPNG::Color#compose_quick                                                      |         57ms   1.6% |  17.5% ||         29ms   0.8% |   0.0% | ../../../.rubies/truffleruby-dev/lib/gems/gems/chunky_png-1.3.11/lib/chunky_png/color.rb~358-368:14300-14740 

Which sounds like we have hit a pathological case in setbyte that consumes way too much memory.

Using Ctrl+\ while it runs seems to show we spend a lot of time in:

  SP 00007ffcf5e86d60 IP 000055fae1e63bca  [image code] com.oracle.svm.core.genscavenge.ThreadLocalAllocation.allocateNewArray(ThreadLocalAllocation.java:241)
  SP 00007ffcf5e86da0 IP 000055fae1e646f8  [image code] com.oracle.svm.core.genscavenge.ThreadLocalAllocation.slowPathNewArrayWithoutAllocating(ThreadLocalAllocation.java:208)
  SP 00007ffcf5e86db0 IP 000055fae1e644d5  [image code] com.oracle.svm.core.genscavenge.ThreadLocalAllocation.slowPathNewArray(ThreadLocalAllocation.java:198)
  SP 00007ffcf5e86de0 IP 000055fae3e05036  [image code] org.truffleruby.core.rope.RopeOperations.flattenBytes(RopeOperations.java:298)
  SP 00007ffcf5e86e60 IP 000055fae3dd693c  [image code] org.truffleruby.core.rope.Rope.getBytesSlow(Rope.java:69)
  SP 00007ffcf5e86e70 IP 000055fae3e0438c  [image code] org.truffleruby.core.rope.ManagedRope.getBytes(ManagedRope.java:45)
  SP 00007ffcf5e86e70 IP 000055fae3e0438c  [image code] org.truffleruby.core.rope.RopeOperations.extractRange(RopeOperations.java:281)
  SP 00007ffcf5e86ea0 IP 00007f98ba5428e9  [block in ChunkyPNG::Canvas::PNGEncoding#encode_png_str_scanline_up] org.truffleruby.core.rope.RopeNodes$MakeSubstringRopeNode.makeSubstringNon7Bit(RopeNodes.java:253)
  SP 00007ffcf5e86ea0 IP 00007f98ba5428e9  [block in ChunkyPNG::Canvas::PNGEncoding#encode_png_str_scanline_up] org.truffleruby.core.rope.RopeNodesFactory$MakeSubstringRopeNodeGen.executeMake(RopeNodesFactory.java:429)
  SP 00007ffcf5e86ea0 IP 00007f98ba5428e9  [block in ChunkyPNG::Canvas::PNGEncoding#encode_png_str_scanline_up] org.truffleruby.core.rope.RopeNodes$SubstringNode.substringConcatRope(RopeNodes.java:201)
  SP 00007ffcf5e86ea0 IP 00007f98ba5428e9  [block in ChunkyPNG::Canvas::PNGEncoding#encode_png_str_scanline_up] org.truffleruby.core.rope.RopeNodesFactory$SubstringNodeGen.executeSubstring(RopeNodesFactory.java:140)
  SP 00007ffcf5e86ea0 IP 00007f98ba5428e9  [block in ChunkyPNG::Canvas::PNGEncoding#encode_png_str_scanline_up] org.truffleruby.core.rope.RopeNodes$SetByteNode.setByte(RopeNodes.java:1214)
  SP 00007ffcf5e86ea0 IP 00007f98ba5428e9  [block in ChunkyPNG::Canvas::PNGEncoding#encode_png_str_scanline_up] org.truffleruby.core.rope.RopeNodesFactory$SetByteNodeGen.executeSetByte(RopeNodesFactory.java:1869)
  SP 00007ffcf5e86ea0 IP 00007f98ba5428e9  [block in ChunkyPNG::Canvas::PNGEncoding#encode_png_str_scanline_up] org.truffleruby.core.string.StringNodes$SetByteNode.setByte(StringNodes.java:2149)
  SP 00007ffcf5e86ea0 IP 00007f98ba5428e9  [block in ChunkyPNG::Canvas::PNGEncoding#encode_png_str_scanline_up] org.truffleruby.core.string.StringNodesFactory$SetByteNodeFactory$SetByteNodeGen.execute(StringNodesFactory.java:5320)

@eregon
Copy link
Member

eregon commented Aug 1, 2020

This is a BINARY String on which setbyte is called, and that creates two SubstringRopes, which eagerly flatten every time to compute the CodeRange of each subrope (CR_7BIT or CR_VALID since it's binary, and the underlying contains non-7-bit bytes).

Maybe we can treat BINARY Strings specially, e.g., by computing the CodeRange lazily for them (like for NativeRope).
Or find a better way to compute the CodeRange without flattening in such a case.
Or have a buffer-like Rope that is mutable, much like NativeRope but still stored in managed memory.
Or let setbyte always return a NativeRope.

@matadcze
Copy link
Author

matadcze commented Aug 3, 2020

Maybe we can treat BINARY Strings specially, e.g., by computing the CodeRange lazily for them (like for NativeRope).
Or find a better way to compute the CodeRange without flattening in such a case.
Or have a buffer-like Rope that is mutable, much like NativeRope but still stored in managed memory.
Or let setbyte always return a NativeRope.

Thanks for your response. Your propositions sound like a good journey. GraalVM and its environments are pretty new for me, but I would like to dig deeper and learn.
Can you recommend me something about what to start?
Solution below

Or let setbyte always return a NativeRope.

sound like the easiest way of solving this problem, but yeah, exploring others can be interesting by itself.

@eregon
Copy link
Member

eregon commented Aug 4, 2020

Of these alternatives, I think:

Maybe we can treat BINARY Strings specially, e.g., by computing the CodeRange lazily for them (like for NativeRope),

Is one of the most general solutions and probably a good idea. Computing the CodeRange eagerly for BINARY seems wasteful, unless it's later converted to a non-BINARY String (but we could just compute it then). But since the CodeRange is currently encoded in the Rope class (e.g., AsciiOnlyLeafRope) it would need some non-trivial changes (probably a single LeafRope class?). It also begs the question whether we should make the CodeRange lazy for all Ropes when we don't immediately know and it's not just a literal.

Or find a better way to compute the CodeRange without flattening in such a case.

Seems hard for general cases. It seems there is no efficient way to compute the CodeRange for a substring of any String which is not CR_7BIT but also contains bytes >= 128 like here.

Or have a buffer-like Rope that is mutable, much like NativeRope but still stored in managed memory.

Could be valuable (also for other encodings) but probably high effort.

Or let setbyte always return a NativeRope.

The easiest to get started with.
Seems useful to experiment and see if this program/benchmark is better with just that or needs other performance fixes.
Using native memory has some disadvantages (e.g., slower allocation, harder to debug) but it's probably fine for now.

cc @nirvdrum who wrote a lot of the Rope code, any opinion?

@mc2dx let setbyte always return a NativeRope sounds like a good change to start if you would like to contribute.
Note there is StringToNativeNode which might be helpful here.
Please see https://github.com/oracle/truffleruby/blob/master/CONTRIBUTING.md to build TruffleRuby, etc and don't hesitate to ask questions on the GraalVM/TruffleRuby Slack.

@eregon
Copy link
Member

eregon commented Aug 14, 2020

I did not realize but the ChunkyPNG example is from https://github.com/oracle/truffleruby/blob/master/doc/user/tools.md
Probably we should use another example for profiling.

@matadcze
Copy link
Author

@eregon Indeed, I have used the example from this document. I will try to prepare some sets of illustrations for profiling.

@eregon
Copy link
Member

eregon commented Jun 4, 2021

I filed an issue to discuss on the stackprof repo: tmm1/stackprof#159
I think short term it would be nice to let the gem install, and fail at runtime with a clear message.

@eregon
Copy link
Member

eregon commented Aug 22, 2022

tmm1/stackprof#184 is merged upstream, so stackprof >= 0.2.21 installs fine on TruffleRuby.
At runtime, if StackProf.* is used it raises an error saying to use CPUSampler instead.

@eregon eregon closed this as completed Aug 22, 2022
@eregon eregon self-assigned this Aug 22, 2022
@eregon
Copy link
Member

eregon commented Aug 22, 2022

It works with the just-released stackprof 0.2.21:

gem i stackprof
Fetching stackprof-0.2.21.gem
Building native extensions. This could take a while...
Successfully installed stackprof-0.2.21
1 gem installed
gem i stackprof  11.01s user 0.41s system 369% cpu 3.088 total
$ ruby -rstackprof -e 'p :OK'
:OK
$ ruby -rstackprof -e 'p :OK; StackProf.run{}'
:OK
/home/eregon/.rubies/truffleruby-dev/lib/gems/gems/stackprof-0.2.21/lib/stackprof/truffleruby.rb:33:in `unimplemented':
Use --cpusampler=flamegraph or --cpusampler instead of StackProf on TruffleRuby. (RuntimeError)
See https://www.graalvm.org/tools/profiling/ and `ruby --help:cpusampler` for more details.
	from /home/eregon/.rubies/truffleruby-dev/lib/gems/gems/stackprof-0.2.21/lib/stackprof/truffleruby.rb:9:in `run'
	from -e:1:in `<main>'

@eregon
Copy link
Member

eregon commented Aug 22, 2022

Regarding the chunky_png benchmark mentioned in #2044 (comment), that's also solved since TruffleString was merged. Because we now use a MutableTruffleString on the first setbyte and then later we can just mutate the MutableTruffleString's bytes without creating new ConcatRopes/SubstringRopes.

CRuby: one-shot: 0.713s peak: 1.557i/s
TruffleRuby Native master: one-shot: 1.855s peak: 2.205i/s
TruffleRuby 22.2: had the issue, takes > 2min for one-shot.

So TruffleRuby is now faster in peak for that benchmark than CRuby.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants