From: ko1@...
Date: 2017-07-13T04:29:50+00:00
Subject: [ruby-core:82023] [Ruby trunk Feature#13434] better method	definition in C API

Issue #13434 has been updated by ko1 (Koichi Sasada).


I wrote the following sentences in hastily so sorry if it has English grammar problems.

normalperson (Eric Wong) wrote:
>  Sorry, I wasn't sure what you wanted the last time this came up.
>  I guess it was around https://bugs.ruby-lang.org/issues/11339
>  Particularly:
>  [ruby-core:69990] https://public-inbox.org/ruby-core/55A72930.40305@atdot.net/
>  
>  I remember not liking Ricsin syntax, but maybe a Ruby API can
>  be better than Ricsin...

Yes. and I understand.

>  Yes, that would be great.  However, we will take into account
>  fork and CoW savings.

Maybe this table will be read-only table so that no CoW issue.

>  Should I try to implement your table idea?  Or did you already
>  start?  You are more familiar with this, but maybe I can try...

Sure. But we need to consider the strategy about this issue.
Note that lazy table loading is common with the following approach S1 and S2.

(Strategy-1) Define a table in C and use it.

This is very straight forward approach. Define table;

```
struct method_define_table_entry {
  const char *method_name;
  ID method_id;            /* we can collaborate with id.c for built-in methods. */
                           /* if we can't (extension libraries, method_name is used */
  func_type func();
  int arity;
  method_type type; /* maybe union of method type bits. visibility, and more */
}
```

There are no jump from current implementation. But not so much fruits.

(Strategy-2: S2) Use ISeq binaries also for C methods.

To use keyword (and rest) arguments optimization for ISeq in C methods, we need to make ISeq wrapper. To achieve this goal, we can wrap C methods with ISeq. In otherwords, C methods are implemented as normal ISeq type methods and invoke them with new insn (or `opt_call_c_function`). Compiled ISeq can be dumped with binary translation and MRI can load it.

We can aggregates all of binary and method table only knows the index of iseq.

```
struct method_define_table_entry {
  long iseq_entry_index; /* because iseq knows it name. */
                         /* however, if we want to encourage pre-defined ID,
                            then we can add ID on it */
};
```

Of course, we don't need to define struct, but only array is enough.

We have further advantage with this approach.

* we can note method parameters like normal Ruby methods.
* we can use exception handling in Ruby. `rb_protect()`  and so on is difficult to use (and slow).
* (spec change) we can put such definition locations (like lib/built-in/string.rb) in backtrace. <- we need to discuss it is preferable or not.
* we can cleanup most of C-func related codes because all methods will be unified to iseq. For example, we can use same trace point probes.

BTW I had introduced `VM_FRAME_FLAG_CFRAME` last year because to achieve this approach.

Issues on this approach:

* ISeq call is slower than C-call because several problems. I believe we can overcome this issue.
* Current ISeq binary dumper is not space efficient (dumped iseq is huge because I don't use any compression techniques). Of course we can improve it (but we need to care about loading time).

>  In addition to improved kwarg handling for C methods, my other
>  goal is to be able to mark read-only/use-once/const/etc. args to
>  avoid unnecessary allocations at runtime.  This will be more
>  flexible than current optimizations (opt_aref_with, opt_aset_with, etc).

Sure. That is also my goal in long time (build a knowledge database of C-implemented behavior). With chatting with nobu, we consider several notation.

```
class String

  # @pure func <- comment notation
  def to_s
    C.attr :pure # <- method notation it doesn't affect run-time behavior.
    self
  end


  # rep: ... <- comment notation
  def gsub(pat, rep = nil)
    C.attr(rep: %i(const dont_escape)) # <- method notation
    C.call :str_gsub(pat, rep)
    # or (*1)
    if rep
      C.call :str_gsub_with_repl(pat, repl)
    else
      C.call :str_gsub_with_block(pat)
    end
  end
end
```

*1: For performance, we may need to introduce special form to branch by arguments. But it should be only performance critical case, such as `Array#[]`.


----------------------------------------
Feature #13434: better method definition in C API
https://bugs.ruby-lang.org/issues/13434#change-65752

* Author: normalperson (Eric Wong)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Current ways to define and parse arguments in the Ruby C API are clumsy,
slow, and impede potential optimizations.

The current C API for defining (rb_define_{singleton_}, method),
and parsing (rb_scan_args, rb_get_kwargs) is orthogonal but inefficient.

rb_get_kwargs creates garbage which pure Ruby kwarg methods do not.
[Feature #11339] was an ugly workaround to use Ruby wrapper methods
for IO#*nonblock methods to avoid garbage from rb_get_kwargs.

Furthermore, it should be possible to annotate args for C functions as
"read-only, use-once" or similar.  In other words, it should be possible to
implement my idea from [ruby-core:80626] where method lookup can be done
out-of-order in some cases, and allow optimizations such as replacing
"putstring" insns with garbage-free "putobject" insns for constants strings
without introducing backwards incompatibility for Rubyists.

We can also get rid of the limited basic op redefinition checks and
implement more generic versions of opt_aref_with / opt_aset_with
for more functions that can take frozen string args.

The "read-only, use-once" annotation can even make it safe for
a dynamic strings to be immediately recycled to reduce garbage.

So we could annotate "puts" and IO#write in a way that causes the VM to
immediately recycle its argument if it's a dynamically-generated string:

	puts "#{dynamic} #{string(:here)}"

I am not good at API design; so I'm not sure what it should look like.

Perhaps sendmsg_nonblock may be implemented like:

```
struct rb_method_info {
    /* to be filled in by rb_def_method ... */
};

static VALUE
sendmsg_nonblock(struct rb_method_info *info, int argc, VALUE *argv, VALUE self)
{
    VALUE mesg, flags, dest_sockaddr, control, exception;

    rb_get_args(info, argc, argv,
		&mesg, &flags, &dest_sockaddr, &control, &exception);

    ...
}

/*
 * ALLCAPS variable names mean read-only (like "constants" in Ruby)
 * "1" prefix means use only once, eligible for immediately recycle
 * if dynamic string
 */

rb_def_method(rb_cBasickSocket, sendmsg_nonblock,
              "sendmsg_nonblock(1MESG "
				"1FLAGS = 0), "
				"1DEST_SOCKADDR = nil), "
				"*1CONTROL, exception: true)", -1);

/* rb_hash_aset can be done as:
 * where 0KEY (not "1" prefix) means it is constant and persistent,
 * and "val" (all lower case, no prefix) means it is a normal
 * variable which can persistent after the function returns
 */
rb_def_method(rb_Hash, rb_hash_aset, "[0KEY]=val", 2);
```

Thoughts?

The existing C API must continue to work, so 3rd-party extensions can
migrate to the new API slowly.


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>