From: knu@...
Date: 2018-10-19T03:30:07+00:00
Subject: [ruby-core:89466] [Ruby trunk Feature#14781] Enumerator#generate

Issue #14781 has been updated by knu (Akinori MUSHA).


zverok (Victor Shepelev) wrote:
> @knu 
> The _ultimate_ goal for my proposal is, in fact, promoting Enumerator as a "Ruby way" for doing all-the-things with loops; not just "new useful feature". 
> 
> That's why I feel really uneasy about your changes to the proposal.

Thanks for your quick feedback, and for bringing up this issue.

> **drop**
> ```ruby
> # from: `drop: 2` is part of Enumerator.from API
> Enumerator.from([node], drop: 2, &:parent).map(&:name)
> # generate: `drop(2)` is part of standard Enumerator API
> Enumerator.generate(node, &:parent).take(6).map(&:name).drop(2)
> ```

I presume `.take(6)` is inserted by mistake, but with it or not the following map and drop methods belong to Enumerable, and are Array based operations that create an intermediate array per call.  So, I consider them as Array/Enumerable API rather than Enumerator API.  Creating intermediate arrays is not only a waste of memory but also against the key concept of Enumerator: to deal with an object as a stream, which may be infinite.

Adding `.lazy` before `.drop(2)` can be a cure, but then the value you get is a lazy enumerator that is incompatible with an non-lazy enumerator.  For instance, Lazy#map, Lazy#select etc. return Lazy objects, so you can't always pass one to methods that expect a normal Enumerable object.

I've always thought that Lazy#eager that turns a lazy enumerator back to a non-lazy enumerator would be nice, but `.lazy.map{}.eager` would look messy anyway.

> # implicit "stop on nil" is part of Enumerator.from convention that code reader should be aware of

I think it's good and reasonable default behavior to treat nil as an end.  Taking your Octokit example, the block could be `{ |response| response.rels[:next]&.get }` to make it go through all pages and automatically stop if nil were treated as an end.  You omitted a `.take_while` in the example, but you'd get an error if there were less than 3 pages.  You'd almost always need to either explicitly raise StopIteration in the initial block or chain `.take_while`/`.take` if there were no default end, and the choice between them is not obvious.

> **start with array** (I believe 1 and 0 initial values are the MOST used cases)
> ```ruby
> # from: we should start from empty array, expression nothing but Enumerator.from API limitation
> Enumerator.from([]) { 0 }.take(10)
> # generate: no start value
> Enumerator.generate { 0 }.take(10)

The limitation only came from what the word `from` sounds like.  I picked the name `from` and `Enumerator.from {}` just didn't sound right to me, so I made the argument mandatory.  You can just default the first argument to `[]` if it reads and writes better, possibly with a different name than `from` which I won't insist on.

> # from: work with one value requires not forgetting to arrayify it 
> Enumerator.from([1], &:succ).take(10)
> # generate: just use the value
> Enumerator.generate(1, &:succ).take(10)

Yeah, due to our keyword arguments being pseudo ones, you can't use variable length arguments for a list of objects that might end with a hash.  We'll hopefully be getting it right by Ruby 3.0.

There's much room for consideration of the name and method signature.  Perhaps multiple factory methods could work better.

> # from: "we pass as much of previous values as initial array had" convention
> Enumerator.from([0, 1]) { |i, j| i + j }.take(10)
> # generate: regular value enumeration, next block receives exactly what previous returns
> Enumerator.generate([0, 1]) { |i, j| [j, i + j] }.take(10).map(&:last)
> # ^ yes, it will require additional trick to include 0 in final result, but I believe this is worthy sacrifice
> ```

The former directly generates an infinite Fibonacci sequence and that's a major difference.  Taking a first few elements with `.take` is just for testing (assertion) purposes and not part of the use case.  When solving a problem like "Find the least n such that \sum_{k=1}^{n} fib(k) >= 1000", `take` wouldn't work optimally.

> The problem with "API complication" is inconsistency. Like, a newcomer may ask: Why `Enumerator.from` has "this handy `drop: 2` initial arg", and `each` don't? Use cases could exist, too!

I understand that sentiment, but there's no surprise that a factory/constructor method of a dedicated class often takes many tunables while individual instance methods do not.  If people all said they need it as a generic feature, it wouldn't be a bad idea to me to consider adding something like Enumerable#skip(n) that would return an offset enumerator.


----------------------------------------
Feature #14781: Enumerator#generate
https://bugs.ruby-lang.org/issues/14781#change-74506

* Author: zverok (Victor Shepelev)
* Status: Feedback
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
This is alternative proposal to `Object#enumerate` (#14423), which was considered by many as a good idea, but with unsure naming and too radical (`Object` extension). This one is _less_ radical, and, at the same time, more powerful.

**Synopsys**: 
* `Enumerator.generate(initial, &block)`: produces infinite sequence where each next element is calculated by applying block to previous; `initial` is first sequence element;
* `Enumerator.generate(&block)`: the same; first element of sequence is a result of calling the block with no args.

This method allows to produce enumerators replacing a lot of common `while` and `loop` cycles in the same way `#each` replaces `for`.

**Examples:**

With initial value

```ruby
# Infinite sequence
p Enumerator.generate(1, &:succ).take(5)
# => [1, 2, 3, 4, 5]

# Easy Fibonacci
p Enumerator.generate([0, 1]) { |f0, f1| [f1, f0 + f1] }.take(10).map(&:first)
#=> [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

require 'date'

# Find next Tuesday
p Enumerator.generate(Date.today, &:succ).detect { |d| d.wday == 2 }
# => #<Date: 2018-05-22 ((2458261j,0s,0n),+0s,2299161j)>

# Tree navigation
# ---------------
require 'nokogiri'
require 'open-uri'

# Find some element on page, then make list of all parents
p Nokogiri::HTML(open('https://www.ruby-lang.org/en/'))
  .at('a:contains("Ruby 2.2.10 Released")')
  .yield_self { |a| Enumerator.generate(a, &:parent) }
  .take_while { |node| node.respond_to?(:parent)  }
  .map(&:name)
# => ["a", "h3", "div", "div", "div", "div", "div", "div", "body", "html"]

# Pagination
# ----------
require 'octokit'

Octokit.stargazers('rails/rails')
# ^ this method returned just an array, but have set `.last_response` to full response, with data
# and pagination. So now we can do this:
p Enumerator.generate(Octokit.last_response) { |response| 
    response.rels[:next].get                         # pagination: `get` fetches next Response
  } 
  .first(3)                                          # take just 3 pages of stargazers
  .flat_map(&:data)                                  # `data` is parsed response content (stargazers themselves)
  .map { |h| h[:login] }
# => ["wycats", "brynary", "macournoyer", "topfunky", "tomtt", "jamesgolick", ...
```

Without initial value

```ruby
# Random search
target = 7
p Enumerator.generate { rand(10) }.take_while { |i| i != target }.to_a
# => [0, 6, 3, 5,....]

# External while condition
require 'strscan'
scanner = StringScanner.new('7+38/6')
p Enumerator.generate { scanner.scan(%r{\d+|[-+*/]}) }.slice_after { scanner.eos? }.first
# => ["7", "+", "38", "/", "6"]

# Potential message loop system:
Enumerator.generate { Message.receive }.take_while { |msg| msg != :exit }
```

**Reference implementation**: https://github.com/zverok/enumerator_generate

I want to **thank** all peers that participated in the discussion here, on Twitter and Reddit.

---Files--------------------------------
enumerator_from.rb (3.16 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>