From: eregontp@...
Date: 2018-02-20T14:13:43+00:00
Subject: [ruby-core:85693] [Ruby trunk Bug#14400] IO#ungetc and IO#ungetbyte documentation is inconsistent with the behavior

Issue #14400 has been updated by Eregon (Benoit Daloze).


Could the buffer grow automatically to allow pushing multiple bytes/characters back?
That's the current implementation in TruffleRuby and Rubinius, the buffer grows on ungetbyte/ungetc if there is not enough space.
Then the semantics for the user would be much simpler.

----------------------------------------
Bug #14400: IO#ungetc and IO#ungetbyte documentation is inconsistent with the behavior
https://bugs.ruby-lang.org/issues/14400#change-70497

* Author: Eregon (Benoit Daloze)
* Status: Feedback
* Priority: Normal
* Assignee: akr (Akira Tanaka)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-01-25 trunk 62035) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
The documentation of IO#ungetc states:

> Pushes back one character (passed as a parameter) onto ios, such that a
> subsequent buffered character read will return it. Only one character may be
> pushed back before a subsequent read operation (that is, you will be able to
> read only the last of several characters that have been pushed back). Has no
> effect with unbuffered reads (such as IO#sysread).

And similar for IO#ungetbyte:

> Pushes back bytes (passed as a parameter) onto ios, such that a
> subsequent buffered read will return it. Only one byte may be pushed back
> before a subsequent read operation (that is, you will be able to read only the
> last of several bytes that have been pushed back). Has no effect with
> unbuffered reads (such as IO#sysread).

The part about only one byte/character is inconsistent with the actual behavior,
most notably because both of these methods accept a String with multiple characters as argument.

~~~ ruby
STDIN.ungetc "Hello World!"
STDIN.read 12 #=> "Hello World!"

STDIN.ungetbyte "Foo Bar"
STDIN.read 7 #=> "Foo Bar"
~~~
(There are even specs for it:
https://github.com/ruby/spec/blob/7fa22023d69620ea3ff4d0ed2eb71fd7b02dd950/core/io/ungetc_spec.rb#L98
https://github.com/ruby/spec/blob/7fa22023d69620ea3ff4d0ed2eb71fd7b02dd950/core/io/ungetbyte_spec.rb#L21)

> that is, you will be able to read only the last of several characters that have been pushed back

is contradicting what happens.

The behavior with large Strings is confusing.
It seems to allow arbitrarily large strings (but only if there was not a ungetbyte already/the buffer was empty?).

~~~
$ pry
[1] pry(main)> STDIN.ungetbyte "a"*10_000
=> nil
[2] pry(main)> STDIN.ungetbyte "a"*10_000
IOError: ungetbyte failed

$ pry
[1] pry(main)> STDIN.ungetbyte "a"*100_000
=> nil
[2] pry(main)> STDIN.ungetbyte "a"*100_000
IOError: ungetbyte failed
from (pry):2:in `ungetbyte'

$ pry
[1] pry(main)> STDIN.ungetbyte "a"*100_000
=> nil
[2] pry(main)> STDIN.read(100_000).size
=> 100000
[3] pry(main)> STDIN.ungetbyte "a"*100_000
=> nil
[4] pry(main)> STDIN.read(100_000).size
=> 100000
~~~

And it's not as simple as if two consecutive ungetbyte were forbidden:

~~~
$ pry
[1] pry(main)> STDIN.ungetbyte "a"*10_000_000
=> nil
[2] pry(main)> STDIN.ungetbyte "a"
IOError: ungetbyte failed
from (pry):2:in `ungetbyte'

$ pry
[1] pry(main)> STDIN.ungetbyte "a"
=> nil
[2] pry(main)> STDIN.ungetbyte "a"
=> nil
~~~

So how are those methods supposed to behave?
Can the documentation be updated to match the behavior and/or the behavior be fixed to be simpler?

I also wonder when those methods are useful.
There seems to be very few usages in the stdlib.
Maybe they should just be removed?
It seems easy to make a custom IO wrapper/buffer supporting pushing characters/bytes back.


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>