Project

General

Profile

Actions

Feature #21552

closed

allow String.strip and similar to take a parameter similar to String.delete

Feature #21552: allow String.strip and similar to take a parameter similar to String.delete
1

Added by MSP-Greg (Greg L) 4 months ago. Updated 7 days ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:123063]

Description

Regrading String.strip (and lstrip, rstrip, and ! versions)

Some text data representations differentiate between what one might call vertical and horizontal white space, and the 'strip' methods currently strip both.

It would be helpful if they had an optional parameter similar to String.delete with a one multi-character selector, so one could do:

t = str.strip " \t"

One can use a regex for this, but this much simpler.


Related issues 1 (1 open0 closed)

Related to Ruby - Feature #7845: Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1)OpenActions

Updated by Dan0042 (Daniel DeLorme) 3 months ago Actions #1 [ruby-core:123233]

Agreed. I tend to use str.sub(/[\ \t]+\z/,'') for this, but an end-anchored regexp has pretty bad worst-case performance. Try to benchmark the previous when str = " "*1000+"a" 😦

Updated by mame (Yusuke Endoh) about 1 month ago Actions #2

  • Related to Feature #7845: Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1) added

Updated by shugo (Shugo Maeda) 14 days ago Actions #3 [ruby-core:124019]

I just heard someone ask for a strip function that doesn't remove NUL characters.
Since Python's str.strip takes an optional argument, it might be a good idea to introduce a similar feature.

I've created a pull request at https://github.com/ruby/ruby/pull/15400 and here's a benchmark result:

voyager:ruby$ cat benchmark_strip.rb                                          (git)-[feature/allow-strip-to-take[0/1816]
require "benchmark"

TARGET = " \t\r\n\f\v\0" + "x" * 1024 + "\0 \t\r\n\f\v"

Benchmark.bmbm do |x|
  x.report("strip") do
    10000.times do
      TARGET.strip
    end
  end

  x.report("gsub") do
    10000.times do
      TARGET.gsub(/\A\s+|\s+\z/, "")
    end
  end

  x.report('strip(" \t\r\n\f\v")') do
    10000.times do
      TARGET.strip(" \t\r\n\f\v")
    end
  end
end
voyager:ruby$ ./tool/runruby.rb benchmark_strip.rb                            (git)-[feature/allow-strip-to-take-chars]
Rehearsal --------------------------------------------------------
strip                  0.005475   0.000065   0.005540 (  0.005546)
gsub                   0.022467   0.000000   0.022467 (  0.022470)
strip(" \t\r\n\f\v")   0.004772   0.000000   0.004772 (  0.004773)
----------------------------------------------- total: 0.032779sec

                           user     system      total        real
strip                  0.000759   0.000961   0.001720 (  0.001720)
gsub                   0.019911   0.000000   0.019911 (  0.019912)
strip(" \t\r\n\f\v")   0.004958   0.000000   0.004958 (  0.004961)

Updated by shugo (Shugo Maeda) 14 days ago Actions #4 [ruby-core:124021]

Suggested by nobu, I've added documentation and tests for character selectors: https://github.com/ruby/ruby/pull/15400/commits/a9ad44007dbb0ea543ce1eb8748edd4213083c5f

Exmaples:

"012abc345".strip("0-9") # "abc"
"012abc345".strip("^a-z") # "abc"

Unlike String#delete, the current implementation doesn't take multiple arguments.
I'm not sure whether there's a use case for it.

Updated by shugo (Shugo Maeda) 13 days ago Actions #5 [ruby-core:124031]

shugo (Shugo Maeda) wrote in #note-4:

Unlike String#delete, the current implementation doesn't take multiple arguments.
I'm not sure whether there's a use case for it.

I've noticed that String#count also take multiple selectors, so I've applied the same changes to String#strip etc. for consistency.

Updated by mame (Yusuke Endoh) 13 days ago Actions #6 [ruby-core:124035]

I'm not strongly opposed, but this kind of API that use a string to represent a collection of characters feel outdated. It is sometimes convenient, though.

Updated by KitaitiMakoto (真 北市) 13 days ago · Edited Actions #7 [ruby-core:124039]

Thank you, shugo.

"someone" he says is me. My use case is here.

I want to extract chunks from a file and pass them to a neural network model to detect the file type. The model requires two chunks: the lstripped beggining portion and the rstripped ending portion, except that null characters must not be stripped. It's useful if I can call:

beg_portion.lstrip("\t\n\v\f\r ") # ["\t", "\n", "\v," "\f," "\r", " "] or `/\s/` is preferred?
end_portion.rstrip("\t\n\v\f\r ")

I'm not sure why the model requires such chunks, but I guess it was trained in Python framework and Python's strip family doesn't strip null characters by default.

As an aside, I was surprised when I saw null characters were stripped by lstrip and rstrip because I'm familiar with Regexp's \s as "whitespace", though the String's documentation explains what is "whitespace". It might be a signal to notice what characters are stripped if the methods accept the argument.

Tips:
For the case of str = " "*1000+"a", reverseing it gets faster than using \s+\z:

str.sub(/\A\s+/, "").reverse.sub(/\A\s+/, "").reverse

But, if many poeple use the trick just for speed, I don't hope such situation.

Updated by shugo (Shugo Maeda) 8 days ago Actions #8 [ruby-core:124115]

tr_setup_table_multi() was called twice in String#{strip,strip!}, so I've fixed it: https://github.com/ruby/ruby/pull/15400/commits/c9cb93f201644cd5e2fbbd6e83cf50acb27642de

Benchmark

https://gist.github.com/shugo/c6367f4139bc2d8df9f9199c49cbbcdf

Rehearsal -----------------------------------------------------------------------------
strip()                                     0.006303   0.001084   0.007387 (  0.007409)
lstrip("\0 \t-\r")                          0.003104   0.000000   0.003104 (  0.003106)
sub(/\A[\0\s]+/, "")                        0.004521   0.000000   0.004521 (  0.004522)
rstrip("\0 \t-\r")                          0.003187   0.000000   0.003187 (  0.003188)
sub(/[\0\s]+\z/, "")                        0.016442   0.000000   0.016442 (  0.016448)
strip("\0 \t-\r")                           0.003774   0.000000   0.003774 (  0.003781)
gsub(/\A[\0\s]+|[\0\s]+\z/, "")             0.022400   0.000000   0.022400 (  0.022404)
sub(/\A[\0\s]+/, "").sub(/[\0\s]+\z/, "")   0.016304   0.000000   0.016304 (  0.016320)
-------------------------------------------------------------------- total: 0.077119sec

                                                user     system      total        real
strip()                                     0.001528   0.000000   0.001528 (  0.001527)
lstrip("\0 \t-\r")                          0.002598   0.000000   0.002598 (  0.002599)
sub(/\A[\0\s]+/, "")                        0.004651   0.000000   0.004651 (  0.004657)
rstrip("\0 \t-\r")                          0.003305   0.000000   0.003305 (  0.003306)
sub(/[\0\s]+\z/, "")                        0.014502   0.000000   0.014502 (  0.014502)
strip("\0 \t-\r")                           0.003664   0.000000   0.003664 (  0.003664)
gsub(/\A[\0\s]+|[\0\s]+\z/, "")             0.022062   0.000000   0.022062 (  0.022077)
sub(/\A[\0\s]+/, "").sub(/[\0\s]+\z/, "")   0.017203   0.000000   0.017203 (  0.017207)

Updated by Eregon (Benoit Daloze) 7 days ago Actions #9 [ruby-core:124125]

This sounds like a lot of complexity for one specific use-case, which already has a good solution with sub.

From the benchmarks, lstrip("\0 \t-\r") and sub(/\A[\0\s]+/, "") are pretty close.
sub(/[\0\s]+\z/, "") is slower than rstrip("\0 \t-\r"), but that sounds more like something that could/should be optimized in the regexp engine (and would benefit far more cases than this specific one).

Updated by Eregon (Benoit Daloze) 7 days ago Actions #10 [ruby-core:124126]

Eregon (Benoit Daloze) wrote in #note-9:

but that sounds more like something that could/should be optimized in the regexp engine

To substantiate that:

$ ruby -rbenchmark/ips -e 'SPACES = ["\0", *("\t".."\r"), " "].join; TARGET = SPACES + "x" * 1024 + SPACES; r=nil; Benchmark.ips { _1.report { r = TARGET.sub(/[\0\s]+\z/, "") } }'
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux]
Warming up --------------------------------------
                         7.106k i/100ms
Calculating -------------------------------------
                         71.778k (± 1.8%) i/s   (13.93 μs/i) -    362.406k in   5.050632s

$ ruby -rbenchmark/ips -e 'SPACES = ["\0", *("\t".."\r"), " "].join; TARGET = SPACES + "x" * 1024 + SPACES; r=nil; Benchmark.ips { _1.report { r = TARGET.sub(/[\0\s]+\z/, "") } }'   
truffleruby 33.0.0-dev-bb226b84 (2025-12-01), like ruby 3.3.7, Oracle GraalVM Native [x86_64-linux]
Warming up --------------------------------------
                       475.108k i/100ms
Calculating -------------------------------------
                         25.222M (± 4.5%) i/s   (39.65 ns/i) -    125.904M in   5.008875s

Updated by Eregon (Benoit Daloze) 7 days ago Actions #11 [ruby-core:124127]

Also in practice you'd probably want to use sub! to mutate in place if a big String.
That would avoid a copy, since CRuby doesn't do lazy substrings which don't share the same end.

Updated by matz (Yukihiro Matsumoto) 7 days ago Actions #12 [ruby-core:124130]

I accept the proposal.

Matz.

Updated by shugo (Shugo Maeda) 7 days ago Actions #13

  • Status changed from Open to Closed

Applied in changeset git|c76ba839b153805f0498229284fea1a809308dbc.


Allow String#strip etc. to take optional character selectors

[Feature #21552]

Co-Authored-By: Claude

Actions

Also available in: PDF Atom