Feature #21552
closedallow String.strip and similar to take a parameter similar to String.delete
Description
Regrading String.strip (and lstrip, rstrip, and ! versions)
Some text data representations differentiate between what one might call vertical and horizontal white space, and the 'strip' methods currently strip both.
It would be helpful if they had an optional parameter similar to String.delete with a one multi-character selector, so one could do:
t = str.strip " \t"
One can use a regex for this, but this much simpler.
Updated by Dan0042 (Daniel DeLorme) 3 months ago
Agreed. I tend to use str.sub(/[\ \t]+\z/,'') for this, but an end-anchored regexp has pretty bad worst-case performance. Try to benchmark the previous when str = " "*1000+"a" 😦
Updated by mame (Yusuke Endoh) about 1 month ago
- Related to Feature #7845: Strip doesn't handle unicode space characters in ruby 1.9.2 & 1.9.3 (does in 1.9.1) added
Updated by shugo (Shugo Maeda) 14 days ago
I just heard someone ask for a strip function that doesn't remove NUL characters.
Since Python's str.strip takes an optional argument, it might be a good idea to introduce a similar feature.
I've created a pull request at https://github.com/ruby/ruby/pull/15400 and here's a benchmark result:
voyager:ruby$ cat benchmark_strip.rb (git)-[feature/allow-strip-to-take[0/1816]
require "benchmark"
TARGET = " \t\r\n\f\v\0" + "x" * 1024 + "\0 \t\r\n\f\v"
Benchmark.bmbm do |x|
x.report("strip") do
10000.times do
TARGET.strip
end
end
x.report("gsub") do
10000.times do
TARGET.gsub(/\A\s+|\s+\z/, "")
end
end
x.report('strip(" \t\r\n\f\v")') do
10000.times do
TARGET.strip(" \t\r\n\f\v")
end
end
end
voyager:ruby$ ./tool/runruby.rb benchmark_strip.rb (git)-[feature/allow-strip-to-take-chars]
Rehearsal --------------------------------------------------------
strip 0.005475 0.000065 0.005540 ( 0.005546)
gsub 0.022467 0.000000 0.022467 ( 0.022470)
strip(" \t\r\n\f\v") 0.004772 0.000000 0.004772 ( 0.004773)
----------------------------------------------- total: 0.032779sec
user system total real
strip 0.000759 0.000961 0.001720 ( 0.001720)
gsub 0.019911 0.000000 0.019911 ( 0.019912)
strip(" \t\r\n\f\v") 0.004958 0.000000 0.004958 ( 0.004961)
Updated by shugo (Shugo Maeda) 14 days ago
Suggested by nobu, I've added documentation and tests for character selectors: https://github.com/ruby/ruby/pull/15400/commits/a9ad44007dbb0ea543ce1eb8748edd4213083c5f
Exmaples:
"012abc345".strip("0-9") # "abc"
"012abc345".strip("^a-z") # "abc"
Unlike String#delete, the current implementation doesn't take multiple arguments.
I'm not sure whether there's a use case for it.
Updated by shugo (Shugo Maeda) 13 days ago
shugo (Shugo Maeda) wrote in #note-4:
Unlike String#delete, the current implementation doesn't take multiple arguments.
I'm not sure whether there's a use case for it.
I've noticed that String#count also take multiple selectors, so I've applied the same changes to String#strip etc. for consistency.
Updated by mame (Yusuke Endoh) 13 days ago
I'm not strongly opposed, but this kind of API that use a string to represent a collection of characters feel outdated. It is sometimes convenient, though.
Updated by KitaitiMakoto (真 北市) 13 days ago
· Edited
Thank you, shugo.
"someone" he says is me. My use case is here.
I want to extract chunks from a file and pass them to a neural network model to detect the file type. The model requires two chunks: the lstripped beggining portion and the rstripped ending portion, except that null characters must not be stripped. It's useful if I can call:
beg_portion.lstrip("\t\n\v\f\r ") # ["\t", "\n", "\v," "\f," "\r", " "] or `/\s/` is preferred?
end_portion.rstrip("\t\n\v\f\r ")
I'm not sure why the model requires such chunks, but I guess it was trained in Python framework and Python's strip family doesn't strip null characters by default.
As an aside, I was surprised when I saw null characters were stripped by lstrip and rstrip because I'm familiar with Regexp's \s as "whitespace", though the String's documentation explains what is "whitespace". It might be a signal to notice what characters are stripped if the methods accept the argument.
Tips:
For the case of str = " "*1000+"a", reverseing it gets faster than using \s+\z:
str.sub(/\A\s+/, "").reverse.sub(/\A\s+/, "").reverse
But, if many poeple use the trick just for speed, I don't hope such situation.
Updated by shugo (Shugo Maeda) 8 days ago
tr_setup_table_multi() was called twice in String#{strip,strip!}, so I've fixed it: https://github.com/ruby/ruby/pull/15400/commits/c9cb93f201644cd5e2fbbd6e83cf50acb27642de
Benchmark¶
https://gist.github.com/shugo/c6367f4139bc2d8df9f9199c49cbbcdf
Rehearsal -----------------------------------------------------------------------------
strip() 0.006303 0.001084 0.007387 ( 0.007409)
lstrip("\0 \t-\r") 0.003104 0.000000 0.003104 ( 0.003106)
sub(/\A[\0\s]+/, "") 0.004521 0.000000 0.004521 ( 0.004522)
rstrip("\0 \t-\r") 0.003187 0.000000 0.003187 ( 0.003188)
sub(/[\0\s]+\z/, "") 0.016442 0.000000 0.016442 ( 0.016448)
strip("\0 \t-\r") 0.003774 0.000000 0.003774 ( 0.003781)
gsub(/\A[\0\s]+|[\0\s]+\z/, "") 0.022400 0.000000 0.022400 ( 0.022404)
sub(/\A[\0\s]+/, "").sub(/[\0\s]+\z/, "") 0.016304 0.000000 0.016304 ( 0.016320)
-------------------------------------------------------------------- total: 0.077119sec
user system total real
strip() 0.001528 0.000000 0.001528 ( 0.001527)
lstrip("\0 \t-\r") 0.002598 0.000000 0.002598 ( 0.002599)
sub(/\A[\0\s]+/, "") 0.004651 0.000000 0.004651 ( 0.004657)
rstrip("\0 \t-\r") 0.003305 0.000000 0.003305 ( 0.003306)
sub(/[\0\s]+\z/, "") 0.014502 0.000000 0.014502 ( 0.014502)
strip("\0 \t-\r") 0.003664 0.000000 0.003664 ( 0.003664)
gsub(/\A[\0\s]+|[\0\s]+\z/, "") 0.022062 0.000000 0.022062 ( 0.022077)
sub(/\A[\0\s]+/, "").sub(/[\0\s]+\z/, "") 0.017203 0.000000 0.017203 ( 0.017207)
Updated by Eregon (Benoit Daloze) 7 days ago
This sounds like a lot of complexity for one specific use-case, which already has a good solution with sub.
From the benchmarks, lstrip("\0 \t-\r") and sub(/\A[\0\s]+/, "") are pretty close.
sub(/[\0\s]+\z/, "") is slower than rstrip("\0 \t-\r"), but that sounds more like something that could/should be optimized in the regexp engine (and would benefit far more cases than this specific one).
Updated by Eregon (Benoit Daloze) 7 days ago
Eregon (Benoit Daloze) wrote in #note-9:
but that sounds more like something that could/should be optimized in the regexp engine
To substantiate that:
$ ruby -rbenchmark/ips -e 'SPACES = ["\0", *("\t".."\r"), " "].join; TARGET = SPACES + "x" * 1024 + SPACES; r=nil; Benchmark.ips { _1.report { r = TARGET.sub(/[\0\s]+\z/, "") } }'
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [x86_64-linux]
Warming up --------------------------------------
7.106k i/100ms
Calculating -------------------------------------
71.778k (± 1.8%) i/s (13.93 μs/i) - 362.406k in 5.050632s
$ ruby -rbenchmark/ips -e 'SPACES = ["\0", *("\t".."\r"), " "].join; TARGET = SPACES + "x" * 1024 + SPACES; r=nil; Benchmark.ips { _1.report { r = TARGET.sub(/[\0\s]+\z/, "") } }'
truffleruby 33.0.0-dev-bb226b84 (2025-12-01), like ruby 3.3.7, Oracle GraalVM Native [x86_64-linux]
Warming up --------------------------------------
475.108k i/100ms
Calculating -------------------------------------
25.222M (± 4.5%) i/s (39.65 ns/i) - 125.904M in 5.008875s
Updated by Eregon (Benoit Daloze) 7 days ago
Also in practice you'd probably want to use sub! to mutate in place if a big String.
That would avoid a copy, since CRuby doesn't do lazy substrings which don't share the same end.
Updated by matz (Yukihiro Matsumoto) 7 days ago
I accept the proposal.
Matz.
Updated by shugo (Shugo Maeda) 7 days ago
- Status changed from Open to Closed
Applied in changeset git|c76ba839b153805f0498229284fea1a809308dbc.
Allow String#strip etc. to take optional character selectors
[Feature #21552]
Co-Authored-By: Claude [email protected]