0% found this document useful (0 votes)
29 views

C++ - How Do I Iterate Over The Words of A String - Stack Overflow

This code provides two methods to split a string into a vector of substrings based on a delimiter. The first method puts the results into a pre-constructed vector, while the second returns a new vector. Both iterate through the string using getline to extract substrings between the delimiter into the output vector.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

C++ - How Do I Iterate Over The Words of A String - Stack Overflow

This code provides two methods to split a string into a vector of substrings based on a delimiter. The first method puts the results into a pre-constructed vector, while the second returns a new vector. Both iterate through the string using getline to extract substrings between the delimiter into the output vector.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

How do I iterate over the words of a string?

Asked 15 years, 5 months ago Modified 9 months ago Viewed 2.4m times

How do I iterate over the words of a string composed of words separated by whitespace?

Note that I'm not interested in C string functions or that kind of character manipulation/access. I prefer
3356
elegance over efficiency. My current solution:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
string s = "Somewhere down the road";
istringstream iss(s);

do {
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}

c++ string split

Share Improve this question Follow edited Jul 4, 2022 at 21:01 community wiki
28 revs, 15 users 28%
Ashwin Nanjappa

690 Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C
functions and quick methods to accomplish anything just because it is not contained within a template ;)
– user19302 Oct 25, 2008 at 9:04

19 while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; } – isekaijin
Sep 29, 2009 at 15:47

@nlaq, Except that you'd have to convert your string object using c_str(), and back to a string again if you still
needed it to be a string, no? – Aaron H. Feb 15, 2011 at 0:00

28 @Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value,
i.e. string sub; while (iss >> sub) cout << "Substring: " << sub << '\n'; – Tony Delroy Apr 11,
2012 at 2:24

14 Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split – hB0 Oct 31, 2013 at
0:23

27 There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility.
IMHO Elegance is not a proxy for efficiency but maintainability. – Matt Mar 31, 2017 at 13:22

4 Most of the answers here are notably latin-centric. Many of the answers assume a single character can be used
as 'whitespace' even though the question defines the delimiter to be whitespace. Unicode has at least 25
whitespace characters. But word-delimiting is not merely a whitespace issue. For instance, in syllabic writing,
such as Tibetan, word delimitation is a semantic, rather than syntactic, problem. Therefore, using whitespace to
extract words is not a suitable approach for many languages. – Konchog Oct 29, 2018 at 12:08
Small addition to the above. You can add a locale facet that treats punctuation as space so you don't need to
handle that separately. codereview.stackexchange.com/a/57467/507 – Martin York Feb 20, 2019 at 21:26

83 Answers Sorted by: Highest score (default)

1 2 3 Next

I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second
returns a new vector.
2584
#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template <typename Out>


void split(const std::string &s, char delim, Out result) {
std::istringstream iss(s);
std::string item;
while (std::getline(iss, item, delim)) {
*result++ = item;
}
}

std::vector<std::string> split(const std::string &s, char delim) {


std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is
empty:

std::vector<std::string> x = split("one:two::three", ':');

Share Improve this answer Follow edited Jan 31, 2023 at 0:09 community wiki
22 revs, 17 users 38%
Evan Teran

elegant solution, I always forget about this particular "getline", thou I do not believe it is aware of quotes and
escape sequences. – boskom May 27, 2010 at 13:32

@stijn: are you saying that split("one two three", ' '); returns a vector with 4 elements? I'm not sure
that is the case, but I'll test it. – Evan Teran Nov 9, 2010 at 15:45

wait, it seems the formatting removed some spaces (or I forgot them): I'm talking about the string "one two three"
with 2 spaces between "two" and "three" – stijn Nov 9, 2010 at 18:54

2 I liked this solution, however, I wrapped the function in a template, changing the vectors std::string template
parameter into a parameter. For me, I also used boost::lexical_cast on said template parameter in the push_back.
– Kit10 Aug 9, 2012 at 19:30

How can I modify it to work with std::wstring, std::getline won't work right? – キキジキ Nov 19, 2012 at 9:09
1 std::getline is templated, so it may "just work", if not see
en.cppreference.com/w/cpp/string/basic_string/getline to figure out how to tweak it. Passing a wchar_t character
as the delim may be enough to trigger the right template. – Evan Teran Nov 19, 2012 at 16:29
if you are enabling return value optimization, can't you make the function to return void? – Rozuur Jul 10, 2013 at
14:52

96 In order to avoid it skipping empty tokens, do an empty() check: if (!item.empty())


elems.push_back(item) – David G Nov 9, 2013 at 22:33

13 How about the delim contains two chars as -> ? – herohuyongtao Dec 26, 2013 at 8:15

9 @herohuyongtao, this solution only works for single char delimiters. – Evan Teran Dec 27, 2013 at 6:11

1 @Copperpot How did you do it in a template? – loop Jan 12, 2014 at 23:02

2 @EvanTeran This may be not regarding splitting the string but general doubt in your code, The elems you are
passing as an reference argument and returning the reference again. I just wanted to know is there any reason for
that? – duslabo Jan 25, 2014 at 17:27

4 @JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this:
f(split(s, d, v)) while still having the benefit of a pre-allocated vector if you like. – Evan Teran Jan 25,
2014 at 17:50

10 Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value. – dshin Sep 9, 2015 at
19:04

3 almost perfect: split(":abc:def:", ':'); returns only 3 instead of 4 elements! – fmuecke Sep 9, 2015 at
20:31

Being able to set max number of returned elements is crucial to me. – Jonny Oct 29, 2015 at 1:25

1 @Jonny, should be trivial, just add an extra condition to the while loop comparing the vector 's size to the max.
Something like this: while (elems.size() < max_count && std::getline(ss, item, delim)) {
– Evan Teran Oct 29, 2015 at 5:57

@Jonny, I see. Your answer looks a bit more complex than necessary. If you make the max default to something
like size_t(-1) , that will effectively be "infinity" (it's the biggest size your system can represent, so you'll run out
of RAM before you hit this). Then you can make the condition as simple as my comment above. No more need to
double check the stream state and do a second read and such. Just a suggestion :-). – Evan Teran Oct 29, 2015
at 6:02

Might be wrong but you might lose the end of the string with that. Well basically I mimic the explode function of
php, or so I believe. – Jonny Oct 29, 2015 at 6:08

Gotcha. My solution will stop at max_count , skipping the rest of the string (since it found the amount it wanted). I
guess you are looking for something that will always make the last one the rest of the string. I have some functions
like that too here: github.com/eteran/cpp-utilities/blob/master/string.h Some are specifically designed to match
php's string manipulation functions as closely as possible :-) – Evan Teran Oct 29, 2015 at 6:21

Why not return split(s, delim, std::vector<std::string>()); ? – Gabriel Oct 29, 2015 at 19:53

2 @Gabriel, you could. But I think when it was written (a few years ago), having a named variable encouraged
NVRO more reliably. With C++11 move semantics, it may be a lot less of a difference. – Evan Teran Oct 30, 2015
at 3:16

be aware that if you are using OpenCV, split can be confused with split from OpenCV that splits images. – Diedre
Jun 20, 2017 at 16:07

3 I really wish they'd add a standard method with this signature: vector<string> std::string::split(char
delimiter = ' '); – doctorram Feb 2, 2018 at 22:26

@loop See gitlab.com/tbeu/wcx_setfolderdate/blob/master/src/splitstring.h for a templated implementation. – tbeu


Jul 7, 2019 at 20:56

@tbeu fixing your link: gitlab.com/tbeu/wcx_setfolderdate/-/blob/master/src/… – luizfls Mar 20, 2020 at 4:17

As others noted this does not correctly handle emtpy strings at the end. (This is not a matter of definition since
"a,b," and "a,b" both give the same result.) This can be fixed by initializing iss with s + delim and handling the
special case that an empty strig should return an empty list explicitly. – Johannes Overmann Nov 11, 2021 at
23:58
For what it's worth, here's another way to extract tokens from an input string, relying only on standard
library facilities. It's an example of the power and elegance behind the design of the STL.
1509
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container,
using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
istream_iterator<string>{}};

Share Improve this answer Follow edited Aug 2, 2019 at 11:18 community wiki
9 revs, 9 users 70%
Zunino

178 Is it possible to specify a delimiter for this? Like for instance splitting on commas? – l3dx Aug 6, 2009 at 11:49

7 @l3dx: it seems that the parameter "\n" is the delimiter. This code is very nice, but I would like to know better
about it. Maybe somebody could explain each line of that snippet? – Jonathan Dec 11, 2009 at 17:30

17 @Jonathan: \n is not the delimiter in this case, it's the deliminer for outputting to cout. – huy Feb 3, 2010 at 12:37

1 based on this: cplusplus.com/reference/algorithm/copy no. The whitespace behavior is a function of the


istream_iterator . It would be more elegant to roll your own. – Wayne Werner Aug 4, 2010 at 17:59

5 @graham.reeds, @l3dx: Please don't write another CSV parser which can't handle quoted fields:
en.wikipedia.org/wiki/Comma-separated_values – Douglas Sep 1, 2010 at 9:30

803 This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.
– SmallChess Jan 10, 2011 at 3:57

12 To people asking how this works: equivalent code using less of the STL would look like string token;
istringstream iss(sentence); while (iss >> token) { cout << token; } or {
tokens.push_back(token); } – user470379 Feb 7, 2011 at 5:11

Why do I get "error C2664: 'std::back_inserter' : cannot convert parameter 1 from 'std::vector<_Ty> (__cdecl *)
(void)' to 'std::vector<_Ty> &'" in VS2008? – szx Apr 17, 2011 at 10:22

The template argument to back_inserter should be string , not vector<string> . That is, it should be
back_inserter<string>(tokens) , not back_inserter<vector<string>>(tokens) . – Sarfaraz Nawaz
May 27, 2012 at 14:56

2 Take a look at ranges if you care about elegance in practical terms (i.e. do more with less code):
slideshare.net/rawwell/iteratorsmustgo – Alexei Sholik Oct 17, 2012 at 18:27

44 Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype
facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the
stringstream with that locale before extracting strings. – Jerry Coffin Dec 19, 2012 at 20:30

1 The main purpose of istream_iterator is it can parse int, float, double, etc from an istream:
istream_iterator<double> does a decent job reading doubles separated by space. With a front or especially back
inserter it's a great combo! :) – Oleg Jan 11, 2013 at 2:48

5 vector has a ctor that takes a begin and end iterator, so no need for the copy call to insert them into a
container. – legends2k Jan 13, 2013 at 18:41

67 @Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm,
doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.
– Christian Rau Feb 7, 2013 at 15:08

@Nawaz Why should it? You're inserting into a std::vector<std::string> and not into a std::string .
But then again, there shouldn't be an explicit template argument, anyway (well, there shouldn't even be a
back_inserter or copy , but ok). – Christian Rau Feb 7, 2013 at 15:12

@ChristianRau: Oh you're right; the first code-snippet probably confused me. Actually I should have said you
don't need to mention the template argument in std::back_inserter ; in fact, mentioning template argument
defies the very purpose of back_inserter . – Sarfaraz Nawaz Feb 7, 2013 at 16:30

2 why do you need to use curly brackets in vector<string> tokens{istream_iterator<string>{iss},


istream_iterator<string>{}}; is it because otherwise it looks like function call? – stewart99 Jan 7, 2014 at
5:06

Questions: 1. why would istream_iterator stop at white spaces? For me spaces are also part of the string;
2. why is it very inefficient? – Ziyuan Apr 22, 2015 at 12:23

14 The elegance in needing 5 includes, 3 lines (not counting using <namespace> and quite cryptic code to... split
a string? dear god. – Michahell Apr 22, 2015 at 15:42

1 We could also have used STL to split a string. – Moiz Sajid Aug 30, 2015 at 11:31

This is much faster than Evan Teran's answer if you only need to split on whitespace. – noɥʇʎԀʎzɐɹƆ Jul 7, 2016
at 15:23

While the missing delimiter concern is correct one should take into account that the OPs solution couldn't handle
that either. So this seems to be not a requirement. – exilit Jul 21, 2016 at 20:40

@doorfly The only place where curly brackets are needed is istream_iterator<string>{} , because that
would otherwise be regarded as a function. – Seppo Enarvi Feb 28, 2017 at 20:31

If using wstring and your code breaks, check this answer for fixing the istream_iterator usage with wchar_t :
stackoverflow.com/a/20959347/3543437 – kayleeFrye_onDeck Jul 3, 2018 at 20:44

@l3dx Yes. You can add a specialized local to the stream that makes a , a space (and all other characters not a
space). Then the code will work just the same. codereview.stackexchange.com/a/57467/507 – Martin York Feb
20, 2019 at 21:30

1 This code could really use some comments to explain what the purpose of every item is. A typical person asking
this question is only going to end up with more questions after reading this, e.g. what the purpose of the empty
istream_iterator is, or why the "create the vector directly" solution has so many brackets. – user2088639 Oct 14,
2019 at 21:17

1 I don't think there is any power or elegance in this, compared to just std::string::split() . Of course there
is not such split in STL – tjysdsg May 14, 2020 at 12:00

You can set the delimiter of istringstream stackoverflow.com/a/21814768/1943599 – Mellester Jul 2, 2020 at
17:44
A possible solution using Boost might be:

874 #include <boost/algorithm/string.hpp>


std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

This approach might be even faster than the stringstream approach. And since this is a generic
template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of
delimiters.

See the documentation for details.

Share Improve this answer Follow edited Aug 3, 2015 at 23:20 community wiki
3 revs, 3 users 67%
ididak

37 Speed is irrelevant here, as both of these cases are much slower than a strtok-like function. – Tom Mar 1, 2009 at
16:51

4 This is practical and quick enough if you know the line will contain just a few tokens, but if it contains many then
you will burn a ton of memory (and time) growing the vector. So no, it's not faster than the stringstream solution --
at least not for large n, which is the only case where speed matters. – j_random_hacker Aug 24, 2009 at 9:02

55 And for those who don't already have boost... bcp copies over 1,000 files for this :) – Roman Starkov Jun 9, 2010
at 20:12

14 Warning, when given an empty string (""), this method return a vector containing the "" string. So add an "if
(!string_to_split.empty())" before the split. – Offirmo Oct 11, 2011 at 13:10

29 @Ian Embedded developers aren't all using boost. – ACK_stoverflow Jan 31, 2012 at 18:23

12 @ACK_stoverflow are embedded developers using C++ anyway? – WDRust Feb 25, 2012 at 8:10

3 bcp 'ing this brings forth libraries such as the MPL, which I think is really hardly needed to split text. Man it is a
PITA... – Luis Machuca Mar 20, 2012 at 18:24

3 @j_random_hacker: "at least not for large n, which is the only case where speed matters" - also for smallish n in a
large-n loop... – Tony Delroy Apr 11, 2012 at 2:29

6 @tuxSlayer: various POSIX/XOPEN/UNIX standards also specify strtok_r – Tony Delroy Apr 11, 2012 at 2:31

1 @TonyDelroy: Yeah, and it looks like in msvc it is called strtok_s (meaning safe?:)). Not too portable... – tuxSlayer
Apr 11, 2012 at 8:05

1 @tuxSlayer: if you'd prefer to write your own implementation instead of have a five line #if / #else / #endif
then knock yourself out.... – Tony Delroy Jun 6, 2012 at 15:43

1 Use std::string::find(..) and std::string::substr(..) no need to use boost. – Nils Jul 3, 2012 at 12:49

1 actually in our company we are not allowed to use boost due to security, yeah i know but suits have decided.
– AndersK Aug 27, 2012 at 6:20

33 as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is
standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That
way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of
overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new
nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed
by the cons. – GMasucci May 22, 2013 at 8:19

1 nice it even works for calling of boost framework in xcode (iOS project) in cpp class – user2083364 Aug 21, 2013
at 9:48
6 My personal opinion is that C and C++ are languages not meant to be agile or to provide fast to market solutions,
using Boost is almost the same as choosing an higher level language that offer more abstraction, for those we
choose Java, C#, etc... Because for those we don't care for exactly what it's doing beneath the hood. Using Boost
would also mean that I would have to tell my client that I'm including a third party library. Thanks anyway. :) – Tiago
May 4, 2015 at 10:55

Can the boost::split really work on the utf-8 string? Can you share any documentation for that? I am trying to split a
utf-8 string at newlines. Will the boost::split work correctly if the string that I pass is using utf-8 encoding? – sajas
Mar 2, 2016 at 12:58

@Andrew: any_of has been part of the standard library since 2011:
en.cppreference.com/w/cpp/algorithm/all_any_none_of – graham.reeds May 24, 2017 at 10:56

#include <vector>
#include <string>
#include <sstream>
413
int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream

std::vector<std::string> tokens; // Create vector to hold our words

while (ss >> buf)


tokens.push_back(buf);

return 0;
}

Share Improve this answer Follow edited May 19, 2018 at 10:01 community wiki
2 revs, 2 users 82%
kev

25 You can also split on other delimiters if you use getline in the while condition e.g. to split by commas, use
while(getline(ss, buff, ',')) . – Ali Oct 6, 2018 at 20:20

1 I don't understand how this got 400 upvotes. This is basically the same as in OQ: use a stringstream and >> from
it. Exactly what OP did even in revision 1 of the question history. – Thomas Weller Oct 17, 2022 at 16:46

An efficient, small, and elegant solution using a template function:

201 template <class ContainerT>


void split(const std::string& str, ContainerT& tokens,
const std::string& delimiters = " ", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();

using value_type = typename ContainerT::value_type;


using size_type = typename ContainerT::size_type;

while (lastPos < length + 1)


{
pos = str.find_first_of(delimiters, lastPos);
if (pos == std::string::npos)
pos = length;

if (pos != lastPos || !trimEmpty)


tokens.emplace_back(value_type(str.data() + lastPos,
(size_type)pos - lastPos));
lastPos = pos + 1;
}
}

I usually choose to use std::vector<std::string> types as my second parameter ( ContainerT )... but
list<...> may sometimes be preferred over vector<...> .

It also allows you to specify whether to trim empty tokens from the results via a last optional parameter.

All it requires is std::string included via <string> . It does not use streams or the boost library
explicitly but will be able to accept some of these types.

Also since C++-17 you can use std::vector<std::string_view> which is much faster and more
memory-efficient than using std::string . Here is a revised version which also supports the container
as a return type:

#include <vector>
#include <string_view>
#include <utility>

template < typename StringT,


typename DelimiterT = char,
typename ContainerT = std::vector<std::string_view> >
ContainerT split(StringT const& str, DelimiterT const& delimiters = ' ', bool
trimEmpty = true, ContainerT&& tokens = {})
{
typename StringT::size_type pos, lastPos = 0, length = str.length();

while (lastPos < length + 1)


{
pos = str.find_first_of(delimiters, lastPos);
if (pos == StringT::npos)
pos = length;

if (pos != lastPos || !trimEmpty)


tokens.emplace_back(str.data() + lastPos, pos - lastPos);

lastPos = pos + 1;
}

return std::forward<ContainerT>(tokens);
}

Care has been taken not to make any unneeded copies.

This will allow for either:

for (auto const& line : split(str, '\n'))

Or:

auto& lines = split(str, '\n');

Both returning the default template container type of std::vector<std::string_view> .

To get a specific container type back, or to pass an existing container, use the tokens input parameter
with either a typed initial container or an existing container variable:

auto& lines = split(str, '\n', false, std::vector<std::string>());


Or:

std::vector<std::string> lines;
split(str, '\n', false, lines);

Share Improve this answer Follow edited Jun 14, 2023 at 19:12 community wiki
18 revs, 5 users 90%
Marius

5 I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and
typenames: typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef
typename ValueType::size_type SizeType; Then to substitute out the value_type and size_types
accordingly. – user199973 Nov 28, 2011 at 21:41

13 For those of us for whom the template stuff and the first comment are completely foreign, a usage example
cmplete with required includes would be lovely. – Wes Miller Aug 17, 2012 at 11:51

3 Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited
the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed
(ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as
tokenize( some_string, some_vector ); – Wes Miller Aug 17, 2012 at 14:23

2 Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible
instructions and also as little as possible memory copies enabled by the use of a substring class that only
references offsets/lengths in other strings. (I rolled my own, but there are some other implementations).
Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible.
– Marius Nov 29, 2012 at 14:50

3 That's the correct output for when trimEmpty = true . Keep in mind that "abo" is not a delimiter in this
answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of
characters (I think str.find_first_of should change to str.find_first , but I could be wrong... can't test)
– Marius Aug 28, 2015 at 15:24

1 I had some issues initially, but this does in fact work with wstring / unicode if you update the template
accordingly. Be careful though; i ran into some easy to cause runtime errors that the compiler didn't catch in a
couple different places. – kayleeFrye_onDeck Jun 8, 2018 at 0:24

Your code is not working! Try string = "hih1ihi", substring = "hi". Your code is not giving the correct result. minus.
– Optimus1 Nov 17, 2021 at 14:57

1 @Optimus1 I think you assumed the delimiters parameter is not a character list of delimiters but rather a
substring. Therein lies the rub. – Marius Mar 23, 2022 at 18:30

Here's another solution. It's compact and reasonably efficient:

178 std::vector<std::string> split(const std::string &text, char sep) {


std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}

It can easily be templatised to handle string separators, wide strings, etc.


Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty
strings.

It can also be easily expanded to skip empty tokens:

std::vector<std::string> split(const std::string &text, char sep) {


std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}

If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be
used:

std::vector<std::string> split(const std::string& text, const std::string&


delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;

while((end = text.find_first_of(delims, start)) != std::string::npos)


{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));

return tokens;
}

Share Improve this answer Follow edited Oct 4, 2016 at 22:33 community wiki
13 revs, 7 users 48%
Alec Thomas

10 The first version is simple and gets the job done perfectly. The only change I would made would be to return the
result directly, instead of passing it as a parameter. – gregschlom Jan 19, 2012 at 2:25

3 The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the
vector, or a heap allocation which would then have to be freed. – Alec Thomas Feb 6, 2012 at 18:56

2 A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move
semantics. – Alec Thomas Jun 27, 2013 at 1:20

7 @AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1
anyway; very succinct) – Marcelo Cantos Aug 17, 2013 at 11:54

@Peter M I would rather have it be passed in by reference, just in case the vector<string> got large.
– Alex Spencer Nov 15, 2013 at 19:43

1 @Veritas In what way does it not work if the delimiter is the last character? Also, outputting empty tokens is
intentional, though it could obviously be easily modified to not do that if required. – Alec Thomas Apr 8, 2014 at
15:36
13 Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a
delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11
support punch cards these days? – Spacen Jasset Aug 11, 2015 at 15:15

If you pass in an empty string, it returns a vector with 1 element (empty string). If you pass in a string that's the
same as sep, then it returns a vector with 2 elements (both empty strings). Should have "if (end > 0) {" before the
push_back in while loop and "if (start > 0) {" before push_back below while loop to fix this. – CodeSmile Sep 26,
2015 at 17:12

3 @LearnCocos2D Please don't alter the meaning of a post with an edit. This behaviour is by design. It is identical
behaviour to Python's split operator. I'll add a note to make this clear. – Alec Thomas Sep 27, 2015 at 21:50

3 Suggest using std::string::size_type instead of int, as some compilers might spit out signed/unsigned warnings
otherwise. – Pascal Kesseli Nov 1, 2015 at 20:45

1 the first function in this answer is the best solution - works perfectly with a reverse join function - std::string
strJoin(const std::vector<std::string> v, const char& delimiter) { if(!v.empty()) {
std::stringstream ss; std::string str(1, delimiter); auto it = v.cbegin();
while(true) { ss << *it++; if(it != v.cend()) ss <<
delimiter; else return ss.str(); } } return "";
} – Roman Shestakov Dec 16, 2017 at 15:45

This is my favorite way to iterate through a string. You can do whatever you want per word.

143 string line = "a line of text to iterate through";


string word;

istringstream iss(line, istringstream::in);

while( iss >> word )


{
// Do something on `word` here...
}

Share Improve this answer Follow edited Apr 12, 2018 at 11:37 community wiki
4 revs, 2 users 86%
gnomed

1 Is it possible to declare word as a char ? – abatishchev Jun 26, 2010 at 17:23

1 Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop
through every character in each word. But right now I believe the current loop depends on spaces for word
separation. Unless you know that there is only a single character between every space, in which case you can just
cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++ – gnomed Jun 30,
2010 at 22:18

12 if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try:
stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;
– Wayne Werner Aug 4, 2010 at 18:03

1 I don't understand how this got 140 upvotes. This is basically the same as in OQ: use a stringstream and >> from
it. Exactly what OP did even in revision 1 of the question history. – Thomas Weller Oct 17, 2022 at 16:46

This is similar to Stack Overflow question How do I tokenize a string in C++?. Requires Boost external
library
89
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;

int main(int argc, char** argv)


{
string text = "token test\tstring";

char_separator<char> sep(" \t");


tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t << "." << endl;
}
}

Share Improve this answer Follow edited Oct 10, 2021 at 16:39 community wiki
6 revs, 5 users 76%
Ferruccio

Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current
token? – einpoklum Apr 9, 2018 at 19:47

I like the following because it puts the results into a vector, supports a string as a delim and gives control
over keeping empty values. But, it doesn't look as good then.
71
#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;

vector<string> split(const string& s, const string& delim, const bool


keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}

int main() {
const vector<string> words = split("So close no matter how far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}

Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do
mean any type of white-space, using Boost's split with is_any_of() works great.
Share Improve this answer Follow edited Jan 8, 2017 at 4:33 community wiki
3 revs, 2 users 70%
Shadow2531

Finally a solution that is handling empty tokens correctly at both sides of the string – fmuecke Sep 9, 2015 at 20:38

The STL does not have such a method available already.

However, you can either use C's strtok() function by using the std::string::c_str() member, or you
59
can write your own. Here is a code sample I found after a quick Google search ("STL string split"):

void Tokenize(const string& str,


vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);

while (string::npos != pos || string::npos != lastPos)


{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}

Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-


7.html

If you have questions about the code sample, leave a comment and I will explain.

And just because it does not implement a typedef called iterator or overload the << operator does not
mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster
than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types,
and they also tend to produce smaller EXEs.

Don't get sold on this "Elegance over performance" deal.

Share Improve this answer Follow edited Apr 12, 2018 at 11:35 community wiki
3 revs, 2 users 82%
user19302

I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my
question). However, for this specific question, I'm looking for an elegant C++ solution. – Ashwin Nanjappa Oct 25,
2008 at 9:16

... and you dont want to just build a OO wrapper over the C functions why? – user19302 Oct 25, 2008 at 9:42

11 @Nelson LaQuet: Let me guess: Because strtok is not reentrant? – paercebal Oct 25, 2008 at 9:52

Why not use the C++ features that are meant for this job? – graham.reeds Oct 25, 2008 at 11:54

44 @Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '\0' chars to replace each
foudn delimiter) and c_str() returns a non-modifiable string. – Evan Teran Oct 25, 2008 at 18:19
char* ch = new char[str.size()]; strcpy(ch, str.c_str()); ... delete[] ch; // problem solved. – user19302 Oct 26, 2008 at
0:20

3 @Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's
silly to avoid C functions for "aesthetic" reasons. – j_random_hacker Aug 24, 2009 at 9:08

"For example, printf and scanf both are faster then cin and cout" only because synchronization is enabled by
default – paulm May 12, 2014 at 19:13

"The STL does not have such a method available already" - what's wrong with string's find_first_of and using
iterators to remember positions? Then, use substr to extract. – jww Sep 26, 2014 at 0:44

2 @paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even
when synchronization is disabled (and on stringstreams, which can't synchronize). – Ben Voigt Apr 12, 2015 at
23:55

Or you could use strsep() (though not as portable). If you don't care about more than one char as the delimiter
another answer gives an idea ( getdelim() ) but you could also iterate over the string with strchr() . Or...there
are many ways depending on what you are after and need. – Pryftan Jun 13, 2020 at 18:22

Here is a split function that:

is generic
45
uses standard C++ (no boost)

accepts multiple delimiters


ignores empty tokens (can easily be changed)

template<typename T>
vector<T>
split(const T & str, const T & delimiters) {
vector<T> v;
typename T::size_type start = 0;
auto pos = str.find_first_of(delimiters, start);
while(pos != T::npos) {
if(pos != start) // ignore empty tokens
v.emplace_back(str, start, pos - start);
start = pos + 1;
pos = str.find_first_of(delimiters, start);
}
if(start < str.length()) // ignore trailing delimiter
v.emplace_back(str, start, str.length() - start); // add what's left of
the string
return v;
}

Example usage:

vector<string> v = split<string>("Hello, there; World", ";,");


vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");

Share Improve this answer Follow edited May 23, 2017 at 22:17 community wiki
6 revs
Marco M.

You forgot to add to use list: "extremely inefficient" – Xander Tulip Mar 19, 2012 at 0:20

1 @XanderTulip, can you be more constructive and explain how or why? – Marco M. Mar 21, 2012 at 11:57

3 @XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO,
google it) should take care of this. Also in C++11 you could return by move reference. – Joseph Garvin May 7, 2012
at 13:56

3 This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start,
pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other
shenanigans done by the .substr function. – Mihai Bişog Sep 5, 2012 at 13:50

@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer.
Thanks – Marco M. Sep 12, 2012 at 13:03

Can someone please make it return up to a max N elements? Any remaining characters should end up in the last
element. – Jonny Oct 29, 2015 at 1:28

Anyone else getting the error "missing 'typename' prior to dependent type name 'T::size_type'"? – Daniel Ryan May
23, 2017 at 3:06

I have a 2 lines solution to this problem:

39 char sep = ' ';


std::string s="1 This is an example";

for(size_t p=0, q=0; p!=s.npos; p=q)


std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;

Then instead of printing you can put it in a vector.

Share Improve this answer Follow edited Jan 15, 2013 at 0:12 community wiki
2 revs, 2 users 94%
rhomu

2 it's only a two-liner because one of those two lines is huge and cryptic... no one who actually has to read code ever,
wants to read something like this, or would write it. contrived brevity is worse than tasteful verbosity.
– underscore_d Nov 12, 2021 at 23:23

You can even make it a one liner by putting everything on a single line! Isn't that wonderful? – rhomu Jan 11, 2023
at 18:17

Yet another flexible and fast way

37 template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
const char* s = input;
const char* e = s;
while (*e != 0) {
e = s;
while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
if (e - s > 0) {
op(s, e - s);
}
s = e + 1;
}
}

To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :

template<class ContainerType>
class Appender {
public:
Appender(ContainerType& container) : container_(container) {;}
void operator() (const char* s, unsigned length) {
container_.push_back(std::string(s,length));
}
private:
ContainerType& container_;
};

std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " \t");

That's it! And that's just one way to use the tokenizer, like how to just count words:

class WordCounter {
public:
WordCounter() : noOfWords(0) {}
void operator() (const char*, unsigned) {
++noOfWords;
}
unsigned noOfWords;
};

WordCounter wc;
tokenize(wc, "A number of words to be counted", " \t");
ASSERT( wc.noOfWords == 7 );

Limited by imagination ;)

Share Improve this answer Follow edited Sep 11, 2013 at 8:11 community wiki
2 revs
Robert

Nice. Regarding Appender note "Why shouldn't we inherit a class from STL classes?" – Andreas Spindler Sep
10, 2013 at 12:07

Here's a simple solution that uses only the standard regex library

37 #include <regex>
#include <string>
#include <vector>

std::vector<string> Tokenize( const string str, const std::regex regex )


{
using namespace std;

std::vector<string> result;

sregex_token_iterator it( str.begin(), str.end(), regex, -1 );


sregex_token_iterator reg_end;

for ( ; it != reg_end; ++it ) {


if ( !it->str().empty() ) //token could be empty:check
result.emplace_back( it->str() );
}

return result;
}

The regex argument allows checking for multiple arguments (spaces, commas, etc.)

I usually only check to split on spaces and commas, so I also have this default function:
std::vector<string> TokenizeDefault( const string str )
{
using namespace std;

regex re( "[\\s,]+" );

return Tokenize( str, re );


}

The "[\\s,]+" checks for spaces ( \\s ) and commas ( , ).

Note, if you want to split wstring instead of string ,

change all std::regex to std::wregex

change all sregex_token_iterator to wsregex_token_iterator

Note, you might also want to take the string argument by reference, depending on your compiler.

Share Improve this answer Follow edited Jun 24, 2015 at 9:31 community wiki
2 revs, 2 users 99%
dk123

This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it
correctly in GCC 4.9. I am still giving you my +1 – mchiasson Aug 19, 2014 at 12:27

1 This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and
"regex" passed by references also. thx. – QuantumKarl Oct 16, 2015 at 15:06

1 Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape
sequences... You can just use R"([\s,]+)" . – Sam Feb 17, 2018 at 17:42

Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're
just looking for different way of doing things though, you can use std::find() / std::find_first_of()
33 and std::string::substr() .

Here's an example:

#include <iostream>
#include <string>

int main()
{
std::string s("Somewhere down the road");
std::string::size_type prev_pos = 0, pos = 0;

while( (pos = s.find(' ', pos)) != std::string::npos )


{
std::string substring( s.substr(prev_pos, pos-prev_pos) );

std::cout << substring << '\n';

prev_pos = ++pos;
}

std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word


std::cout << substring << '\n';

return 0;
}
Share Improve this answer Follow edited Apr 12, 2018 at 11:42 community wiki
2 revs, 2 users 81%
KTC

1 This only works for single character delimiters. A simple change lets it work with multicharacter: prev_pos = pos
+= delimiter.length(); – David Doria Feb 5, 2016 at 14:48

If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in
most of the previously proposed solutions), you can use the boost_split_iterator .
26
Example code including convenient template:

#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>

template<typename _OutputIterator>
inline void split(
const std::string& str,
const std::string& delim,
_OutputIterator result)
{
using namespace boost::algorithm;
typedef split_iterator<std::string::const_iterator> It;

for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));


iter!=It();
++iter)
{
*(result++) = boost::copy_range<std::string>(*iter);
}
}

int main(int argc, char* argv[])


{
using namespace std;

vector<string> splitted;
split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));

// or directly to console, for example


split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "\n"));
return 0;
}

Share Improve this answer Follow edited Feb 9, 2012 at 9:32 community wiki
3 revs, 2 users 71%
zerm

Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few
syntax errors, but this is at least the general idea)
23
#include <regex.h>
#include <string.h>
#include <vector.h>

using namespace std;

vector<string> split(string s){


regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment
words)
regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
regex_iterator<string::iterator> rend; //iterators to iterate thru words
vector<string> result<regex_iterator>(rit, rend);
return result; //iterates through the matches to fill the vector
}

Share Improve this answer Follow answered Oct 29, 2012 at 16:15 community wiki
AJMansfield

Similar responses with maybe better regex approach: here, and here. – Brent Bradburn Dec 5, 2014 at 23:25

C++20 finally blesses us with a split function. Or rather, a range adapter. Godbolt link.

23 #include <iostream>
#include <ranges>
#include <string_view>

namespace ranges = std::ranges;


namespace views = std::views;

using str = std::string_view;

auto view =
"Multiple words"
| views::split(' ')
| views::transform([](auto &&r) -> str {
return str(r.begin(), r.end());
});

auto main() -> int {


for (str &&sv : view) {
std::cout << sv << '\n';
}
}

Share Improve this answer Follow edited Mar 17, 2023 at 21:29 community wiki
2 revs, 2 users 91%
J. Willus

This is mostly the same as stackoverflow.com/a/54134243/6655648. – Pablo H Sep 24, 2021 at 14:31

There is a function named strtok .

22 #include<string>
using namespace std;

vector<string> split(char* str,const char* delim)


{
char* saveptr;
char* token = strtok_r(str,delim,&saveptr);

vector<string> result;

while(token != NULL)
{
result.push_back(token);
token = strtok_r(NULL,delim,&saveptr);
}
return result;
}

Share Improve this answer Follow edited May 2, 2014 at 14:49 community wiki
3 revs, 2 users 91%
Pratik Deoghare

4 strtok is from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the
input string. – Kevin Panko Jun 14, 2010 at 14:07

14 Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL
is passed, it remembers what pointer should be used. If a second thread calls strtok when another thread is still
processing, this char pointer will be overwritten, and both threads will then have incorrect results.
mkssoftware.com/docs/man3/strtok.3.asp – Kevin Panko Jun 14, 2010 at 17:27

1 as mentioned before strtok is unsafe and even in C strtok_r is recommended for use – systemsfault Jul 6, 2010 at
12:17

4 strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the
above that isn't "line noise", and is a testament to what, exactly, is wrong with c++ – Erik Aronesty Oct 10, 2011 at
18:04

Updated so there can be no objections on the grounds of thread safety from C++ wonks. – Erik Aronesty May 2,
2014 at 14:50

1 strtok is evil. It treats two delimiters as a single delimiter if there is nothing between them. – EvilTeach Aug 10,
2014 at 23:53

A for() loop looks better. Like this davekb.com/browse_programming_tips:strtok_r_example:txt – Yetti99 Jun 9,


2015 at 8:12

Using std::string_view and Eric Niebler's range-v3 library:

https://wandbox.org/permlink/kW5lwRCL1pxjp2pW
20
#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"
#include "range/v3/algorithm.hpp"

int main() {
std::string s = "Somewhere down the range v3 library";
ranges::for_each(s
| ranges::view::split(' ')
| ranges::view::transform([](auto &&sub) {
return std::string_view(&*sub.begin(), ranges::distance(sub));
}),
[](auto s) {std::cout << "Substring: " << s << "\n";}
);
}

By using a range for loop instead of ranges::for_each algorithm:

#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"

int main()
{
std::string str = "Somewhere down the range v3 library";
for (auto s : str | ranges::view::split(' ')
| ranges::view::transform([](auto&& sub) { return
std::string_view(&*sub.begin(), ranges::distance(sub)); }
))
{
std::cout << "Substring: " << s << "\n";
}
}

Share Improve this answer Follow edited May 19, 2019 at 20:47 community wiki
2 revs, 2 users 66%
Porsche9II

Yepp, the range for based looks better - I agree – Porsche9II May 17, 2019 at 8:57

The stringstream can be convenient if you need to parse the string by non-space symbols:

19 string s = "Name:JAck; Spouse:Susan; ...";


string dummy, name, spouse;

istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')

Share Improve this answer Follow edited Jun 22, 2015 at 17:02 community wiki
2 revs, 2 users 95%
lukmac

So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:

15 static void Split(std::vector<std::string>& lst, const std::string& input,


const std::string& separators, bool remove_empty = true)
{
std::ostringstream word;
for (size_t n = 0; n < input.size(); ++n)
{
if (std::string::npos == separators.find(input[n]))
word << input[n];
else
{
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
word.str("");
}
}
if (!word.str().empty() || !remove_empty)
lst.push_back(word.str());
}

A good point is that in separators you can pass more than one character.

Share Improve this answer Follow edited May 22, 2011 at 23:02 community wiki
3 revs, 2 users 64%
Goran
Short and elegant

15 #include <vector>
#include <string>
using namespace std;

vector<string> split(string data, string token)


{
vector<string> output;
size_t pos = string::npos; // size_t to avoid improbable overflow
do
{
pos = data.find(token);
output.push_back(data.substr(0, pos));
if (string::npos != pos)
data = data.substr(pos + token.size());
} while (string::npos != pos);
return output;
}

can use any string as delimiter, also can be used with binary data (std::string supports binary data,
including nulls)

using:

auto a = split("this!!is!!!example!string", "!!");

output:

this
is
!example!string

Share Improve this answer Follow edited Jul 14, 2016 at 20:17 community wiki
2 revs, 2 users 98%
user1438233

1 I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the
string, so it is forcing the creation of a copy of the original string. – Alessandro Teruzzi Aug 1, 2016 at 15:30

I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++
String Toolkit Library. It is incredibly flexible and fast.
14
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>

const char *whitespace = " \t\r\n\f";


const char *whitespace_and_punctuation = " \t\r\n\f;,=";

int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}

{ // parsing a string into a vector of floats with other separators


// besides spaces

std::string s("3.0, 3.14; 4.0");


std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}

{ // parsing a string into specific variables

std::string s("angle = 45; radius = 9.9");


std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}

return 0;
}

The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into
useful elements is incredible.

Share Improve this answer Follow answered Jan 7, 2014 at 20:28 community wiki
DannyK

This answer takes the string and puts it into a vector of strings. It uses the boost library.

13 #include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

Share Improve this answer Follow answered Dec 9, 2017 at 21:14 community wiki
NL628

I made this because I needed an easy way to split strings and C-based strings. Hopefully someone else
can find it useful as well. Also, it doesn't rely on tokens, and you can use fields as delimiters, which is
13 another key I needed.

I'm sure there are improvements that can be made to even further improve its elegance, and please do
by all means.

StringSplitter.hpp:

#include <vector>
#include <iostream>
#include <string.h>
using namespace std;

class StringSplit
{
private:
void copy_fragment(char*, char*, char*);
void copy_fragment(char*, char*, char);
bool match_fragment(char*, char*, int);
int untilnextdelim(char*, char);
int untilnextdelim(char*, char*);
void assimilate(char*, char);
void assimilate(char*, char*);
bool string_contains(char*, char*);
long calc_string_size(char*);
void copy_string(char*, char*);

public:
vector<char*> split_cstr(char);
vector<char*> split_cstr(char*);
vector<string> split_string(char);
vector<string> split_string(char*);
char* String;
bool do_string;
bool keep_empty;
vector<char*> Container;
vector<string> ContainerS;

StringSplit(char * in)
{
String = in;
}

StringSplit(string in)
{
size_t len = calc_string_size((char*)in.c_str());
String = new char[len + 1];
memset(String, 0, len + 1);
copy_string(String, (char*)in.c_str());
do_string = true;
}

~StringSplit()
{
for (int i = 0; i < Container.size(); i++)
{
if (Container[i] != NULL)
{
delete[] Container[i];
}
}
if (do_string)
{
delete[] String;
}
}
};

StringSplitter.cpp:

#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"

using namespace std;

void StringSplit::assimilate(char*src, char delim)


{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}

}
else
{
delete[] temp;
}
}
}

void StringSplit::assimilate(char*src, char* delim)


{
int until = untilnextdelim(src, delim);
if (until > 0)
{
char * temp = new char[until + 1];
memset(temp, 0, until + 1);
copy_fragment(temp, src, delim);
if (keep_empty || *temp != 0)
{
if (!do_string)
{
Container.push_back(temp);
}
else
{
string x = temp;
ContainerS.push_back(x);
}
}
else
{
delete[] temp;
}
}
}

long StringSplit::calc_string_size(char* _in)


{
long i = 0;
while (*_in++)
{
i++;
}
return i;
}

bool StringSplit::string_contains(char* haystack, char* needle)


{
size_t len = calc_string_size(needle);
size_t lenh = calc_string_size(haystack);
while (lenh--)
{
if (match_fragment(haystack + lenh, needle, len))
{
return true;
}
}
return false;
}

bool StringSplit::match_fragment(char* _src, char* cmp, int len)


{
while (len--)
{
if (*(_src + len) != *(cmp + len))
{
return false;
}
}
return true;
}

int StringSplit::untilnextdelim(char* _in, char delim)


{
size_t len = calc_string_size(_in);
if (*_in == delim)
{
_in += 1;
return len - 1;
}

int c = 0;
while (*(_in + c) != delim && c < len)
{
c++;
}

return c;
}

int StringSplit::untilnextdelim(char* _in, char* delim)


{
int s = calc_string_size(delim);
int c = 1 + s;

if (!string_contains(_in, delim))
{
return calc_string_size(_in);
}
else if (match_fragment(_in, delim, s))
{
_in += s;
return calc_string_size(_in);
}

while (!match_fragment(_in + c, delim, s))


{
c++;
}

return c;
}

void StringSplit::copy_fragment(char* dest, char* src, char delim)


{
if (*src == delim)
{
src++;
}

int c = 0;
while (*(src + c) != delim && *(src + c))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}

void StringSplit::copy_string(char* dest, char* src)


{
int i = 0;
while (*(src + i))
{
*(dest + i) = *(src + i);
i++;
}
}

void StringSplit::copy_fragment(char* dest, char* src, char* delim)


{
size_t len = calc_string_size(delim);
size_t lens = calc_string_size(src);

if (match_fragment(src, delim, len))


{
src += len;
lens -= len;
}

int c = 0;
while (!match_fragment(src + c, delim, len) && (c < lens))
{
*(dest + c) = *(src + c);
c++;
}
*(dest + c) = 0;
}

vector<char*> StringSplit::split_cstr(char Delimiter)


{
int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}

String -= i;
delete[] String;

return Container;
}

vector<string> StringSplit::split_string(char Delimiter)


{
do_string = true;

int i = 0;
while (*String)
{
if (*String != Delimiter && i == 0)
{
assimilate(String, Delimiter);
}
if (*String == Delimiter)
{
assimilate(String, Delimiter);
}
i++;
String++;
}

String -= i;
delete[] String;
return ContainerS;
}

vector<char*> StringSplit::split_cstr(char* Delimiter)


{
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);

while(*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String,Delimiter);
}
i++;
String++;
}

String -= i;
delete[] String;

return Container;
}

vector<string> StringSplit::split_string(char* Delimiter)


{
do_string = true;
int i = 0;
size_t LenDelim = calc_string_size(Delimiter);

while (*String)
{
if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
{
assimilate(String, Delimiter);
}
if (match_fragment(String, Delimiter, LenDelim))
{
assimilate(String, Delimiter);
}
i++;
String++;
}

String -= i;
delete[] String;

return ContainerS;
}

Examples:

int main(int argc, char*argv[])


{
StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
vector<char*> Split = ss.split_cstr(":CUT:");

for (int i = 0; i < Split.size(); i++)


{
cout << Split[i] << endl;
}

return 0;
}
Will output:

This
is
an
example
cstring

int main(int argc, char*argv[])


{
StringSplit ss = "This:is:an:example:cstring";
vector<char*> Split = ss.split_cstr(':');

for (int i = 0; i < Split.size(); i++)


{
cout << Split[i] << endl;
}

return 0;
}

int main(int argc, char*argv[])


{
string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string("[SPLIT]");

for (int i = 0; i < Split.size(); i++)


{
cout << Split[i] << endl;
}

return 0;
}

int main(int argc, char*argv[])


{
string mystring = "This|is|an|example|string";
StringSplit ss = mystring;
vector<string> Split = ss.split_string('|');

for (int i = 0; i < Split.size(); i++)


{
cout << Split[i] << endl;
}

return 0;
}

To keep empty entries (by default empties will be excluded):

StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");

The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:

String[] Split =
"Hey:cut:what's:cut:your:cut:name?".Split(new[]{":cut:"},
StringSplitOptions.None);

foreach(String X in Split)
{
Console.Write(X);
}

I hope someone else can find this as useful as I do.

Share Improve this answer Follow edited Mar 7, 2023 at 19:37 community wiki
3 revs, 3 users 70%
Steve Dell

Here's another way of doing it..

11 void split_string(string text,vector<string>& words)


{
int i=0;
char ch;
string word;

while(ch=text[i++])
{
if (isspace(ch))
{
if (!word.empty())
{
words.push_back(word);
}
word = "";
}
else
{
word += ch;
}
}
if (!word.empty())
{
words.push_back(word);
}
}

Share Improve this answer Follow edited Jan 8, 2010 at 3:27 community wiki
2 revs
user246110

I believe this could be optimized a bit by using word.clear() instead of word = "" . Calling the clear method
will empty the string but keep the already allocated buffer, which will be reused upon further concatenations. Right
now a new buffer is created for every word, resulting in extra allocations. – Teodor Maxim Apr 26, 2021 at 20:44

Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper
characters.
11
#include <string>
#include <list>
#include <locale> // std::isupper

template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
std::list<String> R;
String w;

for (String::const_iterator i = s.begin(); i < s.end(); ++i) { {


if (std::isupper(*i)) {
if (w.length()) {
R.push_back(w);
w.clear();
}
}
w += *i;
}

if (w.length())
R.push_back(w);
return R;
}

For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow
and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung"
into "Raumfahrt", "Überwachungs" and "Verordnung".

Note std::upper should be really passed as function template argument. Then the more generalized
from of this function can split at delimiters like "," , ";" or " " too.

Share Improve this answer Follow edited Sep 14, 2011 at 9:47 community wiki
2 revs
Andreas Spindler

2 There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist
did not fixed two minor bugs maybe because they were obvious anyway: std::isupper could be passed as
argument, not std::upper . Second put a typename before the String::const_iterator .
– Andreas Spindler Apr 28, 2015 at 7:20

std::isupper is guaranteed to be defined only in <cctype> header (the C++ version of the C <ctype.h> header), so
you must include that. This is like relying we can use std::string by using the <iostream> header instead of the
<string> header. – Adola Jun 13, 2022 at 3:39

What about this:

11 #include <string>
#include <vector>

using namespace std;

vector<string> split(string str, const char delim) {


vector<string> v;
string tmp;

for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {


if(*i != delim && i != str.end()) {
tmp += *i;
} else {
v.push_back(tmp);
tmp = "";
}
}

return v;
}

Share Improve this answer Follow edited Dec 19, 2012 at 22:05 community wiki
3 revs, 3 users 89%
gibbz
This is the best answer here, if you only want to split on a single delimiter character. The original question wanted
to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have
actually answered stackoverflow.com/questions/53849 – Oktalist Dec 19, 2012 at 22:09

I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying
the splitting criteria.
10
#include <iostream>
#include <string>
#include <boost/regex.hpp>

int main() {
std::string line("A:::line::to:split");
const boost::regex re(":+"); // one or more colons

// -1 means find inverse matches aka split


boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
boost::sregex_token_iterator end;

for (; tokens != end; ++tokens)


std::cout << *tokens << std::endl;
}

Share Improve this answer Follow answered Jun 12, 2011 at 9:25 community wiki
Marty B

1 2 3 Next

Highly active question. Earn 10 reputation (not counting the association bonus) in order to answer this question. The
reputation requirement helps protect this question from spam and non-answer activity.

You might also like