Tokenizing Strings in List of Strings - Python
Last Updated :
04 Feb, 2025
The task of tokenizing strings in a list of strings in Python involves splitting each string into smaller units, known as tokens, based on specific delimiters. For example, given the list a = ['Geeks for Geeks', 'is', 'best computer science portal'], the goal is to break each string into individual words or tokens, resulting in a list of lists: [['Geeks', 'for', 'Geeks'], ['is'], ['best', 'computer', 'science', 'portal']].
Using list comprehension
List comprehension is a concise way of creating lists. It allows for looping over an iterable and applying operations or expressions to generate new lists. When combined with split() this provides a very efficient way to tokenize strings.
Python
a = ['Geeks for Geeks', 'is', 'best computer science portal']
res = [sub.split() for sub in a]
print(res)
Output[['Geeks', 'for', 'Geeks'], ['is'], ['best', 'computer', 'science', 'portal']]
Explanation: list comprehension iterates over each string in a, applying the split() method to split each string into a list of words.
Using resplit()
For more complex tokenization, where delimiters are not just spaces, re.split() from the re module can be used. It allows us to split strings based on regular expressions, making it suitable for handling multiple delimiters, such as spaces, punctuation, and other special characters.
Python
import re
a = ['Geeks for Geeks', 'is', 'best computer science portal']
res = [re.split(r'\s+', sub) for sub in a]
print(res)
Output[['Geeks', 'for', 'Geeks'], ['is'], ['best', 'computer', 'science', 'portal']]
Explanation: list comprehension splits each string in a into words using the re.split() with the pattern r'\s+', which matches one or more whitespace characters.
Using map()
map() applies a given function to all items in an iterable . It’s a functional programming tool that is often used in Python to avoid explicit loops. By combining map() with split() , we can tokenize strings efficiently in a functional programming style.
Python
a = ['Geeks for Geeks', 'is', 'best computer science portal']
res = list(map(str.split, a))
print(res)
Output[['Geeks', 'for', 'Geeks'], ['is'], ['best', 'computer', 'science', 'portal']]
Explanation: map() applies split() to each string in a, splitting them into words and then the result is converted back to a list.
Similar Reads
List of strings in Python A list of strings in Python stores multiple strings together. In this article, weâll explore how to create, modify and work with lists of strings using simple examples.Creating a List of StringsWe can use square brackets [] and separate each string with a comma to create a list of strings.Pythona =
2 min read
Python | Splitting string list by strings Sometimes, while working with Python strings, we might have a problem in which we need to perform a split on a string. But we can have a more complex problem of having a front and rear string and need to perform a split on them. This can be multiple pairs for split. Let's discuss certain way to solv
3 min read
Python - Substring presence in Strings List Given list of substrings and list of string, check for each substring, if they are present in any of strings in List. Input : test_list1 = ["Gfg", "is", "best"], test_list2 = ["I love Gfg", "Its Best for Geeks", "Gfg means CS"] Output : [True, False, False] Explanation : Only Gfg is present as subst
5 min read
Python | Delimited String List to String Matrix Sometimes, while working with Python strings, we can have problem in which we need to convert String list which have strings that are joined by deliminator to String Matrix by separation by deliminator. Lets discuss certain ways in which this task can be performed. Method #1 : Using loop + split() T
5 min read
Splitting String to List of Characters - Python We are given a string, and our task is to split it into a list where each element is an individual character. For example, if the input string is "hello", the output should be ['h', 'e', 'l', 'l', 'o']. Let's discuss various ways to do this in Python.Using list()The simplest way to split a string in
2 min read