Python – Group list by first character of string
Last Updated :
23 Mar, 2023
Sometimes, we have a use case in which we need to perform the grouping of strings by various factors, like first letter or any other factor. These types of problems are typical to database queries and hence can occur in web development while programming. This article focuses on one such grouping by the first letter of the string. Let’s discuss certain ways in which this can be performed.
Method #1: Using next() + lambda + loop
The combination of the above 3 functions is used to solve this particular problem by the naive method. The lambda function performs the task of finding like initial character, and the next function helps in forwarding iteration.
Python3
test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]
print ( "The original list : " + str (test_list))
def util_func(x, y): return x[ 0 ] = = y[ 0 ]
res = []
for sub in test_list:
ele = next ((x for x in res if util_func(sub, x[ 0 ])), [])
if ele = = []:
res.append(ele)
ele.append(sub)
print ( "The list after Categorization : " + str (res))
|
Output :
The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]
Time Complexity: O(n^2), where n is the number of elements in the list.
Auxiliary Space: O(n), where n is the number of elements in the list.
Method #2: Using sorted() + groupby()
This particular task can also be solved using the groupby function which offers a convenient method to solve this problem. The sorted function sorts the elements by initial character to be feed to groupby for the relevant grouping.
Python3
from itertools import groupby
test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]
print ( "The original list : " + str (test_list))
def util_func(x): return x[ 0 ]
temp = sorted (test_list, key = util_func)
res = [ list (ele) for i, ele in groupby(temp, util_func)]
print ( "The list after Categorization : " + str (res))
|
Output :
The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]
Time complexity: O(nlogn), where n is the length of the input list.
Auxiliary space: O(n), where n is the length of the input list.
Method #3: Using for loop
Python3
test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]
print ( "The original list : " + str (test_list))
res = []
x = []
for i in test_list:
if i[ 0 ] not in x:
x.append(i[ 0 ])
for i in x:
p = []
for j in test_list:
if j[ 0 ] = = i:
p.append(j)
res.append(p)
print ( "The list after Categorization : " + str (res))
|
Output
The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]
Time complexity: O(n^2), where n is the length of the input list.
Auxiliary space: O(n), where n is the length of the input list.
Approach 4: Using defaultdict
Python3
from collections import defaultdict
test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]
print ( "The original list : " + str (test_list))
res = defaultdict( list )
for i in test_list:
res[i[ 0 ]].append(i)
print ( "The list after Categorization : " + str ( list (res.values())))
|
Output
The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['an', 'a'], ['geek', 'g'], ['for', 'free']]
Time Complexity: O(n) where n is the number of elements in test_list
Auxiliary Space: O(n) as we use a defaultdict to store the result
Explanation:
We use a defaultdict to store the result where the key is the first character of each string in the list and value is the list of all strings with the same first character.
The defaultdict automatically initializes the value as an empty list if the key is not present. So, we don’t have to check if the key is present or not.
We iterate through the test_list and add the elements to the corresponding key in the defaultdict.
Finally, we convert the defaultdict values to a list to get the final result.\
Method #5: Using dictionary comprehension
Use dictionary comprehension to categorize the words based on their first character.
- First, we create a set of unique first characters in the list using set comprehension: set([word[0] for word in test_list]).
- Next, we create a dictionary comprehension where the keys are the first characters and the values are the list of words starting with that character: {char: [word for word in test_list if word.startswith(char)] for char in set([word[0] for word in test_list])}.
- Finally, we print the result.
Python3
test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]
print ( "The original list : " + str (test_list))
res = {char: [word for word in test_list if word.startswith(char)] for char in set ([word[ 0 ] for word in test_list])}
print ( "The list after Categorization : " + str (res))
|
Output
The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : {'g': ['geek', 'g'], 'f': ['for', 'free'], 'a': ['an', 'a']}
Time complexity: O(n^2), where n is the length of the input list. This is because we use the startswith() method inside the list comprehension, which has a time complexity of O(n).
Auxiliary space: O(n), where n is the length of the input list. This is because we create a dictionary where each key has a list of words starting with that character.
Method #6: Using itertools.groupby() with sorted()
Use the itertools.groupby() function in combination with sorted() to group the words based on their first character.
Step-by-step approach:
- First, sort the input list using sorted().
- Then, use itertools.groupby() to group the words based on their first character. groupby() returns an iterator of (key, group) pairs, where key is the first character and group is an iterator of words starting with that character.
- Iterate over the (key, group) pairs, convert the group iterator to a list using list(), and append it to the result list res.
- Finally, print the result.
Below is the implementation of the above approach:
Python3
import itertools
test_list = [ 'an' , 'a' , 'geek' , 'for' , 'g' , 'free' ]
print ( "The original list : " + str (test_list))
res = []
for k, g in itertools.groupby( sorted (test_list), key = lambda x: x[ 0 ]):
res.append( list (g))
print ( "The list after Categorization : " + str (res))
|
Output
The original list : ['an', 'a', 'geek', 'for', 'g', 'free']
The list after Categorization : [['a', 'an'], ['for', 'free'], ['g', 'geek']]
Time complexity: O(n log n), where n is the length of the input list. This is because we use sorted() which has a time complexity of O(n log n).
Auxiliary space: O(n), where n is the length of the input list. This is because we create a list of lists, where each inner list contains the words starting with the same character.
Similar Reads
Python | Lowercase first character of String
The problem of capitalizing a string is quite common and has been discussed many times. But sometimes, we might have a problem like this in which we need to convert the first character of the string to lowercase. Let us discuss certain ways in which this can be performed. Method #1: Using string sli
4 min read
Python - Groups Strings on Kth character
Sometimes, while working with Python Strings, we can have a problem in which we need to perform Grouping of Python Strings on the basis of its Kth character. This kind of problem can come in day-day programming. Let's discuss certain ways in which this task can be performed. Method #1: Using loop Th
4 min read
Get Last N characters of a string - Python
We are given a string and our task is to extract the last N characters from it. For example, if we have a string s = "geeks" and n = 2, then the output will be "ks". Let's explore the most efficient methods to achieve this in Python. Using String Slicing String slicing is the fastest and most straig
2 min read
Split String into List of characters in Python
We are given a string and our task is to split this string into a list of its individual characters, this can happen when we want to analyze or manipulate each character separately. For example, if we have a string like this: 'gfg' then the output will be ['g', 'f', 'g']. Using ListThe simplest way
2 min read
How to Capitalize First Character of String in Python
Suppose we are given a string and we need to capitalize the first character of it, for example: Input: "geeks"Output: "Geeks" In this article, we are going to learn several ways of doing it with examples, let's look at them: Using str.capitalize()Using str.capitalize() is a simple way to make the fi
2 min read
Create a List of Strings in Python
Creating a list of strings in Python is easy and helps in managing collections of text. For example, if we have names of people in a group, we can store them in a list. We can create a list of strings by using Square Brackets [] . We just need to type the strings inside the brackets and separate the
3 min read
Python | Group List on K character
Sometimes, we may face an issue in which we require to split a list to list of list on the K character sent as deliminator. This kind of problem can be used to send messages or can be used in cases where it is desired to have list of list of native list. Letâs discuss certain ways in which this can
3 min read
Python - Least Frequent Character in String
The task is to find the least frequent character in a string, we count how many times each character appears and pick the one with the lowest count. Using collections.CounterThe most efficient way to do this is by using collections.Counter which counts character frequencies in one go and makes it ea
3 min read
Python - Split String of list on K character
In this article, we will explore various methods to split string of list on K character in Python. The simplest way to do is by using a loop and split(). Using Loop and split()In this method, we'll iterate through each word in the list using for loop and split it based on given K character using spl
2 min read
Python | Split string in groups of n consecutive characters
Given a string (be it either string of numbers or characters), write a Python program to split the string by every nth character. Examples: Input : str = "Geeksforgeeks", n = 3 Output : ['Gee', 'ksf', 'org', 'eek', 's'] Input : str = "1234567891234567", n = 4 Output : [1234, 5678, 9123, 4567] Method
2 min read