Python NLTK | tokenize.regexp()

Last Updated : 7 Jun, 2019
With the help of NLTK tokenize.regexp() module, we are able to extract the tokens from string by using regular expression with RegexpTokenizer() method.
Syntax : tokenize.RegexpTokenizer() Return : Return array of tokens using regular expression
Example #1 : In this example we are using RegexpTokenizer() method to extract the stream of tokens with the help of regular expressions. Python3 1==
# import RegexpTokenizer() method from nltk
from nltk.tokenize import RegexpTokenizer
  
# Create a reference variable for Class RegexpTokenizer
tk = RegexpTokenizer('\s+', gaps = True)
  
# Create a string input
gfg = "I love Python"
  
# Use tokenize method
geek = tk.tokenize(gfg)
  
print(geek)
Output :
['I', 'love', 'Python']
Example #2 : Python3 1==
# import RegexpTokenizer() method from nltk
from nltk.tokenize import RegexpTokenizer
  
# Create a reference variable for Class RegexpTokenizer
tk = RegexpTokenizer('\s+', gaps = True)
  
# Create a string input
gfg = "Geeks for Geeks"
  
# Use tokenize method
geek = tk.tokenize(gfg)
  
print(geek)
Output :
['Geeks', 'for', 'Geeks']
Comment