NLP Lab manual
NLP Lab manual
No: 2 :Perform Preprocessing (Tokenization, Scrip Validation, Stop word removal and stemming) of
Text. in R programming
Algorithm:
STEP 5: Stop words are common words (like "the", "is", "in") that are often removed during
Program:
Cleaned Text: this is a simple text i am learning text mining in r its very interesting
Tokens: [1] "this" "is" "a" "simple" "text" "i" "am" "learning" "text" "mining" "in" "r" "its" "very"
"interesting"
Tokens without stopwords: [1] "simple" "text" "learning" "text" "mining" "r" "interesting"
Stemmed Tokens: [1] "simpl" "text" "learn" "text" "mine" "r" "interest"