NLP Lab manual

The document outlines a procedure for text preprocessing in R, including steps for installing necessary libraries, cleaning text, tokenization, stop word removal, and stemming. It provides a sample program demonstrating these techniques using the tm and SnowballC packages. The output showcases the cleaned text, tokens, tokens without stop words, and the stemmed tokens.

Uploaded by

SRIVARSHIKA Sudhakar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

NLP Lab manual

Uploaded by

SRIVARSHIKA Sudhakar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Ex.

No: 2 :Perform Preprocessing (Tokenization, Scrip Validation, Stop word removal and stemming) of
Text. in R programming

Algorithm:

STEP 1: Install and load the necessary libraries

STEP 2: Create a sample text
STEP 3: Ensure that the text contains valid characters. This can be done using regex to
clean any unwanted characters.
STEP 4: Tokenization involves breaking the text into individual words (tokens). We use the
tm package’s Corpus and strsplit for tokenization.

STEP 5: Stop words are common words (like "the", "is", "in") that are often removed during

text preprocessing. The tm package has a list of common stop words.

STEP 6: Stemming is the process of reducing words to their root form.

For example, "running" becomes "run." We will use the SnowballC package for stemming.

Program:

# Install and load the required packages

install.packages(c("tm", "SnowballC", "stringr"))
library(tm)
library(SnowballC)
library(stringr)
# Sample text
text <- "This is a simple text. I am learning text mining in R! It's very interesting."
# Script Validation: Remove unwanted characters (non-alphabetic characters)
text_clean <- str_replace_all(text, "[^[:alpha:]\\s]", "")
print(paste("Cleaned Text: ", text_clean))
# Tokenization
tokens <- unlist(strsplit(tolower(text_clean), "\\s+"))
print(paste("Tokens: ", tokens))
# Stop word removal
stopwords_list <- stopwords("en")
tokens_no_stopwords <- tokens[!tokens %in% stopwords_list]
print(paste("Tokens without stopwords: ", tokens_no_stopwords))
# Stemming
stemmed_tokens <- wordStem(tokens_no_stopwords)
print(paste("Stemmed Tokens: ", stemmed_tokens))
Output:

Cleaned Text: this is a simple text i am learning text mining in r its very interesting
Tokens: [1] "this" "is" "a" "simple" "text" "i" "am" "learning" "text" "mining" "in" "r" "its" "very"
"interesting"
Tokens without stopwords: [1] "simple" "text" "learning" "text" "mining" "r" "interesting"
Stemmed Tokens: [1] "simpl" "text" "learn" "text" "mine" "r" "interest"

Text Mining Code
No ratings yet
Text Mining Code
3 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
SMTA - Lab Record - Aim, Procedures and Results
No ratings yet
SMTA - Lab Record - Aim, Procedures and Results
31 pages
Stemming in R a Comprehensive Guide
No ratings yet
Stemming in R a Comprehensive Guide
8 pages
Text Mining in R: A Tutorial
No ratings yet
Text Mining in R: A Tutorial
7 pages
Final LP-VI NLP Manual 2023-24
No ratings yet
Final LP-VI NLP Manual 2023-24
29 pages
Ass7 Write Up .Final
No ratings yet
Ass7 Write Up .Final
11 pages
Lecture 8
No ratings yet
Lecture 8
45 pages
NLP___
No ratings yet
NLP___
28 pages
VO_MCA_SEM 4 _ Text Mining _U2
No ratings yet
VO_MCA_SEM 4 _ Text Mining _U2
15 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
A Tutorial of Text Mining in R Using TM Package
No ratings yet
A Tutorial of Text Mining in R Using TM Package
6 pages
EBUS622 - Week 5 - Lecture - Text Preparation
No ratings yet
EBUS622 - Week 5 - Lecture - Text Preparation
40 pages
NLP LAB_MANUAL (1)
No ratings yet
NLP LAB_MANUAL (1)
33 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
SL-3_Assignment No 7
No ratings yet
SL-3_Assignment No 7
14 pages
Text Analysis
No ratings yet
Text Analysis
15 pages
Text Mining Code
No ratings yet
Text Mining Code
2 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
16 pages
NLP Experiment 2
No ratings yet
NLP Experiment 2
5 pages
2019 06 27 - Muenster
No ratings yet
2019 06 27 - Muenster
218 pages
LP Vi Manual
No ratings yet
LP Vi Manual
77 pages
CH4
No ratings yet
CH4
15 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
ir manual
No ratings yet
ir manual
53 pages
Extracting, Cleaning and Pre-Processing Text
No ratings yet
Extracting, Cleaning and Pre-Processing Text
12 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
Unit 5 Machine Learning
No ratings yet
Unit 5 Machine Learning
9 pages
Lecture Notes On Lexical Processing
No ratings yet
Lecture Notes On Lexical Processing
16 pages
18 Text Mining - Text Preprocessing
No ratings yet
18 Text Mining - Text Preprocessing
40 pages
NLP 02
No ratings yet
NLP 02
6 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Statistical NLP
No ratings yet
Statistical NLP
45 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
No ratings yet
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
37 pages
NLP Asgn1
No ratings yet
NLP Asgn1
7 pages
NLTK
No ratings yet
NLTK
3 pages
Hands-On Data Science With R Text Mining
No ratings yet
Hands-On Data Science With R Text Mining
41 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
PART B NOTES
No ratings yet
PART B NOTES
62 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
Word Level Analysis (NLP)
No ratings yet
Word Level Analysis (NLP)
28 pages
Hands-On Data Science With R Text Mining: 10th January 2016
No ratings yet
Hands-On Data Science With R Text Mining: 10th January 2016
47 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
38 pages
4.TWITTER EXTRACTION AND ANALYTICS
No ratings yet
4.TWITTER EXTRACTION AND ANALYTICS
45 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
Download full (Ebook) Supervised Machine Learning for Text Analysis in R by Emil Hvitfeldt, Julia Silge ISBN 9780367554187, 9780367554194, 0367554186, 0367554194 ebook all chapters
100% (3)
Download full (Ebook) Supervised Machine Learning for Text Analysis in R by Emil Hvitfeldt, Julia Silge ISBN 9780367554187, 9780367554194, 0367554186, 0367554194 ebook all chapters
76 pages
Ai & ML Week-11
No ratings yet
Ai & ML Week-11
32 pages
mod3 tables EPP
No ratings yet
mod3 tables EPP
9 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
5 Paso S Text Mining
No ratings yet
5 Paso S Text Mining
4 pages
Tmcode Text Mining
No ratings yet
Tmcode Text Mining
2 pages
Big data
No ratings yet
Big data
5 pages
Instant download Supervised Machine Learning for Text Analysis in R 1st Edition Emil Hvitfeldt pdf all chapter
100% (13)
Instant download Supervised Machine Learning for Text Analysis in R 1st Edition Emil Hvitfeldt pdf all chapter
60 pages
Data Science With R Text Mining by Graham Williams
No ratings yet
Data Science With R Text Mining by Graham Williams
21 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
Lab5 Instructions
No ratings yet
Lab5 Instructions
51 pages
Week 8
No ratings yet
Week 8
24 pages
Enhanced Consensus Mechanisms for IoT Using IOTA and Zero abstract
No ratings yet
Enhanced Consensus Mechanisms for IoT Using IOTA and Zero abstract
1 page
NLP
No ratings yet
NLP
11 pages
Unit 1
No ratings yet
Unit 1
36 pages
2 Marks and 10 Marks
No ratings yet
2 Marks and 10 Marks
22 pages
Optimal Binary Search Tree
No ratings yet
Optimal Binary Search Tree
25 pages
COI Model Question Paper
No ratings yet
COI Model Question Paper
5 pages
NLP BOOK
No ratings yet
NLP BOOK
599 pages
e3d1cc8b-bd8a-42b7-ba39-2513a9688f30
No ratings yet
e3d1cc8b-bd8a-42b7-ba39-2513a9688f30
41 pages
Sysllabus
No ratings yet
Sysllabus
5 pages
Array
No ratings yet
Array
68 pages
Array
No ratings yet
Array
68 pages
Cs 3352 FODS MODEL 2
No ratings yet
Cs 3352 FODS MODEL 2
2 pages
Unit - 1 Software
No ratings yet
Unit - 1 Software
45 pages
Call For Research Articles-1
No ratings yet
Call For Research Articles-1
2 pages
CS8494 SOFTWARE ENGINEERING - Watermark 1 100 1 50
No ratings yet
CS8494 SOFTWARE ENGINEERING - Watermark 1 100 1 50
24 pages
Fods QB
No ratings yet
Fods QB
35 pages

NLP Lab manual

Uploaded by

NLP Lab manual

Uploaded by

Ex.

STEP 1: Install and load the necessary libraries

text preprocessing. The tm package has a list of common stop words.

STEP 6: Stemming is the process of reducing words to their root form.

# Install and load the required packages

You might also like