0% found this document useful (0 votes)

43 views47 pages

NLP Course Overview and Key Concepts

Uploaded by

ombhaikigf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views47 pages

NLP Course Overview and Key Concepts

Uploaded by

ombhaikigf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Noida Institute of Engineering and Technology, Greater Noida

Introduction

Unit I

NLP: ACSA10712
Ankit Kumar Sharma
(Assistant Professor)
Course Details Information
B Tech 7th Sem Technology

Jyoti Kataria ACSA10712 NLP Unit 1

1
09/22/2025
Evaluation Scheme
NLP Syllabus

[Link]. FOURTH YEAR

Course Code ACSAI0712 LT P Credit

Course Title Natural Language Processing 3 0 0 3

Course objective:
The course aims to provide an understanding of the foundational concepts and techniques in NLP. The focus is on providing application-based knowledge.

Pre-requisites: Programming Skills, Data Structures, Algorithms, Probability and Statistics, Machine Learning.

Course Contents/Syllabus
UNIT-I Overview of Natural Language Processing 8Hour

Definition, Applications and emerging trends in NLP, Challenges. Ambiguity. NLP tasks using NLTK: Tokenization, stemming lemmatization, stop-word removal, POS tagging, Parsing, Named Entity Recognition, coreference solution.

UNIT-II 8Hour
Regular Expressions
Data Preprocessing: Convert to lower case, handle email-id, HTML tags, URLs, emojis, repeat characters, normalization of data(contractions, standardize) etc. Vocabulary, corpora, and linguistic resources, Linguistic foundations: Morphology, syntax, semantic and pragmatics, Language
models: Unigram, Bigram ,N-grams.

UNIT-III Text Analysis and Similarity 8Hour

TextVectorization:Bag-of-Wordsmodelandvectorspacemodels,TermPresence,TermFrequency,TF-IDFTextualSimilaritCosinesimilarity, WordMover’s distance, Word embeddings:Word2Vec, GloVe.

UNIT-IV Text Classification & NLP Applications 8Hour

Text classification: Implement of applications of NLP using text classification Sentiment Analysis, Topic modeling, Spam detection High Level NLP applications: Machine translation: Rule-based and statistical approaches, Text summarization Dialog system conversational agent sand
chatbots
Noida Institute of Engineering and Technology, Greater Noida

Subject code ACSA10712

Subject name Natural language Processing

Unit – 1
Overview of Natural language Processing
Definition , Applications and emerging trends in NLP, Challenges, Ambiguity , NLP
tasks using NLTK : Tokenization, stemming , lemmatization, stop-word, removal, Pos
tagging, parsing, named entity Recognition, coreferene resolution.
Natural Language Processing

What is NLP
Natural Language Processing (NLP) is a subfield of artificial
intelligence (AI) and computational linguistics focused on enabling
computers to understand, interpret, and respond to human language
in a way that is both meaningful and useful
Key Areas in NLP

• Text Analysis:
– Tokenization: Breaking down text into smaller units, such as words or
sentences.
– Part-of-Speech Tagging: Identifying the grammatical parts of speech
(nouns, verbs, adjectives, etc.) in a sentence.
– Named Entity Recognition (NER): Identifying and classifying named
entities (people, organizations, locations, etc.) in text.
– Sentiment Analysis: Determining the sentiment or emotion expressed in a
piece of text, such as positive, negative, or neutral.
• Speech Recognition:
– Converting spoken language into text, enabling voice-activated systems like
virtual assistants (e.g., Siri, Alexa) to understand and process voice
commands.
Key Areas of NLP

• Machine Translation:
– Automatically translating text or speech from one language to another, as
seen in services like Google Translate.
• Text Generation:
– Automatically generating coherent and contextually relevant text, such as
in chatbots, content creation, or language models like GPT.
• Question Answering:
– Building systems that can answer questions posed in natural language by
retrieving and summarizing relevant information from a large dataset.
• Text Summarization:
– Condensing a large piece of text into a shorter version while preserving its
meaning and key information.
Applications of NLP
• Virtual Assistants: NLP is used in virtual assistants like Siri, Alexa, and
Google Assistant to understand voice commands and respond
appropriately.
• Chatbots: Many businesses use NLP-based chatbots to provide
customer support and answer common queries.
• Search Engines: NLP helps search engines like Google understand user
queries and deliver relevant search results.
• Translation Services: NLP powers translation tools that convert text
from one language to another.
• Sentiment Analysis: Companies use sentiment analysis to monitor
social media, reviews, and other forms of user feedback to gauge public
opinion.
Challenges in NLP

• Ambiguity: Natural language is often ambiguous, with words having

multiple meanings or sentences that can be interpreted in different ways.
• Context: Understanding the context of a conversation or text is essential
for accurate processing, which can be challenging.
• Cultural Differences: Language use varies across cultures, making it
difficult to create models that work universally.
• Evolving Language: Language constantly evolves, with new slang,
idioms, and usage patterns emerging regularly.
Challenges in NLP - Ambiguity

Natural language ambiguity refers to situations where a word, phrase, or

sentence has multiple meanings, making it challenging to interpret correctly.
Some common forms of ambiguity include
1. Lexical Ambiguity
2. Syntactic Ambiguity
3. Semantic Ambiguity
4. Referential Ambiguity
5. Contextual Ambiguity
Challenges in NLP - Ambiguity

1. Lexical ambiguity
• Lexical means relating to words of a language.
• During Lexical analysis given paragraphs are broken down into words or
tokens. Each token has got specific meaning.
There can be instances where a single word can be interpreted in multiple ways.
The ambiguity that is caused by the word alone rather than the context is known as
Lexical Ambiguity.

Example: “Give me the bat!”

In this example “bat” have more than one meaning animal or circket bat
Challenges in NLP – Ambiguity continue….

Lexical ambiguity divide in two category

1. Polysemy
○ One word has many meanings
○ Determining the sense of a word in a particular context
■ He sat on the bank of a river/Withdraw money from the bank
■ Maruti has built a plant to manufacture cars/A man was planted in the
audience to raise anti-political slogans
2. Homonymy
o It refers to a single word having multiple but unrelated meanings.
Examples Bear, left, Pole
Challenges in NLP - Ambiguity

A bear (the animal) can bear (tolerate) very cold temperatures.

The driver turned left (opposite of right) and left (departed from) the main
road.

Pole and Pole — The first Pole refers to a citizen of Poland who could
either be referred to as Polish or a Pole. The second Pole refers to a
bamboo pole or any other wooden pole.
Challenges in NLP - Ambiguity

2. Syntactic Ambiguity/ Structural ambiguity

Syntactic meaning refers to the grammatical structure and rules that define how
words should be combined to form sentences and phrases. A sentence can be
interpreted in more than one way due to its structure or syntax such ambiguity is
referred to as Syntactic Ambiguity.

Example 1: “Old men and women”

There are possible two meaning of the example
All old men and young women.
All old men and old women.
Example 2: “John saw the boy with telescope.”
In example , two possible meanings are
John saw the boy through his telescope.
John saw the boy who was holding the telescope.
Challenges in NLP - Ambiguity

3. Semantic Ambiguity Semantics is nothing but “Meaning”.

• The semantics of a word or phrase refers to the way it is typically
understood or interpreted by people.
• Syntax describes the rules by which words can be combined into sentences, while
semantics describes what they mean.
This type of ambiguity occurs when a sentence has more than one interpretation or
meaning.
Example 1 “The chicken is ready to eat.”
The chicken (as food) is cooked and ready to be eaten.
The chicken (the animal) is hungry and ready to eat something.
Example 2 “Seema loves her mother and Sriya does too.”
In example two may be two interpretations
Sriya loves Seema’s mother or Sriya likes her mother.
Challenges in NLP - Ambiguity

4. Anaphoric (when a noun replace pronoun)Ambiguity -

A word that gets its meaning from a preceding word or phrase is called an

Example 1 - The house is on the longest street. It is very dirty.

In example1 “It ” represent to which house or long street
Example 2 – “I went to the hospital, and they told me to go home and rest.”
In example2- ‘they’ does not explicitly refer to the hospital instead it refers to the Dr or staff who attended
the patient in the hospital.
5. Pragmatic ambiguity - Pragmatics focuses on the real-time usage of language like what the speaker
wants to convey and how the listener infers it.
Example 1 : Do you know what time it is ?
Meaning of the example 1 - that some is asking for the time and other meaning is that someone showing
anger for missed the due time
Natural Language ToolKit (NLTK)

The Natural Language Toolkit (NLTK) is a Python programming environment for

creating applications for statistical natural language processing (NLP).

1. Tokenization
2. Sentence Segmentation
3. Corpus and Vocabulary
4. Stop words
5. Stemming and Lemmatization
6. Named Entity Recognition
7. Co-referencing Resolution
8. POS tagging
9. Parsing
Natural Language ToolKit (NLTK) – [Link]

● Tokenization method is used to split a sentence, paragraph, or

full-text document into smaller units - tokens
I Love NLP.
[‘I’, ‘love’, ‘NLP’,‘.’]
○ The basic unit of a language
○ It helps to interpret the meaning of the text by analysing the words
present in the text
○ Count the number and frequency of words in the text
Natural Language ToolKit (NLTK) – 2. Sentence Segmentation

You first need to break the entire document down into its constituent
sentences. You can do this by segmenting the article along with its
punctuations like full stops and commas.
● Splitting the given input text into sentences
● Characters used for defining sentence end - ‘!’, ‘?’, ‘.’
● Ambiguities:
○ The yearly results of Yahoo! are promising.
○ We are using the .NET framework for our project.
○ Susan scored 78.5% marks in her exams
○ Mr. Mehta is doing a great job.
● Rules like – Numbers around the ‘.’
Natural Language ToolKit (NLTK) – 3. Corpus and Vocabulary

● text corpus
○ The set of text documents used for the model.
○ e.g. For model to analyze movie reviews, corpus is the set of documents each
containing a movie review.
○ Documents set divided into training/testing for the model
● Vocabulary
○ The unique set of words in the entire corpus
○ Usually, the feature vector is based on the vocabulary of the corpus
○ Vocabulary size – number of words in the vocabulary
● Freely available corpus
○ Links to some freely available corpora - [Link]
Natural Language ToolKit (NLTK) – 3. Corpus and Vocabulary
conti….
Some popular Corpus Available
● Movie review dataset (IMDB dataset)
○ Consisting of 1000 negative and 1000 positive labeled movie reviews
○ [Link]
● Amazon product review datasets
○ DVD dataset, Sports and Outdoor datasets
○ Each consisting of 1000 negative and 1000 positive labeled product reviews.
○ [Link]
Natural Language ToolKit (NLTK) – 4. Stop words

Very common words in a language, no useful information

● Articles, prepositions, pronouns, conjunctions, etc,.
○ e.g. - “the”, “in”, “of”, “his”, “and ,”etc”
● Removal of stopwords token
○ Focus to the important information
○ Reduces the dataset size and training time
● May be needed for
○ Relational queries – “flights to Tokyo”
○ Phrases like - “To be or not to be”
○ Prediction of sentiment - “I told you that she was not happy” → “told, happy”
Natural Language ToolKit (NLTK) – 5. Stemming and Lemmatization

● Reduces the form of a word to the common base form.

e.g. - (go, going, gone) -> go (running, ran, runs, run) -> run
● To prepare text, words, and documents for further processing.
● When we search for a word on the web it also retrieves
variations of the word. If we search for say ‘kill’, it may also
return words like killer, killing, killed, kills.
● Here kill is the stem for killer, killed, killing, kills. It conveys
that each of these has the idea of ‘kill’.
Natural Language ToolKit (NLTK) – 5. Stemming and Lemmatization

● Stemming use heuristics that

○ Chops off letters from the end of the word
○ Transforms the end letters
● Lemmatization groups together similar inflected forms of a words, called lemma.
Word Suffix Stem Word Lemma

Was was Is, was, were Be

Cats s cat
Cats Cat
Changing ing chang
Changing, changed, change change
Studies es studi

Studying ing study Studies, studying Study

Natural Language ToolKit (NLTK) – 5. Stemming and Lemmatization

Stemming Lemmatization
Fast and simple- Pattern based Needs POS tagging,
dictionaries
Snowball, Porters LemmaGen, Morpha

Returns the stem of a word – may not be in Returns a proper word -

vocabulary lemma
Crude, less useful More informative

● Stemming and lemmatization are methods used by search engines and

chatbots to analyze the meaning behind a word.
● Stemming uses the stem of the word, while lemmatization uses the context in
which the word is being used.
Natural Language ToolKit (NLTK) – 6. Name Entity Recognition

● Named Entities (NEs) are proper names in texts, i.e. the names of people,
organizations, locations, time and quantities
● NER is to process a text and identify named entities
● Applications:
○ Helps identify the key elements in a text
■ Helps sort unstructured data and detect important information.
○ Useful in answering- question systems Hi, My name is Shubhangi Deb
■ “Where was Mahatma Gandhi born?” PERSON
I am from Australia GPE
I want to work with Amazon ORG
Jeff Bezos PERSON is my inspiration
Named entity recognition with
Machine Learning
Natural Language ToolKit (NLTK) – 6. Name Entity Recognition

● Applications
○ Processing Resumes
■ Looking for information from resumes formatted differently.
■ Personal information, experience, skills, degree etc,.
○ Gain Insights from Customer Feedback
■ Organize all this customer feedback and pinpoint repeating problem areas
■ Areas of customer likes/dislikes/improvement areas
○ Content Recommendation:
■ If you watch a lot of comedies on Netflix, you’ll get more
recommendations that have been classified as the entity ‘Comedy’.
Natural Language ToolKit (NLTK) – 7. Coreference Resolution

● Identify all expressions that refer to the same object.

○ Mohan went to McDonald to buy a burger. He visits the store very often and loves its food.

“ I voted for Biden because he was most aligned to my

principes”, Jenna said
Original sentence

“ Jenna voted for Biden because Biden was most aligned

to Jenna’s principes”, Jenna said
Sentence with resolved conferences

● Uses - It is an important step for a lot of higher level NLP tasks that involves
natural language understanding
Natural Language ToolKit (NLTK) – 7. Coreference Resolution

● Uses
○ Document summarization
○ Question answering
○ Machine translation
● Anaphora (backward references)
○ Refers to any reference that “points backward” to information that was
presented earlier in the text
“The apple on the table was rotten. It had been there for three days.”
● Cataphora (forward references)
○ Refers to any reference that “points forward” to information that will be
presented later in the text
“It has four legs The cow is a domestic animal.”
Natural Language ToolKit (NLTK) – 8. POS tagging

● Assign a parts of speech to each word in text Noun Pronoun

○ Nouns: Which defines any object or entity Interjectio
n
○ Verbs: That defines some action.
○ Pronoun: That can replace a noun – she, him. Preposition
Verb
Parts of
○ Adjectives Describe a noun/pronoun. speech

Adver
● In a sentence, every word will be associated with a Conjunctio b
n
proper POS tag Adjective

Puja bought a new phone from Samsung Store

Proper Noun Verb Determiner Adjective Noun Preposition Proper noun Noun

NNP VBN DT JJ NN IN NNP NN

Natural Language ToolKit (NLTK) – 9. NLTK

NLTK (Natural Language Toolkit) is a library for NLP in Python.

Powerful tool to preprocess text data for further analysis like as input to
Machine Learning algorithms.
Tokenization, POS tagging, Stemming etc
It includes many corpora and lexical resources (like WordNet)
Natural Language ToolKit (NLTK) – 9. NLTK

Installation
pip install nltk

Downloading the datasets:

import nltk
[Link]()

Choose from the screen

whatever packages you want to
download

Source:
[Link]
Natural Language ToolKit (NLTK) – 9. NLTK

Operations using NLTK:

Tokenization
import nltk
text = "First sentence. Second sentence“
nltk.word_tokenize(text)
Output: ['First', 'sentence', '.', 'Second', 'sentence', '.’]

Sentence splitting
nltk.sent_tokenize(text)
Output: ['First sentenece.', 'Second sentence']
Natural Language ToolKit (NLTK) – 9. NLTK

Accessing Text Corpora in NLTK:

● Corpus: A set of text documents usually having some common characteristic.
Corpora is plural of corpus.
set of online books
set of movie reviews
collection of tweets
• NLTK corpus is set of natural language datasets in nltk_data directory
In NLTK some corpora are included
Gutenberg corpus (online books)
Brown corpus (categories – news, humor etc
Reuters corpus (news)
WordNet –most advanced , contains words, synonyms, antonyms etc
from [Link] import gutenberg
Natural Language ToolKit (NLTK) – NLTK

Accessing Gutenberg corpus:

● NLTK includes a small selection of texts from the Project Gutenberg electronic text archive,
which contains some 25,000 free electronic books
See website for full list - [Link]
To download the corpus - [Link]('gutenberg’)
To find the Gutenberg collection that are downloaded with the NLTK package.-
[Link]()
To find the words in the text file - [Link](fileid)
To find the sentences in a file [Link](fileid)
To get the text in the file into a string - [Link](fileid)
In NLTK some corpora are included
Gutenberg corpus
Brown corpus
If phone number “0120- 4543466” in string “ His contact number is 0120- 4543466”
Python command - “0120- 4543466” in “ His contact number is 0120- 4543466”

If we know the format - ####-####### (phone number ) or ##/##/#### (date)

We need regular expressions to search for these patterns.
Natural Language ToolKit (NLTK) – NLTK

Accessing Browns corpus:

● Contains text categorized by genre, such as news, humor etc.
● To find the categories
from [Link] import brown
[Link]()
['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', 'humor',
'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance', 'science_fiction’]
To find the files in the category ‘humor’
[Link](categories=‘humor’)
['cr01', 'cr02', 'cr03', 'cr04', 'cr05', 'cr06', 'cr07', 'cr08', 'cr09’ ]
To find the words in the file ‘cr01’
[Link](fileids=['cr01‘])
['It', 'was', 'among', 'these', 'that', 'Hinkle', ...]
For details refer : [Link]
Natural Language ToolKit (NLTK) – NLTK

Accessing Gutenberg corpus:

Contains text categorized by genre, such as news, humor etc.
● To find the categories
from [Link] import brown
[Link]()
['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', 'humor',
'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance', 'science_fiction’]
To find the files in the category ‘humor’
[Link](categories=‘humor’)
['cr01', 'cr02', 'cr03', 'cr04', 'cr05', 'cr06', 'cr07', 'cr08', 'cr09’ ]
To find the words in the file ‘cr01’
[Link](fileids=['cr01‘])
['It', 'was', 'among', 'these', 'that', 'Hinkle', ...]
For details refer : [Link]
Natural Language ToolKit (NLTK) – NLTK

Accessing Gutenberg corpus:

If we know the format - ####-####### (phone number ) or ##/##/#### (date)

We need regular expressions to search for these patterns.
Natural Language ToolKit (NLTK) – NLTK

Accessing Gutenberg corpus:

If we know the format - ####-####### (phone number ) or ##/##/#### (date)

We need regular expressions to search for these patterns.
Text Preprocessing in NLP

Text Preprocessing in NLP

● Convert raw text into a set of tokens that the computer can understand and use.
Ready for feature extraction
Data cleaning and pre-processing is critical for the quality of further analysis
Pre-processing steps depend on
a. The data – structured (movie reviews) or unstructured (Tweets)
[Link] application for which data needs to be used.
Text Preprocessing in NLP

Steps in Text Preprocessing:

● Convert all characters to lower case

o e.g. Hello, HELLO, hello, hellO -> hello
Remove HTML tags, URL, email id
[Link]
anuj123@[Link]
Text between HTML tags

Converting data to standard form

2mrw, tmrw->tomorrow , btwn, btw -> between, b4->before
Text Preprocessing in NLP

● Emojis
Remove them / replace with a word / sentiment analysis
Replace characters repeated more (twitter)
e.g. it was ssssoooo nice -> it was so nice
Replace contractions (short forms to full words)
I ‘m -> I am, did’nt -> did not
Removal of punctuations
Removal of rare/frequent words
Text Preprocessing in NLP

● Tokenization
'the new policy of the government is good’

Sentence segmentation(if required)

text = "First in class. Last in class.“
Output: ['First in class.', ‘Last in class.']
● Remove stop words
Tokens as input - ['the', 'new', 'policy', 'of', 'the', 'government', 'is',
'good’]
['new', 'policy', 'government', 'good’]
Text Preprocessing in NLP

● Parts of Speech (POS) tagging

Tokens as input -['he', 'loves', 'to', 'play', 'with', 'toys', 'in', 'morning’]
[('he', 'PRP’), ('loves', 'VBZ’), ('to', 'TO'), ('play', 'VB’), ('with', 'IN'), ('toys',
'NN'), ('in', 'IN'), ('morning', 'NN’)]

Stemming
Tokens as input -['Stemming', 'usually', 'tries', 'to', 'convert', 'the', 'word', 'into', 'its', 'root',
'format’]
Stemming - ['stem', 'us', 'tri', 'to', 'convert', 'the', 'word', 'into', 'it', 'root',
'form’]

Lemmatization
Tokens as input -['Stemming', 'usually', 'tries', 'to', 'convert', 'the', 'word', 'into', 'its', 'root',
'format’]
Lemmatization - ['Stemming', 'usually', 'try', 'to', 'convert', 'the', 'word', 'into',
'it', 'root', 'format']
Text Preprocessing in NLP

● Name Entity Recognition

String ‘text’ as input
text = "Tom is good at playing football and stays in London."
tokens = word_tokenize(text)
pos_text= nltk.pos_tag(tokens)
nes=nltk.ne_chunk(pos_text,
nes=nltk.ne_chunk(pos_text, binary = True) binary = False)

OUTPUT OUTPUT
(NE Tom/NNP) (PERSON Tom/NNP)
('is', 'VBZ') ('is', 'VBZ')
('good', 'JJ') ('good', 'JJ')
('at', 'IN') ('at', 'IN')
('playing', 'VBG') ('playing', 'VBG')
('football', 'NN') ('football', 'NN')
('and', 'CC') ('and', 'CC')
('stays', 'NNS') ('stays', 'NNS')
('in', 'IN') ('in', 'IN')
(NE London/NNP) (GPE London/NNP)
('.', '.') ('.', '.')
Text Preprocessing in NLP

How do we implement these NLP concepts:

Language: Python
● Why Python?
○ Has simple syntax
○ Has extensive collection of NLP tools and libraries
● Python Libraries used in this course
○ Numpy
○ Pandas
○ Matplotlib
○ SciKit-Learn
○ NLTK
Text Preprocessing in NLP

Summary
● Now you have an idea of what NLP is, its applications and the emerging
trends and challenges in using it.
● You also learnt about the basic concepts of NLP like
○ Corpus and Vocabulary
○ Text Normalization (Tokenization , Stemming and Lemmatization,
Stop words, Sentence segmentation )
○ POS tagging, Named Entity Recognition, Co-referencing Resolution
○ Parsing

●Implementation of the above concepts using NLTK

Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
29 pages
Comprehensive Guide to Natural Language Processing
No ratings yet
Comprehensive Guide to Natural Language Processing
86 pages
NLP: Morphological Analysis and Stages
100% (1)
NLP: Morphological Analysis and Stages
140 pages
Structure of Anusaraka in NLP
No ratings yet
Structure of Anusaraka in NLP
31 pages
NCEAC Course on Natural Language Processing
No ratings yet
NCEAC Course on Natural Language Processing
4 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
39 pages
NLP Challenges and Applications Overview
No ratings yet
NLP Challenges and Applications Overview
86 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
46 pages
NLP Techniques: Tokenization and Models
No ratings yet
NLP Techniques: Tokenization and Models
154 pages
AI and Machine Learning Overview
No ratings yet
AI and Machine Learning Overview
9 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
2 pages
NLP Course Outline and Learning Goals
No ratings yet
NLP Course Outline and Learning Goals
4 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
20 pages
NLP Unit I Study Notes for CSE Students
No ratings yet
NLP Unit I Study Notes for CSE Students
1 page
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
16 pages
Natural Language Processing Course Syllabus
100% (1)
Natural Language Processing Course Syllabus
2 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
6 pages
Applied NLP Workshop Overview
No ratings yet
Applied NLP Workshop Overview
3 pages
NLP and Machine Learning Overview
No ratings yet
NLP and Machine Learning Overview
17 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
10 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
87 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
11 pages
NLP Overview: Language Modeling & Tasks
No ratings yet
NLP Overview: Language Modeling & Tasks
41 pages
Overview of Natural Language Processing
100% (1)
Overview of Natural Language Processing
83 pages
GTU NLP Syllabus for 2024-25
No ratings yet
GTU NLP Syllabus for 2024-25
3 pages
NLP Concepts and Challenges Overview
No ratings yet
NLP Concepts and Challenges Overview
13 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
4 pages
Meaning of Chesthanu in Hindi
No ratings yet
Meaning of Chesthanu in Hindi
60 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
7 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
99 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
53 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
43 pages
NLP Overview and Applications
No ratings yet
NLP Overview and Applications
3 pages
Foundations of Natural Language Processing
No ratings yet
Foundations of Natural Language Processing
31 pages
Triaright Photos in NLP Applications
No ratings yet
Triaright Photos in NLP Applications
4 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
76 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
108 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
44 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
19 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
26 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
87 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
16 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
32 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
65 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
159 pages
Applications of Artificial Intelligence
No ratings yet
Applications of Artificial Intelligence
16 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
35 pages
NLP Course Notes Overview
No ratings yet
NLP Course Notes Overview
21 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
14 pages
NLP Course Overview and Objectives
No ratings yet
NLP Course Overview and Objectives
57 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
7 pages
NLP Course Outline at IIM Amritsar
No ratings yet
NLP Course Outline at IIM Amritsar
10 pages
Advances in Natural Language Processing
No ratings yet
Advances in Natural Language Processing
13 pages
Techniques in Voice Assistant Communication
No ratings yet
Techniques in Voice Assistant Communication
31 pages
Advanced NLP with TensorFlow 2 Syllabus
No ratings yet
Advanced NLP with TensorFlow 2 Syllabus
6 pages
Fundamentals of Natural Language Processing
No ratings yet
Fundamentals of Natural Language Processing
44 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
41 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
87 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
29 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
19 pages
Multimodal Model for Artificial General Intelligence
No ratings yet
Multimodal Model for Artificial General Intelligence
13 pages
Emergent Abilities in Language Models
No ratings yet
Emergent Abilities in Language Models
30 pages
RAG Chat-Bot Capstone Project Report
No ratings yet
RAG Chat-Bot Capstone Project Report
84 pages
Create an Azure AI Question Answering Bot
No ratings yet
Create an Azure AI Question Answering Bot
22 pages
Retrieving and Reading - A Comprehensive Survey On Open-Domain Question Answering
No ratings yet
Retrieving and Reading - A Comprehensive Survey On Open-Domain Question Answering
21 pages
UQuAD1.0: Urdu QA Dataset Development
No ratings yet
UQuAD1.0: Urdu QA Dataset Development
22 pages
LLMs and Retrieval-Based QA Methods
No ratings yet
LLMs and Retrieval-Based QA Methods
8 pages
Watson's Jeopardy! AI Breakthrough
No ratings yet
Watson's Jeopardy! AI Breakthrough
9 pages
NLP Overview and Applications
No ratings yet
NLP Overview and Applications
19 pages
NLP Assignment on Text Summarization
No ratings yet
NLP Assignment on Text Summarization
1 page
Dynamic Neural Module Networks for QA
No ratings yet
Dynamic Neural Module Networks for QA
10 pages
ChatGPT's Role in Information Retrieval
No ratings yet
ChatGPT's Role in Information Retrieval
15 pages
Understanding Thematic Roles in NLP
No ratings yet
Understanding Thematic Roles in NLP
10 pages
Deep Learning Model Life-Cycle Guide
No ratings yet
Deep Learning Model Life-Cycle Guide
57 pages
NLU Course Overview at SJTU
No ratings yet
NLU Course Overview at SJTU
61 pages
Semantic Role Labeling in NLP
No ratings yet
Semantic Role Labeling in NLP
3 pages
RAG vs. Fine-Tuning for Low-Frequency Knowledge
No ratings yet
RAG vs. Fine-Tuning for Low-Frequency Knowledge
11 pages
CCS369 Text and Speech Analysis Syllabus
No ratings yet
CCS369 Text and Speech Analysis Syllabus
3 pages
GatorTron: Advanced Clinical NLP Model
No ratings yet
GatorTron: Advanced Clinical NLP Model
24 pages
POS Tagging and Chunking in NLP
No ratings yet
POS Tagging and Chunking in NLP
5 pages
ChatGPT vs Human: HC3 Dataset Insights
No ratings yet
ChatGPT vs Human: HC3 Dataset Insights
21 pages
EcoDoc: Cost-Efficient Document Processing
No ratings yet
EcoDoc: Cost-Efficient Document Processing
8 pages
Memory Management for Intelligent Agents
No ratings yet
Memory Management for Intelligent Agents
22 pages
Applications of Text Parsing Techniques
No ratings yet
Applications of Text Parsing Techniques
2 pages
Iterative RAG for Medical Questioning
No ratings yet
Iterative RAG for Medical Questioning
16 pages
Natural Language Processing and Information Retrieval Principles and Applications (Muskan Garg Etc.) (Z-Library)
100% (1)
Natural Language Processing and Information Retrieval Principles and Applications (Muskan Garg Etc.) (Z-Library)
271 pages
Prompt-Guided Image Captioning for VQA
No ratings yet
Prompt-Guided Image Captioning for VQA
13 pages
Generative Frameworks for Attribute Extraction
No ratings yet
Generative Frameworks for Attribute Extraction
10 pages
Types of RAG for Enhanced Retrieval
100% (3)
Types of RAG for Enhanced Retrieval
15 pages

NLP Course Overview and Key Concepts

Uploaded by

NLP Course Overview and Key Concepts

Uploaded by

Noida Institute of Engineering and Technology, Greater Noida

Jyoti Kataria ACSA10712 NLP Unit 1

[Link]. FOURTH YEAR

Course Title Natural Language Processing 3 0 0 3

UNIT-III Text Analysis and Similarity 8Hour

TextVectorization:Bag-of-Wordsmodelandvectorspacemodels,TermPresence,TermFrequency,TF-IDFTextualSimilaritCosinesimilarity, WordMover’s distance, Word embeddings:Word2Vec, GloVe.

UNIT-IV Text Classification & NLP Applications 8Hour

Subject code ACSA10712

• Ambiguity: Natural language is often ambiguous, with words having

Natural language ambiguity refers to situations where a word, phrase, or

Example: “Give me the bat!”

Lexical ambiguity divide in two category

A bear (the animal) can bear (tolerate) very cold temperatures.

2. Syntactic Ambiguity/ Structural ambiguity

Example 1: “Old men and women”

3. Semantic Ambiguity Semantics is nothing but “Meaning”.

4. Anaphoric (when a noun replace pronoun)Ambiguity -

Example 1 - The house is on the longest street. It is very dirty.

The Natural Language Toolkit (NLTK) is a Python programming environment for

● Tokenization method is used to split a sentence, paragraph, or

Very common words in a language, no useful information

● Reduces the form of a word to the common base form.

● Stemming use heuristics that

Was was Is, was, were Be

Studying ing study Studies, studying Study

Returns the stem of a word – may not be in Returns a proper word -

● Stemming and lemmatization are methods used by search engines and

● Identify all expressions that refer to the same object.

“ I voted for Biden because he was most aligned to my

“ Jenna voted for Biden because Biden was most aligned

● Assign a parts of speech to each word in text Noun Pronoun

Puja bought a new phone from Samsung Store

NNP VBN DT JJ NN IN NNP NN

NLTK (Natural Language Toolkit) is a library for NLP in Python.

Downloading the datasets:

Choose from the screen

Operations using NLTK:

Accessing Text Corpora in NLTK:

Accessing Gutenberg corpus:

If we know the format - ####-####### (phone number ) or ##/##/#### (date)

Accessing Browns corpus:

Accessing Gutenberg corpus:

Accessing Gutenberg corpus:

If we know the format - ####-####### (phone number ) or ##/##/#### (date)

Accessing Gutenberg corpus:

If we know the format - ####-####### (phone number ) or ##/##/#### (date)

Text Preprocessing in NLP

Steps in Text Preprocessing:

● Convert all characters to lower case

Converting data to standard form

Sentence segmentation(if required)

● Parts of Speech (POS) tagging

● Name Entity Recognition

How do we implement these NLP concepts:

●Implementation of the above concepts using NLTK

You might also like