0% found this document useful (0 votes)

53 views21 pages

NLP

Natural Language Processing (NLP) is a field of computer science focused on enabling machines to understand and generate human language, utilizing various techniques such as Speech to Text. The document outlines the stages of a comprehensive NLP system, the historical development of NLP, and key concepts including text corpus, tokenization, and applications of NLP. It highlights the transition from traditional rule-based systems to modern machine learning and deep learning approaches in processing natural language.

Uploaded by

mukherjeenandan917

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views21 pages

NLP

Uploaded by

mukherjeenandan917

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

• Natural language is a means for us to express our thoughts and ideas.

• Language is a mutually agreed upon set of protocols involving words/sounds that

we use to communicate with each other.
• In this era of digitization and computation, we are constantly interacting with
machines around us through various means, such as voice commands and typing
instructions in the form of words.
• NLP can be defined as a field of computer science that is concerned with enabling
computer algorithms to understand, analyze and generate natural languages.
• For example, interacting with Siri or Alexa at some point.
• Siri and Alexa use techniques such as Speech to Text with the help of a search
engine to do the magic.
• Speech to Text is an application of NLP.

2
Stages in a Comprehensive NLP System

Tokenization
Morphological Analysis
Syntactic Analysis
Semantic Analysis (lexical and compositional)
Pragmatics and Discourse Analysis
Knowledge-Based Reasoning
Text generation
• NLP works at different levels, which means that machine process and understand
natural language at different levels.
• These levels are :
• Morphological level: This level deals with understanding word structure and word
information.
• Lexical level: This level deals with understanding the part of speech of the word.
• Syntactic level: This level deals with understanding the syntactic analysis of a sentence, or
parsing a sentence.
• Semantic level: This level deals with understanding the actual meaning of a sentence.
• Discourse level: This level deals with understanding the meaning of a sentence beyond just
the sentence level, that is, considering the context.
• Pragmatic level: This level deals with using real-world knowledge to understand the
sentence.

5
History of NLP

• NLP is a field that has emerged from various other fields such as AI, linguistics
and Data science.
• As stated above the idea had emerged from the need for Machine Translation in
the 1940s.
• Then the original language was English and Russian.
• But the use of other words such as Chinese also came into existence in the initial
period of the 1960s.
• Then a lousy era came for MT/NLP during 1966, this fact was supported by a
report of ALPAC, according to which MT/NLP almost died because the research
in this area did not have the pace at that time.
• This condition became better again in the 1980s when the product related to
MT/NLP started providing some results to customers.

8
• After reaching in dying state in the 1960s, the NLP/MT got a new life when the
idea and need of Artificial Intelligence emerged. LUNAR is developed in 1978 by
W.A woods; it could analyze, compare and evaluate the chemical data on a lunar
rock and soil composition that was accumulating as a result of Apollo moon
missions and can answer the related question.
• In the 1980s the area of computational grammar became a very active field of
research which was linked with the science of reasoning for meaning and
considering the user ‘s beliefs and intentions.
• In the period of 1990s, the pace of growth of NLP/MT increased. Grammars, tools
and Practical resources related to NLP/MT became available with the parsers.
• The research on the core and futuristic topics such as word sense disambiguation
and statistically colored NLP, the work on the lexicon got a direction of research.

9
• This quest of the emergence of NLP was joined by other essential topics such as
statistical language processing, Information Extraction and automatic
summarising.
• The discussion on the history of NLP cannot be considered complete without the
mention of the ELIZA, a chatbot program which was developed from 1964 to
1966 at the Artificial Intelligence Laboratory of MIT.
• It was created by Joseph Weizenbaum.
• It was a program which was based on script named as DOCTOR which was
arranged to Rogerian Psychotherapist and used rules, to response the questions of
the users which were psychometric-based.
• It was one of the chatbots which were capable of taking the Turing test at that
time.

10
• Previously, a traditional rule-based system was used for computations, in which
you had to explicitly write hardcoded rules.
• Today, computations on natural language are being done using ML and DL
techniques.
• Let’s say we have to extract the names of some politicians from a set of policial
news articles. So, if we want to apply rule-based grammar, we must manually craft
certain rules based on human understanding of language.
• As we can see, using a rule-based system like this would not yield very accurate
results.
• One major disadvantage is that the same rule cannot be applicable in all cases,
given the complex and nuanced nature of most language.

11
Basic Concepts

• Text corpus or corpora

• Paragraph
• Sentences
• Phrases and words
• N-grams
• Bag-of-words

12
Text Corpus or corpora

• The language data that all NLP tasks depend upon is called the text corpus or
simply corpus.
• A corpus is a large set of text data that can be in one of the languages like English,
French, and so on.
• The corpus can consist of a single document or a bunch of documents.
• The source of the text corpus can be social network sites like Twitter, blog sites,
open discussion forums like Stack Overflow, books, and several others.
• In some of the tasks like machine translation, we would require a multilingual
corpus.
• For example we might need both the English and French translations of the same
document content for developing a machine translation model.
13
• For speech tasks, we would also need human voice recordings and the
corresponding transcribed corpus.
• For many of the NLP task, the corpus is split into chunks for further analysis.
• These chunks could be at the paragraph, sentence, or word level.

14
Paragraph

• A paragraph is the largest unit of text handled by an NLP task.

• Paragraph level boundaries by itself may not be much use unless broken down into
sentences.
• Though sometimes the paragraph may be considered as context boundaries.
• Tokenizers that can split a document into paragraphs are available in some of the
Python libraries.

15
Sentences

• Sentences are the next level of lexical unit of language data.

• A sentence encapsulates a complete meaning or thought and context.
• It is usually extracted from a paragraph based on boundaries determined by
punctuations like period.
• The sentence may also convey opinion or sentiment expressed in it.
• In general, sentences consists of parts of speech (POS) entities like nouns, verbs,
adjectives, and so on.
• There are tokenizers available to split paragraphs to sentences based on
punctuations.

16
Phrases and words

• Phrases are a group of consecutive words within a sentence that can convey a
specific meaning.
• For example, in the sentence Tomorrow is going to be a rainy day the part going to
be a rainy day expresses a specific thought.
• Some of the NLP tasks extract key phrases from sentences for search and retrieval
applications.
• The next smallest unit of text is the word.
• The common tokenizers split sentences into text based on punctuations like spaces
and comma.
• One of the problems with NLP is ambiguity in the meaning of same words used in
different context.
17
N-gram

• A sequence of characters or words forms an N-gram.

• For example, character unigram consists of a single character.
• A bigram consists of a sequence of two characters and so on.
• Similarly word N-grams consists of a sequence of n words.
• In NLP, N-grams are used as features for tasks like text classification.

18
Bag-of-words

• Bag-of-words in contrast to N-grams does not consider word order or sequence.

• It captures the word occurrence frequencies in the text corpus.
• Bag-of-words is also used as features in tasks like sentiment analysis and topic
identification.

19
Applications

• Analyzing sentiment
• Recognizing named entities
• Linking entities
• Translating text
• Natural language interfaces
• Semantic Role Labeling
• Relation extraction
• SQL query generation, or semantic parsing
• Machine Comprehension
• Textual entailment

20
• Coreference resolution
• Searching
• Question answering and chatbots
• Converting text to voice
• Converting voice to text
• Speaker identification
• Spoken dialog systems
• Other applications

NLP Unit I
No ratings yet
NLP Unit I
30 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
41 pages
Unit I - Natural Language Processing
No ratings yet
Unit I - Natural Language Processing
34 pages
NLP Unit 1 1
No ratings yet
NLP Unit 1 1
67 pages
NLP Unit 1 To 5
No ratings yet
NLP Unit 1 To 5
91 pages
Unit 1
No ratings yet
Unit 1
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
SITA3012 NLP Unit 1
No ratings yet
SITA3012 NLP Unit 1
33 pages
NLP: History, Challenges, and Applications
No ratings yet
NLP: History, Challenges, and Applications
50 pages
NLP Module1-4
No ratings yet
NLP Module1-4
100 pages
NLP 1
No ratings yet
NLP 1
37 pages
Natural Language Processing New
No ratings yet
Natural Language Processing New
25 pages
0 Unit-1 Introducntion To NLP
No ratings yet
0 Unit-1 Introducntion To NLP
41 pages
NLP - Introduction
No ratings yet
NLP - Introduction
7 pages
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
No ratings yet
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
7 pages
Aids Module 5
No ratings yet
Aids Module 5
35 pages
Natural Language Processing
No ratings yet
Natural Language Processing
14 pages
Chap 1
No ratings yet
Chap 1
54 pages
1 Natural Language Processing-Intro
No ratings yet
1 Natural Language Processing-Intro
16 pages
Class 1 - NLP
No ratings yet
Class 1 - NLP
28 pages
NLP 833
No ratings yet
NLP 833
26 pages
Natural Language Processing Guide
No ratings yet
Natural Language Processing Guide
21 pages
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
100% (8)
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
5 pages
Unit1 (Part1)
No ratings yet
Unit1 (Part1)
49 pages
AI Chapter 6 and 7 New
No ratings yet
AI Chapter 6 and 7 New
48 pages
Unit1 A
No ratings yet
Unit1 A
8 pages
NLP Basics for Beginners
No ratings yet
NLP Basics for Beginners
7 pages
NLP UNIT 1 Part 1
No ratings yet
NLP UNIT 1 Part 1
24 pages
NLP Notes2
No ratings yet
NLP Notes2
27 pages
Module-1 - Introduction To Natural Language Processing
No ratings yet
Module-1 - Introduction To Natural Language Processing
70 pages
Unit I
No ratings yet
Unit I
36 pages
Unit-I NLP
No ratings yet
Unit-I NLP
37 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
CL Unit 1
No ratings yet
CL Unit 1
11 pages
Unit-I NLP
No ratings yet
Unit-I NLP
15 pages
NLP Unit 1 and 2
No ratings yet
NLP Unit 1 and 2
106 pages
NLP1 Lecture1
No ratings yet
NLP1 Lecture1
22 pages
NLP MODULE 1 Chapter1 &2
100% (1)
NLP MODULE 1 Chapter1 &2
83 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
ML Module A7707 - Part1
No ratings yet
ML Module A7707 - Part1
48 pages
NLP Introduction Week3
No ratings yet
NLP Introduction Week3
28 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
Natural Language Processing Inside Pages 2
No ratings yet
Natural Language Processing Inside Pages 2
159 pages
2 Introduction
No ratings yet
2 Introduction
15 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
63 pages
Natural Language Processing State of The Art Curre
No ratings yet
Natural Language Processing State of The Art Curre
33 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
NLP: Trends, Challenges & Insights
No ratings yet
NLP: Trends, Challenges & Insights
32 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
NLP Important Question and Answers Module Wise
No ratings yet
NLP Important Question and Answers Module Wise
101 pages
What Is NLP
No ratings yet
What Is NLP
14 pages
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
No ratings yet
Natural Language Processing With Python A Comprehensive Guide To NLP in The Age of AI For 2024 (Hayden Van Der Post) (Z-Library)
315 pages
NLP for AI and Business Solutions
No ratings yet
NLP for AI and Business Solutions
13 pages
Chapter 6
No ratings yet
Chapter 6
21 pages
1 NLP
No ratings yet
1 NLP
26 pages
NLP for AI and Tech Enthusiasts
No ratings yet
NLP for AI and Tech Enthusiasts
30 pages
Nlp-Unit-I Final
No ratings yet
Nlp-Unit-I Final
31 pages
Advances in Natural Language Processing
No ratings yet
Advances in Natural Language Processing
7 pages
Steven Weinberg and Higgs Physics: A D J I I
No ratings yet
Steven Weinberg and Higgs Physics: A D J I I
14 pages
Relativistic Cosmology. I - Amalkumar Raychaudhuri
No ratings yet
Relativistic Cosmology. I - Amalkumar Raychaudhuri
4 pages
Higher Lie Idempotents
No ratings yet
Higher Lie Idempotents
16 pages
Deeplearning7 - Markov Chain and HMM
No ratings yet
Deeplearning7 - Markov Chain and HMM
24 pages
50 Great Philosophy Books
No ratings yet
50 Great Philosophy Books
3 pages
The Flight of A Relativistic Charge in Matter. Insights, Calculations and Practical Applications of Classical Electromagnetism
100% (3)
The Flight of A Relativistic Charge in Matter. Insights, Calculations and Practical Applications of Classical Electromagnetism
134 pages
Bohrs 1913 Molecular Model Revisited
No ratings yet
Bohrs 1913 Molecular Model Revisited
5 pages
Ozone Layer Depletion - Air Pollution and Control
No ratings yet
Ozone Layer Depletion - Air Pollution and Control
7 pages
Street Fight MMA Training Guide
No ratings yet
Street Fight MMA Training Guide
3 pages
The ADM Decomposition
No ratings yet
The ADM Decomposition
4 pages
Universal High-Fidelity Quantum Gates For Spin Qubits in Diamond
No ratings yet
Universal High-Fidelity Quantum Gates For Spin Qubits in Diamond
26 pages
A Heterotic Hermitian-Yang-Mills Equivalence: Mathematical Physics
No ratings yet
A Heterotic Hermitian-Yang-Mills Equivalence: Mathematical Physics
33 pages
Empirical Learning of Dynamical Decoupling On Quantum Processors
No ratings yet
Empirical Learning of Dynamical Decoupling On Quantum Processors
17 pages
Environmental Science
No ratings yet
Environmental Science
6 pages
Halocarbons Air Pollution
No ratings yet
Halocarbons Air Pollution
4 pages
Hint of Dark Matter-Dark Energy Interaction in The Current Cosmological Data
No ratings yet
Hint of Dark Matter-Dark Energy Interaction in The Current Cosmological Data
12 pages
Adobe Scan Sep 28, 2020
No ratings yet
Adobe Scan Sep 28, 2020
6 pages
Research Topic
100% (1)
Research Topic
6 pages
Labor Dispute: Mercury Drug vs. Dayao
No ratings yet
Labor Dispute: Mercury Drug vs. Dayao
12 pages
Phu Gia
No ratings yet
Phu Gia
7 pages
SW D A Lab Manual 11
No ratings yet
SW D A Lab Manual 11
22 pages
Savings Bank Account Rules
No ratings yet
Savings Bank Account Rules
3 pages
Developments English For Secific Purposes
100% (4)
Developments English For Secific Purposes
317 pages
C Program: Count Vowels & Consonants
No ratings yet
C Program: Count Vowels & Consonants
1 page
Karachi Ordered Disorder and The Struggle For The City 2014 1st Edition Laurent Gayer
No ratings yet
Karachi Ordered Disorder and The Struggle For The City 2014 1st Edition Laurent Gayer
53 pages
Human Smuggling V Human Trafficking
No ratings yet
Human Smuggling V Human Trafficking
1 page
Etbc New Songbook
No ratings yet
Etbc New Songbook
91 pages
Psychology and Personal Development
No ratings yet
Psychology and Personal Development
16 pages
BACTERIA
No ratings yet
BACTERIA
14 pages
8605-1 Iqra
No ratings yet
8605-1 Iqra
22 pages
VyasSanjay15G Compressed
No ratings yet
VyasSanjay15G Compressed
3 pages
BA LLB Sem 2 Writings On Contemporary Issues - Solved QP
No ratings yet
BA LLB Sem 2 Writings On Contemporary Issues - Solved QP
13 pages
Creating Pardot Layout Templates
No ratings yet
Creating Pardot Layout Templates
12 pages
Managing and Leading People 2e Sample Chapter
No ratings yet
Managing and Leading People 2e Sample Chapter
19 pages
6 Letter Names Starting With M
No ratings yet
6 Letter Names Starting With M
1 page
FST 261 (Experiment 7)
No ratings yet
FST 261 (Experiment 7)
8 pages
Defendant's Written Statement
No ratings yet
Defendant's Written Statement
5 pages
英语议论文100词
No ratings yet
英语议论文100词
7 pages
Document Types and Naming Conventions
No ratings yet
Document Types and Naming Conventions
17 pages
TFAll - Climate Education and Leadership Initiative
No ratings yet
TFAll - Climate Education and Leadership Initiative
5 pages
Speech
No ratings yet
Speech
3 pages
DD Topical Past Papers Block II 1st Year
No ratings yet
DD Topical Past Papers Block II 1st Year
19 pages
NMPM012017
No ratings yet
NMPM012017
8 pages
SMG Release Notes 10 9 1
No ratings yet
SMG Release Notes 10 9 1
12 pages
Physical Pharmacy Experiment 1to 3bvc
No ratings yet
Physical Pharmacy Experiment 1to 3bvc
9 pages
Rethinking Education: A New Model
No ratings yet
Rethinking Education: A New Model
1 page
Palkhivala: Defender of India's Constitution
No ratings yet
Palkhivala: Defender of India's Constitution
24 pages

NLP

Uploaded by

NLP

Uploaded by

• Natural language is a means for us to express our thoughts and ideas.

• Language is a mutually agreed upon set of protocols involving words/sounds that

• Text corpus or corpora

• A paragraph is the largest unit of text handled by an NLP task.

• Sentences are the next level of lexical unit of language data.

• A sequence of characters or words forms an N-gram.

• Bag-of-words in contrast to N-grams does not consider word order or sequence.

You might also like