0% found this document useful (0 votes)
6 views

Module 1.1

Lecture Notes for Natural Language processing A brief history of natural language processing, language challenges, applications, classical vs statistical vs deep learning-based, Basic concepts in linguistic data Structure: Morphology, syntax, semantics, pragmatics, Tokenized text and pattern matching-Recognizing names, Stemming, Tagging

Uploaded by

Abd Xy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 1.1

Lecture Notes for Natural Language processing A brief history of natural language processing, language challenges, applications, classical vs statistical vs deep learning-based, Basic concepts in linguistic data Structure: Morphology, syntax, semantics, pragmatics, Tokenized text and pattern matching-Recognizing names, Stemming, Tagging

Uploaded by

Abd Xy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Brief History of NLP

o 1950s: Alan Turing proposed the Turing Test to evaluate a


machine's ability to exhibit intelligent behavior equivalent to,
or indistinguishable from, that of a human.
o 1960s: Development of early NLP systems such as ELIZA, a
computer program by Joseph Weizenbaum that simulated
conversation.
o 1970s-1980s: Introduction of rule-based systems like
SHRDLU and the development of the Chomsky hierarchy in
linguistics.
o 1990s: Statistical approaches began to dominate NLP,
utilizing probabilistic models to handle large corpora of text
data.
o 2000s: The rise of machine learning and deep learning
techniques led to significant advancements in NLP, such as
Google's PageRank algorithm for search engines.
o 2010s-Present: Development of powerful deep learning
models like Word2Vec, GloVe, BERT, and GPT, which
have revolutionized NLP tasks such as language translation,
sentiment analysis, and text generation.

Language Challenges in NLP

o Ambiguity: Words and sentences can have multiple


meanings.
o Example: "The farmer went to the bank." (Is "bank"
referring to the side of a river or a financial institution?)
o Context: Understanding the context is crucial for accurate
interpretation.
 Example: "He banked the plane" vs. "He went to the
bank."
o Sarcasm and Irony: Detecting sarcasm and irony can be
challenging.

 Example: "Oh, great! Another homework


assignment."

o Diverse Syntax and Grammar: Different languages have


different syntax and grammar rules.
 Example: Subject-Verb-Object (SVO) , She eats an
apple in English vs. Subject-Object-Verb (SOV) in
Japanese. Kanojo wa ringo o taberu

o Idioms and Phrases: Recognizing and interpreting


idiomatic expressions.

 Example: "Kick the bucket" meaning "to die."

Applications of NLP

o Machine Translation: Translating text from one language


to another.

 Example: Google Translate translating "Hello, world!"


into Spanish as "¡Hola, mundo!"

o Sentiment Analysis: Determining the sentiment (positive,


negative, neutral) of a text.

 Example: Analyzing product reviews to determine


customer satisfaction.

o Chatbots: Automated systems that interact with users via


text or speech.

 Example: Customer support chatbots like those used


by banks or online retailers.
o Information Retrieval: Extracting relevant information
from large datasets.

 Example: Search engines like Google retrieving


relevant web pages based on user queries.

o Speech Recognition: Converting spoken language into text.

 Example: Voice assistants like Siri, Alexa, and


Google Assistant.

Classical vs. Statistical vs. Deep Learning-based NLP

 Classical NLP:
o Rule-based Approaches: Utilize hand-crafted rules to
process language.

 Example: Parsing sentences using grammar rules.

o Manual Feature Engineering: Involves defining specific


linguistic features for analysis.

 Example: Identifying parts of speech (POS) using


predefined rules.
 Statistical NLP:

o Probabilistic Models: Use statistical methods to model and


predict language patterns.

 Example: Hidden Markov Models (HMMs) for POS


tagging.

o Large Amounts of Data: Relies on extensive corpora to


learn patterns.

 Example: Using n-grams to predict the next word in a


sentence.

 Deep Learning-based NLP:

o Neural Networks: Employ deep neural networks to learn


from raw text data.

 Example: Recurrent Neural Networks (RNNs) for


sequence prediction.

o End-to-End Learning: Models can learn to perform tasks


directly from data without explicit feature engineering.

 Example: Transformers like BERT and GPT for


various NLP tasks.
Basic Concepts in Linguistic Data Structure

 Morphology:

o Study of word structure and formation.


o Example: Analyzing the root, prefix, and suffix of words
like "unhappiness" (un- + happy + -ness).

 Syntax:

o Rules that govern sentence structure.


o Example: English follows Subject-Verb-Object (SVO)
order: "She (S) loves (V) music (O)."

 Semantics:

o Meaning of words and sentences.


o Example: Understanding that "bark" can refer to the sound a
dog makes or the outer covering of a tree.

 Pragmatics:

o Contextual use of language.


o Example: Interpreting "Can you pass the salt?" as a request
rather than a question about ability.
Tokenized Text and Pattern Matching

o Tokenization: Splitting text into individual tokens (words or


sentences).
o Example:
 Input Text: "Natural Language Processing is
fascinating."
 Tokenized Text: ['Natural', 'Language', 'Processing',
'is', 'fascinating', '.']
 Explanation: The sentence is divided into individual
words and punctuation marks.

o Pattern Matching: Identifying patterns within tokenized


text using regular expressions.
o Example:

 Input Text: "The quick brown fox jumps over the


lazy dog."
 Pattern: Words with exactly 4 letters.
 Matched Words: ['quick', 'over', 'lazy']
 Explanation: The pattern identifies words that are
exactly four letters long within the sentence.

Recognizing Names

o Named Entity Recognition (NER): Identifies proper nouns


and classifies them as people, organizations, etc.
o Example:
 Input Text: "Barack Obama was the 44th President of
the United States."
 Recognized Entities:

 'Barack Obama' as PERSON


 '44th President' as TITLE
 'United States' as GPE (Geopolitical Entity)

 Explanation: The NER system identifies and


categorizes names and titles within the text.

Stemming and Lemmatization

 Stemming:
o Reduces words to their base form by removing prefixes or
suffixes.
o Example:

 Input Words: ['running', 'jumps', 'easily', 'fairly']


 Stemmed Words: ['run', 'jump', 'easili', 'fairli']
 Explanation: The words are reduced to their root
forms, which may not always be meaningful.

 Lemmatization:

o Reduces words to their meaningful base form using


vocabulary and morphological analysis.
o Example:

 Input Words: ['running', 'jumps', 'easily', 'fairly']


 Lemmatized Words: ['run', 'jump', 'easy', 'fair']
 Explanation: The words are reduced to their base or
dictionary forms, ensuring they remain meaningful.

Tagging Parts of Speech


o POS Tagging: Assigns part-of-speech tags to each word in a
sentence.
o Example:

 Input Text: "The quick brown fox jumps over the


lazy dog."
 POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'),
('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'),
('lazy', 'JJ'), ('dog', 'NN')]
 Explanation: Each word is tagged with its
corresponding part of speech, such as determiner (DT),
adjective (JJ), noun (NN), verb (VBZ), and
preposition (IN).

Constituent Structure

o Constituent Structure Analysis: Breaks down sentences


into their sub-parts (constituents).
o Example:

 Input Text: "The quick brown fox jumped over the


lazy dog."
 Constituent Structure:

 Sentence (S)

 Noun Phrase (NP): "The quick brown


fox"
 Determiner (DT): "The"
 Adjectives (JJ): "quick", "brown"
 Noun (NN): "fox"
 Verb Phrase (VP): "jumped over the lazy
dog"

 Verb (VBD): "jumped"


 Prepositional Phrase (PP): "over
the lazy dog"

 Preposition (IN): "over"


 Noun Phrase (NP): "the lazy
dog"
 Determiner (DT):
"the"
 Adjective (JJ): "lazy"
 Noun (NN): "dog"

 Explanation: The sentence is parsed into a


hierarchical structure, showing the relationships
between words and phrases.

You might also like