0% found this document useful (0 votes)

39 views33 pages

Syntactic AI in Natural Language Processing

The document provides an overview of Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Natural Language Processing (NLP), detailing their definitions and interrelations. It discusses the importance of NLP in processing large volumes of textual data and highlights various applications such as sentiment analysis, machine translation, and text summarization. Additionally, it covers regular expressions, finite automata, and their roles in computational linguistics, along with references for further reading.

Uploaded by

fpar570

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views33 pages

Syntactic AI in Natural Language Processing

Uploaded by

fpar570

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Speech & Natural Language Processing

Curated by
Dr. Tohida Rehman
Assistant Professor
Department of Information Technology
Jadavpur University
Contents
1. Introduction of AI, ML,DL
2. Overview of NLP
3. Discussion about the application of NLP
4. Details of regular expression
5. References
Introduction[1/2]
What is AI?
 The term artificial intelligence is used to describe that machines can mimic/simulate human
thinking capabilities and behavior.
What is ML?
 Machine learning uses artificial intelligence to enable machines to learn and predict
outcomes more accurately without being explicitly programmed to do so.
What is DL?
 Deep learning is a subset of ML that uses complex multi-layered artificial neural network
algorithms modeled to represent human-like behavior using a large amount of data.
What is NLP?
 Natural language processing (NLP) is the ability of a computer program to understand
human language as it is spoken and written -- referred to as natural language.
 NLP ⊆ AI; NLP ∩ ML; NLP ∩ DL and DL ⊆ ML.
Introduction[2/2]
What is Linguistics?
Linguistics is the scientific study of language, and its focus is the systematic investigation of the
properties of particular languages as well as the characteristics of the language in general.
Important subfields of linguistics include:
• Phonetics - the study of how speech sounds are produced and perceived
• Phonology - the study of sound patterns and changes
• Morphology - studies how words are formed from smaller meaningful units called morphemes.
• Syntax - focuses on the rules that govern sentence structure and word order.
• Semantics - deals with the meaning of words, phrases, and sentences.
• Pragmatics - the study of how language is used in context (interpretation of meaning)
• Historical Linguistics - the study of language change
• Sociolinguistics - the study of the relation between language and society
• Computational Linguistics - the study of how computers can process human language
• Psycholinguistics - Investigates how language is learned, understood, and produced by the human
mind.
Why NLP is important?

 Large volume of textual data on the web and social media.

 NLP enables computers to communicate with humans in their native language.
 Making sense of a highly unstructured data source.
 Human language is incredible in its complexity and variety.
 We can express ourselves verbally and in writing in many ways.
 Need to utilize machine learning methods for understanding syntactic and semantics context
for modeling our languages.
Information Extraction
Subject: NLP Meeting
Date: January 15, 20 Event: NLP Meeting
Date: Nov-13-2022
To: T Rehman Start: 09:00am
End: 10:00am
Hi Rehman, we’ve now scheduled the NLP meeting. Where: SMCC building

It will be in the SMCC building tomorrow from 9:00-10:00 a.m.

-Poly
Create new Calendar entry
Information Extraction & Sentiment Analysis
Attributes:
zoom
affordability
size and weight
flash
ease of use
Size and weight

✓  since the camera is small and light, I won't need to carry around those heavy, bulky
 nice and compact to carry!

✓  the camera feels flimsy, is plastic and very light in weight you have to be very
professional cameras either!

✗ delicate in the handling of this camera

Machine Translation
 Automatically translate text from one language to another without human involvement.

Source Text

Which free courses will help you to learn English?

Translated Text

ক োন ফ্রি ক োর্ সআপনোক ইংকেজি ফ্রিখকে র্োহোয্য েকে?

Text Summarization
 People nowadays use search engines like Google, Yahoo, and Bing to find information on
the Internet.
 Due to explosion in data, it is helpful for users if they are provided relevant summaries of
the search.
 Text summarization has become a vital approach to help consumers swiftly grasp vast
amounts of information.
 Given a long text, humans have a natural tendency to remember its most important points in
a summary form. The volume of data around us is growing to the point that we need to find
a solution that will deliver accurate and timely summary information.
 It requires a tool or approach for extracting an accurate summary from a large amount of
data.
Extractive vs Abstractive Summarization
 Extractive and abstractive summarization are two types of text summarization methods.
 A technique for extracting essential sentences or paragraphs from the source text and
condensing them into a shorter text is known as extractive summarization.

Abstractive summarization:-acquire the text’s primary idea in natural language without the
verbatim use of terms from the text
Language Technology
making good progress
still really hard?
Sentiment analysis
Good progress in 2025...
mostly solved Best roast chicken in San Francisco!
Question answering (QA)
The waiter ignored us for 20 minutes.
Q. How effective is ibuprofen in reducing
Spam detection Coreference resolution fever in patients with acute febrile illness?

Let’s go to Shimla! ✓
✗ Paraphrase
Carter told Mubarak he shouldn’t run again.
Buy V1AGRA …
Word sense disambiguation (WSD) XYZ acquired ABC yesterday
I need new batteries for my mouse. ABC has been taken over by XYZ
Part-of-speech (POS) tagging
ADJ ADJ NOUN VERB ADV Summarization
Colorless green ideas sleep furiously. Parsing The Dow Jones is up Economy is
I can see Alcatraz from the window! The S&P500 jumped good
Housing prices rose
Named entity recognition (NER) Machine translation (MT)
PERSON ORG LOC We drink coffee every morning. Dialog
Einstein met with UN officials in Princeton Where is Citizen Kane playing in SF?
আমেো প্রফ্রেফ্রিন র্ োকে ফ্রি পোন ফ্রে |
Castro Theatre at 7:30. Do you
Information extraction (IE) want a ticket?
Party
You’re invited to our dinner May 27
party, Friday May 27 at 8:30 add
Ambiguity makes NLP hard:“Crash blossoms”
 Ambiguity is an intrinsic characteristic of human conversations.
 Natural language understanding(NLU) scenarios are very challenging for ambiguity.
 Because One words, phrases or sentences can have multiple meaning with different context.
 Crash blossom (plural crash blossoms) A sentence, often a news headline, that is subject to
incorrect interpretation due to syntactic and/or lexical ambiguity.
Teacher strikes idle kids Hospitals Are Sued by 7 Foot Doctors
Has two meanings, “Teacher hits lazy kids” and “Teacher walkouts leave kids idle.”
Identification and explanation: "strikes" can occur as either a verb meaning to hit or a noun
meaning a refusal to work. Meantime, "idle" can occur as either a verb or an adjective
 “Has two meaning Seven doctors are suing the hospital” or “doctors who are 7 foot in
height are suing the hospitals.
 More information can be found in New York Times article
Assignment

 Find out some(at least 10) Crash blossoms and identify the type of ambiguity.
Skills prerequisite

 Simple linear algebra (vectors, matrices)

 Basic probability theory
 Java or Python programming
 Different packages/tools like NLTK, spacy many more
Regular Expression
 One of the unsung successes in standardization in computer science has been the regular
expression (RE), a language for specifying text search strings
 Regular expression search requires a pattern that we want to search for, and a corpus of
texts to search through that.
 Regular expressions are case sensitive.
 American Mathematician Stephen Cole Kleene formalized the Regular Expression
language
 Regular expressions play a surprisingly large role
 Sophisticated sequences of regular expressions are often the first model for any text
processing text
 For many hard tasks, we use machine learning classifiers
 But regular expressions are used as features in the classifiers
 Can be very useful in capturing generalizations
DESCRIPTION OF A FINITE AUTOMATON[1/2]
Analytically a deterministic finite automaton(DFA) can be represented by a 5-tuple (Q, ∑ , δ,
qo, F). where
(i) Q is a finite nonempty set of states.
(ii) ∑ is a finite nonempty set of inputs called the input alphabet.
(iii) δ(delta) is a function that maps Q x ∑ into Q and is usually called the direct
transition function. This is the function which describes the change of states during the
transition. This mapping is usually represented by a transition table or a transition diagram.
(iv) qo ∈ Q is the initial state.
(v) F ⊆ Q is the set of final states. It is assumed here that there may be more than one final
state.
N.B:The behavior of a deterministic automaton (DFSA) is fully determined by the state it is in.
DESCRIPTION OF A FINITE AUTOMATON[2/2]
Analytically a nondeterministic finite automaton(NDFA) can be represented by a 5-tuple
(Q, ∑ , δ, qo, F). where
(i) Q is a finite nonempty set of states.
(ii) ∑ is a finite nonempty set of inputs called the input alphabet.
(iii) δ(delta) is a function that maps Q x ∑ into 𝟐𝑸 and is usually called the direct
transition function. This is the function which describes the change of states during the
transition. This mapping is usually represented by a transition table or a transition diagram.
(iv) qo ∈ Q is the initial state.
(v) F ⊆ Q is the set of final states. It is assumed here that there may be more than one final
state.
N.B: difference between the deterministic and nondeterministic automata is only in δ. For deterministic
automaton (DFA), the outcome is a state, i.e. an element of Q; for nondeterministic automaton the outcome is a
subset of Q.
Give the entire sequence of states for the input string 110001

Example Taken from K L P Mishra

Solution
H.W
Check this input string 0100 accepted or not?

Example Taken from K L P Mishra

Finite Automata, Regular Grammars, Regular Expressions
 The theoretical basis of computational work: finite state automata
 For description: regular expressions
 Similarly a regular expression can be implemented as a finite state automation and FSA can
be described with RE.
 The regular expression is more than just a convenient metalanguage for text searching.
 A regular expression is one way of characterizing a particular kind of formal language
called a regular language.
 Both regular expressions and finite-state automata can be used to describe regular
languages.
Regular Expression
We give a formal recursive definition of regular expressions over ∑ : as follows:
1. Any terminal symbol (i.e. an element of ∑ :), an empty string is denoted with ε or
sometimes Λ and null sign (∅), the empty set are regular expressions. When we view a in ∑:
as a regular expression, we denote it by a.
2. The union of two regular expressions R1 and R2 written as R 1 + R2, is also a regular
expression.
3. The concatenation of two regular expressions R1 and R2, written as R1. R2, is also a
regular expression.
4. The iteration (or closure) of a regular expression R written as R*, is also a regular
expression.
S. If R is a regular expression, then (R) is also a regular expression.
6. The regular expressions over ∑ : are precisely those obtained recursively by the application
of the rules 1-5 once or several times.
Regular expressions

 A formal language for specifying text strings

 How can we search for any of these?
 woodchuck
 woodchucks
 Woodchuck
 Woodchucks
Regular Expressions: Disjunctions
 Letters inside square brackets []
Pattern Matches
[wW]oodchuck Woodchuck, woodchuck
[1234567890] Any digit
 Ranges [A-Z]

Pattern Matches
[A-Z] An upper case letter Drenched Blossoms
[a-z] A lower case letter my beans were impatient
[0-9] A single digit Chapter 1: Down the Rabbit Hole
Regular Expressions: Negation in Disjunction
 Negations [^Ss]
 Carat means negation only when first in []

Pattern Matches
[^A-Z] Not an upper case letter Oyfn pripetchik
[^Ss] Neither ‘S’ nor ‘s’ I have no exquisite reason”
a^b The pattern a carat b Look up a^b now
Regular Expressions: More Disjunction
 Woodchucks is another name for groundhog!
 The pipe | for disjunction

Pattern Matches
groundhog|woodchuck
yours|mine yours
mine
a|b|c = [abc]
[gG]roundhog|[Ww]oodchuck
Regular Expressions: ? * + . {}
Pattern Matches
colou?r Optional previous color colour
char
oo*h! 0 or more of oh! ooh! oooh! ooooh!
previous char
o+h! 1 or more of oh! ooh! oooh! ooooh!
previous char
baa+ baa baaa baaaa baaaaa
beg.n begin begun begun beg3n
{} Exactly the specified "ma.{2}als“
number of occurrences
#Search for a sequence that starts with “ma",
followed excactly 2 (any) characters, and an
“als"
Regular Expressions: Anchors ^ $
 The most common anchors are the caret ˆ and the dollar-sign $.
 the caret ^ has three uses: to match the start of a line, to indicate a negation inside of square
brackets, and just to mean a caret.
 The dollar sign $ matches the end of a line. So the pattern $ is a useful pattern for
matching a space at the end of a line, and /^The dog\.$/ matches a line that contains only the
phrase The dog.
 We have to use the backslash here since we want the . to mean “period” and not the
wildcard.
Pattern Matches
^[A-Z] Palo Alto
^[Â-Za-z] 1 “Hello”
\.$ The end.
.$ The end? The end!
More operators
More operators
Example
 Find me all instances of the word “the” in a text.
the
[tT]he Misses capitalized examples
[â-zA-Z][tT]he[â-zA-Z] Incorrectly returns other or theology

 Regular expressions can be used with multiple languages. Such as: Java, Python, Ruby,
Swift, Scala, Groovy, C#, PHP, Javascript
RegEx Functions(You may try using python)

findall Returns a list containing all matches

search Returns a Match object if there is a match anywhere in the string
split Returns a list where the string has been split at each match
sub Replaces one or many matches with a string
Reference Books
1. Daniel Jurafsky and James H. Martin. 2020. Speech and Language Processing.
2. 3rd Edition Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical
Natural Language Processing. MIT Press.
3. Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, Harshit Surana. 2020. Practical
Natural Language Processing. O'Reilly.
4. NPTEL NLP course.
5. [Link]
6. Coursera course - Natural Language Processing

Understanding NLP: Stages and Challenges
No ratings yet
Understanding NLP: Stages and Challenges
17 pages
Origins and Challenges of NLP
No ratings yet
Origins and Challenges of NLP
106 pages
Unit 2 Updated.pptx
No ratings yet
Unit 2 Updated.pptx
116 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
24 pages
Bag of Words in NLP Explained
No ratings yet
Bag of Words in NLP Explained
50 pages
NLP Concepts and Challenges Overview
No ratings yet
NLP Concepts and Challenges Overview
51 pages
English Morphology in NLP Overview
No ratings yet
English Morphology in NLP Overview
15 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
55 pages
N-Grams and NLP Techniques Explained
No ratings yet
N-Grams and NLP Techniques Explained
13 pages
NLP Origins, Challenges, and Models
No ratings yet
NLP Origins, Challenges, and Models
8 pages
Introduction to NLP and Python NLTK
No ratings yet
Introduction to NLP and Python NLTK
114 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
62 pages
Markov Assumption in NLP Explained
No ratings yet
Markov Assumption in NLP Explained
6 pages
Understanding Regular Expressions and FSAs
No ratings yet
Understanding Regular Expressions and FSAs
23 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
80 pages
NLP: Understanding Language Ambiguity
No ratings yet
NLP: Understanding Language Ambiguity
21 pages
Key Challenges in Natural Language Processing
No ratings yet
Key Challenges in Natural Language Processing
36 pages
Morphology and Lemmatization in NLP
No ratings yet
Morphology and Lemmatization in NLP
31 pages
Understanding Affixes: Types & Examples
No ratings yet
Understanding Affixes: Types & Examples
62 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
43 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
7 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
85 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
49 pages
NLTK: A Guide to NLP Basics
No ratings yet
NLTK: A Guide to NLP Basics
23 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
20 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
37 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
44 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
20 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
45 pages
NLP Applications and Linguistic Models
No ratings yet
NLP Applications and Linguistic Models
5 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
72 pages
Understanding NLP: Challenges and Applications
No ratings yet
Understanding NLP: Challenges and Applications
38 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
37 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
25 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
65 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
87 pages
NLP UNIT 1
No ratings yet
NLP UNIT 1
12 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
37 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
25 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
18 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
10 pages
CB3591 NLP Course Notes 2025-2026
No ratings yet
CB3591 NLP Course Notes 2025-2026
50 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
63 pages
Overview of Morphology in NLP
100% (1)
Overview of Morphology in NLP
24 pages
NLP Applications in Class 10 AI
No ratings yet
NLP Applications in Class 10 AI
36 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
37 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
7 pages
Introduction to NLP and Its Challenges
No ratings yet
Introduction to NLP and Its Challenges
101 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
90 pages
Module-1 Part-2 Contd
No ratings yet
Module-1 Part-2 Contd
13 pages
NLP Challenges and Applications Overview
No ratings yet
NLP Challenges and Applications Overview
86 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
12 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
58 pages
NLP Challenges and Language Models
No ratings yet
NLP Challenges and Language Models
9 pages
VTU Exam Question Paper With Solution of BAI601 Natural Language Processing June-2025-Novy Jacob
No ratings yet
VTU Exam Question Paper With Solution of BAI601 Natural Language Processing June-2025-Novy Jacob
70 pages
Natural Language Processing: Some Screenshots Are Taken From NLP Course by Jufrasky - Used Only For Educational Purpose
No ratings yet
Natural Language Processing: Some Screenshots Are Taken From NLP Course by Jufrasky - Used Only For Educational Purpose
44 pages
Phonology's Role in NLP
No ratings yet
Phonology's Role in NLP
25 pages
Essential English Vocabulary Guide
No ratings yet
Essential English Vocabulary Guide
4 pages
Gerunds and Infinitives Practice
No ratings yet
Gerunds and Infinitives Practice
1 page
Types and Examples of Pronouns
100% (2)
Types and Examples of Pronouns
16 pages
Understanding Infinitives and Gerunds
100% (1)
Understanding Infinitives and Gerunds
2 pages
English Assessment Tools for Year 3
No ratings yet
English Assessment Tools for Year 3
9 pages
Present Perfect Continuous Explained
No ratings yet
Present Perfect Continuous Explained
11 pages
Pashto and Hindi: Linguistic Features
0% (1)
Pashto and Hindi: Linguistic Features
5 pages
English 6 Daily Lesson Log: Week 6
No ratings yet
English 6 Daily Lesson Log: Week 6
5 pages
Understanding Descriptive Texts
No ratings yet
Understanding Descriptive Texts
4 pages
Understanding Caption Text Features
No ratings yet
Understanding Caption Text Features
23 pages
Preference for Fruits in Sweets
No ratings yet
Preference for Fruits in Sweets
4 pages
Q2 Lesson Plan: Simple Verb Tenses
No ratings yet
Q2 Lesson Plan: Simple Verb Tenses
3 pages
Extra Grammar Practice: Present Tenses
No ratings yet
Extra Grammar Practice: Present Tenses
1 page
Morphological Processing in English-Tamil SMT
No ratings yet
Morphological Processing in English-Tamil SMT
10 pages
Personality Traits and Family Dynamics
No ratings yet
Personality Traits and Family Dynamics
8 pages
English 7 Daily Lesson Log: WH-Questions
No ratings yet
English 7 Daily Lesson Log: WH-Questions
3 pages
English Pilot Test Vocabulary & Grammar
No ratings yet
English Pilot Test Vocabulary & Grammar
2 pages
English Sentence Structure and Tenses
No ratings yet
English Sentence Structure and Tenses
8 pages
Understanding Modal Verbs of Obligation
No ratings yet
Understanding Modal Verbs of Obligation
78 pages
English Test for Grade Assessment
No ratings yet
English Test for Grade Assessment
3 pages
Predictive Parsing and Error Handling
No ratings yet
Predictive Parsing and Error Handling
17 pages
Plan Klasa e 3 - Vjetor
No ratings yet
Plan Klasa e 3 - Vjetor
9 pages
Understanding "Silk" in Context
No ratings yet
Understanding "Silk" in Context
12 pages
Teaching Sentence Structure with Bikes
No ratings yet
Teaching Sentence Structure with Bikes
43 pages
Reported Speech: Tense Changes Guide
No ratings yet
Reported Speech: Tense Changes Guide
3 pages
Understanding Simple Present Tense
No ratings yet
Understanding Simple Present Tense
9 pages
Grade 6 English Term 1 Curriculum Plan
No ratings yet
Grade 6 English Term 1 Curriculum Plan
9 pages
Tenses Chart for Kids
No ratings yet
Tenses Chart for Kids
1 page
Understanding Na and I Adjectives
No ratings yet
Understanding Na and I Adjectives
9 pages
Coordination and Subordination Explained
No ratings yet
Coordination and Subordination Explained
2 pages

Syntactic AI in Natural Language Processing

Uploaded by

Syntactic AI in Natural Language Processing

Uploaded by

Speech & Natural Language Processing

 Large volume of textual data on the web and social media.

It will be in the SMCC building tomorrow from 9:00-10:00 a.m.

✗ delicate in the handling of this camera

Which free courses will help you to learn English?

ক োন ফ্রি ক োর্ সআপনোক ইংকেজি ফ্রিখকে র্োহোয্য েকে?

 Simple linear algebra (vectors, matrices)

Example Taken from K L P Mishra

Example Taken from K L P Mishra

 A formal language for specifying text strings

findall Returns a list containing all matches

You might also like