Lec-1 Introduction
Lec-1 Introduction
https://en.wikipedia.org/wiki/NLP
• Course Instructor: Tanmoy Chakraborty (tanmoychak.com)
(NLP, Social Media, Graph Neural Networks)
[email protected]
• Guest Lecture: TBD
• Course page: https://sites.google.com/view/ell881-iitd/home
• Piazza: https://piazza.com/iitd.ac.in/spring2023/ell881
• TAs:
• Kshitij Alwadhi ([email protected])
• Gurusha Juneja ([email protected])
• Group Email: TBD
Useful resources/tools/libraries
• Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze
• Natural Language Processing, Jacob Eisenstein
https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf
• A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
http://u.cs.biu.ac.il/~yogo/nnlp.pdf
• Journals
• Computational Linguistics, Natural Language Engineering, TACL, KBS, ACM TALLIP, ....
• Conferences
• ACL, EMNLP, NAACL, COLING, AAAI, IJCNLP, ICML, NIPS, WWW, KDD, SIGIR, ….
Research papers repository
https://aclanthology.org/
11
Research papers repository
https://arxiv.org/list/cs.CL/recent
12
Prerequisite
Mandatory Desirable
• Data Structures & Algorithm Deep learning
• Machine Learning
• Python programming
• Strongly recommended to learn ML. This class will not cover fundamentals of ML.
• Instructor/TAs may cover DL-related prerequisites
Course Directives
HashLearn
• Class Time: Mon & Thu, 2 pm – 3:30 pm • Meet your instructor at least once
per 15 days to resolve your doubts.
• Office Hour: Mon 5-6 pm • Mon 5-5:30 pm (appointment
based, email me at least 1 hr before
• Room: LH-519 coming)
* You are welcome to propose a new idea if you find it fascinating to be qualified for a mini project. Instructor opines!
List of Projects
• TBD
Content (Tentative)
• Introduction
• Classical NLP • Regular Expressions, Text Normalization, and Edit Distance
• Morphology & Finite-state Transducers
1980-2010
• In-context learning
date
• Setup
• Two rooms, two humans, and a computer.
• Room 1: One human C
• Room 2: One computer (A) and one human (B)
https://en.wikipedia.org/wiki/History_of_natural_language_processing
Why NLP is challenging?
Ambiguity
The real reason why NLP is hard
“Rohit Sharma was on fire last night. He totally destroyed the other teams”
Ambiguity
Duck or Rabbit?
shadakhtar:nlp:iiitd:2022:intro
Who has the
telescope?
Ambiguity in language
No
ambiguity!
shadakhtar:nlp:iiitd:2022:intro
Ambiguity in language
OR
shadakhtar:nlp:iiitd:2022:intro
Who’ll gift
whom?
Ambiguity in language
shadakhtar:nlp:iiitd:2022:intro
Ambiguity in language
OR
Public Public
demand: demand:
shadakhtar:nlp:iiitd:2022:intro
Ambiguity in language
IN OUT
Baby
changing
room
shadakhtar:nlp:iiitd:2022:intro
Ambiguity in language
shadakhtar:nlp:iiitd:2022:intro
Ambiguity and Punctuations!
shadakhtar:nlp:iiitd:2022:intro
Ambiguity makes NLP hard
Surface form has multiple interpretations
• Syntactic Ambiguity
• Violinist Linked to JAL Crash Blossoms => main verb?
Buffalo buffalo, whom other Buffalo buffalo buffalo, buffalo Buffalo buffalo
The sentence uses a restrictive clause, so there are no commas, nor is there the word "which," as in, "Buffalo buffalo, which Buffalo buffalo buffalo, buffalo
Buffalo buffalo." This clause is also a reduced relative clause, so the word that, which could appear between the second and third words of the sentence, is
omitted.
shadakhtar:nlp:iiitd:2022:intro Dmitri Borgmann's Beyond Language: Adventures in Word and Thought. 1967.
Why else is natural language
understanding difficult?
non-standard English segmentation issues Idioms/Multiword
Great job @justinbieber! Were dark horse
SOO PROUD of what youve the New York-New Haven Railroad get cold feet
accomplished! U taught us 2 the New York-New Haven Railroad lose face
#neversaynever & you yourself throw in the towel
should never give up either♥ Khana-wana (Echo)
shadakhtar:nlp:iiitd:2022:intro
NLP trinity
DL
shadakhtar:nlp:iiitd:2022:intro
Word and Token
● Word:
○ Smallest sequence of phonemes of a spoken language that can be uttered in isolation
● Word Segmentation/Tokenization:
○ Breaking a string of characters into a sequence of words.
○ Smallest sequence of graphemes that are delimited with some predefined characters (space,
comma, full-stop, etc.);
Ram, Shyam, and Mohan are playing. ⇒ [Ram] [,] [Shyam] [,] [and] [Mohan] [are] [playing] [.]
21,53,010 COVID cases in India. ⇒ [21] [,] [53] [,] [010] [COVID] [cases] [in] [India] [.]
Check this out…https://www.abc.com ⇒ [Check] [this] [out] [.] [.] [.] [https] [:] [/] [/] [www] [.] [abc] [.] [com]
shadakhtar:nlp:iiitd:2022:intro
Parts-of-Speech (POS) Tags
PRP: Personal Pronoun
VBD: Verb, Past
DT: Determiner
● Grammatical class of the word. NN: Noun, Singular, Mass
TO: to
IN: Preposition
He ate an apple .
PRP VBD DT NN .
● PoS disambiguation
○ A word can belong to different grammatical classes.
PRP VBD TO DT NN IN DT NN .
PRP VBD TO VB DT NN IN DT NN .
shadakhtar:nlp:iiitd:2022:intro
Chunking
○ Mumbai green lights women icons on traffic signals earns global praise. ⇒
[NP Mumbai green lights women icons] [PP on] [NP traffic signals] [VP earns] [NP global praise]
shadakhtar:nlp:iiitd:2022:intro
Syntax Processing
S
● Validate the grammatical structure of the sentence.
● Let, vocabulary = [the, mango, he, eats, ...]
○ He eats a mango. ⇒ ✅
○ He mango eats a. ⇒ ❌ NP VP .
He eats a mango
Parse Tree
shadakhtar:nlp:iiitd:2022:intro
Syntax Processing
S
● Every language has a grammar G = <V, T, P, S>.
PRP DT NN
He eats a mango
shadakhtar:nlp:iiitd:2022:intro
Syntactic Ambiguity
S
S
NP VP .
NP VP .
VBZ NP
VBZ NP PP
PRP DT NN PP
PRP DT NN IN NP
IN NP
DT NN DT NN
telesco telesco
I saw a girl with a I saw a girl with a
pe pe
shadakhtar:nlp:iiitd:2022:intro
Semantic Role Labelling (SRL)
● Identify the semantic role of each argument (noun phrase) w.r.t. the predicate (main
verb) of the sentence
shadakhtar:nlp:iiitd:2022:intro
Textual Entailment
● Determine whether one natural language sentence entails (implies) another under an
ordinary interpretation
(Ram hit Shyam with a hockey stick yesterday. → Shyam got hurt) ⇒ Positive TE
(Ram hit Shyam with a hockey stick yesterday. → Shyam did not get hurt) ⇒ Negative TE
(Ram hit Shyam with a hockey stick yesterday. → Shyam got hospitalized) ⇒ non TE
shadakhtar:nlp:iiitd:2022:intro
Pragmatics
○ Intention:
■ Utterance: Can you pass the water bottle?
■ Literal meaning: Are you able to pass the water bottle? (Response: Yes, I can.)
■ Pragmatic meaning: Pass me the water bottle. (Response: Handover the water bottle)
shadakhtar:nlp:iiitd:2022:intro
Discourse
Mother said to John: Go to school. It is open today. Are you planning to bunk? Father
will be very angry.
shadakhtar:nlp:iiitd:2022:intro
Coreference Resolution
● Two referring expressions used to refer to the same entity are said to corefer.
● Determine which phrases in a document corefer.
John shows Bob his Toyota yesterday. It’s similar to the one I bought five years ago.
That was really nice, but he like this one even better.
shadakhtar:nlp:iiitd:2022:intro
Information Extraction
● Relation extraction:
○ Relation among entities
■ CEO(Sundar Pichai, Google), CEO(Sundar Pichai, Alphabet), Born-at(Sundar Pichai,
India), ParentOrg(Alphabet, Google)
shadakhtar:nlp:iiitd:2022:intro
Word Sense Disambiguation (WSD)
shadakhtar:nlp:iiitd:2022:intro
Sentiment Analysis
○ It's a mass Chinese product. Too expensive. Thin and useless ⇒ Negative
○ My neighbours are home and it’s good to wake up at 3am in the morning. ⇒ Negative?
shadakhtar:nlp:iiitd:2022:intro
Machine Translation
● Given a sentence in the source language L1, convert it to the target language L2, such that the semantic (adequacy and fluency)
is preserved.
shadakhtar:nlp:iiitd:2022:intro
Summarization
● Given a document, summarize the semantics (extract relevant information) in shorter length text.
● Document
○ Sen. Barack Obama sealed the Democratic presidential nomination last night after a grueling
and history-making campaign against Sen. Hillary Rodham Clinton that will make him the first
African American to head a major-party ticket.
● Summary
○ Barack Obama is the Democratic presidential candidate.
shadakhtar:nlp:iiitd:2022:intro
Question Answering
● Factoid Questions
○ Question: Who is the author of the book Wings of Fire?
○ Answer: A. P. J, Abdul Kalam
● List Questions
○ Question: What are the islands in India?
○ Answer: Andaman Island, Nicobar Island, Labyrinth Island, Barren Island
● Descriptive Questions
○ Question: What is Greenhouse effect?
○ Answer: The analogy used to describe the ability of gases in the atmosphere to absorb
heat from the earth’s surface.
shadakhtar:nlp:iiitd:2022:intro
Dialog System and Chatbot
shadakhtar:nlp:iiitd:2022:intro
Hate Speech
• Any post that targets a specific individual/group of people based on their ethnicity, religious beliefs,
geographical belonging, race, etc., with malicious intentions of disseminating hate or emboldening
violence.
• #BuildThatWall #BuildTheDamnWall I’m sorry my Lord #Jesus but people are just deaf down
here
• Related terms
shadakhtar:nlp:iiitd:2022:intro
Fake News
shadakhtar:nlp:iiitd:2022:intro
Language Technology
shadakhtar:nlp:iiitd:2022:intro
Why Study NLP?
• To get a job in industry
• e.g., many current job listings are CL jobs
• Google Inc.
• Amazon Inc.
• Facebook Inc.
• Flipkart Inc., etc.