1.Chapter1 Introduction Chapter2 LanguageCharacteristics
1.Chapter1 Introduction Chapter2 LanguageCharacteristics
AC3110E
1
Chapter 1: Introduction
3
Natural language processing (NLP)
Freepik4
Examples of NLP Applications
src: Internet
5
NLP applications
Domains:
• Banking & Financial Services Industry
• IT and ITeS
• Retail and eCommerce
• Healthcare and Life Sciences
• Transportation & Logistics
• Government & Public Sector
• Media & Entertainment
• Manufacturing, Education, Automotive, etc...
https://www.statista.com/statistics/607891/worldwide-natural-language-processing-market-revenues/
6
NLP Components
7
Technologies in NLP
• Speech processing
• Speech synthesis •Text processing
• Speech recognition •Language modeling
• Speaker recognition •Text Summarization
• Speaker verification •Text classification
• Speech encoding •Text Retrieval
• etc. •Text Data Mining
•Question Answering
•Report Generation
• Dictionaries •Translation Technologies
• Morphological and •Dialogue systems
syntactic grammars •etc.
• Rules for semantic
interpretation
• Pronunciation
• Intonation
THIS COURSE !
Ambiguity !
• Phonetic ambiguity
• Lexical ambiguity
• “board” is noun or verb?
• Syntax Level ambiguity
• “He lifted the beetle with red cap.” − Did he use cap to lift the beetle or he lifted a
beetle that had red cap?
10
Why is NLP Hard?
11
NLP Terminology
12
Levels of analysis in natural language processing
Pragmatic analysis
Discourse
integration
Semantic analysis
Output: run
Syntactic analysis “lpr /ali/stuff.init”
Lexical/Morphology
analysis
13
Levels of analysis in natural language processing
14
Approaches of NLP
15
Objectives of the course
16
Chapter 2:
Language Characteristics
2.1 Languages and Dialects
18
2.1 Languages and Dialects
19
2.1 Languages and Dialects
• Languages
• Different languages are not mutually intelligible
• Need to be explicitly learned
• Dialects
• regional variant of a language
• involves modifications at the lexical and grammatical levels
• Dialects of the same language are assumed to be mutually intelligible
• Accent
• regional variant affecting only the pronunciation
20
2.2 Linguistic Description and Classification
https://www.theguardian.com/education/gallery/2015/jan/23/a-language-family-tree-in-pictures
21
Vietnamese language
https://en.wikipedia.org/wiki/Austroasiatic_languages 22
Vietnamese language
• 54 ethnic groups
• Vietnamese (Kinh ethnic group): 86%
• 5 ethnic groups < 1000 people
R/MapPorn
23
2.2 Linguistic Description and Classification
24
2.2 Linguistic Description and Classification
25
2.2 Linguistic Description and Classification
• ...
26
2.2 Linguistic Description and Classification
• Sentence structures
• The relative ordering of subject (S), verb (V), and object (O).
• The six resulting possible word orders—SOV, SVO, VSO, VOS, OVS, and OSV
• Most common: SOV, SVO
• not limited to just one of these types but allow several different word orders
• Phrasal structure
• modifiers (adjectives or relative clauses) typically precede or follow the head words
they modify
27
2.3 Writing Systems
• Orthography
https://en.wikipedia.org/wiki/List_of_writing_systems 29
2.3 Writing Systems
https://wals.info/chapter/141 30
2.3 Writing Systems
https://wals.info/chapter/141
https://www.britannica.com 31
2.3 Writing Systems
33
Example
Vietnamese Japanese French
Language family Austro-Asiatic, Mon-Khmer Japonic Indo-European, Romance
branch Agglutinative language language branch
Isolating language Inflected language
Word formation Syllables are separated by a Words consist of multiple linear Word has lot of forms to
white space morphemes express grammatical
Word is created by one, two or No space between words category such as tense,
more syllables number, gender etc.
Sentence structure subject–verb–object subject–object–verb subject–verb–object
Word order makes Verbs are conjugated primarily for Verbs and nouns are
grammatical relationship tense and passive voice conjugated following
Nouns have no grammatical gender and number of the
number or gender pronoun
Phrasal structure modified then modifier modifier then modified modified then modifier
lot of irregular cases
Example + tôi ăn một cái 日本の家電製品は世界で有名で J’ai un produit naturel
bánh to す。 (I have a natural product)
i eat one CLASSIFIER (Japanese electronic product is
cake big famous worldwide) Cette peinture a beaucoup
(I eat a big cake) des couleurs naturelles
+ tôi đang ăn (I’m eating) 日本 (japan) の (of) 家電 (This picture has a lot of
+ tôi đã ăn (I ate) (electronic) 製品 (product) は natural colors)
+ tôi sẽ ăn (I will eat) (the) 世界で (worldwide) 有名
+ đi ăn (go for eating) vs. (famous) です(is)
ăn đi (let’s eat)
34
• End of chapter 2
35