0% found this document useful (0 votes)
31 views

Computational Linguistics Information Engineering Software Engineering Artificial Intelligence

The document discusses natural language processing (NLP) and provides definitions and key aspects. NLP aims to allow computers to understand, interpret, and generate human language in a manner similar to humans. It involves techniques from computer science, artificial intelligence, and linguistics. NLP has applications in areas like information retrieval, question answering, and machine translation.

Uploaded by

Saikat Mondal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Computational Linguistics Information Engineering Software Engineering Artificial Intelligence

The document discusses natural language processing (NLP) and provides definitions and key aspects. NLP aims to allow computers to understand, interpret, and generate human language in a manner similar to humans. It involves techniques from computer science, artificial intelligence, and linguistics. NLP has applications in areas like information retrieval, question answering, and machine translation.

Uploaded by

Saikat Mondal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 97

NLP

Computational Linguistics
Information engineering
Software engineering
Artificial intelligence
•Language is one of the oldest creations of mankind for enhancing science,
business and useful research. With the advent of computer in all walks of life in
the modern world, there is a need of a technology which can make direct
interpretation of natural language with the help of computers. This requires the
machine to be intelligent.

•Natural language processing, technically called as Computational Linguistic


aims at interpretation of the natural language text by machine at a level
comparable with human. Computers are nowadays used for acquisition,
transmission, monitoring, storage, analysis, and transformation of information
Hence, endowing them with the ability to understand and generate information
expressed in natural language is a major field of research in Artificial
Intelligence
•Natural Language Processing (NLP) is the computerized approach to analyzing text that is
based on both a set of theories and a set of technologies. And, being a very active area of
research and development, there is not a single agreed-upon definition that would satisfy
everyone, but there are some aspects, which would be part of any knowledgeable person’s
definition. The definition I offer is: Definition: Natural Language Processing is a theoretically
motivated range of computational techniques for analyzing and representing naturally
occurring texts
•at one or more levels of linguistic analysis for the purpose of achieving human-like language
processing for a range of tasks or applications. Several elements of this definition can be
further detailed. Firstly the imprecise notion of
•‘Range of computational techniques’ is necessary because there are multiple methods or
techniques from which to choose to accomplish a particular type of language analysis.
‘Naturally occurring texts’ can be of any language, mode, genre, etc. The texts can be oral
or written. The only requirement is that they be in a language used by humans to
• Communicate to one another
History of NLP
• The history of NLP generally starts in the 1950s, although work can be found from
earlier periods. In 1950, Alan Turing published an article titled "Computing Machinery
and Intelligence" which proposed what is now called the Turing test as a criterion of
intelligence.
• During the 1970s many programmers began to write 'conceptual ontologies', which
structured real-world information into computer-understandable data. Examples are
MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin
(Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units
(Lehnert 1981). During this time, many chatterbots were written
including PARRY, Racter, and Jabberwacky.
• Up to the 1980s, most NLP systems were based on complex sets of hand-written rules.
Starting in the late 1980s, however, there was a revolution in NLP with the introduction
of machine learning algorithms for language processing. This was due to both the
steady increase in computational power (see Moore's Law) and the gradual lessening of
the dominance of Chomskyan theories of linguistics (e.g. transformational grammar)
• Why NEP2020 Focus so much on Interdisciplinary and Language
• Language (engineering )
• Speech to test
• Speech recognition
• NLP
• AI
• Computer science
• Electronics
• Physics
• Math's
• Philosophy (Ontology - Epistemology)
• Inability to express knowledge in Language
Artificial intelligence
•  "The study and design of intelligent agents” in which an intelligent agent is a system that
perceives its environment and takes actions that maximize its chances of success. John McCarthy,
who has given this term in 1955, defines it as "the science and engineering of making intelligent
machines".

•The central problems (or goals) of AI research include

• Reasoning

• Knowledge

• Planning

•Learning

•Natural language processing

•Perception 

• and the ability to move and manipulate objects


Other parameters and applications

• Logic • Problem solving


• Emotion • Unsupervised Learning
• Cognition • Understanding
• Knowledge representation • mathematical optimization
• Pattern matching • Constraints satisfaction
• Psychology • Gaming
• Philosophy • Robotics
• Epistemology • Genetic / Bio tech
• Ontology • Heuristics search algorithms etc.
• Economics • Disaster management
• Planning • Security
Logic : तर्क science uses in reasoning
• Propositional logic

• Predicates Logic – prolog ,lisp

• Fuzzy logics - Perception


Reasoning: decision making using logical
thinking
• Logic + reasoning = rule development , fact production leading to
knowledge base creation

• Monotonic reasoning

• Non monotonic reasoning (contradiction)


Knowledge Representation
• Building Knowledge Base
Semantic networks
Frames – procedural and declarative
Conceptual dependency
 Scripts
Philosophy : knowledge : Truth :How : What
• Epistemology :How
• Study of nature

• Ontology : metaphysics : Information science :What


• Essence of things
• Stemming
• Synonyms
• Ontology is the philosophical field revolving around (the study
of) the nature of reality (all that is or exists), and the different
entities and categories within reality. 
• Epistemology is the philosophical field revolving around (the
study of) knowledge and how to reach it. One might say that it
includes the ontology of knowledge. 
• Examples of theories within the field of ontology are: ontological
monism, pluralism, idealism, materialism, dualism, etc.
• Examples of theories within the field of epistemology are:
realism, relativism, rationalism, irrationalism, etc.
Cognition : Knowledge processing through
sense organs , environment , experiences .

• Can we implement cognition in machines ?


Heuristics
• AI problem solving algorithms

• blind search Heuristic search


• Brute force search Generate and test
• DFS Hill climbing
• BFS feedback
• A star
• Best first search
• Greedy Algorithm
• And OR search

Natural language processing
• NLP technically known as CL is applied area of AI and CS which is used for
natural language manipulation by computers in the same way as done by
humans.
• Natural language generation
How language generated and can a machine generate language using big
data , information?

• Natural language understanding

• Natural language learning


Phonology – This science helps to deal with patterns present in the sound and speeches related to
the sound as a physical entity. Word base consonants vowels

Lexical

Pragmatics – Analysis involve the study of language in its contexts of use.(context based meaning)

Morphology – This science deals with the structure of the words and the systematic relations
between them internal structure of word minimum meaningful meaning

Syntax – This science deal with the structure of the sentences.

Semantics – This science deals with the literal meaning of the words, phrases as well as sentences.

Pragmatics and Discourse Analysis involve the study of language in its contexts of


use. Pragmatics focuses on the effects of context on meaning, and Discourse Analysis studies
written and spoken language in relation to its social context.
Pre - processing in NLP
• Splitting

• Chunking

• Structurization

• Organizing
Segmentation
• Speech segmentation,

• Text segmentation,

• Topic segmentation,

• Word segmentation
Tokenization
• Words as tokens

• Sentences as tokens

• POS tokens

• Bag of words

• Domains as tokens ETC

• Others
Stop words
• from nltk.corpus import stopwords
sw = stopwords.words("english")

• ["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours",
"yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it",
"its", "itself", "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom",
"this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have",
"has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or",
"because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between",
"into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down",
"in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there",
"when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other",
"some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t",
"can", "will", "just", "don", "should", "now"]
e NLP TOOLKITS
• BORIS
• CMU Sphinx
• Corpora
• GATE-8.1
• Speaktoit
• LOLITA
• Snow ball
• Festival speech synthesis system
• CTAKES
• Maluuba
• Regulus Grammar Compiler
• Never ending language learning
• Scala NLP
• JGibLDA- v1.0
• ETAP-3
• Apache lucene core
• Stanford NLP
• Chat GPT
• L NLP TOOLKITS w.r.t. Programming Language
• DELPH-IN (CH,LISP)
• Deeplearning4 (Scala and also Java)
• Mallet (Java)
• Natural language toolkit(python)
• Lingua stream(Java)
• Distinguo(C++)
• Modular Audio Recognition Framework(Java)
• Monty Lingua(Python and also Java)
• Gensim (python based)
• TextBlob(Python)
Stemming
Ing
Ed
Ive
Ble
Ous
Ly
Ions
Ize
Ate
Ion
S
e
Lemmatization (lemma)
*WORDNET *NLTK
Both works on root word Algo. But
lemmatization
• Can change better best to GOOD as it is context driven
POS Taggers
Part of Speech (POS)
Parsing
• Compiler design VS NLP (syntax , operators vs NP VP)

• Deep Parsing: Shallow Parsing (meaning, relationship, POS)

• Chunking

• Syntax tree

• Grammatical rules

• Semantic tree ?
The lexical analysis in NLP deals with the study at the level of words with respect to
their lexical meaning and part-of-speech. ... A lexeme is a basic unit of lexical meaning; which is an
abstract unit of morphological analysis that represents the set of forms or “senses” taken by a single
morpheme
Optical character recognition
Hand writing recognizer
Michael Nilsen deep learning and
neural networks
F(x) = w. x + b > 0 (x is between o and 1 ) 𝛴 𝑤1𝑥 1

Local minima and maxima , cost function


Sentiment analysis / Opinion Mining

Emotions ?
Steps for Generalized Emotional analysis
Knowledge base VS Neural networks
Types of SA
Speech recognition
• Automated speech recognition
• STT - TTS

• 1952 At bell labs ……. Called as Voice recognizer


• Speaker recognition
• 1960  Leonard Baum developed the mathematics of Markov chains
• James Baker and Janet M. Baker began using the Hidden Markov
Model (HMM) for speech recognition
• 1980 n- grams ( uni gram, bi gram, tri gram )
Speech recognition
• Cataloging • Heuristics models
• Data base • HMM
• Voice base • ANN
• Digital to analog • Fuzzy logics
• Pattern matching algorithms • AI models
• Phonemes / Phonetics • Knowledge base
• IPA • Rule base
• Praat
• Speech analysis:
• spectral analysis (spectrograms)
• pitch analysis
• formant analysis
• intensity analysis
• jitter, shimmer, voice breaks
• cochleagram
• excitation pattern
Semantic role labeling
• Shallow semantic parsing

• Slot fitting

• Used to classify the text into meaningful slots and labels according to
there role in the text .

• Frame net – manual representation in 1970


Name Entity Recognition
• Applied area of Information retrieval

• Technically Implements Dynamic SRL


Text Summarization
• Briefing a big discourse into a small text or combination of texts
without changing its meaning , technically cutting down the original
text more than 50% .
• Essay , passage , news , information , book , research papers etc.
• Text mining can be used as pre processing ?
• Information extraction ?
• Internet Information .
• images , audio , video .
• - term filtering and word frequency is carried out (low-frequency
terms are removed) - sentences are weighted by the significant terms
they contained - sentence segmentation and extraction is performed
• TF idf
• Lexical and morphological
• Corpus
• Anaphora (1 details) cataphora
Sparck Jones I T G (1990-2000)
• interpretation of the source text to obtain a text representation,

• transformation of the text representation into a summary


representation, and,

• finally, generation of the summary text from the summary


representation
Sparck Jones distinguishes three classes of
context
• Input factors. The features of the text to be summarized crucially
determine the way a summary can be obtained. These fall into three
groups, which are: text form (e.g. document structure); subjet type
(ordinary, specialized or restricted) and unit (single or multiple documents
as input)
• Purpose factors. These are the most important factors. They fall under
three categories: situation refers to the context within the summary is to
be used; audience (i.e. summary readers) and use (what is the summary
for?).
• Output factors. In this class we can group: material (i.e. content) ; format
and style.
Mani and Marbury (1999 onwards) surface to discourse level

• Surface level
1. thematic features :- important words , key words , significant words ,tf idf corpus ,
frequency , removal of unwanted words .
2. Location of the above words i.e. are they in title or lead or in cue word category
both relevant and irrelevant
3. Background :- start , user , title etc.
• Entity level – representation, modelling, relationship
1. Similarity – same stem same context same meaning
2. Proximity – closeness
3. Co-occurrence - words or phrases can be related if they occur in common texts
4. Thesaural relationships among words
5. Coreference- The idea behind coreference is that referring
expressions can be linked
6. Logical relations such as agreement, contradiction, entailment, and
consistency
7. Syntactic relations are based on parse trees
8. Meaning representation-based relations
• Discourse level
context extraction
emotion ?
pragmatics
Word sense disambiguation
Warren Weaver 1950

A famous example is to determine the sense of pen in the


following passage (Bar-Hillel 1960):
“Little John was looking for his toy box. Finally he found it.
The box was in the pen. John was very happy.”
WordNet lists five senses for the word pen:
1.pen — a writing implement with a point from which ink flows.
2.pen — an enclosure for confining livestock.
3.playpen, pen — a portable enclosure in which babies may be left to play.
4.penitentiary, pen — a correctional institution for those convicted of major crimes.
5.pen — female swan.
Machine translation

You might also like