Computational Linguistics Information Engineering Software Engineering Artificial Intelligence
Computational Linguistics Information Engineering Software Engineering Artificial Intelligence
Computational Linguistics
Information engineering
Software engineering
Artificial intelligence
•Language is one of the oldest creations of mankind for enhancing science,
business and useful research. With the advent of computer in all walks of life in
the modern world, there is a need of a technology which can make direct
interpretation of natural language with the help of computers. This requires the
machine to be intelligent.
• Reasoning
• Knowledge
• Planning
•Learning
•Perception
• Monotonic reasoning
Lexical
Pragmatics – Analysis involve the study of language in its contexts of use.(context based meaning)
Morphology – This science deals with the structure of the words and the systematic relations
between them internal structure of word minimum meaningful meaning
Semantics – This science deals with the literal meaning of the words, phrases as well as sentences.
• Chunking
• Structurization
• Organizing
Segmentation
• Speech segmentation,
• Text segmentation,
• Topic segmentation,
• Word segmentation
Tokenization
• Words as tokens
• Sentences as tokens
• POS tokens
• Bag of words
• Others
Stop words
• from nltk.corpus import stopwords
sw = stopwords.words("english")
• ["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours",
"yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it",
"its", "itself", "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom",
"this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have",
"has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or",
"because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between",
"into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down",
"in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there",
"when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other",
"some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t",
"can", "will", "just", "don", "should", "now"]
e NLP TOOLKITS
• BORIS
• CMU Sphinx
• Corpora
• GATE-8.1
• Speaktoit
• LOLITA
• Snow ball
• Festival speech synthesis system
• CTAKES
• Maluuba
• Regulus Grammar Compiler
• Never ending language learning
• Scala NLP
• JGibLDA- v1.0
• ETAP-3
• Apache lucene core
• Stanford NLP
• Chat GPT
• L NLP TOOLKITS w.r.t. Programming Language
• DELPH-IN (CH,LISP)
• Deeplearning4 (Scala and also Java)
• Mallet (Java)
• Natural language toolkit(python)
• Lingua stream(Java)
• Distinguo(C++)
• Modular Audio Recognition Framework(Java)
• Monty Lingua(Python and also Java)
• Gensim (python based)
• TextBlob(Python)
Stemming
Ing
Ed
Ive
Ble
Ous
Ly
Ions
Ize
Ate
Ion
S
e
Lemmatization (lemma)
*WORDNET *NLTK
Both works on root word Algo. But
lemmatization
• Can change better best to GOOD as it is context driven
POS Taggers
Part of Speech (POS)
Parsing
• Compiler design VS NLP (syntax , operators vs NP VP)
• Chunking
• Syntax tree
• Grammatical rules
• Semantic tree ?
The lexical analysis in NLP deals with the study at the level of words with respect to
their lexical meaning and part-of-speech. ... A lexeme is a basic unit of lexical meaning; which is an
abstract unit of morphological analysis that represents the set of forms or “senses” taken by a single
morpheme
Optical character recognition
Hand writing recognizer
Michael Nilsen deep learning and
neural networks
F(x) = w. x + b > 0 (x is between o and 1 ) 𝛴 𝑤1𝑥 1
Emotions ?
Steps for Generalized Emotional analysis
Knowledge base VS Neural networks
Types of SA
Speech recognition
• Automated speech recognition
• STT - TTS
• Slot fitting
• Used to classify the text into meaningful slots and labels according to
there role in the text .
• Surface level
1. thematic features :- important words , key words , significant words ,tf idf corpus ,
frequency , removal of unwanted words .
2. Location of the above words i.e. are they in title or lead or in cue word category
both relevant and irrelevant
3. Background :- start , user , title etc.
• Entity level – representation, modelling, relationship
1. Similarity – same stem same context same meaning
2. Proximity – closeness
3. Co-occurrence - words or phrases can be related if they occur in common texts
4. Thesaural relationships among words
5. Coreference- The idea behind coreference is that referring
expressions can be linked
6. Logical relations such as agreement, contradiction, entailment, and
consistency
7. Syntactic relations are based on parse trees
8. Meaning representation-based relations
• Discourse level
context extraction
emotion ?
pragmatics
Word sense disambiguation
Warren Weaver 1950