Natural Language Processing
Natural Language Processing
Processing
Presented by
M.Mohana
What is NLP?
• NLP stands for Natural Language Processing
• Branch of artificial intelligence (AI) that focuses on the interaction
between computers and human language
• Helps computers to understand human language and also allows
machines to communicate with us.
• Algorithms and models that enable computers to understand,
interpret, and generate human language in a meaningful way.
• For instance, Google’s keyboard suggests auto-corrects, and word
predicts in email writing (words that would be used).
• Translation systems use language modeling to work efficiently
with multiple languages.
https://blog.dataiku.com/nlp-metamorphosis
03-04-2024 M.Mohana, Research Scholar 5
Components of NLP
Natural Language Understanding (NLU) Natural Language Generation (NLG)
• NLU focuses on interpreting and extracting • NLG, on the other hand, deals with creating
meaning from human language input. human-like text or speech output based on
• It involves techniques such as text parsing, entity structured data or input from NLU systems.
recognition, sentiment analysis, and intent • NLG systems generate coherent and
detection. contextually relevant text or speech by
• NLU systems aim to comprehend the content of combining linguistic rules, templates, and
text or speech input to extract relevant sometimes machine learning models.
information and understand the user's intentions • NLG applications include text summarization,
or queries. language translation, chatbot responses, and
• Examples of NLU applications include chatbots content generation for news articles or reports.
that understand user queries, sentiment analysis
tools that analyze emotions in text, and voice
assistants that interpret spoken commands
• Normalize words into its base form or root words (Celebrate- celebrates, celebrated,
Stemming
and celebrating) (no meaning sometimes)
• Similar to Stemming but root word has meaning, used to group different inflected
Lemmatization
forms of the word, called Lemma
• “is,and, the, a” – stop words might be filtered out before doing any statistical
Identifying Stop Words
analysis
Dependency Parsing • To find that how all the words in the sentence are related to each other
• Parts of speech, indicates that how a word functions with its meaning as well as
POS tags
grammatically within the sentences.
• Process of detecting the named entity such as person name, movie name,
Name Entity Recognition
organization name or location
• Used to collect the individual piece of information and grouping them into bigger
Chunking
pieces of sentences
03-04-2024 M.Mohana, Research Scholar 8
Phases of NLP
• Depends upon the sentences that proceed with it and also invokes the
Discourse Integration
meaning of the sentences that follow it
Basic ML
Basic Deep • Text processing techniques
Learning
Algorithm
Concepts • Word Embedding
• Deep Learning Network for NLP (CNN,
LSTM, GRUs, Encoder and Decoder)
Python for • Attention mechanism, Transfer learning in
Mathematics NLP
NLP
Pre- • Transformers (BERT, GPT, ALBERT and
Requisites so on)
• Fine Tuning NLP task
• Large Language Model (LLM)
https://youtu.be/7vHquWmUriE?si=JkiXw-b6i371hyQX
Level
https://arxiv.org/abs/1810.04805
BERT
https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf
GPT-1
https://d4mucfpksywv.cloudfront.net/better-language-
GPT-2 models/language_models_are_unsupervised_multitask_learners.pdf
Research Papers https://www.kaggle.com/discussions/general/236973
https://arxiv.org/abs/2203.02155
RLHF
https://arxiv.org/abs/2307.09288
LIama2
https://blog.google/technology/ai/google-palm-2-ai-large-language-model/
Advanced PaLM2
Level https://mistral.ai/news/mixtral-of-experts/
Mistral : Mixture of Experts
https://openai.com/research/gpt-4
GPT -4
https://blog.google/technology/ai/google-gemini-ai
Gemini AI
03-04-2024 M.Mohana, Research Scholar 13
Projects we should try for a better understanding of NLP
Sentiment Analysis
Question Answering System
Named Entity Recognition
Fake News Detection
Topic Modeling
Text Similarity
Text summarization and machine translation
Next word Prediction
LLM applications using RAG
Fine-tune a model for a specific NLP task
Constructing your own LLM, inspired by models like Llame 2
Lemmatization
Numeric Token Removal Whitespace Removal Stemming
Converting words to their
Eliminating numerical Eliminating extra spaces, Reducing words to their base
canonical form based on their
tokens. tabs, or newline characters. or root form.
part of speech.
Input TextSpell Checking Text Normalization Entity Recognition and Removing HTML Tags and
and Correction Standardizing text by Masking Special Characters
Detecting and correcting converting abbreviations or Identifying and masking Eliminating HTML tags and
spelling errors. variations to their full forms. named entities. special characters.
Part-of-speech (POS) Assigns parts of speech tag to each word Named entity recognition, text classification
tagging in text
N-grams sequences of contiguous words or Captures context and word order
characters, capturing local word information, useful in language modeling,
dependencies machine translation, and text generation
Named Entity identifies and classifies named entities Information extraction, entity linking, and
Recognition (NER) (e.g., person names, organizations, improving search engine results
locations) in text
Dependency Parsing Captures syntactic dependencies, useful analyzes the grammatical structure of a
for machine translation, question sentence by identifying relationships
answering, and information retrieval between words
Syntax Tree-Based Includes subtree patterns, syntactic paths, Syntax tree-based features capture syntactic
Features and tree kernels for parsing and semantic structures and relationships in sentences
analysis
03-04-2024 M.Mohana, Research Scholar 19
https://medium.com/@eskandar.sahel/exploring-feature-extraction-techniques-for-natural-language-processing-46052ee6514
Word Embedding or Word Vector
• Represent words as dense vectors in a continuous Image Source: Google
vector space, where words with similar meanings
are closer to each other in the space.
• Numeric representations of words in a lower-
dimensional space
• Try to capture semantic and syntactic information
• Word2Vec, GloVe (Global Vectors for Word
Representation), and fastText
• Method of extracting features out of text, to feed
into ML model for work with text data
Need for Word Embedding
• To reduce dimensionality
• To use a word to predict the words around it
• Inter-word semantics must be captured
https://www.geeksforgeeks.org/word-embeddings-in-nlp/
Frequency-based
TF-IDF Vector
Embedding
Co-Occurrence
Types of Word Vector
Embedding
CBOW
Prediction-based
Embedding
Skip – Gram
model
Edit Distance
Cosine Similarity • Calculates the minimum
• Measures the cosine of the Jaccard Similarity number of operations Word Embeddings
angle between two vectors • Measures the similarity (insertions, deletions, • Used to compute similarity
representing the text. between two sets by dividing substitutions) required to between texts by averaging or
• Used with word embeddings the size of their intersection by transform one text into another. combining word vectors to
or TF-IDF vectors to compute the size of their union. • Measuring similarity between represent the entire text.
similarity. texts with similar structures but
potentially different words.
Rule-Based Tagging
• This approach uses handcrafted rules based on linguistic knowledge to assign
POS tags to words.
• For example, a rule might specify that words ending in "-ing" are typically
gerunds (verbs).
Probabilistic Tagging
• This approach uses statistical models to assign POS tags based on the
probability of a word occurring with a particular tag.
• Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs) are
Several approaches commonly used probabilistic models for POS tagging.
to POS tagging Deep Learning Tagging
• With the rise of deep learning, neural network-based approaches have become
popular for POS tagging.
• Models like Bidirectional LSTMs (Long Short-Term Memory networks) or
Transformers can learn complex patterns and dependencies in text to predict
POS tags.
Hybrid Approaches
• Some systems combine rule-based and probabilistic or deep-learning
techniques to improve tagging accuracy.
03-04-2024 27
Challenges in Part-of-Speech Tagging
03-04-2024
Fall into three broad categories:
• Rule-based approaches – a set of rules for the grammar
of a language
• Machine learning approaches - machine learning model
on a labeled dataset using algorithms like conditional
random fields and maximum entropy
• Hybrid approaches – a rule-based system to quickly
identify easy-to-recognize entities and a machine
learning system to identify more complex entities
31
Image Source: Google
Key steps and techniques involved in Named Entity Extraction
Tokenization and Part-of-Speech (POS) Tagging:
1. Input text is tokenized into words or tokens, and each token is assigned a part-of-speech tag (e.g., noun, verb, adjective) using
POS tagging techniques.
Named Entity Recognition:
1. NER algorithms then identify and label tokens that correspond to named entities in the text. Commonly used techniques for
NER include:
1. Rule-based approaches: Using handcrafted rules and patterns to match and classify named entities based on linguistic
features (e.g., capitalization, word position, context).
2. Machine learning models: Training supervised learning models such as Conditional Random Fields (CRFs), Support
Vector Machines (SVMs), or deep learning models like Bidirectional LSTMs (Bi-LSTMs) or Transformers (e.g., BERT,
RoBERTa) on annotated NER datasets to predict named entity labels for tokens.
Post-processing and Entity Classification:
1. After identifying named entities, post-processing steps may involve resolving entity boundaries, handling overlapping entities,
and classifying entities into specific categories (e.g., person, organization, location).
Evaluation and Validation:
1. NER systems are evaluated using metrics such as precision, recall, and F1-score, comparing the model's predictions against
manually annotated ground truth data.
Named Entity Linking (NEL):
1. In some cases, NER is followed by Named Entity Linking, where identified named entities are linked to corresponding entries
in knowledge bases or ontologies to enrich their semantic representation.
03-04-2024 M.Mohana, Research Scholar 32
• Refers to the measure of how similar two pieces of text are
Semantic similarity in terms of their meaning or semantics.
• Measures how close or how different the two pieces of
word or text are in terms of their meaning and context.
• NLP applications such as information retrieval, question
answering, text summarization, and recommendation
systems.
What is There are certain approaches for measuring semantic similarity in natural
language processing
Semantic • Word Embedding - (skip-gram, cbow), GloVe, and Fasttext
• Word2Vec – (Continuous Bag of Words, Skip-gram)
Similarity? • Dov2Vec – An extension of word2vec
• SBERT – Transformer-based model in which the encoder part captures the
meaning of words in a sentence.
• InferSent -It uses bi-directional LSTM to encode sentences and infer
semantics.
• USE (universal sentence encoder) – It’s a model trained by Google that
generates fixed-size embeddings for sentences that can be used for any
NLP task.
https://www.geeksforgeeks.org/different-techniques-for-sentence-semantic-similarity-in-nlp/
03-04-2024 34
Types of Semantic Similarity
• To determine the semantic similarity between concepts
Knowledge-Based Similarity • Represents each concept by a node in an ontology graph, also called the
topological method because the graph is used as a representation for the
corpus concepts.
• Calculates the semantic similarity based on learning features’ vectors from the
corpus.
Statistical-Based Similarity
• count or TF-IDF in LSA, weights of Wikipedia concepts in ESA, synonyms in
PMI, and co-occurring words of a set of predefined words in HAL.
Topic Modeling: Techniques like Latent Semantic STS Benchmarks: Datasets like the STS Benchmark
Analysis (LSA) or Latent Dirichlet Allocation (LDA) can provide pairs of text with human-annotated similarity
be used to model topics in documents and compute scores, allowing the evaluation of semantic similarity
similarity based on topic distributions. models.
Doc2Vec: An extension of Word2Vec that learns document Embedding-Based Approaches: Utilize pre-trained word
embeddings, enabling similarity computation at the or sentence embeddings to compute semantic similarity
document level. between texts, often fine-tuned on STS datasets for
Graph-Based Models: Represent documents as nodes in a improved performance.
graph and compute similarity based on graph-based
algorithms like Personalized PageRank or graph neural
networks.
03-04-2024 M.Mohana, Research Scholar 37
Common techniques and methods used for measuring semantic similarity
Resume Screening
Content Recommendation
Fraud Detection Resume screening by
Recommending relevant
Classify financial texts to categorizing job
articles, blogs, or products
detect fraudulent activities applications based on skills,
to users based on their
and identify potential risks. experience, and
interests.
qualifications.
https://www.analyticsvidhya.com/blog/2020/12/understanding-text-classification-in-nlp-with-movie-review-example-example/
4/3/2024 M.Mohana, Research Scholar 41
Steps Involved in Text Classification
Pre-processing Applications
(Tokenizing the text, (Spam detection, topic
removing stop words, Training and Evaluation categorization, language
performing stemming or identification, document
lemmatization, and handling classification, and content
any noise) recommendation systems)
Market
Research How does it work?
Track • Pre-processing
Brand
Monitoring
campaign (Tokenization,
performance Lemmatization, Stop-word
removal)
• Keyword Analysis
(extracted keywords and
Improve give them a sentiment
Customer Use cases Social media score)
Service
Build better
Analyze at
products and
Neutral Statement and scale
services
Challenges
Emoji form
Provide
Sarcasm Multipolarity Real-time
Negation objective
results
(Yeah, great. It (I'm happy with insights
(I wouldn't say Why it is
took three weeks the sturdy build
the subscription important?
for my order to but not impressed
was expensive)
arrive.) with the color)
https://aws.amazon.com/what-is/sentiment-analysis/#:~:text=Sentiment%20analysis%20is%20an%20application,before%20providing%20the%20final%20result.
03-04-2024 M.Mohana, Research Scholar 45
Sentiment Analysis Vs Semantic Analysis
Sentiment Analysis
Approaches in Sentimental Analysis
• focuses on determining the emotional tone expressed in a piece
of text.
• classify the sentiment as positive, negative, or neutral,
especially valuable in understanding customer opinions, Rule-based
reviews, and social media comments. (Lexicon-Based, VADER)
• used to identify the prevailing sentiment and gauge public or
individual reactions to products, services, or events.
Semantic Analysis
Machine Learning
• Aims to comprehend the meaning and context of the text.
• Understand the relationships between words, phrases, and
concepts in a given piece of content.
• Semantic analysis considers the underlying meaning, intent, Neural Network
and the way different elements in a sentence relate to each
other.
• Crucial for tasks such as question answering, language
translation, and content summarization, where a deeper Hybrid Approach
understanding of context and semantics is required.
Text Preprocessing
Improved User
Interaction Personalized recommendations
Text Generation
Personal coaching
https://doubletick.io/blog/nlp-chatbots
Converting data or information Intent classification, entity Encompasses both NLU and
into natural language text, recognition, sentiment NLG capabilities to interpret
generating summaries, analysis, context user queries or commands,
creating stories, composing understanding, and discourse extract meaning, generate
emails analysis appropriate responses
Text preprocessing,
tokenization, part-of-speech Template-based generation,
tagging, named entity chatbots, virtual assistants,
Rule-based generation, chatbots, voice assistants,
recognition, syntactic analysis voice recognition systems,
statistical approaches, and search engines, and smart
information retrieval, and
neural network-based home devices
sentiment analysis
generation
1 2 3 4
Need to decode the There is a need to Require an in-depth Need to re-encode this meaning
meaning of the source interpret and analyze all knowledge of in the target language, which
text in its entirety. the features of the text the grammar, semantics, also needs the same in-depth
available in the corpus. syntax, idioms, etc. of knowledge as the source
the source language for language to replicate the
this process. meaning in the target language.
https://www.scaler.com/topics/nlp/machine-translation-in-nlp/
Ambiguity: Many words and phrases have multiple meanings or interpretations, leading to
translation ambiguity.
Idioms and Cultural Nuances: Idiomatic expressions, cultural references, and context-
specific language nuances can be challenging for machine translation systems.
Rare Languages: Limited availability of training data and resources for less common
languages can hinder accurate translations.
https://deepgram.com/ai-glossary/text-to-speech-models
Text Input: Process begins with a piece of written text as input, text can be in various languages and
formats, such as plain text, web pages, or documents.
Text Analysis : Text undergoes linguistic analysis to understand its structure, including sentence
segmentation, part-of-speech tagging, and syntactic parsing.
Analysis helps in generating more natural-sounding speech
Speech Synthesis: Synthesized speech generation involves converting the analyzed text into spoken
words.
The step includes prosody (intonation, rhythm, and stress) modeling to mimic human-like speech patterns.
Audio Output : Final output is an audio file or real-time speech output that can be played through
speakers, headphones, or integrated into applications.
The analog-to-digital-converter takes sounds from an audio file, measures the waves in detail, and filters them to
distinguish the relevant sounds.
The sounds are then segmented into hundredths or thousandths of seconds and are then matched to phonemes.
A phoneme is a unit of sound that distinguishes one word from another in any given language. For example, there are
approximately 40 phonemes in the English language.
The phonemes are then run through a network via a mathematical model that compares them to well-known
sentences, words, and phrases.
The text is then presented as text or a computer-based demand based on the audio’s most likely version.
4/3/2024 71
Attention Mechanism
• Technique used in neural network models to focus on
relevant parts of the input sequence when making
predictions or generating output sequences
• developed to increase the performance of encoder
decoder(seq2seq) RNN model.
• Solution to the limitation of the Encoder-Decoder
model which encodes the input sequence to one fixed
length vector from which to decode the output at
Image Source: Google each time step.
• Allows the model to "pay attention" to certain parts
Before learning about of the data and to give them more weight when
Attention Mechanism, we making predictions.
should learn about how
RNN, LSTM, and GRU are
• preserve the context of every word in a sentence by
working
assigning an attention weight relative to all other
words.
• even if the sentence is large, the model can preserve
the contextual importance of each word.
03-04-2024 M.Mohana, Research Scholar 72
https://medium.com/analytics-vidhya/https-medium-com-understanding-attention-mechanism-natural-language-processing-9744ab6aed6a
How attention mechanisms work in NLP
Contextual Relevance
• In NLP tasks, especially those dealing with sequences like sentences or documents, different parts of the input sequence may contribute
differently to the output. For example, in machine translation, certain words in the input sentence may have more influence on the translation
of specific words in the output sentence.
Attention Weights
• Attention mechanisms assign weights to each element in the input sequence, indicating how relevant that element is to the current step in
processing. These weights are often computed using neural network layers, such as softmax layers, based on the similarity between the
current state of the model and each element in the input sequence.
Attention Scores
• To calculate the attention weights, attention mechanisms typically compute attention scores. These scores measure the similarity or relevance
between the current state of the model (e.g., the decoder state in machine translation) and each element in the input sequence. Common
methods for computing attention scores include dot product attention, additive attention, and multiplicative attention.
Soft Attention
• After computing the attention scores, a softmax function is often applied to convert the scores into attention weights. Soft attention allows
the model to consider the entire input sequence but with varying degrees of importance for each element.
Context Vector
• Once the attention weights are determined, a context vector is computed as a weighted sum of the input elements, where the weights are
given by the attention weights. This context vector represents the focused information from the input sequence that is relevant to the current
step in the model's processing.
Types of Attention
processing. This selection process is often based on learned probabilities or Multi-head
heuristics.
• Structured Attention allows the attention weights to be learned using a Structured
structured prediction model, such as a conditional random field.
• Scaled dot-product Attention, a variant of dot-product attention that scales the Soft
dot product by the square root of the key dimension.
• Dot-product Attention computes the attention weights as the dot product of the Hard
query and key vectors.
• Local attention restricts the attention mechanism to a specific region or
window of the input sequence. Local
• Global attention considers the entire input sequence when computing attention
weights. Global
Transformer-based T5
models and XLNet DistilBERT (Text-to-Text Transfer
Transformer)
architectures
BART ELECTRA
(BART is a denoising (Efficiently Learning an BERT-based
autoencoder for Encoder that Classifies Multilingual Models
pretraining sequence-to- Token Replacements
sequence models) Accurately)
Contextual Representations
Attention Mechanisms Fine-tuning
LLMs learn to generate contextual
representations of words and phrases based LLMs leverage attention mechanisms to After pre-training, LLMs can be fine-tuned
on their surrounding context. weigh the importance of different words or on specific tasks or domains by further
tokens in a sequence. training on task-specific data.
Model captures these contextual variations.
• Word clouds display words from a text corpus, with font size indicating word
frequency.
• Bar charts display word frequencies or other textual data categories.
• Histograms show the distribution of word lengths, sentence lengths, or other text-
related metrics.
• Scatter plots can represent relationships between words or text features.
Visualization • Tree maps show hierarchical data structures, such as folder structures or topic
hierarchies in text.
in NLP • Heatmaps represent text-related metrics using color gradients.
(Cont.) • Network graphs depict relationships between entities in text, such as co-occurrence
networks or social networks based on mentions.
• Visualizing topics and their word distributions using bar charts, word clouds, or
interactive topic models.
• Time series plots show changes in text-related metrics over time, such as word
frequencies in social media posts or news articles.
• Mapping text data geographically, such as sentiment analysis results across regions or
locations mentioned in text.
• Adding annotations, labels, or tooltips to visualizations to provide context and
insights about specific text elements.
03-04-2024 85
Explainable Artificial Intelligence (XAI) techniques in NLP
Attention Mechanisms: attention maps show which words or phrases contribute most to the model's output, providing
insights into the model's decision-making process.
Feature Importance: identify the most important features or words in a text that influence the model's predictions (feature
permutation importance or SHAP (SHapley Additive exPlanations).
LIME (Local Interpretable Model-agnostic Explanations): used to explain why a specific text input led to a particular
prediction from a machine learning model.
ELI5 (Explain Like I'm 5): explain the predictions of models such as text classifiers by highlighting important words or
phrases in the input text that contribute to the predicted class.
Integrated Gradients: assigns importance scores to each word or feature in the input text based on how they contribute to
the model's output
Saliency Maps: show which words or tokens have the highest impact on the model's decision, aiding in interpretability.
Rule-based Explanations: creates human-readable rules that describe how the model makes predictions based on specific
linguistic patterns or features in the text.