0% found this document useful (0 votes)

420 views20 pages

Unsmoothed N-grams in NLP Analysis

The document is a course material for M.Sc in Computer Science focusing on Natural Language Processing (NLP), specifically on unsmoothed N-grams and their applications. It discusses key concepts, advantages, limitations, and smoothing techniques for N-grams, as well as word classes and their importance in NLP. Additionally, it covers practical considerations for evaluating N-grams and techniques like interpolation and backoff for handling data sparsity.

Uploaded by

Boomika G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

420 views20 pages

Unsmoothed N-grams in NLP Analysis

Uploaded by

Boomika G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Effective Date Page 1 of

Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Course Material

Department of
Department Programme [Link](CS)
Computer Science and Applications

Course Title Natural Language Processing Course Code 24P2CSDE05

Semester &
Class & Section I [Link](CS) II Sem, 2024-25
Academic Year

Handling Staff [Link] Designation Associate Professor

Staff Incharge HoD Principal

VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 2 of
Effective Date Page No.
20

Unit-II
Word Level Analysis : Unsmoothed N-grams
In Natural Language Processing (NLP), unsmoothed N-grams refer to sequences of nnn words
analyzed without applying any smoothing techniques to handle zero probabilities for unseen word
sequences. This raw approach can provide insights into the exact frequency and co-occurrence of words in a
text corpus, making it foundational for many tasks in NLP.
Key Concepts in Unsmoothed N-grams
1. N-grams:
o An N-gram is a contiguous sequence of nnn words from a given text.
 Unigram: n=1n = 1n=1 (single words).
 Bigram: n=2n = 2n=2 (word pairs).
 Trigram: n=3n = 3n=3 (three-word sequences).
 And so on.
2. Unsmoothing:
o In unsmoothed N-grams, the probabilities of word sequences are calculated directly from
their observed frequencies in the corpus:
P(w1,w2,...,wn)=Count(w1,w2,...,wn)Total Count of N-grams in the CorpusP(w_1, w_2, ...,
w_n) = \frac{\text{Count}(w_1, w_2, ..., w_n)}{\text{Total Count of N-grams in the
Corpus}}P(w1,w2,...,wn)=Total Count of N-grams in the CorpusCount(w1,w2,...,wn)
o If a sequence does not occur in the training data, its probability is zero.
3. Challenges:
o Data Sparsity: Many word sequences may not appear in the training corpus, resulting in zero
probabilities.
o Unseen Events: The model cannot generalize to unseen sequences, limiting its applicability
in real-world scenarios.
Applications of Unsmoothed N-grams
1. Language Modeling:
o N-grams are used to model the probability distribution of word sequences, forming the basis
of many simple language models.
2. Text Analysis:
o Analyze word frequency, co-occurrence, and patterns in text without adjusting probabilities
for rare events.
3. Machine Translation and Summarization:
o Evaluate the structure of phrases and sentences in a corpus.
4. Information Retrieval:
o Identify key phrases or sequences in a text to match user queries.
Advantages of Unsmoothed N-grams
1. Simplicity:
o Easy to calculate and understand.
2. Exact Representation:
o Provides raw insights into word distribution and sequence occurrence.
3. Interpretability:
o No additional assumptions (e.g., smoothing) obscure the results.
o
Effective Date Page 3 of
Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Limitations
1. Zero Probabilities:
o Any sequence not in the training data has a probability of zero, making the model unusable in
those cases.
2. Data Dependency:
o Requires large corpora to cover a wide range of word sequences.
3. Scalability:
o As nnn increases, the number of possible N-grams grows exponentially, leading to the "curse
of dimensionality."
When to Use Unsmoothed N-grams?
1. Exploratory Analysis:
o When analyzing raw word-level patterns in text data.
2. Baseline Models:
o For establishing a benchmark before applying advanced techniques like smoothing or neural
models.
3. Specific NLP Tasks:
o When zero probabilities are not critical, or the focus is on observed patterns only.

Evaluating N-grams Smoothing

Smoothing techniques address the problem of zero probabilities in N-gram models by redistributing some
probability mass from seen N-grams to unseen ones. Evaluating the performance of smoothing methods is
crucial to assess how well a language model generalizes to unseen data and avoids overfitting.
1. Why Smoothing Is Important in N-gram Models
 Unseen N-grams: Without smoothing, any unseen N-gram will have a probability of zero, making
the model unusable for predicting sequences containing those N-grams.
 Better Generalization: Smoothing ensures the model can handle rare or unseen word sequences
effectively.
 Improved Perplexity: By redistributing probability mass, smoothing generally leads to lower
perplexity on test data, indicating better predictions.
2. Common Smoothing Techniques
1. Laplace (Add-One) Smoothing:
o Adds 1 to the count of every possible N-gram to avoid zero probabilities.
o Formula: P(wn∣wn−1)=Count(wn−1,wn)+1Total Count of Bigrams+VP(w_n | w_{n-1}) = \
frac{\text{Count}(w_{n-1}, w_n) + 1}{\text{Total Count of Bigrams} + V}P(wn∣wn−1
)=Total Count of Bigrams+VCount(wn−1,wn)+1 where VVV is the vocabulary size.
2. Add-kkk Smoothing:
o Generalizes Laplace smoothing by adding a smaller constant k>0k > 0k>0.
o Reduces the overestimation of probabilities for unseen N-grams compared to Laplace
smoothing.
3. Good-Turing Smoothing:
o Adjusts the probability of seen and unseen N-grams based on the counts of N-grams with
similar frequencies.
o Effective for redistributing probability mass to unseen events.
4. Kneser-Ney Smoothing:
o Combines absolute discounting with backing off to lower-order models.
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 4 of
Effective Date Page No.
20

o Specifically designed for language modeling and is often the most effective for N-grams.
o Captures the diversity of contexts where a word appears.
5. Backoff and Interpolation:
o Backoff: Uses lower-order N-grams when higher-order N-grams are unavailable.
o Interpolation: Combines probabilities from higher- and lower-order N-grams.
3. Metrics for Evaluating Smoothing Techniques
3.1 Perplexity
 Measures how well the model predicts a test dataset.
 Lower perplexity indicates better predictions.
 Use: Compare perplexity across different smoothing methods to determine the most effective one.
3.2 Coverage
 Measures how many N-grams in the test set have non-zero probabilities after smoothing.
 Higher coverage: Indicates that the smoothing method successfully handles unseen N-grams.
3.3 Precision and Recall
 Evaluate how accurately the smoothed model predicts N-grams compared to reference sequences.
 Use: Helpful in tasks like machine translation or text generation.
3.4 BLEU/ROUGE Scores
 Evaluate the impact of smoothing on downstream tasks like machine translation (BLEU) or
summarization (ROUGE).
 Higher scores: Indicate that the smoothing method improves the quality of generated text.
4. Practical Considerations
1. Dataset Size:
o Smaller datasets often require more aggressive smoothing techniques like Laplace or Add-
kkk.
o Larger datasets can benefit from advanced methods like Kneser-Ney smoothing.
2. Vocabulary Size:
o Larger vocabularies increase the number of unseen N-grams, making effective smoothing
essential.
3. Higher-Order N-grams:
o Higher nnn (e.g., trigrams, 4-grams) suffer more from data sparsity, making advanced
smoothing methods critical.
4. Task-Specific Requirements:
o Some tasks (e.g., ASR, machine translation) may benefit more from sophisticated smoothing
techniques like Kneser-Ney due to their contextual sensitivity.
5. Interpreting Results
 Perplexity: Use test data to compare perplexity scores across smoothing methods.
 Probabilities: Compare how each method redistributes probability mass to unseen or rare N-grams.
 Task Performance: Evaluate BLEU/ROUGE scores or other task-specific metrics to determine how
smoothing impacts downstream tasks.

Interpolation and Backoff

Interpolation and Backoff are two commonly used techniques for handling data sparsity in N-gram
models. These methods aim to improve language models' ability to generalize and assign probabilities to
unseen word sequences.
1. Interpolation
Definition
Effective Date Page 5 of
Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

 Interpolation combines probabilities from higher-order and lower-order N-grams, rather than relying
solely on the highest available N-gram.
 The idea is to use information from all N-gram levels, weighting them appropriately.

Mathematical Representation
For a trigram model (n=3n=3n=3):

Characteristics
 All levels (unigram, bigram, trigram, etc.) contribute to the final probability.
 Weights (λ\lambdaλ) can be determined through techniques like grid search or expectation-
maximization (EM) based on a held-out dataset.
Advantages
 Smoother probability distribution compared to relying solely on higher-order N-grams.
 Reduces the impact of data sparsity by leveraging lower-order N-grams.
Applications
 Language modeling (e.g., speech recognition, machine translation).
 Predictive text generation.
2. Backoff
Definition
 Backoff uses lower-order N-grams only when higher-order N-grams are unavailable or have zero
probability.
 Unlike interpolation, backoff does not combine probabilities; it falls back to lower-order
probabilities as needed.
Mathematical Representation
For a trigram model:
P(wi∣wi−2,wi−1)={If trigram exists: P(wi∣wi−2,wi−1)Else, back off to: αP(wi∣wi−1)P(w_i | w_{i-2}, w_{i-
1}) = \begin{cases} \text{If trigram exists: } P(w_i | w_{i-2}, w_{i-1}) \\ \text{Else, back off to: } \alpha
P(w_i | w_{i-1}) \\ \end{cases}P(wi∣wi−2,wi−1)={If trigram exists: P(wi∣wi−2,wi−1
)Else, back off to: αP(wi∣wi−1)
 α\alphaα: Backoff weight to ensure proper normalization of probabilities.
Characteristics
 Probabilities from lower-order N-grams are used only when necessary.
 A normalization factor (α\alphaα) ensures the model’s probabilities sum to 1.
Advantages
 Simpler implementation compared to interpolation.
 Efficient when the corpus contains sufficient higher-order N-grams.
Applications
 Language modeling in applications like text-to-speech (TTS) and auto-completion.

3. Interpolation vs. Backoff

Feature Interpolation Backoff
Combines probabilities from all N-gram Uses lower-order N-grams only when
Combination
levels. necessary.
Requires weights (λ\lambdaλ) for each N- Normalizes probabilities with a backoff
Weighting
gram level. factor (α\alphaα).
Sparsity Distributes probability across levels Falls back to lower levels when higher ones
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 6 of
Effective Date Page No.
20

Feature Interpolation Backoff

Handling smoothly. fail.
Complexity Computationally more complex. Simpler to implement.
Depends on the size and quality of the
Accuracy Typically provides better generalization.
corpus.
4. Advanced Techniques
Katz Backoff
 A combination of backoff and smoothing.
 High-order N-grams are used when available, and lower-order N-grams are backed off to with
discounted probabilities.
 Probability adjustment ensures unused mass from higher-order N-grams is redistributed to lower-
order ones.
Linear Interpolation
 A specific form of interpolation where weights are pre-determined or trained on a development
dataset.
 Each N-gram level contributes to the final probability, weighted by fixed or learned factors.

Word Classes
In Natural Language Processing (NLP), word classes (also referred to as parts of speech
(POS) or syntactic categories) are used to group words based on their grammatical roles, syntactic behavior,
and function within a sentence. Word classes help in analyzing, understanding, and generating human
language computationally.

1. Common Word Classes in NLP

Here are the primary word classes typically used in NLP:
Word Class Definition Examples
Noun Represents people, places, things, or ideas. dog, city, happiness
Pronoun Substitutes for nouns. he, she, it, they
Verb Denotes actions, states, or events. run, is, think
Adjective Describes or modifies nouns. happy, red, tall
quickly, very,
Adverb Modifies verbs, adjectives, or other adverbs.
tomorrow
Shows relationships between a noun/pronoun and another word in
Preposition in, on, by, with
the sentence.
Conjunction Connects words, phrases, or clauses. and, but, or
Determiner Modifies nouns to clarify reference. the, a, some, this
Interjection Expresses emotion or exclamation. oh, wow, ouch
Numeral Represents numbers or quantities. one, two, third
Adds meaning or emphasis, often functioning as part of a phrasal
Particle not, up, off
verb.
Auxiliary Verb Helps the main verb express tense, mood, or voice. is, have, can
2. Importance of Word Classes in NLP
 Syntactic Analysis: Identifying word classes is essential for parsing sentences into meaningful
structures.
Effective Date Page 7 of
Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

 Semantic Understanding: Helps in understanding the meaning and relationships of words in a

sentence.
 Applications:
o Machine Translation: Determines correct grammar and structure in the target language.
o Text-to-Speech (TTS): Improves pronunciation and prosody.
o Question Answering: Helps identify entities and relationships in text.
3. Word Class Tagging
Part-of-Speech (POS) Tagging
 POS tagging involves assigning word classes to each word in a text.
 Example: Sentence: She is reading a book. POS Tags: She (PRON), is (AUX), reading (VERB), a
(DET), book (NOUN)
POS Tagging Tools
1. NLTK (Natural Language Toolkit):
o Uses pre-trained POS taggers based on the Penn Treebank tag set.
2. SpaCy:
o Provides efficient, pre-trained models for POS tagging.
3. Stanford NLP:
o A highly accurate POS tagging library.
Popular Tag Sets
1. Penn Treebank POS Tag Set: Common in English NLP tasks.
o Example tags: NN (Noun, singular), VB (Verb, base form), DT (Determiner).
2. Universal POS Tag Set: A cross-linguistic standard.
o Example tags: NOUN, VERB, ADJ.
4. Challenges in Word Classes for NLP
1. Ambiguity:
o Some words can belong to multiple classes depending on context.
o Example: book (Noun: a book) vs. book (Verb: to book a ticket).
2. Idiomatic Expressions:
o Words may lose their standard class roles in idioms.
o Example: kick the bucket (phrase meaning to die).
3. Morphologically Rich Languages:
o Languages like Finnish or Turkish have complex inflection systems, making word class
tagging challenging.
4. Domain-Specific Vocabulary:
o Technical or slang words may not fit neatly into standard word classes.
5. Applications of Word Classes in NLP
1. Machine Translation:
o Ensures grammatical correctness in translations by tagging words with their appropriate
classes.
2. Named Entity Recognition (NER):
o Distinguishes between nouns (e.g., dog) and proper nouns (e.g., Google).
3. Text Summarization:
o Identifies keywords and key phrases based on nouns, verbs, and adjectives.
4. Sentiment Analysis:
o Adjectives and adverbs often carry sentiment, aiding polarity detection.
5. Information Retrieval:
o Helps identify relevant words or phrases in search queries.
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 8 of
Effective Date Page No.
20

Part-of-Speech Tagging
Part-of-Speech Tagging is the process of assigning word classes or grammatical categories
(e.g., noun, verb, adjective) to each word in a given text based on its context. It is a fundamental step in
many NLP applications as it helps in understanding the syntactic structure and meaning of a sentence.
1. Why POS Tagging is Important
1. Syntactic Analysis:
o Identifies the grammatical structure of sentences for parsing and sentence analysis.
2. Semantic Understanding:
o Determines word meaning based on context (e.g., "book" as a noun or verb).
3. Downstream Applications:
o Named Entity Recognition (NER): Identifies proper nouns.
o Machine Translation: Ensures grammatically correct output.
o Text Summarization: Extracts key phrases based on POS.
o Sentiment Analysis: Leverages adjectives and adverbs to detect sentiment.
2. How POS Tagging Works
Steps in POS Tagging:
1. Tokenization:
o Split the input text into individual words or tokens.
o Example: "The cat sat on the mat." → ["The", "cat", "sat", "on", "the", "mat"]
2. Assigning POS Tags:
o Each token is assigned a tag based on:
 Rule-Based Methods: Grammar rules.
 Statistical Models: Probabilities derived from training data.
 Deep Learning Models: Neural networks that learn contextual relationships.
POS Tags
 Commonly used POS tagging schemes:
1. Penn Treebank POS Tag Set (for English):
 NN: Noun (singular)
 VB: Verb (base form)
 JJ: Adjective
 RB: Adverb
 IN: Preposition
2. Universal POS Tag Set:
 NOUN, VERB, ADJ, ADV, PRON, etc.
3. Techniques for POS Tagging
1. Rule-Based Tagging
 Relies on manually defined grammar rules.
 Example:
o "If a word ends with '-ing', tag it as a verb (VB)."
 Limitation:
o Cannot handle complex or ambiguous contexts effectively.
2. Statistical Tagging
 Uses probabilistic models trained on labeled data.
 Examples:
Effective Date Page 9 of
Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

o Hidden Markov Models (HMMs):

 Computes the most likely sequence of tags based on transition and emission
probabilities.
o Conditional Random Fields (CRFs):
 Incorporates contextual features and learns the sequence of tags.
3. Neural Network-Based Tagging
 Uses deep learning to capture context and dependencies in sentences.
 Examples:
o Recurrent Neural Networks (RNNs):
 Learn sequential data, such as sentences, for POS tagging.
o Bidirectional LSTMs (Bi-LSTMs):
 Use context from both directions (previous and next words) for better tagging
accuracy.
o Transformers:
 Models like BERT pre-trained on large corpora excel at contextual tagging.
4. Challenges in POS Tagging
1. Ambiguity:
o Words can have multiple POS tags depending on context.
o Example: book (Noun: "a book", Verb: "to book a room").
2. Out-of-Vocabulary Words:
o Words not seen during training can be challenging to tag.
3. Complex Sentence Structures:
o Long or syntactically ambiguous sentences may reduce accuracy.
4. Domain-Specific Text:
o Jargon or technical terms require domain-specific models.
5. Applications of POS Tagging
1. Named Entity Recognition (NER):
o Identifies entities like names, locations, or dates based on POS tags.
2. Dependency Parsing:
o Establishes syntactic relationships between words.
3. Text Summarization:
o Extracts key phrases by focusing on nouns, verbs, and adjectives.
4. Machine Translation:
o Ensures grammatical correctness in translations.
5. Question Answering Systems:
o Identifies the role of words in questions (e.g., subject, object).
6. POS Tagging Libraries
1. NLTK (Natural Language Toolkit):
o A Python library with pre-trained POS taggers.
python
Copy code
import nltk
from nltk import pos_tag, word_tokenize

sentence = "The quick brown fox jumps over the lazy dog."
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 10 of
Effective Date Page No.
20

print(pos_tags)
2. SpaCy:
o Provides fast and efficient POS tagging.
python
Copy code
import spacy

nlp = [Link]("en_core_web_sm")
sentence = "The quick brown fox jumps over the lazy dog."
doc = nlp(sentence)
for token in doc:
print(f"{[Link]}: {token.pos_}")
3. Stanford CoreNLP:
o A highly accurate library for POS tagging, using statistical models.
4. Flair:
o A deep learning library specialized in POS tagging and other NLP tasks.
5. BERT-based Models:
o Pre-trained transformer models like BERT can perform POS tagging with fine-tuning.
.

Rule based

Rule-Based NLP involves the use of hand-crafted linguistic rules and patterns to process,
analyze, and generate human language. It is one of the oldest approaches in NLP and relies on predefined
rules, lexicons, and grammar to achieve language understanding or generation.
1. What is Rule-Based NLP?
In a rule-based system, language processing is based on a set of manually defined rules. These rules are
created by linguists or domain experts and are used to identify patterns in text or to define how language
elements interact.
 Example:
o Rule: If a word ends with "-ing," it is likely a verb.
o Rule: If "not" appears before an adjective, classify it as negative sentiment.
Key Components:
1. Lexicons: Word lists or dictionaries with associated features (e.g., part of speech, polarity).
2. Grammar Rules: Syntax and morphology rules (e.g., subject-verb agreement, noun phrase
structure).
3. Pattern Matching: Matching text to specific patterns (e.g., regular expressions).
4. Rule Engine: A system that applies rules to text.
2. Applications of Rule-Based NLP
1. Part-of-Speech (POS) Tagging
 Rule-based taggers assign POS tags to words using linguistic rules.
 Example Rule:
o If the preceding word is a determiner (e.g., "the"), tag the current word as a noun.
2. Named Entity Recognition (NER)
 Detect entities like names, dates, and locations using patterns.
 Example Rule:
Effective Date Page 11 of
Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

o If a word starts with a capital letter and is followed by "Inc." or "Ltd.," classify it as an
organization.
3. Sentiment Analysis
 Identify positive or negative sentiment using sentiment word lexicons and negation rules.
 Example Rule:
o If "not" appears before a positive word (e.g., "not good"), classify it as negative.
4. Text Normalization
 Handle text preprocessing tasks like stemming and lemmatization using rules.
 Example Rule:
o If a word ends in "ing," remove "ing" (e.g., "running" → "run").
5. Spell Checking
 Correct spelling errors by comparing against a dictionary and applying transformation rules.
6. Information Extraction
 Extract structured information from unstructured text using templates and rules.
 Example:
o Extract dates in the format "DD-MM-YYYY" using regex patterns.
7. Question Answering
 Use rules to detect question types and retrieve relevant information.
 Example Rule:
o If a question starts with "Who," retrieve entities tagged as "Person."
3. Advantages of Rule-Based NLP
1. Interpretability:
o Rules are explicit and easy to understand.
o Useful in domains where decisions need to be explainable.
2. Domain Adaptability:
o Rules can be customized for specific languages, industries, or tasks.
3. Low Data Dependency:
o Does not require large labeled datasets for training.
4. Deterministic Behavior:
o Outputs are predictable and consistent.
4. Limitations of Rule-Based NLP
1. Scalability:
o Creating and maintaining a large number of rules is time-consuming and labor-intensive.
2. Coverage:
o Rules may fail to handle edge cases, ambiguities, or new language patterns.
3. Context Sensitivity:
o Difficult to account for context or nuances of natural language effectively.
4. Maintenance:
o Rules need to be updated frequently to keep up with evolving language and domain-specific
terms.
5. Generalization:
o Rule-based systems often struggle with unseen data or out-of-vocabulary words.
5. Examples of Rule-Based NLP Techniques
1. Regular Expressions (Regex)
 Used for pattern matching in text.
 Example:
o Extract email addresses:
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 12 of
Effective Date Page No.
20

python
Copy code
import re
text = "Contact us at support@[Link]."
emails = [Link](r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
print(emails)
Output: ['support@[Link]']
2. Rule-Based POS Tagging
 Example using NLTK:
python
Copy code
import nltk
from nltk import RegexpTagger

# Define POS tagging rules

rules = [
(r'.*ing$', 'VBG'), # Gerunds
(r'.*ed$', 'VBD'), # Past tense verbs
(r'.*es$', 'VBZ'), # 3rd person singular verbs
(r'.*ly$', 'RB'), # Adverbs
(r'.*', 'NN') # Default: Noun
]

# Apply rule-based tagger

tagger = RegexpTagger(rules)
sentence = "The cat is running quickly."
tokens = nltk.word_tokenize(sentence)
tags = [Link](tokens)
print(tags)
Output:
css
Copy code
[('The', 'NN'), ('cat', 'NN'), ('is', 'NN'), ('running', 'VBG'), ('quickly', 'RB'), ('.', 'NN')]
3. Named Entity Recognition
 Extract dates using regex:
python
Copy code
import re
text = "The meeting is scheduled for 14-01-2025."
dates = [Link](r'\b\d{2}-\d{2}-\d{4}\b', text)
print(dates)
Output: ['14-01-2025']
Effective Date Page 13 of
Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

6. Rule-Based vs. Statistical/ML-Based NLP

Aspect Rule-Based NLP Statistical/ML-Based NLP
Interpretability High (rules are explicit). Low (complex models like neural networks).
Data Dependency Low (works without large datasets). High (requires labeled data for training).
Limited (hard to adapt to new
Flexibility High (generalizes better with enough data).
patterns).
Scales well with more data and computational
Scalability Difficult to scale as rules increase.
power.
Performance Good for small, well-defined tasks. Better for large, complex, and ambiguous tasks.
7. Hybrid Systems
Modern NLP systems often combine rule-based and ML-based approaches to leverage the strengths of
both:
 Example: Use ML models for initial tagging and apply rule-based post-processing for domain-
specific corrections.
8. Use Cases for Rule-Based NLP
 Domains with Low Data Availability:
o Legal or healthcare text analysis where labeled datasets are limited.
 Critical Applications:
o Applications requiring high interpretability (e.g., financial compliance).
 Text Preprocessing:
o Tokenization, normalization, and filtering in NLP pipelines.

Stochastic and Transformation-based tagging

POS tagging is a crucial task in NLP, and there are multiple approaches to achieving it, including
stochastic methods and transformation-based learning (TBL). Here's a breakdown of these two
approaches:
1. Stochastic Tagging
Stochastic Tagging relies on probability and statistics to assign part-of-speech (POS) tags to words. It
involves using statistical models trained on labeled data to compute the most likely sequence of tags for a
sentence.
Key Techniques in Stochastic Tagging
1. Hidden Markov Models (HMM)
 Overview:
o Assumes a sequence of words is generated by a hidden sequence of states (POS tags).
o Uses transition probabilities (from one tag to another) and emission probabilities
(probability of a word given a tag) to find the most likely sequence of tags.
 Key Steps:
o Calculate probabilities from a tagged corpus.
o Use the Viterbi Algorithm to find the most probable sequence of tags.
 Example:
o Given the sentence "The cat sleeps":
 Transition Probability: P(NN → VB) = 0.2 (probability of a noun followed by a verb).
 Emission Probability: P(sleeps | VB) = 0.8 (probability of the word "sleeps" given the
tag VB).
2. N-gram Models
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 14 of
Effective Date Page No.
20

 Uses n-grams (sequences of n words or tags) to compute probabilities.

 For POS tagging, bigrams and trigrams are commonly used.
 Example:
o If the bigram probability P(DT → NN) is high, the word "dog" is likely to be a noun when
preceded by a determiner like "the."
3. Maximum Entropy Models
 Probabilistic models that consider a wider range of contextual features.
 Predict the POS tag that maximizes the conditional probability given the features.
4. Conditional Random Fields (CRFs)
 Discriminative models that predict the sequence of tags directly, using both current and surrounding
context.
 Example:
o Predicts tags for the sentence "The quick brown fox" by considering the relationship between
neighboring words and their tags.
Advantages of Stochastic Tagging
1. Can handle ambiguity using probabilities.
2. Generalizes well to unseen data with sufficient training.
3. Scalable for large datasets.
Challenges of Stochastic Tagging
1. Requires a large, annotated corpus for training.
2. Struggles with domain-specific text or out-of-vocabulary words.
3. Complex models like CRFs and HMMs may be computationally expensive.
2. Transformation-Based Tagging (TBL)
Transformation-Based Tagging, also known as Brill Tagging, is a hybrid approach that combines
rule-based and stochastic methods. It learns rules from data iteratively to correct initial tagging errors.
How TBL Works
1. Initialization:
o Start with a baseline tagger (e.g., assign the most frequent tag for each word).
o Example: Tag "book" as NN (noun) because it's most commonly a noun.
2. Rule Generation:
o Identify contexts where the initial tag is incorrect and generate rules to correct these errors.
o Example Rule:
 If the previous word is "to" and the current word is "book," change the tag from NN
(noun) to VB (verb).
3. Rule Application:
o Apply the learned rules iteratively, refining the tagging process in each iteration.
4. Stopping Condition:
o Stop when no further improvements are made or when a predefined number of iterations is
reached.
Example of TBL
Input Sentence:
 "I want to book a flight."
Initial Tags (Baseline):
 I/PRP want/VB to/TO book/NN a/DT flight/NN
Transformation Rule:
 Rule: If the previous word is "to" and the current word is tagged as NN, change the tag to VB.
Final Tags:
Effective Date Page 15 of
Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

 I/PRP want/VB to/TO book/VB a/DT flight/NN

Advantages of Transformation-Based Tagging
1. Interpretable Rules:
o Rules are human-readable and explainable.
2. Domain Adaptability:
o Rules can be adapted for specific domains or tasks.
3. Efficiency:
o Does not require probabilistic computations during inference.
Challenges of TBL
1. Dependency on Baseline:
o The quality of the baseline tagger impacts performance.
2. Rule Creation:
o Iterative rule learning can be slow for large datasets.
3. Error Propagation:
o Errors in early iterations may propagate to later stages.
3. Comparison of Stochastic and TBL
Aspect Stochastic Tagging Transformation-Based Tagging (TBL)
Relies on probabilities and statistical
Methodology Learns explicit transformation rules iteratively.
models.
Requires labeled data but learns interpretable
Training Data Requires large labeled datasets.
rules.
Interpretability Low (statistical models are complex). High (rules are human-readable).
Adaptability Generalizes well with sufficient data. Can be fine-tuned for specific domains.
Speed Faster at runtime (once trained). Slower due to iterative rule application.
Error Handling Handles ambiguity using probabilities. Iteratively corrects errors with learned rules.
4. Applications of Stochastic and TBL Tagging
1. Part-of-Speech Tagging:
o Stochastic models (e.g., HMMs, CRFs) are widely used in general-purpose POS taggers.
o TBL is useful in domains where interpretability is crucial.
2. Named Entity Recognition (NER):
o Stochastic models identify entities using probabilistic tagging.
o TBL can refine entity tags by applying domain-specific rules.
3. Spell Checking and Correction:
o TBL can be used to create rules for correcting common spelling errors.
4. Syntactic Parsing:
o Stochastic models like CRFs assist in dependency and constituency parsing.

Issues in POS Tagging

POS tagging, the process of assigning grammatical tags to words in a text, is a fundamental task in
Natural Language Processing (NLP). Despite its importance, several challenges arise during its
implementation due to the complexity and ambiguity of human language.
1. Ambiguity
1.1 Lexical Ambiguity
 A single word can have multiple possible POS tags depending on its context.
o Example:
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 16 of
Effective Date Page No.
20

 "Book a flight" → "book" (Verb)

 "Read the book" → "book" (Noun)
1.2 Structural Ambiguity
 The structure of a sentence can lead to multiple valid tag sequences.
o Example:
 "Visiting relatives can be fun"
 "Visiting" as a Verb (action) or Adjective (modifier).
1.3 Tagging Ambiguity
 Words that can fit into multiple categories even within similar contexts.
o Example:
 "He saw her duck"
 "duck" could be a Noun (animal) or Verb (action).
2. Out-of-Vocabulary (OOV) Words
 Words not present in the training data can lead to incorrect or undefined tags.
o Common cases:
 Neologisms: Newly coined words (e.g., "selfie").
 Technical Terms: Domain-specific jargon.
 Foreign Words: Words borrowed from other languages.
3. Domain-Specific Challenges
 POS tagging systems trained on general-purpose corpora may fail in specific domains.
o Example:
 Medical domain: "MRI scan" (Medical jargon)
 Legal domain: "Hereby declare" (Formal language)
4. Language Variability
4.1 Morphological Richness
 Some languages (e.g., Finnish, Turkish) have rich morphology where a single word can represent an
entire phrase in English.
o Example:
 Turkish: "Evlerinizden" → "From your houses."
4.2 Free Word Order
 In free word order languages (e.g., Sanskrit, Hungarian), the sequence of words does not always
determine their grammatical role, making tagging more complex.
4.3 Lack of Resources
 For low-resource languages, there may be insufficient annotated corpora for training.
5. Multiword Expressions (MWEs)
 Phrases that function as a single unit can cause confusion in tagging.
o Example:
 "New York" should be tagged as a proper noun (NNP), not as two separate entities.

6. Context Sensitivity
 POS tags often depend on the broader sentence or paragraph context, which simple models may fail
to capture.
o Example:
 "He likes to fish"
 "The fish is fresh"
7. Inconsistent Annotation Standards
Effective Date Page 17 of
Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

 Different corpora use different POS tagsets or annotation guidelines, leading to inconsistency in
tagging models.
o Example:
 Universal POS Tagset (simpler): "book" → VERB
 Penn Treebank Tagset (granular): "book" → VB
8. Polysemy and Homonymy
 Polysemy: Words with multiple related meanings.
o Example: "run" → a physical action (VB) or a race (NN).
 Homonymy: Words with unrelated meanings but identical spelling.
o Example: "bank" → a financial institution (NN) or the side of a river (NN).
9. Noisy Text
 Tagging becomes difficult in non-standard or informal text formats, such as:
o Social Media Text: Contains abbreviations, emojis, and slang.
 Example: "u r gr8" → "you are great."
o Speech Transcriptions: May include disfluencies and fillers.
 Example: "Um, I think I like, uh, coffee."
10. Compound Words
 Words like "ice-cream" or "well-being" can be misinterpreted as separate tokens or misclassified.
11. Handling Code-Switching
 In multilingual contexts, speakers often switch between languages mid-sentence.
o Example:
 "I need to book a taxi जल्दी से" (English + Hindi).
12. Evaluation and Metrics
 Evaluating POS taggers is challenging due to:
o Different annotation schemes.
o Disagreement between annotators in ambiguous cases.
o Metric limitations: Precision, recall, and F1 may not always reflect real-world performance.
13. Dependency on Training Data
 Quality of Training Data:
o Poorly annotated corpora result in models learning incorrect patterns.
 Bias in Data:
o Models trained on biased datasets may perform poorly in diverse contexts.
14. Memory and Computational Constraints
 Resource-heavy models like CRFs or neural networks may not work well on devices with limited
computational power.

Strategies to Overcome Challenges

1. Ambiguity Handling
 Use context-aware models like Transformers (e.g., BERT) to capture sentence context.
 Leverage multi-task learning to integrate syntactic and semantic information.
2. OOV Word Management
 Use subword tokenization (e.g., Byte Pair Encoding or WordPiece).
 Add fallback rules for rare or unseen words.
3. Domain Adaptation
 Train or fine-tune models on domain-specific corpora.
 Use transfer learning techniques.
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 18 of
Effective Date Page No.
20

4. Handling Noisy Text

 Preprocess text to normalize slang, abbreviations, and spelling errors.
 Use domain-specific embeddings trained on noisy text (e.g., Twitter embeddings).
5. Multilingual Solutions
 Use models trained on Universal Dependencies (UD) to standardize tagging across languages.
 Build language-agnostic embeddings (e.g., mBERT, XLM-R).
6. Resource Scarcity
 Use cross-lingual transfer learning or unsupervised methods for low-resource languages.
 Employ crowdsourcing to create labeled datasets.
7. Incorporating Linguistic Knowledge
 Add linguistic rules or constraints to supplement statistical or neural models.
 Combine rule-based and data-driven approaches for better performance.

Hidden Markov Models (HMM) and Maximum Entropy Models

Hidden Markov Models (HMM) and Maximum Entropy Models (MaxEnt) are two widely used
statistical techniques in Natural Language Processing (NLP). They are often applied to sequence labeling
tasks, such as Part-of-Speech (POS) tagging, Named Entity Recognition (NER), and other text classification
problems.
1. Hidden Markov Models (HMM)
An HMM is a probabilistic model used for modeling sequences, where the system being modeled is
assumed to follow a Markov process with hidden states.
Key Concepts in HMM
1. States:
o Represent hidden variables, e.g., POS tags.
o Example: {NN, VB, DT} (noun, verb, determiner).
2. Observations:
o Represent observable data, e.g., words in a sentence.
o Example: ["The", "cat", "jumps"].
3. Transition Probabilities (P(tag_i | tag_(i-1))):
o Probability of transitioning from one state to another.
o Example: P(VB → NN).
4. Emission Probabilities (P(word | tag)):
o Probability of observing a word given a state.
o Example: P("cat" | NN).

5. Initial Probabilities (P(tag)):

o Probability of starting in a specific state.
o Example: P(DT) = 0.3.
How HMM Works
 HMM assumes that:
1. The current state depends only on the previous state (Markov property).
2. The observed word depends only on the current state.
Decoding with HMM
 The goal is to find the most likely sequence of states (tags) given the observed sequence of words.
Effective Date Page 19 of
Page No.
20
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

 Viterbi Algorithm:
o A dynamic programming algorithm to compute the most probable tag sequence efficiently.
Advantages of HMM
 Simplicity: Easy to implement and interpret.
 Probabilistic Framework: Provides a natural way to handle uncertainty in language.
Disadvantages of HMM
1. Strong Independence Assumptions:
o Assumes that the current state depends only on the previous state and the current word.
2. Data Sparsity:
o Struggles with unseen words or rare transitions.
3. Fixed Features:
o Cannot incorporate rich contextual features easily.
2. Maximum Entropy Models (MaxEnt)
Maximum Entropy Models, also known as log-linear models, are discriminative models used for
classification tasks. MaxEnt models predict the conditional probability of a class (e.g., a tag) given an input
feature vector.
Key Concepts in MaxEnt
1. Feature Representation:
o Captures contextual information, e.g., surrounding words, word suffixes, and capitalization.
o Example Features:
 Current Word: "cat"
 Previous Word: "The"
 Is Capitalized: False
2. Conditional Probability:
o Computes the probability of a class (tag) given the features.

3. Training:
o Maximize the log-likelihood of the training data using optimization algorithms (e.g., gradient
descent).
Advantages of MaxEnt
1. Rich Features:
o Can incorporate arbitrary, overlapping, and non-independent features.
2. Flexibility:
o No need for independence assumptions (unlike HMM).
3. Discriminative:
o Directly models the conditional probability P(y∣x)P(y | x)P(y∣x).
Disadvantages of MaxEnt
1. Computational Cost:
o Training can be expensive, especially with many features.
2. Overfitting:
o Requires regularization to avoid overfitting on the training data.
3. Data Dependence:
o Performance depends heavily on feature engineering and quality of labeled data.
3. Comparison of HMM and MaxEnt
Aspect HMM MaxEnt
Generative: Models joint probability Discriminative: Models conditional
Model Type
P(x,y)P(x, y)P(x,y). probability ( P(y
VIVEKANANDHA COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(Autonomous)

Page 20 of
Effective Date Page No.
20

Aspect HMM MaxEnt

Independence
Strong independence assumptions. No independence assumptions.
Assumptions
Limited to emission and transition Can use rich, overlapping contextual
Features
probabilities. features.
Computationally intensive during
Efficiency Faster to train and decode.
training.
Sequence labeling with complex
Use Cases Sequence labeling with simple features.
features.
Handles sparse data better with proper
Robustness Struggles with sparse data.
regularization.
4. Applications of HMM and MaxEnt in NLP
HMM Applications
1. Part-of-Speech Tagging:
o Assign POS tags to words in a sentence.
2. Speech Recognition:
o Model sequences of phonemes or words in audio signals.
3. Machine Translation:
o Generate translation probabilities for word alignments.
MaxEnt Applications
1. Named Entity Recognition (NER):
o Identify entities like names, locations, and organizations.
2. Chunking:
o Identify phrases (e.g., noun or verb phrases) in sentences.
3. Sentiment Analysis:
o Classify text as positive, negative, or neutral.

Unsmoothed N-grams in NLP Models
100% (1)
Unsmoothed N-grams in NLP Models
6 pages
Unsmoothed N-grams and Laplace Smoothing
100% (1)
Unsmoothed N-grams and Laplace Smoothing
2 pages
N-Gram Models in NLP
No ratings yet
N-Gram Models in NLP
23 pages
Language Modeling Techniques Overview
No ratings yet
Language Modeling Techniques Overview
30 pages
Syntactic Analysis and Parsing Techniques
No ratings yet
Syntactic Analysis and Parsing Techniques
4 pages
Understanding Parameter Estimation in NLP
No ratings yet
Understanding Parameter Estimation in NLP
12 pages
Lecture Recap: Language Models & N-Grams
No ratings yet
Lecture Recap: Language Models & N-Grams
41 pages
Challenges in POS Tagging Models
No ratings yet
Challenges in POS Tagging Models
14 pages
NLP Question Bank for B.Tech/MBA SEE
No ratings yet
NLP Question Bank for B.Tech/MBA SEE
3 pages
NLTK and spaCy Installation Guide
No ratings yet
NLTK and spaCy Installation Guide
33 pages
Back-off and Interpolation in NLP
No ratings yet
Back-off and Interpolation in NLP
12 pages
NLP Word Structure Analysis Techniques
No ratings yet
NLP Word Structure Analysis Techniques
19 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
58 pages
R22 JNTUH CSE Language Model Syllabus
No ratings yet
R22 JNTUH CSE Language Model Syllabus
16 pages
Language Modeling Overview for Students
No ratings yet
Language Modeling Overview for Students
72 pages
First Order Logic in NLP Semantics
No ratings yet
First Order Logic in NLP Semantics
45 pages
Text Representation Techniques in NLP
No ratings yet
Text Representation Techniques in NLP
131 pages
NLP Challenges: Irregularity, Ambiguity, Productivity
No ratings yet
NLP Challenges: Irregularity, Ambiguity, Productivity
9 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
43 pages
Walker's Algorithm for WSD in NLP
No ratings yet
Walker's Algorithm for WSD in NLP
99 pages
Advanced Grammar in NLP Systems
No ratings yet
Advanced Grammar in NLP Systems
6 pages
Language Modeling Overview Guide
No ratings yet
Language Modeling Overview Guide
3 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
18 pages
JNTUH R22 Computer Networks Notes
No ratings yet
JNTUH R22 Computer Networks Notes
82 pages
Understanding Language Models in NLP
No ratings yet
Understanding Language Models in NLP
148 pages
NLP Laboratory Manual for CD3281
No ratings yet
NLP Laboratory Manual for CD3281
25 pages
Semantic and Discourse Analysis in NLP
No ratings yet
Semantic and Discourse Analysis in NLP
19 pages
Interpolation and Backoff in NLP
No ratings yet
Interpolation and Backoff in NLP
14 pages
Understanding Syntax Parsing in NLP
No ratings yet
Understanding Syntax Parsing in NLP
23 pages
NLP: Origins, Challenges, and Models
No ratings yet
NLP: Origins, Challenges, and Models
16 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
35 pages
Morphological Analysis in NLP
No ratings yet
Morphological Analysis in NLP
24 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
Word2Vec vs GloVe: Embedding Approaches
No ratings yet
Word2Vec vs GloVe: Embedding Approaches
20 pages
Minimum Edit Distance in NLP
No ratings yet
Minimum Edit Distance in NLP
57 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
51 pages
Unit IV Notes
No ratings yet
Unit IV Notes
14 pages
Character Encoding in NLP Text Processing
No ratings yet
Character Encoding in NLP Text Processing
35 pages
Chomsky's Grammar in NLP Explained
No ratings yet
Chomsky's Grammar in NLP Explained
39 pages
Deep Learning Notes Overview
No ratings yet
Deep Learning Notes Overview
69 pages
NLP Unit 3
No ratings yet
NLP Unit 3
17 pages
Introduction to Randomized Algorithms
No ratings yet
Introduction to Randomized Algorithms
18 pages
Semantic Representation in NLP
No ratings yet
Semantic Representation in NLP
18 pages
Data-Driven Syntax Parsing in NLP
No ratings yet
Data-Driven Syntax Parsing in NLP
118 pages
NLP Question Bank for ODD SEM 2023-24
No ratings yet
NLP Question Bank for ODD SEM 2023-24
3 pages
Generative Models for Ambiguity Resolution
No ratings yet
Generative Models for Ambiguity Resolution
8 pages
Discreteness in Natural Language Processing
100% (1)
Discreteness in Natural Language Processing
46 pages
Information Retrieval and NLP Models
No ratings yet
Information Retrieval and NLP Models
16 pages
Understanding Big Data Analytics
No ratings yet
Understanding Big Data Analytics
29 pages
Morphological Analysis in NLP
No ratings yet
Morphological Analysis in NLP
14 pages
NLP: Parsing and Frequency Analysis in Java
No ratings yet
NLP: Parsing and Frequency Analysis in Java
35 pages
Machine Learning Lab Viva Questions
100% (1)
Machine Learning Lab Viva Questions
4 pages
Syntactic Parsing in Natural Language Processing
No ratings yet
Syntactic Parsing in Natural Language Processing
42 pages
RDF and XSLT in Semantic Web
No ratings yet
RDF and XSLT in Semantic Web
17 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
51 pages
NLP Course Overview and Challenges
100% (1)
NLP Course Overview and Challenges
71 pages
Semantic Analysis in NLP
No ratings yet
Semantic Analysis in NLP
8 pages
N-Gram Smoothing Techniques Explained
No ratings yet
N-Gram Smoothing Techniques Explained
28 pages
Add-One Smoothing in Language Models
No ratings yet
Add-One Smoothing in Language Models
10 pages
N-grams and Language Model Basics
No ratings yet
N-grams and Language Model Basics
74 pages
A12 Bus Schedule for June 2024
No ratings yet
A12 Bus Schedule for June 2024
1 page
PHP MySQL Milk Billing System Guide
No ratings yet
PHP MySQL Milk Billing System Guide
1 page
A12 Bus Route and Schedule Report
No ratings yet
A12 Bus Route and Schedule Report
2 pages
Java Stack and Queue Implementation
No ratings yet
Java Stack and Queue Implementation
7 pages
AI-Driven Routing for Wireless Sensor Networks
No ratings yet
AI-Driven Routing for Wireless Sensor Networks
9 pages
ID3 Algorithm for Decision Trees
No ratings yet
ID3 Algorithm for Decision Trees
5 pages
Decision Trees and ID3 Algorithm Explained
No ratings yet
Decision Trees and ID3 Algorithm Explained
15 pages
C Programming Basics and Structure
No ratings yet
C Programming Basics and Structure
93 pages
C++ Classes, Objects, and Tokens Explained
No ratings yet
C++ Classes, Objects, and Tokens Explained
4 pages
Series 4600 Horizontal Split Case Pump
No ratings yet
Series 4600 Horizontal Split Case Pump
4 pages
Nursing Care Plan for Infection Risk
0% (1)
Nursing Care Plan for Infection Risk
3 pages
DepEd Financial Management Operations Manual
No ratings yet
DepEd Financial Management Operations Manual
22 pages
JBL Tune 305C USB-C Earbud Features
No ratings yet
JBL Tune 305C USB-C Earbud Features
2 pages
Battle of the Cowshed: A Turning Point
No ratings yet
Battle of the Cowshed: A Turning Point
2 pages
Hybrid-Nano PCM for Solar Water Heating
No ratings yet
Hybrid-Nano PCM for Solar Water Heating
13 pages
Visual Art Exam Review Guide
No ratings yet
Visual Art Exam Review Guide
30 pages
Eagleburgmann Isartherm Flex 6050 en
No ratings yet
Eagleburgmann Isartherm Flex 6050 en
1 page
Lecture 1
No ratings yet
Lecture 1
24 pages
Jam 2020 PDF
No ratings yet
Jam 2020 PDF
1 page
Interactive Card Game Research Guide
No ratings yet
Interactive Card Game Research Guide
4 pages
Modeling and Simulation of Photovoltaic Arrays
No ratings yet
Modeling and Simulation of Photovoltaic Arrays
5 pages
PSC Update: June 2017 Detentions Report
No ratings yet
PSC Update: June 2017 Detentions Report
1 page
Main To Reserve Station PDF
No ratings yet
Main To Reserve Station PDF
2 pages
IP Office Server Edition SME7094
No ratings yet
IP Office Server Edition SME7094
2 pages
Employee Misconduct Warning Notice
No ratings yet
Employee Misconduct Warning Notice
3 pages
Zalex Control Box Wiring Diagram
No ratings yet
Zalex Control Box Wiring Diagram
1 page
Understanding Java Generics Basics
No ratings yet
Understanding Java Generics Basics
39 pages
Understanding Computer Viruses and Removal
No ratings yet
Understanding Computer Viruses and Removal
6 pages
Portable Fluid Purification Solutions
No ratings yet
Portable Fluid Purification Solutions
8 pages
TISE Erasmus Mundus Application Form
No ratings yet
TISE Erasmus Mundus Application Form
3 pages
Database Normalization Explained
No ratings yet
Database Normalization Explained
8 pages
Class 12 Chemistry: Transition Elements
No ratings yet
Class 12 Chemistry: Transition Elements
12 pages
Homogeneous vs Heterogeneous Mixtures Explained
No ratings yet
Homogeneous vs Heterogeneous Mixtures Explained
33 pages
Network Analysis Exercises for Systems Management
No ratings yet
Network Analysis Exercises for Systems Management
3 pages
Electrical Engineer CV - Bangladesh
No ratings yet
Electrical Engineer CV - Bangladesh
2 pages
AP Biology DNA Replication Practice Test
100% (1)
AP Biology DNA Replication Practice Test
27 pages
Esquema Elétrico Iluminação Externa
No ratings yet
Esquema Elétrico Iluminação Externa
8 pages
Drone Safety Quiz and Guidelines
No ratings yet
Drone Safety Quiz and Guidelines
9 pages
Analyzing Capra's "The Hidden Connections"
No ratings yet
Analyzing Capra's "The Hidden Connections"
22 pages

Unsmoothed N-grams in NLP Analysis

Uploaded by

Unsmoothed N-grams in NLP Analysis

Uploaded by

Effective Date Page 1 of

Course Title Natural Language Processing Course Code 24P2CSDE05

Handling Staff [Link] Designation Associate Professor

Staff Incharge HoD Principal

Evaluating N-grams Smoothing

Interpolation and Backoff

3. Interpolation vs. Backoff

Feature Interpolation Backoff

1. Common Word Classes in NLP

 Semantic Understanding: Helps in understanding the meaning and relationships of words in a

o Hidden Markov Models (HMMs):

# Define POS tagging rules

# Apply rule-based tagger

6. Rule-Based vs. Statistical/ML-Based NLP

Stochastic and Transformation-based tagging

 Uses n-grams (sequences of n words or tags) to compute probabilities.

 I/PRP want/VB to/TO book/VB a/DT flight/NN

Issues in POS Tagging

 "Book a flight" → "book" (Verb)

Strategies to Overcome Challenges

4. Handling Noisy Text

Hidden Markov Models (HMM) and Maximum Entropy Models

5. Initial Probabilities (P(tag)):

Aspect HMM MaxEnt

You might also like