0% found this document useful (0 votes)

29 views35 pages

Language Modeling Techniques Overview

The document discusses language modeling, which involves predicting the probability distribution of words in text sequences, and outlines various approaches such as N-gram, neural network, and transformer models. It covers applications of language models in spell checking, grammar checking, machine translation, and more, while explaining concepts like perplexity and the chain rule of probability for estimating word probabilities. Additionally, it emphasizes the importance of training and test sets in evaluating language models and the need for proper evaluation metrics.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views35 pages

Language Modeling Techniques Overview

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Multimedia

Application
By

Minhaz Uddin Ahmed, PhD

Department of Computer Engineering
Inha University Tashkent.
Email: [Link]@[Link]
Content
 Language Models
 N-Grams
 3.2 Evaluating Language Models: Training and Test Sets
 3.3 Evaluating Language Models: Perplexity
 3.4 Sampling sentences from a language model
 3.5 Generalization and Zeros
 3.6 Smoothing
 3.8 Advanced: Kneser-Ney Smoothing
Language Modeling

 Language modeling involves predicting the probability distribution of

words or tokens in a sequence of text. The goal of language modeling
is to capture the underlying structure and patterns of natural
language, allowing computers to generate coherent and
grammatically correct text.

 There are several approaches to language modeling, including:

i) N-gram Models
ii) Neural Network Models
iii) Transformer Models
Language Modeling

 Tashkent is the capital of ---------------?

i) India
ii) China
iii) Uzbekistan
Language model applications

 Spell checking
 Grammer Checking
 Machine translation
 Summarization
 Question answering
 Speech recognition
Probabilistic Language Models

 Assign a probability to a sentence

Application:
 Machine Translation:
P(high winds tonite) > P(large winds tonite)
 Spell Correction
 The office is about fifteen minuets from my house
 P(about fifteen minutes from) > P(about fifteen minuets from)

 Speech Recognition
 P(I saw a van) >> P(eyes awe of an)


+ Summarization, question-answering ,
Probability of sentence

 Grammer correction
 I go to school
 I going to school

 Probability score: I go to school > I going to school

 Correct: go to school, Wrong: going to school

Probability of sentence or words

 Compute the probability of a sentence or sequence of words:

=> P(W) = P(w1, w2,w3, w4,w5…wn)

 Probability of an upcoming word:

=> P(w5| w1,w2,w3,w4)
 P(Uzbekistan | Tashkent , is, the, capital, of)

 A model that computes either of these :

P(W) or P(wn|w1, w2…wn-1) is called a language model.
How to compute P(W)
 How to compute this joint probability:
 P(its, water, is, so, transparent, that)
 Intuition: let’s rely on the Chain Rule of Probability

P(A,B) = p(A|B) p(B)

the joint probability of all the random variables can be calculated by

multiplying the probability of each variable conditioned on all the previous
variables
Chain Rule of Probability

 Conditional probabilities
=> P(B|A) = P(A,B) / P(A)
Rewriting : P(A,B) = P(A)P(B|A)

More variables: P(A,B,C,D) = P(A) P(B|A) P (C|A, B) P(D|A,B,C)

The chain rule in general

=> P(x1, x2, x3, …, xn) = P(x1) P(x2|x1) P(x3|x1,x2) … P(xn|x1, …, xn-
1)
Chain Rule of Probability

 Chain rule : P(A,B,C,D) = P(A) P(B|A) P(C|A,B) P(D|A,B,C)

Calculation
= P(Uzbekistan| Tashkent, is ,the, capital, of )
= count (Tashkent is the capital of Uzbekistan) / count (Tashkent is the capital of)
The Chain Rule applied to compute
joint probability of words in
sentence

P(“its water is so transparent”) =

P(its) × P(water|its) × P(is|its water)
× P(so|its water is) × P(transparent|its water is so)
How to estimate these probabilities

 Could we just count and divide?

 No! Too many possible sentences!

 We’ll never see enough data for estimating these
Markov Assumption

 Simplifying assumption
= P(Uzbekistan| Tashkent, is, the, capital, of)
= P(Uzbekistan | of)
Andrei Markov
= P (Uzbekistan | capital of)

The assumption that the probability of a word depends only on the

previous word is called Markov assumption
Simplest case: Unigram model

Some automatically generated sentences from a unigram model

fifth, an, of, futures, the, an, incorporated, a,

a, the, inflation, most, dollars, quarter, in, is,
mass

thrift, did, eighty, said, hard, 'm, july, bullish

that, or, limited, the

Bigram model

 Condition on previous word

 Please bring me a glass of water.

History Word prediction

Estimating bigram probabilities

 The Maximum Likelihood Estimate

Bigram model

<s> I am Sam </s>

<s> Sam I am </s>
<s> I do not like green eggs and ham </s>
Estimated bigram probabilities

 P(<s> I want English food </s>) = P(I|<s>)x P (want|I)x P(English|

want) x p(food|english) x P(</s>|food) = 0.000031

 We can extend to trigrams, 4-grams, 5-

grams
 In general this is an insufficient model of
language
 because language has long-distance
dependencies:

“The computer which I had just put into the

machine room on the fifth floor crashed.”

 But we can often get away with N-gram

N-gram models

 An n-gram is a collection of n successive items in a text document

that may include words, numbers, symbols, and punctuation. N-gram
models are useful in many text analytics applications where
sequences of words are relevant, such as in sentiment analysis, text
classification, and text generation.

 In deep learning , Language models used higher gram model to train

the dataset.
N-gram models

Google Ngram
Viewer displays
user-selected words
or phrases (ngrams)
in a graph that
shows how those
phrases have
occurred in a
corpus. Google
Ngram Viewer's
corpus is made up
of the scanned
books available in
Google Book
Once the language model is built, it can then be used with machine
learning algorithms to build predictive models for text analytics
applications
Google N-Gram Release, August
2006

…
Evaluating Language Models:
Training and Test Sets
 "Extrinsic (in-vivo) Evaluation"
To compare models A and B
1. Put each model in a real task
• Machine Translation, speech recognition, etc.
2. Run the task, get a score for A and for B
• How many words translated correctly
• How many words transcribed correctly
3. Compare accuracy for A and B
Intrinsic (in-vitro) evaluation

 Extrinsic evaluation not always possible

• Expensive, time-consuming
• Doesn't always generalize to other applications
 Intrinsic evaluation: perplexity
• Directly measures language model performance at predicting words.
• Doesn't necessarily correspond with real application performance

• But gives us a single general metric for language models

• Useful for large language models (LLMs) as well as n-grams
Training sets and test sets

We train parameters of our model on a training set.

We test the model’s performance on data we haven’t
seen.
 A test set is an unseen dataset; different from training set.
 Intuition: we want to measure generalization to unseen data
 An evaluation metric (like perplexity) tells us how well
our model does on the test set.
Perplexity

 Perplexity is the standard metric for measuring quality of a language

model.
 The inverse probability of test set, normalized by the number of
words.

Chain rule:

Bigrams:

Minimizing perplexity is the maximizing probability

Perplexity

 Calculate perplexity of a sentence

Task of recognizing the digit in English

=> A sentence consist of random digits
=> Each digit probability is p = 1/10

Minimizing perplexity is the maximizing probability

Choosing training and test sets

• If we're building an LM for a specific task

• The test set should reflect the task language we want
to use the model for
• If we're building a general-purpose model
• We'll need lots of different kinds of training data
• We don't want the training set or the test set to be
just from one domain or author or language.
Training on the test set

We can’t allow test sentences into the training set

• Or else the LM will assign that sentence an artificially high probability
when we see it in the test set
• And hence assign the whole test set a falsely high probability.
• Making the LM look better than it really is
This is called “Training on the test set”
Dev sets

• If we test on the test set many times we might implicitly tune to its characteristics

• Noticing which changes make the model better.

• So we run on the test set only once, or a few times
• That means we need a third dataset:
• A development test set or, devset.
• We test our LM on the devset until the very end
• And then test our LM on the test set once
Reference

Chapter 3
Question
Thank you

Language Modeling Techniques Overview
No ratings yet
Language Modeling Techniques Overview
63 pages
Lecture Recap: Language Models & N-Grams
No ratings yet
Lecture Recap: Language Models & N-Grams
41 pages
N-grams in Statistical Language Models
No ratings yet
N-grams in Statistical Language Models
87 pages
Understanding Language Models in NLP
No ratings yet
Understanding Language Models in NLP
59 pages
N-gram Language Models in NLP
No ratings yet
N-gram Language Models in NLP
49 pages
N-gram Language Modeling Overview
No ratings yet
N-gram Language Modeling Overview
65 pages
N-gram Language Modeling Overview
No ratings yet
N-gram Language Modeling Overview
75 pages
N-gram Language Modeling Overview
No ratings yet
N-gram Language Modeling Overview
84 pages
N-Gram Models in NLP
No ratings yet
N-Gram Models in NLP
23 pages
Understanding N-grams in Language Modeling
No ratings yet
Understanding N-grams in Language Modeling
51 pages
N-grams in Language Modeling Explained
No ratings yet
N-grams in Language Modeling Explained
70 pages
Evaluating N-gram Models with Perplexity
No ratings yet
Evaluating N-gram Models with Perplexity
78 pages
N-gram Language Model Overview
No ratings yet
N-gram Language Model Overview
75 pages
N-Gram Language Modeling Overview
No ratings yet
N-Gram Language Modeling Overview
27 pages
N-gram Models in Language Processing
No ratings yet
N-gram Models in Language Processing
25 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
N-gram Language Modeling Basics
No ratings yet
N-gram Language Modeling Basics
97 pages
Probabilistic Language Modeling Overview
No ratings yet
Probabilistic Language Modeling Overview
59 pages
Understanding N-Gram Language Models
No ratings yet
Understanding N-Gram Language Models
56 pages
Understanding Language Models in NLP
No ratings yet
Understanding Language Models in NLP
48 pages
Understanding Language Modeling Basics
No ratings yet
Understanding Language Modeling Basics
69 pages
Understanding N-Gram Language Models
No ratings yet
Understanding N-Gram Language Models
37 pages
Understanding N-Gram Language Models
No ratings yet
Understanding N-Gram Language Models
79 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
22 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Introduction to N-grams in Language Models
No ratings yet
Introduction to N-grams in Language Models
13 pages
Language Modeling Techniques Overview
No ratings yet
Language Modeling Techniques Overview
67 pages
N-Gram Language Models in NLP
100% (1)
N-Gram Language Models in NLP
22 pages
N-Gram Models in Natural Language Processing
No ratings yet
N-Gram Models in Natural Language Processing
22 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
22 pages
N-grams Language Model in NLP
No ratings yet
N-grams Language Model in NLP
33 pages
Understanding N-Gram Language Models
No ratings yet
Understanding N-Gram Language Models
37 pages
Introduction to N-gram Models
No ratings yet
Introduction to N-gram Models
76 pages
Introduction to N-gram Language Models
No ratings yet
Introduction to N-gram Language Models
77 pages
Understanding N-grams in Language Modeling
No ratings yet
Understanding N-grams in Language Modeling
69 pages
Grammar-Based Language Modeling Insights
No ratings yet
Grammar-Based Language Modeling Insights
36 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
37 pages
Language Models in NLP Explained
No ratings yet
Language Models in NLP Explained
31 pages
Understanding n-gram Models in AI
No ratings yet
Understanding n-gram Models in AI
32 pages
Statistical NLP Techniques Overview
No ratings yet
Statistical NLP Techniques Overview
43 pages
N-Gram Language Modelling Insights
No ratings yet
N-Gram Language Modelling Insights
40 pages
Understanding Language Modeling Basics
No ratings yet
Understanding Language Modeling Basics
4 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
N-gram Language Models Explained
No ratings yet
N-gram Language Models Explained
15 pages
Chain Rule in N-Gram Language Models
No ratings yet
Chain Rule in N-Gram Language Models
24 pages
Understanding N-grams in Language Modeling
No ratings yet
Understanding N-grams in Language Modeling
88 pages
Understanding N-gram Models
No ratings yet
Understanding N-gram Models
88 pages
N-Gram Models in Language Processing
No ratings yet
N-Gram Models in Language Processing
117 pages
N-Gram Language Models Explained
No ratings yet
N-Gram Language Models Explained
28 pages
L3 LanguageModels
No ratings yet
L3 LanguageModels
118 pages
N-gram Language Models Overview
No ratings yet
N-gram Language Models Overview
50 pages
N-grams and Language Model Basics
No ratings yet
N-grams and Language Model Basics
74 pages
Understanding Statistical Language Models
No ratings yet
Understanding Statistical Language Models
56 pages
08 Language Models
No ratings yet
08 Language Models
69 pages
Understanding N-gram Language Models
No ratings yet
Understanding N-gram Language Models
3 pages
NLP: Language Models & POS Tagging
No ratings yet
NLP: Language Models & POS Tagging
59 pages
Understanding N-grams in Language Modeling
No ratings yet
Understanding N-grams in Language Modeling
88 pages
N-gram Language Models Explained
No ratings yet
N-gram Language Models Explained
39 pages
Block Cipher
No ratings yet
Block Cipher
17 pages
Unified Video Action Model for Robotics
No ratings yet
Unified Video Action Model for Robotics
16 pages
Understanding Modular Arithmetic Basics
No ratings yet
Understanding Modular Arithmetic Basics
31 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
43 pages
NLP Techniques and Applications Overview
No ratings yet
NLP Techniques and Applications Overview
49 pages
Virtual Memory and Page Replacement Techniques
No ratings yet
Virtual Memory and Page Replacement Techniques
64 pages
Multimedia Applications in NLP Concepts
No ratings yet
Multimedia Applications in NLP Concepts
47 pages
Understanding Context Free Grammar and Parsing
No ratings yet
Understanding Context Free Grammar and Parsing
42 pages
Active Administrator Installation Guide - 81
No ratings yet
Active Administrator Installation Guide - 81
45 pages
Modern Business Cybersecurity Challenges
No ratings yet
Modern Business Cybersecurity Challenges
2 pages
Adding and Subtracting Polynomials
No ratings yet
Adding and Subtracting Polynomials
9 pages
ShoppyGlobe E-commerce App Requirements
No ratings yet
ShoppyGlobe E-commerce App Requirements
2 pages
Starview VC-10-RGB-EU VMS Overview
No ratings yet
Starview VC-10-RGB-EU VMS Overview
6 pages
Audio Spectrum Analyzer Circuit Design
96% (24)
Audio Spectrum Analyzer Circuit Design
29 pages
CMPT404A PNP Chopper Transistor Datasheet
No ratings yet
CMPT404A PNP Chopper Transistor Datasheet
2 pages
Washing Machine FAT Protocol Guide
No ratings yet
Washing Machine FAT Protocol Guide
6 pages
Kabuhayan Program Beneficiary Profile Form
75% (16)
Kabuhayan Program Beneficiary Profile Form
2 pages
RIA CV: Professional Profile Summary
No ratings yet
RIA CV: Professional Profile Summary
2 pages
Panasonic KX-TDE Control Units Overview
No ratings yet
Panasonic KX-TDE Control Units Overview
27 pages
MEC-MR65 Components List
No ratings yet
MEC-MR65 Components List
2 pages
Beatrice Eric's Resume and Profile
No ratings yet
Beatrice Eric's Resume and Profile
4 pages
An 357 Android Java D2xx Demo Application For FT4222
No ratings yet
An 357 Android Java D2xx Demo Application For FT4222
17 pages
Machine Learning Classification Overview
No ratings yet
Machine Learning Classification Overview
20 pages
Arduino LCD Keypad Math Game
No ratings yet
Arduino LCD Keypad Math Game
20 pages
Krone Modules
No ratings yet
Krone Modules
5 pages
AP-200 Series Access Point Guide
No ratings yet
AP-200 Series Access Point Guide
2 pages
Introduction to Linux Operating System
No ratings yet
Introduction to Linux Operating System
19 pages
AI and Robotics: Overview and Future
No ratings yet
AI and Robotics: Overview and Future
7 pages
76 Notes On Numerical Fluid Mechanics (NNFM)
No ratings yet
76 Notes On Numerical Fluid Mechanics (NNFM)
382 pages
Rebuttal to Street-Porter's Facebook Views
No ratings yet
Rebuttal to Street-Porter's Facebook Views
2 pages
Intelligent Traffic Management System SRS
No ratings yet
Intelligent Traffic Management System SRS
12 pages
Advanced Flutter Interview Questions
No ratings yet
Advanced Flutter Interview Questions
2 pages
Java Programming and Data Structures Course
No ratings yet
Java Programming and Data Structures Course
4 pages
A.K. Tripathi - Control Systems DRILL (200 Plus New Questions With Meticulous Solutions) For GATE - ESE-2018
No ratings yet
A.K. Tripathi - Control Systems DRILL (200 Plus New Questions With Meticulous Solutions) For GATE - ESE-2018
128 pages
Lead Management System for HSR Motors
No ratings yet
Lead Management System for HSR Motors
5 pages
RT8057
No ratings yet
RT8057
11 pages
HDRFlow: Ghosting-Free HDR Video
No ratings yet
HDRFlow: Ghosting-Free HDR Video
3 pages
VAST Error 303 and JWPlayer Solutions
No ratings yet
VAST Error 303 and JWPlayer Solutions
24 pages