0% found this document useful (0 votes)

163 views98 pages

NLP- AI2214601 unit 1to unit 5 notes

The document provides an overview of Natural Language Processing (NLP), detailing its origins, challenges, and key concepts such as language modeling, grammar-based models, and statistical language modeling. It discusses the complexities of natural language, including ambiguity, context dependency, and ethical concerns, while highlighting the advancements in machine learning that have propelled NLP forward. Additionally, it outlines various applications of language modeling, including text generation, machine translation, and speech recognition.

Uploaded by

nithyaamr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

163 views98 pages

NLP- AI2214601 unit 1to unit 5 notes

Uploaded by

nithyaamr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 98

AI2214601 NATURAL LANGUAGE PROCESSING

UNIT I INTRODUCTION 9

Origins and challenges of NLP – Language Modeling: Grammar-based LM, Statistical LM -

Regular Expressions, Finite-State Automata – English Morphology, Transducers for lexicon

and rules, Tokenization, Detecting and Correcting Spelling Errors, Minimum Edit Distance

The origins of Natural Language Processing (NLP) can be traced back to the 1950s and
1960s with the development of early computational linguistics and machine translation
systems. Some
key milestones include the development of the Georgetown-IBM experiment in 1954, which
translated Russian sentences into English, and the creation of the first chatbot, ELIZA, in the
mid-1960s.

Challenges in NLP stem from the complexity and ambiguity inherent in natural language.
Here are some of the key challenges:

1. Ambiguity: Natural language is highly ambiguous, with words and phrases often
having multiple meanings depending on context. Resolving this ambiguity is a major
challenge in tasks such as parsing, word sense disambiguation, and machine translation.
2. Syntax and Semantics: Understanding the syntactic and semantic structure of
sentences is crucial for NLP tasks. However, natural language exhibits complex
syntactic and semantic patterns that can be difficult for machines to parse and
understand accurately.
3. Context Dependency: The meaning of a word or phrase can vary depending on the
surrounding context. Capturing and modeling context dependencies is essential for
tasks like sentiment analysis, named entity recognition, and question answering.
4. Lack of Annotated Data: Many NLP tasks require large amounts of annotated data for
training machine learning models. However, creating high-quality annotated datasets
can be time-consuming and expensive, especially for languages with limited resources.
5. Domain Specificity: Natural language varies greatly across different domains and
genres (e.g., medical texts, legal documents, social media posts). Building NLP
systems that perform well across diverse domains is challenging due to the need for
domain adaptation and specialized knowledge.
6. Commonsense Reasoning: Understanding and reasoning about commonsense
knowledge is essential for many NLP tasks, such as language understanding and
generation. However, capturing and representing commonsense knowledge in a
machine-readable format is still an ongoing research challenge.
7. Ethical and Bias Concerns: NLP systems can inadvertently perpetuate biases present
in the data they are trained on, leading to issues such as algorithmic bias and fairness
concerns. Addressing these ethical considerations is crucial for the responsible
development and deployment of NLP technologies.

Despite these challenges, significant progress has been made in NLP in recent years, driven
by advances in machine learning, deep learning, and computational linguistics. Ongoing
research continues to push the boundaries of what is possible in natural language
understanding and generation.
2. Language modeling is a fundamental task in natural language processing (NLP) that
involves predicting the next word in a sequence of words. The goal is to capture the statistical
structure of language and generate coherent and contextually relevant text.

Here's how language modeling typically works:

1. Input Sequence: A language model takes as input a sequence of words or tokens. This
sequence can be a sentence, paragraph, or longer text.
2. Context Encoding: The input sequence is encoded into a numerical representation
that can be processed by the language model. This encoding captures the contextual
information of the input, such as the meaning of words and their relationships within
the sequence.
3. Prediction: Based on the encoded context, the language model predicts the probability
distribution over the vocabulary of possible next words. This distribution indicates the
likelihood of each word occurring given the context provided by the input sequence.
4. Sampling: To generate text, the language model can either select the word with the
highest probability (greedy decoding) or sample from the probability distribution to
introduce randomness and generate diverse text.

Language modeling can be approached using different techniques:

1. Statistical Language Models: These models estimate the probability of word

sequences based on statistical analysis of large text corpora. Techniques such as n-
grams and smoothing methods are commonly used in statistical language modeling.
2. Neural Language Models: Neural network-based approaches, particularly recurrent
neural networks (RNNs) and more recently transformer-based architectures like GPT
(Generative Pre-trained Transformer), have become prevalent for language modeling.
These models learn distributed representations of words and capture long-range
dependencies in text.
3. Fine-tuning: Pre-trained language models can be fine-tuned on specific tasks or
domains to improve performance on downstream tasks like text generation, machine
translation, and sentiment analysis. This fine-tuning process adapts the pre-trained
model to the characteristics of the target dataset or task.

Language modeling has numerous applications in NLP, including:

 Text generation: Generating coherent and contextually relevant text, such as in

chatbots, language generation systems, and content creation tools.
 Machine translation: Modeling the probability of target language words given the
source language context.
 Speech recognition: Estimating the likelihood of spoken words or phrases given
acoustic features.
 Information retrieval: Ranking documents based on their relevance to a query by
modeling the likelihood of word sequences in documents.
 Summarization: Generating concise summaries of longer texts by predicting the most
important words or phrases.

Overall, language modeling plays a crucial role in various NLP tasks and continues to be an
active area of research and development.

Grammar-based language models (LMs) are a class of language models that rely on explicit
grammar rules to generate or understand natural language text. These models are based on
linguistic theories and formal grammars, which define the syntax and structure of a language.
Here's how grammar-based language models typically work:

1. Grammar Rules: Grammar-based LMs start with a set of grammar rules that describe
the syntactic structure of the language. These rules define how words and phrases can
be combined to form grammatically correct sentences.
2. Parsing: When generating or understanding text, the input is parsed according to the
grammar rules to identify the syntactic structure of the input sentence. This involves
breaking down the input into its constituent parts, such as words, phrases, and clauses,
and determining how they relate to each other.
3. Rule Application: The grammar rules are then applied to the parsed input to generate
or interpret text. These rules govern how words and phrases can be combined to form
valid sentences according to the grammar of the language.
4. Constraints: Grammar-based LMs may incorporate additional constraints to ensure
that the generated text adheres to specific criteria, such as style, domain-specific
vocabulary, or semantic coherence.
5. Evaluation: The generated text is evaluated based on its grammaticality and
coherence according to the rules of the grammar. This evaluation may involve
checking for violations of grammar rules, semantic inconsistencies, or other linguistic
criteria.

Grammar-based language models have several advantages:

 Explicit Linguistic Knowledge: By encoding linguistic knowledge in the form of

grammar rules, these models can capture intricate syntactic structures and language
patterns.
 Interpretability: Since the grammar rules are explicitly defined, the behavior of
grammar-based LMs is often more interpretable compared to black-box models like
neural networks.

However, grammar-based LMs also have limitations:

 Scalability: Creating comprehensive grammar rules for natural languages can be

challenging, especially for languages with complex syntax and semantics. As a result,
grammar-based LMs may struggle to handle the full richness of natural language.
 Coverage: Grammar-based LMs may not capture all linguistic phenomena, leading to
gaps in coverage and potential errors in text generation or interpretation.

Overall, while grammar-based language models provide a principled approach to natural

language processing, they are often supplemented or replaced by data-driven approaches, such
as statistical language models and neural language models, which can learn patterns directly
from data without relying on explicit grammar rules.

grammar-based language modeling using context-free grammar (CFG) and a probabilistic

context-free grammar (PCFG).
Examples:
S -> NP VP
NP -> Det N
VP -> V NP | V NP PP
PP -> P NP
Det -> "the" | "a"
N -> "cat" | "dog" | "ball"
V -> "chased" | "ate"
P -> "on" | "under" | "with"
This grammar consists of rules for generating
1. sentences (S),
2. noun phrases (NP),
3. verb phrases (VP),
4. prepositional phrases (PP),
5. determiners (Det),
6. nouns (N),
7. verbs (V),
8. prepositions (P).
We start with the start symbol "S" and recursively apply the production rules until we derive a
complete sentence:

1. S
2. NP VP (using the rule S -> NP VP)
3. Det N VP (using the rule NP -> Det N)
4. "the" N VP (using the rule Det -> "the")
5. "the" N V NP (using the rule VP -> V NP)
6. "the" N V Det N (using the rule NP -> Det N)
7. "the" N V Det N PP (using the rule VP -> V NP PP)
8. "the" N V Det N P NP (using the rule PP -> P NP)
9. "the" N V Det N P Det N (using the rule NP -> Det N)

Now we have a complete sentence: "the cat chased the dog on a ball."

Statistical language modelling

Statistical language modeling is a technique used to estimate the probability distribution of

word sequences in a language based on observed data. It forms the basis for many natural
language processing tasks, such as speech recognition, machine translation, and text
generation.

Here's how statistical language modeling typically works:

1. Training Data: Statistical language models are trained on large amounts of text data,
known as a corpus. This corpus contains sequences of words along with their
frequencies of occurrence.
2. n-gram Models: One of the simplest approaches to statistical language modeling is
the n-gram model, where the probability of a word sequence is estimated based on the
frequencies of occurrence of n-length sequences of words (n-grams) in the training
data. For example, a bigram model (n=2) estimates the probability of a word given its
preceding word, while a trigram model (n=3) estimates the probability of a word given
its two preceding words.
3. Estimating Probabilities: Given a sequence of words w1, w2, ..., wn, the probability
of the entire sequence P(w1, w2, ..., wn) is estimated as the product of the conditional
probabilities of each word given its preceding context:
P(w1, w2, ..., wn) ≈ P(w1) * P(w2|w1) * P(w3|w1, w2) * ... * P(wn|wn−1, ..., w1)

These probabilities are estimated from the frequencies of n-grams in the training data
using techniques such as maximum likelihood estimation (MLE) or smoothed
estimation methods like add-one smoothing or Kneser-Ney smoothing.

4. Backoff and Interpolation: To address data sparsity issues and improve the
robustness of n-gram models, techniques like backoff and interpolation are often
employed. Backoff involves using lower-order n-grams when higher-order n-grams
have zero counts, while interpolation combines probabilities from different n-gram
orders to smooth the probability estimates.
5. Application: Once trained, a statistical language model can be used for various NLP
tasks. For example, in speech recognition, the language model helps to recognize the
most likely sequence of words given the input speech signal. In machine translation, it
guides the generation of fluent and grammatically correct translations.

Statistical language modeling provides a simple yet effective framework for capturing the
statistical properties of natural language. However, it has limitations such as the inability to
capture long-range dependencies and the need for large amounts of training data to achieve
good performance. More sophisticated approaches, such as neural language models, have
been developed to address these limitations and achieve state-of-the-art results in many NLP
tasks.

1. Speech Recognition: In speech recognition systems, statistical language models are

used to decode the most likely sequence of words given an input speech signal. The
language model helps to distinguish between alternative word sequences and improve
the accuracy of the recognized text. For example, given the audio signal "I want to
eat," the language model may help decide between "I want two eat" and "I want to
eat."
2. Autocomplete and Text Prediction: Statistical language models power autocomplete
and text prediction features in applications such as search engines, messaging apps,
and word processors. These models suggest the most likely next word or phrase based
on the context of the input text. For example, when typing "I am going to," the
language model may suggest "the store" or "the park" as likely completions.
3. Machine Translation: In machine translation systems, statistical language models
help generate fluent and grammatically correct translations by estimating the
probability of different word sequences in the target language. The language model
guides the selection of the most likely translation given the source text. For example,
given the source text "Je suis content" in French, the language model may help choose
between "I am happy" and "I am satisfied" as translations.
4. Text Generation: Statistical language models can be used to generate coherent and
contextually relevant text for various applications, such as chatbots, content creation
tools, and language generation systems. These models estimate the probability of word
sequences and use sampling techniques to generate new text based on the learned
language patterns. For example, a language model trained on news articles may
generate new headlines or article summaries.
5. Spell Checking and Correction: Statistical language models are used in spell
checking and correction systems to identify and correct spelling errors in text. These
models estimate the likelihood of word sequences and suggest corrections based on
the context of the input text. For example, when detecting the misspelled word
"recieve," the language model may suggest "receive" as a likely correction based on its
training data.
These examples demonstrate how statistical language modeling is applied in various NLP
tasks to improve the accuracy, fluency, and naturalness of text processing and generation.

a simple example of statistical language modeling using a bigram model. Suppose we have a
small corpus consisting of the following sentences:

1. "I like to eat apples."

2. "Apples are delicious."
3. "I like to eat bananas."

We can use this corpus to build a bigram language model, which estimates the probability of
each word given its preceding word. Here's how we can do it:

1. Tokenization: First, we tokenize the sentences into individual words, removing

punctuation and converting everything to lowercase. This gives us the following
tokenized corpus:

["i", "like", "to", "eat", "apples"]

["apples", "are", "delicious"]
["i", "like", "to", "eat", "bananas"]

2. Counting Bigrams: Next, we count the occurrences of bigrams (pairs of consecutive

words) in the tokenized corpus:

("i", "like"): 2
("like", "to"): 2
("to", "eat"): 2
("eat", "apples"): 1
("apples", "are"): 1
("are", "delicious"): 1
("eat", "bananas"): 1

3. Estimating Probabilities: We calculate the probability of each word given its

preceding word using maximum likelihood estimation (MLE)

P("like" | "i") = Count("i like") / Count("i") = 2 / 2 = 1.0

P("to" | "like") = Count("like to") / Count("like") = 2 / 2 = 1.0
P("eat" | "to") = Count("to eat") / Count("to") = 2 / 2 = 1.0
P("apples" | "eat") = Count("eat apples") / Count("eat") = 1 / 2 = 0.5
P("are" | "apples") = Count("apples are") / Count("apples") = 1 / 1 = 1.0
P("delicious" | "are") = Count("are delicious") / Count("are") = 1 / 1 = 1.0
P("bananas" | "eat") = Count("eat bananas") / Count("eat") = 1 / 2 = 0.5

Now, we have a bigram language model that can estimate the probability of word
sequences. For example, if we want to compute the probability of the sentence "I like
to eat bananas," we can multiply the probabilities of the bigrams:
P("i") * P("like" | "i") * P("to" | "like") * P("eat" | "to") * P("bananas" | "eat")
= 1.0 * 1.0 * 1.0 * 1.0 * 0.5
= 0.5
This shows that according to our bigram model, the probability of the sentence "I like
to eat bananas" is 0.5.

Regular expressions (regex) are powerful tools used in natural language processing (NLP)
for pattern matching and text processing tasks. They allow for efficient searching, extraction,
and manipulation of text based on specified patterns. Here are some common applications of
regular expressions in NLP:

1. Tokenization: Regular expressions can be used to split a text into tokens, such as
words or sentences. For example, \w+ matches one or more word characters,
effectively tokenizing words in a sentence.
2. Text Cleaning: Regular expressions are useful for cleaning and preprocessing text
data by removing unwanted characters, punctuation, or formatting. For instance, \W
matches any non-word character, which can be used to remove punctuation marks
from text.
3. Pattern Matching: Regular expressions enable the extraction of specific patterns or
entities from text data. For example, \b\d{3}-\d{3}-\d{4}\b matches phone numbers in
the format XXX-XXX-XXXX.
4. Named Entity Recognition (NER): Regular expressions can be used as simple rules
for identifying named entities such as dates, emails, or URLs in text. For example, a
regex pattern can match strings that resemble email addresses (\b[A-Za-z0-9._%+-]
+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b).
5. Information Extraction: Regular expressions can aid in extracting structured
information from unstructured text, such as dates, addresses, or numerical data. For
instance, \b\d{2}/\d{2}/\d{4}\b matches dates in the format MM/DD/YYYY.
6. Text Normalization: Regular expressions can be used to normalize text by converting
it to a standard format. For example, \b[A-Z]+\b matches all uppercase words, which
can be converted to lowercase for normalization.
7. Text Segmentation: Regular expressions can help in segmenting text into meaningful
units, such as paragraphs or sections. For example, \n\n matches two consecutive
newline characters, which can be used to split text into paragraphs.

While regular expressions are powerful, they also have limitations. They may not handle
complex patterns or variations in text well, and writing and maintaining complex regex
patterns can be challenging. Additionally, regular expressions are often not robust to noisy or
ambiguous text data. In such cases, more advanced techniques, such as rule-based systems or
machine learning models, may be more suitable.

Example

simple example of a regular expression in Python that matches email addresses:

import re

# Sample text containing email addresses

text = "Contact us at [email protected] or [email protected] for assistance."

# Regular expression pattern to match email addresses

pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

# Find all email addresses in the text

matches = re.findall(pattern, text)
# Print the matches
print(matches)

Output:

['[email protected]', '[email protected]']

In this example:

 The regular expression pattern r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]

{2,}\b' consists of several components:
o \b: Word boundary anchor to ensure that the email address starts and ends with
a word boundary.
o [A-Za-z0-9._%+-]+: Matches one or more characters that can occur in the local
part of the email address (before the '@' symbol), including letters, digits, dots,
underscores, percent signs, plus signs, and hyphens.
o @: Matches the '@' symbol.
o [A-Za-z0-9.-]+: Matches one or more characters that can occur in the domain
part of the email address (after the '@' symbol), including letters, digits, dots,
and hyphens.
o \.: Matches a literal dot '.' (used to separate domain levels).
o [A-Z|a-z]{2,}: Matches two or more letters (upper or lowercase) representing
the top-level domain (TLD), such as 'com', 'net', 'co.uk', etc.
o \b: Word boundary anchor to ensure that the email address ends with a word
boundary.
 The re.findall() function is used to find all occurrences of the pattern in the given text.
 The matches found are printed, which are the email addresses present in the text.

Finite State Automata (FSA) or Finite State Machines (FSM), which are models used in
computer science and mathematics to represent systems that can be in only a finite number of
states at any given time. These automata are widely used in various fields, including natural
language processing, compiler design, and digital circuit design.

1. Definition: A Finite State Automaton is defined by a finite set of states, a finite set of
input symbols, a transition function that describes how the automaton transitions
between states based on input symbols, a start state, and a set of accept states.
2. Types of FSAs:
o Deterministic Finite Automaton (DFA): In a DFA, for each state and input
symbol, there is exactly one transition leading to a next state. DFAs are
commonly used in lexical analysis and pattern matching.
o Nondeterministic Finite Automaton (NFA): In an NFA, there can be
multiple transitions for a given state and input symbol, or there can be ε-
transitions (transitions without consuming an input symbol). NFAs are often
used in regular expression matching.
3. Operations on FSAs:
o Union, intersection, and complementation of automata.
o Concatenation, Kleene star (closure), and concatenation of automata.
o Minimization of DFAs to reduce the number of states while preserving the
language recognized by the automaton.
4. Applications:
o Regular expression matching: FSAs are used to implement regular expression
engines.
Lexical analysis: DFAs are used to recognize tokens in programming
o
languages.
o Pattern recognition: FSAs can be used to model and recognize patterns in data.
5. Limitations:
o FSAs are limited in their expressive power compared to more complex
automata models like pushdown automata and Turing machines.
o They can only recognize regular languages, which are a subset of the
languages recognized by context-free grammars.

Let's consider a simple example of a deterministic finite automaton (DFA) that recognizes
strings over the alphabet {0, 1} that end with "01".

Here's the DFA:

1. States: {q0, q1, q2}

2. Alphabet: {0, 1}
3. Start state: q0
4. Accept states: {q2}
5. Transition function:

----0----> q0 ----0----> q0 ----0----> q0

| | | |
| | | |
Start | v1 v1 v1
| q1 q1q1
|0 0
v
q2

 The start state is q0.

 The accept state is q2.
 The DFA transitions from state to state based on the input symbols. For example, if it
reads a 0 in state q0, it stays in q0, if it reads a 1 in q0, it transitions to q1, and so on.
 The string will be accepted if it ends in state q2, indicating that it ends with "01".

English morphology is an essential aspect of natural language processing (NLP) that deals
with the structure and formation of words in the English language. It encompasses various
morphological processes, such as inflection, derivation, compounding, and others.
Understanding English morphology is crucial for tasks like tokenization, stemming,
lemmatization, and part-of-speech tagging. Here's a brief overview of some key concepts in
English morphology and their relevance in NLP:

1. Inflection: Inflection involves adding affixes (prefixes, suffixes, infixes) to a base

word to indicate grammatical features such as tense, aspect, mood, number, case, and
gender. For example:
o Walk (base form) -> walks, walked, walking (inflected forms)
o Cat (singular) -> cats (plural)
2. Derivation: Derivation involves forming new words by adding affixes to base words,
resulting in changes in meaning or word class. For example:
o Happy (adjective) -> happiness (noun)
o Nation (noun) -> national (adjective)
3. Compounding: Compounding involves combining two or more words to form a new
word with a new meaning. Compounds can be open (e.g., ice cream), hyphenated
(e.g., well-being), or closed (e.g., keyboard).
4. Stemming: Stemming is the process of reducing words to their base or root forms by
removing affixes. It aims to normalize words so that different inflected forms map to
the same stem. For example:
o Running, runs -> run (stem)
o Cats, cat's -> cat (stem)
5. Lemmatization: Lemmatization is similar to stemming but considers the context of
words to determine their base forms (lemmas). It typically involves dictionary lookup
and morphological analysis to ensure that the lemma is a valid word. For example:
o Running, runs -> run (lemma)
o Better, best -> good (lemma)
6. Part-of-Speech Tagging: Part-of-speech tagging assigns grammatical categories
(nouns, verbs, adjectives, etc.) to words in a sentence. Morphological features play a
significant role in determining the part of speech of a word. For example:
o Running (verb)
o Cat (noun)
o Happy (adjective)
7. Morphological Analysis: Morphological analysis involves breaking down words into
their constituent morphemes (the smallest units of meaning). This process is essential
for understanding the internal structure of words and for various NLP tasks.

In NLP, algorithms and models are developed to handle these morphological processes
efficiently, enabling tasks such as text normalization, syntactic analysis, semantic analysis,
and more. Proper handling of English morphology enhances the accuracy and effectiveness of
NLP systems across a wide range of applications.

English morphology in natural language processing (NLP) involves analyzing the structure
and formation of words in the English language. Morphology deals with the internal structure
of words and how they are formed from smaller meaningful units called morphemes. Here's
an example illustrating English morphology:

Consider the word "unhappiness."

1. Root: The root or base of the word is "happy."

2. Affixes:
o "un-" is a prefix indicating negation or reversal.
o "-ness" is a suffix indicating the quality or state of being.
3. Morphemes:
o "un-" (prefix)
o "happi-" (root)
o "-ness" (suffix)
4. Morphological Analysis:
o Prefix: "un-": negation
o Root: "happy": feeling or showing pleasure
o Suffix: "-ness": state or quality
5. Word Formation: The word "unhappiness" is formed by combining the prefix "un-"
with the root "happy" and the suffix "-ness," resulting in the meaning "the state of not
being happy" or "lack of happiness."

In NLP, understanding English morphology is crucial for various tasks, including:

1. Tokenization: Breaking text into words or tokens, considering morphological

boundaries.
2. Stemming: Reducing words to their root form (stem) by removing affixes. For
example, stemming "unhappiness" would result in "happi."
3. Lemmatization: Similar to stemming but produces the base or dictionary form of a
word (lemma). For "unhappiness," the lemma would be "happiness."
4. Part-of-Speech Tagging: Identifying the grammatical category of each word based on
its morphology. For example, "unhappiness" would be tagged as a noun.
5. Named Entity Recognition (NER): Identifying named entities like person names,
organization names, etc., which often have specific morphological patterns.

Understanding English morphology helps NLP systems better comprehend and process text,
enabling tasks such as sentiment analysis, machine translation, information retrieval, and
more.

1. Lexical Transduction:
o Lexical transduction refers to the process of mapping words from one form to
another based on specific rules or patterns. This could involve transformations
such as stemming or lemmatization, where words are reduced to their base or
dictionary forms.
o For example, in English morphology, converting the word "running" to its base
form "run" involves a lexical transduction rule that removes the suffix "-ing."
2. Rules for Lexical Transduction:
o Lexical transduction rules are typically based on linguistic knowledge and
patterns observed in the language. These rules define how words are
transformed from one form to another.
o Rules can involve the application of affix stripping, suffix removal, or applying
irregular transformation patterns.
o Example lexical transduction rule: "If a word ends with '-ing', remove the
suffix to obtain the base form."
3. Grammatical Transduction:
o Grammatical transduction refers to the process of transforming sentences or
phrases from one grammatical form to another. This could involve tasks such
as converting active voice to passive voice, changing tense, or altering
sentence structure.
o Example: Converting the sentence "The cat chased the mouse" from active
voice to passive voice results in "The mouse was chased by the cat."
4. Rules for Grammatical Transduction:
o Grammatical transduction rules are based on syntactic and grammatical
structures. These rules define how sentences or phrases are transformed while
preserving their meaning.
o Rules can involve rearranging word order, changing verb conjugation, or
altering grammatical features.
o Example grammatical transduction rule: "To convert active voice to passive
voice, move the object of the active sentence to the subject position and change
the verb form to the passive voice."

Tokenization is a fundamental task in natural language processing (NLP) that involves

breaking down a text into smaller units called tokens. These tokens can be words, phrases,
symbols, or other meaningful elements, depending on the context and the specific
requirements of the task at hand. Here's an overview of tokenization in NLP:

1. Word Tokenization:
o Word tokenization, also known as word segmentation or word splitting,
involves dividing a text into individual words based on whitespace or
punctuation boundaries.
o Example: The sentence "Tokenization is an important NLP task" can be
tokenized into ["Tokenization", "is", "an", "important", "NLP", "task"].
2. Sentence Tokenization:
o Sentence tokenization involves splitting a text into individual sentences based
on punctuation marks like periods, exclamation marks, and question marks.
o Example: The paragraph "This is the first sentence. This is the second
sentence! And this is the third sentence?" can be tokenized into ["This is the
first sentence.", "This is the second sentence!", "And this is the third
sentence?"].
3. Subword Tokenization:
o Subword tokenization involves dividing words into smaller units, such as
morphemes or character n-grams. This approach is commonly used in
languages with complex morphology or for handling out-of-vocabulary words.
o Example: In subword tokenization, the word "tokenization" can be split into
["to", "ken", "iza", "tion"] or ["token", "iza", "tion"].
4. Tokenization Challenges:
o Tokenization can be challenging for languages with complex word boundaries
or agglutinative morphology.
o Ambiguity in tokenization can arise due to punctuation marks, abbreviations,
contractions, and compound words.
5. Tokenization Libraries:
o Various NLP libraries provide built-in functions for tokenization, including
NLTK (Natural Language Toolkit), spaCy, and the tokenization module in the
TensorFlow and PyTorch frameworks.
6. Preprocessing:
o Tokenization is typically the first step in text preprocessing, followed by tasks
such as lowercasing, stemming, lemmatization, and stop word removal.

Detecting and correcting spelling errors is an important task in natural language processing
(NLP) and can significantly improve the accuracy and readability of text. Here's an overview
of how spelling errors are detected and corrected:

1. Spell Checking:
o Spell checking involves identifying words in a text that are not found in a
dictionary or known vocabulary.
o Spell checkers compare each word in the text against a dictionary or a list of
known words to determine if it is spelled correctly.
o Words that are not found in the dictionary are flagged as potential spelling
errors.
2. Candidate Generation:
o Once spelling errors are detected, candidate words are generated as potential
replacements for the misspelled words.
o Candidate generation techniques may involve:
 Generating possible corrections by applying operations such as
insertion, deletion, substitution, or transposition of characters.
 Using statistical language models to suggest the most likely
replacements based on context.
3. Candidate Ranking:
o After generating candidate replacements, a ranking algorithm is applied to
score and rank the candidate corrections.
o Ranking algorithms consider factors such as:
 Edit distance: How many edits are required to transform the misspelled
word into each candidate.
 Language model probabilities: How likely each candidate is based on
the surrounding context.
 Frequency of occurrence: How frequently each candidate appears in a
large corpus of text.
4. Correction Selection:
o The correction selection process involves choosing the highest-ranked
candidate as the replacement for the misspelled word.
o In some cases, multiple candidate corrections may be suggested to the user for
manual selection.
5. Contextual Spelling Correction:
o Contextual spelling correction takes surrounding context into account when
detecting and correcting spelling errors.
o Contextual information, such as adjacent words, grammar, syntax, and
semantics, can help improve the accuracy of spelling correction.
6. Evaluation and Feedback:
o Spell checkers are often evaluated using manually annotated datasets or user
feedback to assess their accuracy and effectiveness.
o Continuous improvement based on user feedback helps refine and enhance
spelling correction algorithms over time.

In natural language processing (NLP), the minimum edit distance (also known as
Levenshtein distance) is a metric used to quantify the similarity between two strings by
measuring the minimum number of single-character edits (insertions, deletions, or
substitutions) required to transform one string into the other. It's a fundamental concept used
in various NLP tasks, such as spell checking, text correction, and approximate string
matching. Here's how it works:

1. Definition:
o Given two strings, A of length m and B of length n, the minimum edit distance
between them, denoted as D(A, B), is the minimum number of edits required to
transform string A into string B.
2. Operations:
o Insertion: Add a character to string A.
o Deletion: Remove a character from string A.
o Substitution: Replace a character in string A with another character.
3. Dynamic Programming Algorithm:
o The minimum edit distance can be efficiently computed using dynamic
programming.
o The algorithm fills in a matrix where each cell (i, j) represents the minimum
edit distance between the substrings A[0:i] and B[0:j].
o The algorithm iterates through each position in the matrix, updating the values
based on the minimum cost of the possible edit operations.
o The final value in the bottom-right corner of the matrix represents the
minimum edit distance between the two strings.
4. Applications:
o Spell Checking: Determine the closest words to a misspelled word by
computing the minimum edit distance between the misspelled word and all
words in a dictionary.
o Approximate String Matching: Find strings in a database that are similar to a
given query string by computing the minimum edit distance between the query
string and database strings.
o OCR (Optical Character Recognition): Correct errors in OCR output by
comparing the recognized text with the original text using minimum edit
distance.
5. Example:
o For example, consider the strings "kitten" and "sitting":
 The minimum edit distance between them is 3.
 One possible sequence of edit operations is: substitute 'k' with 's',
substitute 'e' with 'i', and insert 'g' at the end.

UNIT II WORD LEVEL ANALYSIS 9

Unsmoothed N-grams, Evaluating N-grams, Smoothing, Interpolation and Backoff
– Word classes, Part-of-Speech Tagging, Rule-based, Stochastic and
Transformation-based tagging, Issues in PoS tagging

Unsmoothed N-grams:
Definition:
o An n-gram is a contiguous sequence of n items (words, characters, or tokens)
within a larger sequence of text.
o Unsmoothed n-grams involve calculating the probability of observing each n-
gram in the training data directly from the counts of those n-grams, without
any adjustments for unseen or rare events.
2. Probability Estimation:
o Given a corpus of text, the probability of a word sequence is estimated by
counting the occurrences of each n-gram in the training data and dividing by
the total count of all n-grams.
o For example, the probability of observing the word sequence "the cat sat"
using trigrams would be estimated by counting the number of occurrences of
the trigram "the cat sat" and dividing by the total count of all trigrams in the
corpus.
3. Challenges:
o Unsmoothed n-grams can suffer from data sparsity issues, especially for
higher-order n-grams or in corpora with limited training data.
o If an n-gram is not observed in the training data, its probability will be zero,
which can lead to severe underestimation of the likelihood of unseen word
sequences.
4. Usage:
o Despite their limitations, unsmoothed n-grams can still be useful in certain
contexts, particularly for small or specialized corpora where data sparsity is
less of an issue.
o Unsmoothed n-grams can serve as a baseline model for comparison with more
sophisticated language models that incorporate smoothing techniques.
5. Evaluation:
o The performance of unsmoothed n-gram models can be evaluated using
standard metrics such as perplexity or accuracy on a held-out test set.
o Perplexity measures how well the model predicts the test data and can indicate
the effectiveness of the language model in capturing the distribution of word
sequences in the training corpus.

Unsmoothed N-grams
Language modeling is the way of determining the probability of any sequence of words.
Language modeling is used in a wide variety of applications such as Speech Recognition,
Spam filtering, etc. In fact, language modeling is the key aim behind the implementation of
many state- of-the-art Natural Language Processing models.
Methods of Language Modelings:
Two types of Language Modelings:
• Statistical Language Modelings: Statistical Language Modeling, or Language
Modeling, is the development of probabilistic models that are able to predict the next
word in the sequence given the words that precede. Examples such as N-gram
language modeling.
• Neural Language Modelings: Neural network methods are achieving better results
than classical methods both on standalone language models and when models are
incorporated into larger models on challenging tasks like speech recognition and
machine translation. A way of performing a neural language model is through word
embeddings.
N-gram
N-gram can be defined as the contiguous sequence of n items from a given sample of text
or speech. The items can be letters, words, or base pairs according to the application. The
N-grams typically are collected from a text or speech corpus (A long text dataset).
N-gram Language Model:
An N-gram language model predicts the probability of a given N-gram within any
sequence of words in the language. A good N-gram model can predict the next word in the
sentence i.e the value of p(w|h)
Example of N-gram such as unigram (“This”, “article”, “is”, “on”, “NLP”) or bigram
(‘This article’, ‘article is’, ‘is on’,’on NLP’).
Now, we will establish a relation on how to find the next word in the sentence using
. We need to calculate p(w|h), where is the candidate for the next word. For example in the
above example, lets’ consider, we want to calculate what is the probability of the last word
being “NLP” given the previous words:

After generalizing the above equation can be calculated as:

But how do we calculate it? The answer lies in the chain rule of probability:

Now generalize the above equation:

Simplifying the above formula using Markov assumptions:

• For unigram:

• For Bigram:

# imports

import string import random

import nltk nltk.download('punkt')
nltk.download('stopwords')
nltk.download('reuters')

from nltk.corpus import reuters from

nltk import FreqDist

# input the reuters sentences sents

=reuters.sents()

# write the removal characters such as : Stopwords and punctuation stop_words =

set(stopwords.words('english'))

string.punctuation = string.punctuation +'"'+'"'+'-'+'''+'''+'—' string.punctuation

removal_list = list(stop_words) + list(string.punctuation)+ ['lt','rt'] removal_list

Evaluating N-grams

Metrics for Language Modelings

• Entropy: Entropy, as a measure of the amount of information conveyed by
Claude Shannon. Below is the formula for representing entropy
• Now, we know the probability of getting the first word as natural. But, what’s the
probability of getting the next word after getting the word ‘Language‘ after the word
‘Natural‘.

Interpolation and Backoff

In unigram prior smoothing, we replace (1/V) with P(Wi). This includes Interpolation as
we include Unigram result in Bigram calculation. (Read Interpolation below). This works
well but still cannot we used for Language Modelling.
Add one smoothing makes massive changes in actual data. Its used in text-classification
where the number of zeros are not large.
Backoff : Sometimes we don’t have enough data to trust our k-model , so we move to lower
model Language model.
INTERPOLATION: Mix Unigram, Bigram and
Trigram. Linear Interpolation: It is of 2 types.

1. Simple Interpolation: L1 P(Wi) + L2 P(Wi|Wi-1) + L3 P(Wi|Wi-2Wi-1); L1+L2+L3 =

1;
2. Lambdas conditional Interpolation: Lambdas depend on context.

The value of these lambdas can be calculated if we have a dev set/training data to set these
lambdas to maximize the likelihood of these models.
Open vs Closed Vocabulary tasks: If we know all the words in advance then its a close
Vocabulary. But many times we don’t know all words. In our test set there might occur OOV
(Out of Vocabulary ) words. One way to deal with them is to create a special token called
<UNK>. Any word which has low probability while training is changed to <UNK> and then
the model is treated as if the word <UNK> is there. At decoding time, we change the unseen
word to <UNK> and then compute its probability
w.r.t the Language Model.
Smoothing for Large N-Grams (Web-scale-N-Grams): We use Stupid Backoff technique.
P(Wi|Wi-k … Wi-1) = { simple probability formula } but if count(Wi-k …. Wi) is less than
some threshold then we use k* P(Wi|Wi-k+1 …. Wi-1). Where k is some fraction and this
can be best calculated by our dev set data. Stupid backoff produces scores rather than
probabilities. And this works quite well for large scale N-Grams.
Add-K does not work good for Language modelling. Stupid backoff works well for Large
N-Grams. Lets start with Good Turing smoothing.

Good Turing Smoothing

Nc = Frequency of frequency c.

Example : Abhinav I am Abhinav am I

this this = 1 ; I = 2 ; am = 2 ; Abhinav 2

N1 = 1 ; N2 = 3
Read the intuition of Good Turing smoothing here. Done by leave one out validation and by
taking a held set. Very well explained here (from 11th Minute).
P (new things occurring ) = N1
/ N ; Generalizing: c* =
(c+1)*Nc+1 / Nc And P = c* /
N.

Therefore in case of new things, c* = 0 as we have not observed it earlier. After first few
N’s namely N0,N1 … N10, we replace the N’s by a smooth function as there will be gaps.
Say N127 can be zero. Therefore we replace with the best fit paralog function.
Kneser Ney Smoothing

First do absolute discounting (as a result of good turing smoothing). But we see the
discounting factor is nearly equal to 0.75. So we do absolute discounting and then do
interpolation — Discounting + ( L(Wi-1)*P(w) ).
Instead of P(w) (Unigram probability){How likely is w}, Pcontinuation(w){How likely is w
to appear as a novel continuation} is a better estimate.
P continuation (w) = Words which precede w / Total word bigrams.

A frequent word like Fransisco will have low continuation probability as it only appears
after San. P kn (Wi | Wi-1) = (count (Wi-1Wi) — d)/count (Wi-1) + L(Wi-1) Pcont(Wi)

Kneser Ney Smoothing for N-Grams is as below:

Pkn( wi | wi-n+1 … wi-1) = [ max( countkn( wi-n+1… wi ) — d, 0) ] / [ countkn( wi-
n+1… wi-1 ) ] + Θ( wi-n+1 …. wi-1 ) x Pkn( wi | wi-n+2….. wi-1 )
Where:

ckn(•) =the actual count(•) for the highest order n-gram

• the actual count(•) for the highest order n-gram
or continuation_count(•) for lower order n-gram
=> continuation_count = Number of unique single word contexts for

WORD CLASSES:
Words can be grouped into classes referred to as Part of Speech (PoS) or morphological
classes

Traditional grammar is based on few types of PoS (noun, verb, adjective, preposition, adverb,
conjunction, etc..) ▫More recent models are based on a larger number of classes
45 Penn Treebank 
87 Brown corpus

 146 C7 tagset
The word PoS provides crucial information to determine the roles of the word itself and of
the words close to it in the sentence
 knowing if a word is a personal pronoun (I, you, he/she,.. ) or a possessive pronoun (my,
your, his/her,…) allows a more accurate selection of the most probable words that appear in
its neighborhood (the syntactic rules are often based on the PoS of words)
 e.g. possessive pronoun - noun vs. personal pronoun – verb
The 4 largest open classes of words, present in most of the languages, are
Stochastic POS Tagging
Another technique of tagging is Stochastic POS Tagging. Now, the question that arises here
is which model can be stochastic. The model that includes frequency or probability
(statistics) can be called stochastic. Any number of different approaches to the problem of
part-of-speech tagging can be referred to as stochastic tagger.
The simplest stochastic tagger applies the following approaches for POS tagging −
Word Frequency Approach
In this approach, the stochastic taggers disambiguate the words based on the probability that a
word occurs with a particular tag. We can also say that the tag encountered most frequently
with the word in the training set is the one assigned to an ambiguous instance of that word.
The main issue with this approach is that it may yield inadmissible sequence of tags.
Tag Sequence Probabilities
It is another approach of stochastic tagging, where the tagger calculates the probability of a
given sequence of tags occurring. It is also called n-gram approach. It is called so because the
best tag for a given word is determined by the probability at which it occurs with the n
previous tags.
Properties of Stochastic POST Tagging
Stochastic POS taggers possess the following properties −
• This POS tagging is based on the probability of tag occurring.
• It requires training corpus
• There would be no probability for the words that do not exist in the corpus.
• It uses different testing corpus (other than training corpus).
• It is the simplest POS tagging because it chooses most frequent tags
associated with a word in training corpus.
Transformation-based Tagging
Transformation based tagging is also called Brill tagging. It is an instance of the
transformation-based learning (TBL), which is a rule-based algorithm for automatic tagging
of POS to the given text. TBL, allows us to have linguistic knowledge in a readable form,
transforms one state to another state by using transformation rules.
It draws the inspiration from both the previous explained taggers − rule-based and stochastic.
If we see similarity between rule-based and transformation tagger, then like rule-based, it is
also based on the rules that specify what tags need to be assigned to what words. On the other
hand, if we see similarity between stochastic and transformation tagger then like stochastic, it
is machine learning technique in which rules are automatically induced from data.
Working of Transformation Based Learning(TBL)
In order to understand the working and concept of transformation-based taggers, we need to
understand the working of transformation-based learning. Consider the following steps to
understand the working of TBL −
• Start with the solution − The TBL usually starts with some solution to the
problem and works in cycles.
• Most beneficial transformation chosen − In each cycle, TBL will choose the most
beneficial transformation.
• Apply to the problem − The transformation chosen in the last step will be applied
to the problem.

The algorithm will stop when the selected transformation in step 2 will not add either more
value or there are no more transformations to be selected. Such kind of learning is best
suited in classification tasks.
Advantages of Transformation-based Learning (TBL)
The advantages of TBL are as follows −
• We learn small set of simple rules and these rules are enough for tagging.
• Development as well as debugging is very easy in TBL because the learned rules are
easy to understand.
• Complexity in tagging is reduced because in TBL there is interlacing of
machinelearned and human-generated rules.
• Transformation-based tagger is much faster than Markov-model tagger.
Disadvantages of Transformation-based Learning (TBL)
The disadvantages of TBL are as follows −
• Transformation-based learning (TBL) does not provide tag probabilities.
• In TBL, the training time is very long especially on large corpora.
.
UNIT-3 Syntactic Analysis

Context-Free-Grammars, Grammar rules for English,Treebanks,Normal Forms for grammar-

Dependency Grammar-Syntactic Parsing,Ambiguity,Dynamic Programming parsing-Shallow parsing-
Probabilistic CFG,Probabilistic CYK,Probabilistics Lexicalized CFGs-Feature structure ,Unification of
feature structures.
Nomenclature – a system of naming things, especially in science
Auxiliary verbs (also called helping verbs) are used along with a main verb to express tense, mood, or
voice. For example, in the statement “it is raining,” “is” functions as an auxiliary verb indicating that the
action of the main verb (“raining”) is ongoing.

Examples of auxiliary verbs include “be,” “do,” “have,” “will,” “shall,” “may,” “can,” “must,” “ought,”
“should,” “could,” and “would.”
The sentence "That cold, empty sky was full of fire and light" is broken down
into its grammatical components. The subject of the sentence is the noun
phrase "That cold, empty sky," and the predicate is "was full of fire and
light," with "full" as the predicate adjective and "of fire and light" as the
prepositional phrase that modifies "full."

This tree visually illustrates the hierarchical relationships between the

different parts of the sentence according to the rules of syntax.

Sentence:
"That cold, empty sky was full of fire and light."

Structure:
1. S (Sentence): The root node, representing the entire sentence.
2. NP-SBJ (Noun Phrase - Subject): This is the subject of the sentence. It contains:
o DT (Determiner): "That"
o JJ (Adjective): "cold"
o , (Comma): Separates adjectives.
o JJ (Adjective): "empty"
o NN (Noun): "sky"

So, the subject is "That cold, empty sky."

3. VP (Verb Phrase): This represents the action or predicate in the sentence. It contains:
o VBD (Verb, Past Tense): "was" (the linking verb in past tense).
o ADJP-PRD (Adjective Phrase - Predicate): The predicate describing the
subject.
 JJ (Adjective): "full" (describing the state of the subject).
 PP (Prepositional Phrase): Begins with a preposition and includes:
 IN (Preposition): "of"
 NP (Noun Phrase): The object of the preposition "of," consisting
of:
 NN (Noun): "fire"
 CC (Coordinating Conjunction): "and"
 NN (Noun): "light"

This forms the phrase "of fire and light."

Syntactic Parsing
Ambiguity - 1
Ambiguity -2

Example:
Dynamic Programming Parsing
Shallow Parsing
3.7 Probabilistic Cocke-Younger-Kasami (PCYK/PCKY)
CYK another Example:
PCYK /PCKY Example:

Example of PCYK:

https://youtu.be/fairHU−DEVY?si=mIxsNEpCXb9wtIRV
3.8 Lexicalized PCFGs or Probabilistic Lexicalized CFGs:
https://www.youtube.com/watch?v=LuXv9T6KdV4
3.9
3.10
Second Assumption
The second probability in equation (1) above can be approximated by assuming that a word
appears in a category independent of the words in the preceding or succeeding categories
which can be explained mathematically as follows −
PROB (W1,..., WT | C1,..., CT) = Πi=1..T PROB (Wi|Ci)

Now, on the basis of the above two assumptions, our goal reduces to finding a sequence C
which maximizes
Πi=1...T PROB(Ci|Ci-1) * PROB(Wi|Ci)

Now the question that arises here is has converting the problem to the above form really
helped us. The answer is - yes, it has. If we have a large tagged corpus, then the two
probabilities in the above formula can be calculated as −
PROB (Ci=VERB|Ci-1=NOUN) = (# of instances where Verb follows Noun) / (# of
instances where Noun appears) (2)

PROB (Wi|Ci) = (# of instances where Wi appears in Ci) /(# of instances where Ci appears)
(3)

Maximum Entropy models introduction

We can see any problems in natural language processing as linguistic classification problems
in which linguistic contexts are used to predict linguistic classes. Maximum entropy models
are a clean way to combine various pieces of contextual evidence to estimate the probability
of a particular linguistic class occurring with a specific linguistic context.
Maximum entropy classification is a method that generalizes logistic regression to multiclass
problems. The Maximum Entropy model is a type of log-linear model.
If we are given some data and told to decide, we could think of attributes about the data,i.e.,
features. Some of these features might be more important than others.
We apply a weight to each feature found in the data, and we add up all of the features.
Finally, the weighted sum is normalized to give a fraction between 0 and 1. We can use this
fraction to tell us the score of how confident we might be in making a decision.
Maximum Likelihood
The principle of Maximum Likelihood is defined as we have to find the parameter values w
such that it models the input data x with the maximum probability. The aim is to find the
weight parameters that will maximize the likelihood of the training data.
We Assume we have a random sample with a training set of n examples. We assume input
values to be independent, so the probability function f(x,w) is the product of the
probabilities of each input.
Like maximum likelihood, the entire conditional probability says we choose a parameter
estimate w_hat that maximizes the product f(yi|xi, w).

We assume yi are independent conditionally on xi. To find w_hat, we can use

gradient-based solutions like gradient descent. Gradient descent involves:
• Starting with random weights w.
• Looping through the training dataset.
• Calculating the gradient.
At each iteration, we update the weights w by moving some distance in the direction of the
gradient. We repeat this until it converges or completes its iteration count.
Log-Linear Model
The log-linear model uses a linear combination of features and weights to find the predicted
label with maximum log-likelihood. The log-likelihood is the logarithm of the likelihood
function. Since the logarithm function is a monotonic increasing function, maximizing the
log-likelihood maximizes the likelihood.
We describe the probability as:

The function f(x,y) is a function that can account for relations between data and labels. It
expresses some characteristics of the data point. It results in a value of 0 or 1 depending on
the absence or presence. The wj is a weight of the feature function that captures how closely
a given feature is related to a provided label. In the training process, wj is randomly
initialized initially. The training process will learn the weight through gradient descent with
some optimization methods.

Approach
In the training phase, we have to find weight w. Let us start with the log-likelihood function:
This function L(w) measures how well w explains the labeled data. The higher

value of P(y|x; w) greater is the value of L(w). The maximum-likelihood function uses the
argmax function to find the best values for the parameter w:

The process involves iterating through training data many iterations.

1. Initially, initialize the w to some random values.

2. Keep iterating through each input. During each iteration, we update the weight by
finding the derivative of L(w) concerning wj.
3. Updating vector was below and repeated until converged.
Maximum Entropy Likelihood

The maximum entropy model is log-linear. MaxEnt handles multinomial distribution. The
maximum entropy principle states that we have to model the given set of data by finding the
highest entropy to satisfy the constraints of our previous knowledge.
To find the probability for each class, Maximum Entropy is defined as:

Applications

MaxEnt classification is a more classical machine learning task and solves problems beyond
natural language processing. Here are a few:
• Sentiment analysis (e.g., given a product review, the reviewer likes and dislikes about
the product).
• Preferences (e.g., Given a person's demographics, who will a person vote for? Would
they prefer Superman, Batman, or the Teenage Mutant Ninja Turtles? etc.).
• Diagnosis (e.g., Given characteristics of several medical images and patient history,
what medical condition is a person at risk of having?).
Maximum Entropy Markov Model

There are many systems where there is a time or state dependency. These systems evolve
through a sequence of states, and past states influence the current state. For example, stock
prices, DNA sequencing, human speech, or words in a sentence.
Maximum Entropy Markov Model makes use of state-time dependencies,i.e., it uses
predictions of the past and the current observation to make the current prediction.
In image analysis, we're required to classify the object into one of many classes. We estimate
the probability for each class. Rather than take a hard decision on one of the outcomes, it's
better to output probabilities, which will benefit downstream tasks. Multinomial logistic
regression is also called softmax regression or Maximum Entropy (MaxEnt) classifier.
Entropy's related to the disorder. Higher the disorder, less predictable the outcomes, and
hence more information. For example, an unbiased coin has more information (and entropy)
than one that mostly lands up heads.
MaxEnt is about picking a probability distribution that maximizes the entropy.
Then, there's Markov Chain. It models a system as a set of states with probabilities assigned
to state transitions. While MaxEnt computes probabilities for each input independently, the
Markov chain recognizes a dependency from one state to the next. Thus, MEMM
maximizes entropy plus using state dependencies (Markov Model).
The MEMM has dependencies between each state and the full observation sequence
explicitly. MEMM has only one transition probability matrix. This matrix encapsulates
previous states y(i−1) and current observation x(i) pairs in the training data to the current
state y(i). Our goal is to find the P(y1,y2,…,yn|x1,x2,…xn). This is given by:
Since HMM only depends on the previous state, we can limit the condition of y(n)
given y(n-1). This is the Markov independence assumption.

So MEMM defines using Log-linear model as:

Shortcomings Of MEMM

MEMM suffers from what's called the label bias problem. Once we're in a state or
label, the following observation will select one of many transitions leaving that state.
However, the model as a whole would have many more transitions. If a state has only
one outgoing change, the observation has no influence. Simply put, transition scores
are normalized on a per-state basis.
UNIT 4 SEMANTICS AND PRAGMATICS
Requirement for representation, First-Order Logic, Description Logics-Syntax-Driven Semantic analysis,
Semantic attachments-word Senses, Relations between senses, Thematic Roles,selctional restrictions-Word
Sense ,Disambiguation,WSD using Supervised-Word similarity using Thesaurus and Distributional methods.
What are Semantics?
 Study of words, phrases and sentences in a language.
 Explores how words and grammatical structures contribute to the meaning of sentences, and how
meaning is composed and interpreted.
 focuses on the literal meaning of language, and aims to understand how meaning is derived from the
linguistic form.

What is Pragmatics?
 Study of how language is used in context.
 Investigates how meaning is affected by factors such as the speaker's intentions, the listener's knowledge,
and the communicative situation.
 focuses on the non-literal or implied meaning of language, and aims to understand how meaning
is derived from the use of language in social interaction.

Requirements for representation:

● Compositionality: The meaning of a sentence is composed of the meanings of each word and the way
they are combined. This means that the meaning of a sentence can be derived from the meanings of its
parts.

● Truth conditions: A representation must specify the truth conditions for a sentence, i.e., the
conditions under which the sentence would be true or false.

● Context sensitivity: The meaning of a sentence may depend on the context in which it is used.
Therefore, a representation must be able to account for the effects of context on meaning.

● Pragmatic relevance: A representation must be relevant to the communicative situation. This means
that it should take into account the speaker's intended meaning and the listener's interpretation.

● Generality: A representation should apply to a wide range of sentences and situations.

● Consistency: A representation should be consistent with other linguistic and cognitive theories, and
should not lead to contradictions or inconsistencies.

First-order Logic:

● First-order logic (FOL) is a formal language that has been used in semantics and pragmatics to represent
the meaning of sentences in a structured and logical way.
● FOL allows us to represent the relationships between objects, properties, and events in a precise and
formal manner, which can be useful for analyzing and understanding the meaning of natural language
expressions.
● In semantics:
○ FOL allows us to represent the meanings of words and sentences regarding their truth
conditions.
■ Example: The sentence "John is a doctor" can be represented in FOL as
"Doctor(John)", which means that the object "John" has the property of being a
doctor.
○ FOL allows us to represent the logical structure of sentences, including their subject-
predicate structure and the relationships between different parts of the sentence.
■ Example: The sentence "All dogs bark" can be represented in FOL as "For all x,
if x is a dog, then x barks", where "For all x" is a quantifier that means "for
every x", "if x is a dog" is a predicate that describes the property of being a dog,
and "x barks" is a predicate that describes the action of barking.
○ FOL representations can help to avoid ambiguity and inconsistency in
interpretation.
■ Example: The sentence "Every student passed the exam" can be represented in
FOL as "For all x, if x is a student, then x passed the exam", which avoids the
ambiguity of the sentence "Every student passed the exam with flying colours",
since the latter may suggest that all students scored exceptionally well, which is
not necessarily implied by the former.
● In pragmatics:
○ FOL allows us to represent the speaker's intended meaning and the listener's
interpretation of a sentence.
■ Example: The sentence "I need help with my homework" can be represented in
FOL as a request for assistance, such as "Request(Assistance, Speaker,
Homework)", where "Request" is a predicate that describes the communicative
act of making a request, "Assistance" is a variable that represents the object
being requested, "Speaker" is a variable that represents the speaker, and
"Homework" is a variable that represents the object for which assistance is
needed.
○ FOL allows us to represent the context in which a sentence is used, including the
speaker's beliefs, intentions, and assumptions, and the listener's knowledge and
expectations.
■ Example: The sentence "Do you have the time?" can be represented in FOL as
a request for information, such as "Request(Time, Listener)", where "Request"
is a predicate that describes the communicative act of making a request,
"Time" is a variable that represents the object being requested, and "Listener"
is a variable that represents the person being addressed.
○ FOL representations can help to capture the communicative function of a
sentence and its relationship to other expressions in the discourse.
■ Example: The sentence "I'm sorry, I can't come to your party tonight" can be
represented in FOL as a polite refusal, such as "Refusal(Party, Speaker,
Listener)", where "Refusal" is a predicate that describes the
communicative act of refusing an invitation, "Party" is a variable that represents the event being refused,
"Speaker" is a variable that represents the person making the refusal, and "Listener" is a variable that represents
the person being addressed.
● Description Logics (DLs) are a family of formal knowledge representation languages used to
represent and reason complex concepts and relationships in a structured and logical manner.
● DLs are a subset of first-order logic (FOL) specifically designed for representing knowledge in a way
that is both expressive and computationally tractable.
● Provides formal semantics for natural language expressions, allowing us to represent their meaning
in a structured and logical way.
● Used to construct ontologies, which are structured representations of knowledge in a particular domain.
● Allows us to reason about the relationships between concepts and instances, and to infer new
knowledge based on existing knowledge.
● Often uses inference engines to perform reasoning tasks, such as consistency checking, classification,
and query answering.DL operates under an open-world assumption, which allows for more flexible and
incremental development of ontologies.

Syntax-Driven Semantic Analysis:

● A type of DL approach
● The syntax of natural language expression is used to drive the process of semantic analysis.
● Involves mapping the syntax of a sentence onto a formal logical structure, such as a DL
ontology, to derive its meaning.
● This approach allows for a more efficient and accurate analysis of natural language expressions, as the
syntactic structure can provide important cues for determining the meaning of ambiguous or complex
expressions.
● The grammar of the language works as a guide.
● The assumption is that the grammatical structure of a sentence reflects its underlying meaning and that
by analyzing this structure, we can infer the meaning of the sentence.
● For example, the sentence "John is a doctor who specializes in cardiology" can be analyzed
syntactically to identify the subclauses "John is a doctor" and "who specializes in cardiology", which
can then be mapped onto corresponding concepts in a DL ontology to derive the overall meaning of the
sentence.
Semantic Attachments:

● Semantic attachments, also known as semantic roles or theta roles, are a linguistic concept that
describes the relationship between the semantic content of a sentence and its syntactic structure.
In other words, they represent the different roles that words or phrases play in a sentence based on their
meaning.
● For example, in the sentence "John ate the pizza with a fork," the word "John" is the agent who
acts as eating, "pizza" is the patient that undergoes the action of being eaten, and "fork" is the instrument
that John uses to eat the pizza. These different roles are represented as semantic attachments associated
with each word or phrase in the sentence.

Word Senses:
● A word sense is a specific meaning of a word that is determined by its context. Semantic
attachments are a way of representing the meaning of a word in context by linking it to the concepts
or entities that it refers to.
● For example, consider the word "bank." Depending on the context in which it appears, it could refer to
a financial institution or the side of a river. In semantic attachments, we might represent these two senses
of the word as follows:
○ For the financial institution sense:
■ Word: "bank"
■ Sense: "financial institution"
■ Attachment: links to the concept of a financial institution, such as a bank
account, loans, or mortgages.
○ For the side of a river sense:
■ Word: "bank"
■ Sense: "river bank"
○ Attachment: links to the concept of a river, such as water, shore, or sediment.
● By representing word senses in this way, we can better understand the meaning of words in context
and use this information for various NLP tasks, such as information retrieval, machine translation, and
sentiment analysis.
Relations between senses:
● The relationship between senses is typically represented by semantic relations or roles.
● These relations capture the semantic relationships between the different senses of a word, as well as
the relationships between different words in a sentence or discourse.
● Some common types of semantic relations include:
○ Hyponymy/Hypernymy:
■ Captures the relationship between a specific instance of a concept
(hyponym) and its more general category (hypernym).
■ For example, "dog" is a hyponym of "animal" and "animal" is a hypernym of
"dog".
○ Synonymy:
■ Captures the relationship between different words or senses that have the same or
similar meaning.
■ For example, "car" and "automobile" are synonyms.
○ Antonymy:
■ Captures the relationship between words or senses that are opposite in
meaning.
■ For example, "hot" and "cold" are antonyms.
○ Meronymy/Holonymy:
■ Captures the relationship between a part and a whole.
■ For example, "wheel" is a meronym of "car" and "car" is a holonym of
"wheel".
○ Troponymy:
■ Captures the relationship between a verb and a more specific way in which
the action is carried out.
■ For example, "walk" is a troponym of "move" and "stroll" is a troponym of
"walk".
● These relations can be used to build a network of interconnected senses and concepts, which can be
used for various NLP tasks such as word sense disambiguation, information retrieval, and machine
translation.

Thematic roles:

● Also known as semantic roles or theta roles in certain places.

● Defined as the various roles that a noun phrase may play concerning the action or state
described by a governing verb, commonly the sentence's main verb.
● In other words, it is a way of describing the relationship between the participants in a sentence and the
event or state that the sentence describes.
● Roles that are commonly recognized are:
○ Agent:
■ The entity that performs the action or event described by the verb.
■ For example, in the sentence "John kicked the ball," "John" is the agent.
○ Patient:
■ The entity affected by the action or event described by the verb.
■ For example, in the sentence "John kicked the ball," "the ball" is the patient.
○ Theme:
■ The entity that is the topic or point of reference in the sentence.
■ For example, in the sentence "John gave the book to Mary," "the book" is the theme.
○ Experiencer:
■ The entity that experiences a mental or emotional state or perception.
■ For example, in the sentence "John loves Mary," "John" is the experiencer.
○ Instrument:
■ The entity that is used to act.
■ For example, in the sentence "John cut the bread with a knife," "a knife" is the
instrument.
○ Location:
■ The place where the action or event described by the verb takes place.
■ For example, in the sentence "John is at the store," "the store" is the location.
○ Source:
■ The entity from which something moves or originates.
■ For example, in the sentence "John came from New York," "New York" is the
source.
○ Goal:
■ The entity toward which something moves.
■ For example, in the sentence "John went to the store," "the store" is the goal.

Selectional Restrictions:

● Also known as the semantic constraints or lexical constraints.

● They are the limitations on the arguments that can be used with a given predicate or verb.
● In other words, they are the semantic properties or features that a verb's argument must have to be
licensed by that verb.
● Types:
○ Syntactic restrictions:
■ These are limitations based on the grammatical structure of a sentence.
■ For example, a transitive verb requires two arguments, while an
intransitive verb only requires one.
○ Semantic restrictions:
■ These are limitations based on the meaning of the verb or predicate and its
arguments.
■ For example, the verb "eat" has a semantic restriction that its object must be
edible.
○ Selectional preferences:
■ These are tendencies for certain types of arguments to be associated with a
particular verb.
■ For example, the verb "give" typically takes a recipient and a theme
argument.
○ Prototypicality effects:
■ These are effects based on the typical or prototypical characteristics of the verb
and its arguments.
■ For example, the verb "eat" typically takes a solid object as its object.
○ Frame semantics:
■ This is a more general approach that considers the entire event frame or
situation that a predicate and its arguments participate in, not just the meaning
of individual words.
● Selectional restrictions specify the types of arguments that a predicate can take.
● They help identify and disambiguate the correct interpretation of a word in a particular context.
● By narrowing down the possible meanings of the word, they increase the accuracy of natural language
processing applications such as text-to-speech, machine translation, and information retrieval.
● Violations of selectional restrictions can indicate errors or inconsistencies in text or the
computational model being used to analyze it.
● Selectional restrictions can be used to identify errors and inconsistencies in the text.

Word Sense Disambiguation:

● WSD involves determining the correct sense of a word based on its context in a given
sentence or text.
● Many words have multiple senses, and the correct sense must be identified to accurately interpret
the text.
● WSD can be performed using various techniques, including rule-based methods,
knowledge-based methods, and supervised and unsupervised machine learning
algorithms.
● Applications of WSD include machine translation, information retrieval, and text-to-speech
synthesis.
● WSD is a challenging problem, as context can be ambiguous and may require a deep
understanding of language and its nuances to correctly disambiguate word senses.
● Reasons, why WSD is used, are:
○ Ambiguity: Many words in natural language have multiple meanings or senses, and
WSD is used to disambiguate the correct sense in a given context.
○ Accuracy: Helps to improve the accuracy of NLP applications by ensuring that the
correct sense of a word is used.
○ Precision: WSD can also help to improve the precision of NLP applications by
reducing the number of false positives and false negatives.
○ Language understanding: WSD is an important task for natural language
understanding as it requires understanding the context and the meaning of the words
in the sentence.
○ Information retrieval: WSD is used in information retrieval systems to retrieve
relevant documents based on the intended meaning of the query.
WSD using Supervised, Dictionary and Thesaurus:

Supervised Approach:

● This approach uses labelled examples to train a machine learning model to predict the correct
sense of a word in context.
● A popular algorithm for supervised WSD is the Naive Bayes classifier.
● Example: In the sentence "I went to the bank to deposit my paycheck," the word "bank" could
refer to a financial institution or a river bank. A supervised WSD model would use labelled
examples to learn how to predict the correct sense based on the context of the sentence.

Dictionary-Based Approach:

● This approach uses a dictionary or lexical database that provides information on the
different senses of a word.
● When presented with a word in context, the approach looks up the word in the dictionary and
chooses the sense that best fits the context.
● Example: In the sentence "I love to play the bass guitar," the word "bass" could refer to a fish or a
low-pitched musical instrument. A dictionary-based WSD approach would look up
"bass" and choose the sense that matches the context of the sentence.

Thesaurus-Based Approach:

● This approach uses a thesaurus or semantic network that groups words based on their semantic
similarity.
● When presented with a word in context, the approach identifies related words in the
thesaurus and chooses the sense that best fits the context.
● Example: In the sentence "The company's revenue has been steadily increasing," the word
"increase" could refer to an uptick in profits or a general upward trend. A thesaurus-based WSD
approach would identify related words like "growth" and "improvement" and choose the sense that
matches the context of the sentence.

Bootstrapping methods:

Bootstrapping in NLP:

● It is a method for automatically building a computational model or system through a

combination of data-driven and rule-based approaches.
● It involves starting with a small amount of labelled data, using that data to create a
preliminary model, and then iteratively improving the model by using it to label more data.

Bootstrapping methods:

1. Self-training:
a. Definition: A method where a model is iteratively trained on a small labelled dataset and
then uses its predictions on a larger unlabeled dataset to expand the training set.
b. Example: In part-of-speech tagging, a model might be trained on a small labelled set of
sentences with their corresponding POS tags, and then use its predictions on a larger set
of unlabeled sentences to identify and add new POS-tagged examples to the training set.
2. Co-training:
a. Definition: A method where two or more models are trained independently on
different views of the same data, and then use their predictions on a larger unlabeled
dataset to improve each other's performance.
b. Example: In sentiment analysis, one model might be trained on a small labelled dataset
of tweets, while another is trained on a small labelled dataset of news articles. The
models can then use their predictions on a larger set of unlabeled social media data to
improve each other's accuracy.

3. Active learning:
a. Definition: A method where a model is trained on a small labelled dataset and
then used to select the most informative examples from an unlabeled dataset for
annotation. These examples are then added to the labelled dataset, and the model is
retrained on the expanded dataset.
b. Example: In named entity recognition, a model might be trained on a small labelled
dataset of sentences with named entities, and then use active learning to select the most
uncertain examples from a larger set of unlabeled sentences for manual annotation. These
labelled examples can then be used to improve the model's performance.
4. Semi-supervised learning:
a. Definition: A method where a model is trained on a small labelled dataset and a larger
unlabeled dataset, to leverage the structure of the unlabeled data to improve
performance on the labelled data.
b. Example: In machine translation, a model might be trained on a small labelled
dataset of sentence pairs in two languages, then use a larger set of unlabeled sentences
in both languages to improve the model's accuracy.
Word similarity using a thesaurus and distributional methods:
Word similarity is an important task in natural language processing (NLP) that involves measuring the degree
of relatedness between pairs of words. There are various approaches to measuring word similarity, including
the use of thesaurus and distributional methods.

Thesaurus-based method:
1. Thesaurus-based methods use a pre-existing vocabulary of synonyms, antonyms, and other lexical
relations to measure word similarity.
2. These methods rely on the idea that words with similar meanings tend to have similar or related
entries in a thesaurus.
3. Often involve mapping words to a set of synsets, which are groups of words that share a
common meaning.
4. Synsets are typically organized in a hierarchical or network structure that reflects the
relationships between words at different levels of abstraction or specificity.
5. To measure word similarity using a thesaurus-based method, one common approach is to compute
the distance or similarity between pairs of synsets.
6. This can be done using various metrics, such as the shortest path distance between synsets in a graph or
the amount of shared information between synsets based on their properties.
7. Example: One commonly used thesaurus for NLP is WordNet, which provides a hierarchical network
of synonym sets (synsets) for thousands of words. Word similarity can be computed using the shortest
path between two synsets in the WordNet graph.
Distributional-based method:

1. Distributional methods, on the other hand, use statistical information about the co-
occurrence patterns of words in large text corpora to estimate their similarity.
2. The intuition behind distributional methods is that words that occur in similar
contexts are likely to have similar meanings.
3. To apply a distributional method for measuring word similarity, one first needs to
represent words as high-dimensional vectors that capture their distributional
patterns in a corpus.
4. There are various ways to do this, such as counting the frequency of words in a
fixed- size window of text, using neural network models that learn dense
embeddings of words, or applying matrix factorization techniques that
decompose the co-occurrence matrix of words into low-rank components.
5. Once words are represented as vectors, their similarity can be computed using
various distance or similarity metrics, such as cosine similarity, Euclidean
distance, or Mahalanobis distance.
6. These metrics capture the degree of overlap or distance between the vectors,
which reflects the degree of similarity or dissimilarity between the
corresponding words.
7. Example: One popular distributional method for word similarity is the cosine
similarity between word vectors. Word vectors are high-dimensional representations
of words that capture their semantic properties based on their distributional patterns
in a corpus. The cosine similarity between two-word vectors measures the degree
of similarity between the corresponding words.

Both thesaurus-based and distributional methods have their strengths and weaknesses, and their
choice depends on the specific application and resources available. Thesaurus-based methods can
be more precise in measuring word similarity, but they rely on a fixed vocabulary that may not
be exhaustive or flexible enough for some tasks. Distributional methods, on the other hand, can
capture more nuanced and context-dependent meanings, but they may require more data and
computational resources to train and apply.

UNIT V
DISCOURSE ANALYSIS AND LEXICAL RESOURCES Discourse segmentation,
Coherence– Reference Phenomena, Anaphora Resolution using Hobbs and Centering
Algorithm – Coreference Resolution – Resources: Porter Stemmer, Lemmatizer, Penn
Treebank, Brill’sTagger, WordNet, PropBank, FrameNet,

The most difficult problem of AI is to process the natural language by computers or in

other words natural language processing is the most difficult problem of artificial
intelligence. If we talk
about the major problems in NLP, then one of the major problems in NLP is discourse
processing − building theories and models of how utterances stick together to form coherent
discourse. Actually, the language always consists of collocated, structured and coherent
groups of sentences rather than isolated and unrelated sentences like movies. These coherent
groups of sentences are referred to as discourse.
Concept of Coherence
Coherence and discourse structure are interconnected in many ways. Coherence, along with
property of good text, is used to evaluate the output quality of natural language generation
system. The question that arises here is what does it mean for a text to be coherent? Suppose
we collected one sentence from every page of the newspaper, then will it be a discourse? Of-
course, not. It is because these sentences do not exhibit coherence. The coherent discourse
must possess the following properties −
Coherence relation between utterances
The discourse would be coherent if it has meaningful connections between its utterances. This
property is called coherence relation. For example, some sort of explanation must be there to
justify the connection between utterances.
Relationship between entities
Another property that makes a discourse coherent is that there must be a certain kind of
relationship with the entities. Such kind of coherence is called entity-based coherence.
Discourse structure
An important question regarding discourse is what kind of structure the discourse must have.
The answer to this question depends upon the segmentation we applied on discourse.
Discourse segmentations may be defined as determining the types of structures for large
discourse. It is quite difficult to implement discourse segmentation, but it is very important
for information retrieval, text summarization and information extraction kind of
applications.
Algorithms for Discourse Segmentation
In this section, we will learn about the algorithms for discourse segmentation. The algorithms
are described below −
Unsupervised Discourse Segmentation
The class of unsupervised discourse segmentation is often represented as linear
segmentation. We can understand the task of linear segmentation with the help of an
example. In the example, there is a task of segmenting the text into multi-paragraph units; the
units represent the passage of the original text. These algorithms are dependent on cohesion
that may be defined as the use of certain linguistic devices to tie the textual units together.
On the other hand, lexicon cohesion is the cohesion that is indicated by the relationship
between two or more words in two units like the use of synonyms.
Supervised Discourse Segmentation
The earlier method does not have any hand-labeled segment boundaries. On the other hand,
supervised discourse segmentation needs to have boundary-labeled training data. It is very
easy to acquire the same. In supervised discourse segmentation, discourse marker or cue
words play an important role. Discourse marker or cue word is a word or phrase that
functions to signal discourse structure. These discourse markers are domain-specific.
Text Coherence
Lexical repetition is a way to find the structure in a discourse, but it does not satisfy the
requirement of being coherent discourse. To achieve the coherent discourse, we must focus
on coherence relations in specific. As we know that coherence relation defines the possible
connection between utterances in a discourse. Hebb has proposed such kind of relations as
follows −
We are taking two terms S0 and S1 to represent the meaning of the two related sentences −

Result
It infers that the state asserted by term S0 could cause the state asserted by S1. For example,
two statements show the relationship result: Ram was caught in the fire. His skin burned.

Explanation
It infers that the state asserted by S1 could cause the state asserted by S0. For example, two
statements show the relationship − Ram fought with Shyam’s friend. He was drunk.
Parallel
It infers p(a1,a2,…) from assertion of S0 and p(b1,b2,…) from assertion S1. Here ai and bi
are similar for all i. For example, two statements are parallel − Ram wanted car. Shyam
wanted money.
Elaboration
It infers the same proposition P from both the assertions − S0 and S1 For example, two
statements show the relation elaboration: Ram was from Chandigarh. Shyam was from
Kerala.
Occasion
It happens when a change of state can be inferred from the assertion of S0, final state of
which can be inferred from S1 and vice-versa. For example, the two statements show the
relation occasion: Ram picked up the book. He gave it to Shyam.
Building Hierarchical Discourse Structure
The coherence of entire discourse can also be considered by hierarchical structure between
coherence relations. For example, the following passage can be represented as hierarchical
structure
−
S1 − Ram went to the bank to deposit

money. S2 − He then took a train to

Shyam’s cloth shop. S3 − He wanted to

buy some clothes.

S4 − He do not have new clothes for party.

S5 − He also wanted to talk to Shyam regarding his health

Reference Resolution
Interpretation of the sentences from any discourse is another important task and to achieve
this we need to know who or what entity is being talked about. Here, interpretation reference
is the key element. Reference may be defined as the linguistic expression to denote an entity
or individual. For example, in the passage, Ram, the manager of ABC bank, saw his friend
Shyam at a shop. He went to meet him, the linguistic expressions like Ram, His, He are
reference.
On the same note, reference resolution may be defined as the task of determining what
entities are referred to by which linguistic expression.
Terminology Used in Reference Resolution
We use the following terminologies in reference resolution −
Referring expression − The natural language expression that is used to perform reference is
called a referring expression. For example, the passage used above is a referring expression.
Referent − It is the entity that is referred. For example, in the last given example Ram is a
referent.
Corefer − When two expressions are used to refer to the same entity, they are called corefers.
For example, Ram and he are corefers.
Antecedent − The term has the license to use another term. For example, Ram is the
antecedent of the reference he.
Anaphora & Anaphoric − It may be defined as the reference to an entity that has been
previously introduced into the sentence. And, the referring expression is called anaphoric.
Discourse model − The model that contains the representations of the entities that have been
referred to in the discourse and the relationship they are engaged in.
Types of Referring Expressions
Let us now see the different types of referring expressions. The five types of referring
expressions are described below −
Indefinite Noun Phrases
Such kind of reference represents the entities that are new to the hearer into the discourse
context. For example − in the sentence Ram had gone around one day to bring him some food
− some is an indefinite reference.
Definite Noun Phrases
Opposite to above, such kind of reference represents the entities that are not new or
identifiable to the hearer into the discourse context. For example, in the sentence - I used to
read The Times of India – The Times of India is a definite reference.
Pronouns
It is a form of definite reference. For example, Ram laughed as loud as he
could. The word he represents pronoun referring expression.
Demonstratives
These demonstrate and behave differently than simple definite pronouns. For example, this
and that are demonstrative pronouns.
Names
It is the simplest type of referring expression. It can be the name of a person, organization
and location also. For example, in the above examples, Ram is the name-refereeing
expression.
Reference Resolution Tasks
The two reference resolution tasks are described below.
Coreference Resolution
It is the task of finding referring expressions in a text that refer to the same entity. In simple
words, it is the task of finding corefer expressions. A set of coreferring expressions are called
coreference chain. For example - He, Chief Manager and His - these are referring
expressions in the first passage given as example.
Constraint on Coreference Resolution
In English, the main problem for coreference resolution is the pronoun it. The reason behind
this is that the pronoun it has many uses. For example, it can refer much like he and she. The
pronoun it also refers to the things that do not refer to specific things. For example, It’s
raining. It is really good.
Pronominal Anaphora Resolution
Unlike the coreference resolution, pronominal anaphora resolution may be defined as the task
of finding the antecedent for a single pronoun. For example, the pronoun is his and the task of
pronominal anaphora resolution is to find the word Ram because Ram is the antecedent.
Stemming is the process of producing morphological variants of a root/base word. Stemming
programs are commonly referred to as stemming algorithms or stemmers. A stemming
algorithm reduces the words “chocolates”, “chocolatey”, “choco” to the root word,
“chocolate” and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”. Stemming
is an important part of the pipelining process in Natural language processing. The input to
the stemmer is tokenized words. How do we get these tokenized words? Well, tokenization
involves breaking down the document into different words.
Stemming is a natural language processing technique that is used to reduce words to their
base form, also known as the root form. The process of stemming is used to normalize text
and make it easier to process. It is an important step in text preprocessing, and it is
commonly used in information retrieval and text mining applications.
There are several different algorithms for stemming, including the Porter stemmer, Snowball
stemmer, and the Lancaster stemmer. The Porter stemmer is the most widely used algorithm,
and it is based on a set of heuristics that are used to remove common suffixes from words.
The Snowball stemmer is a more advanced algorithm that is based on the Porter stemmer, but
it also supports several other languages in addition to English. The Lancaster stemmer is a
more aggressive stemmer and it is less accurate than the Porter stemmer and Snowball
stemmer.
Stemming can be useful for several natural language processing tasks such as text
classification, information retrieval, and text summarization. However, stemming can also
have some negative effects such as reducing the readability of the text, and it may not always
produce the correct root form of a word.
It is important to note that stemming is different from Lemmatization. Lemmatization is
the process of reducing a word to its base form, but unlike stemming, it takes into account
the context of the word, and it produces a valid word, unlike stemming which may produce
a non-word as the root form.
Under-stemming occurs when two words are stemmed from the same root that are not of
different stems. Under-stemming can be interpreted as false-
negatives. Under-stemming is a problem that can occur when using stemming algorithms in
natural language processing. It refers to the situation where a stemmer does not produce the
correct root form of a word or does not reduce a word to its base form. This can happen
when the stemmer is not aggressive enough in removing suffixes or when it is not designed
for the specific task or language.
Automatically interpreting and analysing the meaning of words and pre-
processing textual input can be complex in Natural Language Processing (NLP).
We frequently employ lexicons to help with this. A lexicon, word-hoard,
wordbook, or word-stock is a person's, language's, or branch of knowledge's
vocabulary. We frequently link the text in our data to the lexicon, which aids us in
comprehending the relationships between those terms.
Wordnet
WordNet is a massive lexicon of English words. Nouns, verbs, adjectives, and
adverbs are arranged into synsets,' which are collections of cognitive synonyms
that communicate a separate concept. Conceptual-semantic and linguistic links like
hyponymy and antonymy are used to connect synsets.
WordNet is similar to a thesaurus in that it groups words according

to their meanings. There are, nevertheless, some key distinctions. How

to use WordNet
The Natural Language Toolkit (NLTK) is a Python module for natural language
processing. It has many corpora, toy grammars, trained models, and, most
importantly, WordNet, which is of importance to this site. The English WordNet
module in the NLTK module has 155,287 words and 117,659 synonym sets.
Synset is a type of basic interface used in NLTK that allows you to look up words
in WordNet. Synset instances are a collection of synonyms that communicate the
same idea. Some words have only one Synset, while others have multiple.
from nltk.corpus import wordnet
syn = wordnet.synsets('hello')[0]
print ("Name of Synset : ", syn.name())

# Word Definition
print ("Meaning of Synset : ", syn.definition())

# a collection of phrases in which the word is used in context

print ("Synset's example : ", syn.examples())
Name of Synset : hello.n.01
Meaning of Synset : an expression of greeting

Synset's example : ['every morning they exchanged polite hellos']

and Hyponyms

Hypernyms are more esoteric terms.

More specific terms are referred to as hyponyms.
Synsets are grouped in a structure similar to that of an inheritance tree, therefore
both spring to mind. A root hypernym can be found at the top of this tree.
Hypernyms are a means of classifying and grouping words based on their
resemblance.
Code:
from nltk.corpus import wordnet synset =
wordnet.synsets('hello')[0]

print ("Synset's name : ", synset.name())

print ("Synset abstract term : ", synset.hypernyms())
print ("Specific term of Synset : ", synset.hypernyms()[0].hyponyms())
synset.root_hypernyms()

print ("\nSynset root hypernerm : ", synset.root_hypernyms())

Output:
Synset's name : hello.n.01

Synset abstract term : [Synset('greeting.n.01')]

Specific Term of Synset : [Synset('calling_card.n.02'), Synset('good_afternoon.n.01'),

Synset('good_morning.n.01'), Synset('hail.n.03'), Synset('hello.n.01'),
Synset('pax.n.01'), Synset('reception.n.01'), Synset('regard.n.03'),

Synset('salute.n.02'), Synset('salute.n.03'), Synset('welcome.n.02'),

Synset('well-wishing.n.01')]

Synset root hypernerm : [Synset('entity.n.01')]

Part of Speech (POS) in Synset synset =
wordnet.synsets('hello')[0] print ("Syntag:
", syn.pos())

syn = wordnet.synsets('doing')[0]
print ("Syntag : ", syn.pos())

syn = wordnet.synsets('beautiful')[0] print

("Syntag : ", syn.pos())

syn = wordnet.synsets('quickly')[0]
print ("Syntag : ", syn.pos()) Output:

Syntax : n
Syntax : v
Syntax : a
Syntax : r

The FrameNet corpus is a lexical database of English that is both human- and machine-
readable, based on annotating examples of how words are used in actual texts. FrameNet is
based on a theory of meaning called Frame Semantics, deriving from the work of Charles J.
Fillmore and colleagues. The basic idea is straightforward: that the meanings of most words
can best be understood on the basis of a semantic frame: a description of a type of event,
relation, or entity and the participants in it. For example, the concept of cooking typically
involves a person doing the cooking (Cook), the food that is to be cooked (Food), something
to hold the food while cooking (Container) and a source of heat (Heating_instrument). In the
FrameNet project, this is represented as a frame called Apply_heat, and the Cook, Food,
Heating_instrument and Container are called frame elements (FEs). Words that evoke this
frame, such as fry, bake, boil, and broil, are called lexical units (LUs) of the Apply_heat
frame. The job of FrameNet is to define the frames and to annotate sentences to show how
the FEs fit syntactically around the word that evokes the frame.
A Frame is a script-like conceptual structure that describes a particular type of situation,
object, or event along with the participants and props that are needed for that Frame. For
example, the “Apply_heat” frame describes a common situation involving a Cook, some
Food, and a Heating_Instrument, and is evoked by words such as bake, blanch, boil, broil,
brown, simmer, steam, etc.
We call the roles of a Frame “frame elements” (FEs) and the frame-evoking words are called
“lexical units” (LUs).
FrameNet includes relations between Frames. Several types of relations are defined, of which
the most important are:
• Inheritance: An IS-A relation. The child frame is a subtype of the parent frame,
and each FE in the parent is bound to a corresponding FE in the child. An example is the
“Revenge” frame which inherits from the “Rewards_and_punishments” frame.
• Using: The child frame presupposes the parent frame as background, e.g the “Speed”
frame “uses” (or presupposes) the “Motion” frame; however, not all parent FEs need to be
bound to child FEs.
• Subframe: The child frame is a subevent of a complex event represented by the
parent, e.g. the “Criminal_process” frame has subframes of “Arrest”, “Arraignment”, “Trial”,
and “Sentencing”.
• Perspective_on: The child frame provides a particular perspective on an un-
perspectivized parent frame. A pair of examples consists of the “Hiring” and
“Get_a_job” frames, which perspectivize the “Employment_start” frame from the
Employer’s and the Employee’s point of view, respectively.
To get a list of all of the Frames in FrameNet, you can use the frames() function. If you
supply a regular expression pattern to the frames() function, you will get a list of all Frames
whose names match that pattern:

>>> from pprint import pprint

>>> from operator import itemgetter

>>> from nltk.corpus import framenet as fn

>>> from nltk.corpus.reader.framenet import PrettyList

>>> x = fn.frames(r'(?i)crim')

>>> x.sort(key=itemgetter('ID'))

>>> x
To get the details of a particular Frame, you can use the frame() function passing in the frame
number:
>>> from pprint import pprint

>>> from nltk.corpus import framenet as fn

>>> f = fn.frame(202)
>>> f.ID
202

>>>
f.name
'Arrest'

>>> f.definition
"Authorities charge a Suspect, who is under suspicion of having committed a
crime..."
'Co-participant',
'Manner',

'Means',
'Offense',
'Place',
'Purpose',

'Source_of_legal_authority',
'Suspect',

'Time',

The'Type']
frame() function shown above returns a dict object containing detailed information about
the Frame. See the documentation on the frame() function for the specifics.
You can also search for Frames by their Lexical Units (LUs).
The frames_by_lemma() function returns a list of all frames that contain LUs in which the
‘name’ attribute of the LU matches the given regular expression. Note that LU names are
composed of “lemma.POS”, where the “lemma” part can be made up of either a single
lexeme (e.g. ‘run’) or multiple lexemes (e.g. ‘a little’) (see below).
>>> PrettyList(sorted(fn.frames_by_lemma(r'(?i)a little'),
key=itemgetter('ID')))

[<frame ID=189 name=Quanti...>, <frame ID=2001 name=Degree>]

Lexical Units

A lexical unit (LU) is a pairing of a word with a meaning. For example, the “Apply_heat”
Frame describes a common situation involving a Cook, some Food, and a Heating
Instrument, and is _evoked_ by words such as bake, blanch, boil, broil, brown, simmer,
steam, etc. These frame-evoking words are the LUs in the Apply_heat frame. Each sense of
a polysemous word is a different LU.
We have used the word “word” in talking about LUs. The reality is actually rather complex.
When we say that the word “bake” is polysemous, we mean that the lemma “bake.v” (which
has the word-forms “bake”, “bakes”, “baked”, and “baking”) is linked to three different
frames:
• Apply_heat: “Michelle baked the potatoes for 45 minutes.”
• Cooking_creation: “Michelle baked her mother a cake for her birthday.”
• Absorb_heat: “The potatoes have to bake for more than 30 minutes.”
These constitute three different LUs, with different definitions.
Multiword expressions such as “given name” and hyphenated words like “shut-eye” can also
be LUs. Idiomatic phrases such as “middle of nowhere” and “give the slip (to)” are also
defined as LUs in the appropriate frames (“Isolated_places” and “Evading”, respectively),
and their internal structure is not analyzed.
Framenet provides multiple annotated examples of each sense of a word (i.e. each LU).
Moreover, the set of examples (approximately 20 per LU) illustrates all of the combinatorial
possibilities of the lexical unit.
Each LU is linked to a Frame, and hence to the other words which evoke that Frame. This
makes the FrameNet database similar to a thesaurus, grouping together semantically similar
words.
In the simplest case, frame-evoking words are verbs such as “fried” in:
“Matilde fried the catfish in a heavy iron skillet.”
Sometimes event nouns may evoke a Frame. For example, “reduction” evokes
“Cause_change_of_scalar_position” in:
“…the reduction of debt levels to $665 million from $2.6 billion.”
Adjectives may also evoke a Frame. For example, “asleep” may evoke the “Sleep” frame as
in:
“They were asleep for hours.”
Many common nouns, such as artifacts like “hat” or “tower”, typically serve as dependents
rather than clearly evoking their own frames.
Details for a specific lexical unit can be obtained using this class’s lus() function, which takes
an optional regular expression pattern that will be matched against the name of the lexical
unit:
>>> from pprint import pprint
>>> PrettyList(sorted(fn.lus(r'(?i)a little'), key=itemgetter('ID')))
[<lu
You canID=14733 name=a
obtain detailed little.n>,
information <lu ID=14743
on a particular LU byname=a little.adv>,
calling the ...]
lu() function and
passing in an LU’s ‘ID’ number:
>>> from pprint import pprint

>>> from nltk.corpus import framenet as fn

>>>
fn.lu(256).name
'foresee.v'

>>> fn.lu(256).definition
'COD: be aware of beforehand; predict.'
Note that LU names take the form of a dotted string (e.g. “run.v” or “a little.adv”) in which a
lemma precedes the “.” and a part of speech (POS) follows the dot. The lemma may be
composed of a single lexeme (e.g. “run”) or of multiple lexemes (e.g. “a little”). The list of
POSs used in the LUs is:
v - verb n - noun a - adjective adv - adverb prep - preposition num - numbers intj -
interjection art - article c - conjunction scon - subordinating conjunction
For more detailed information about the info that is contained in the dict that is returned by
the lu() function, see the documentation on the lu() function.

Annotated Documents

The FrameNet corpus contains a small set of annotated documents. A list of these documents
can be obtained by calling the docs() function:
>>> from pprint import pprint

>>> from nltk.corpus import framenet as fn

>>> d = fn.docs('BellRinging')[0]
>>> d.corpname
'PropBank'

>>> d.sentence[49]
full-text sentence (...) in BellRinging:
[POS_tagset] PENN

[text] + [annotationSet]

`` I live in hopes that the ringers themselves will be drawn into

*****************
DesirCause_tCause
[1][3][2]

that fuller life .

****** Comple [4]
(Desir=Desiring, Cause_t=Cause_to_make_noise, Cause=Cause_motion, Comple=Completeness)
>>> d.sentence[49].annotationSet[1] annotation set (...):
[status] MANUAL

[LU] (6605) hope.n in Desiring [frame] (366) Desiring

[GF] 2 relations

[PT] 2 phrases

[text] + [Target] + [FE] + [Noun]

`` I live in hopes that the ringers themselves will be drawn into

- ^^^^ ^^ *****
E supp suEvent

that fuller life .

The corpus consists of one million words of American English texts printed in 1961. The
(E=Experiencer,
texts su=supp)
for the corpus were sampled from 15 different text categories to make the corpus a good
standard reference. Today, this corpus is considered small, and slightly dated. The corpus is,
however, still used. Much of its usefulness lies in the fact that the Brown corpus lay-out has
been copied by other corpus compilers. The LOB corpus (British English) and the Kolhapur
Corpus (Indian English) are two examples of corpora made to match the Brown corpus. The
availability of corpora which are so similar in structure is a valuable resourse for researchers
interested in comparing different language varieties, for example.

For a long time, the Brown and LOB corpora were almost the only easily available computer
readable corpora. Much research within the field of corpus linguistics has therefore been
made using these data. By studying the same data from different angles, in different kinds of
studies, researchers can compare their findings without
having to take into consideration possible variation caused by the use of different data.

At the University of Freiburg, Germany, researchers are compiling new versions of the LOB
and Brown corpora with texts from 1991. This will undoubtedly be a valuable resource for
studies of language change in a near diachronic perspective.

The Brown corpus consists of 500 texts, each consisting of just over 2,000 words. The texts
were sampled from 15 different text categories. The number of texts in each category varies
(see below).

More comprehensive information about the Brown corpus can be found in the
Brown Corpus Manual (external link).

BNC is distributed in a format which makes possible almost any kind of computer-based
research on the nature of the language. Obvious application areas include lexicography,
natural language understanding (NLP) systems, and all branches of applied and The
theoretical linguistics.
Uses of the BNC
Large language corpora can help provide answers for these kinds of questions -- if only
because they encourage linguists, lexicographers, and all who work with language to ask
them. The purpose of a language corpus is to provide language workers with evidence of
how language is really used, evidence that can then be used to inform and substantiate
individual theories about what words might or should mean. Traditional grammars and
dictionaries tell us what a word ought to mean, but only experience can tell us what a word
is used to mean. This is why dictionary publishers, grammar writers, language teachers, and
developers of natural language processing software alike have been turning to corpus
evidence as a means of extending and organizing that experience.

Unit-2 Aim 502
No ratings yet
Unit-2 Aim 502
6 pages
UNIT I_NLP
No ratings yet
UNIT I_NLP
24 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
Module-5:: Network Analysis
No ratings yet
Module-5:: Network Analysis
22 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
NLP Notes For Students
No ratings yet
NLP Notes For Students
18 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Unit 1 2 3 4 5 NLP Notes Merged
100% (1)
Unit 1 2 3 4 5 NLP Notes Merged
105 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
21AD3202 - Natural LanguageProcessing-Record
No ratings yet
21AD3202 - Natural LanguageProcessing-Record
64 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
NLP UNIT-II PPT
No ratings yet
NLP UNIT-II PPT
45 pages
NLP Unit-2 Notes
No ratings yet
NLP Unit-2 Notes
45 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
NLP Module 4 Notes
No ratings yet
NLP Module 4 Notes
8 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
Lecture Notes: IV B. Tech I Semester (JNTUH-R13)
No ratings yet
Lecture Notes: IV B. Tech I Semester (JNTUH-R13)
18 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
2 pages
NLP Notes
No ratings yet
NLP Notes
71 pages
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
No ratings yet
Development of An Indian Legal Language Model (LLM) For Enhanced Legal Text Analysis and Assistance
7 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
Unit 1
No ratings yet
Unit 1
35 pages
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
No ratings yet
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
11 pages
Natural Language Processing
100% (1)
Natural Language Processing
3 pages
Unit 4 NLP Notes
No ratings yet
Unit 4 NLP Notes
35 pages
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
0% (1)
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
8 pages
Unit I
No ratings yet
Unit I
30 pages
Question Bank
No ratings yet
Question Bank
13 pages
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
No ratings yet
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
29 pages
NLP Lab Manual Updated
No ratings yet
NLP Lab Manual Updated
34 pages
NLP Lab Tasks
No ratings yet
NLP Lab Tasks
16 pages
BAI601-NLP
No ratings yet
BAI601-NLP
5 pages
NNDL Unit-1
No ratings yet
NNDL Unit-1
28 pages
NLP Qb-Ese
No ratings yet
NLP Qb-Ese
2 pages
Unit 3
100% (1)
Unit 3
11 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
NLP Assignment Answer
No ratings yet
NLP Assignment Answer
4 pages
NLP Final
No ratings yet
NLP Final
26 pages
Natural Language Processing
100% (1)
Natural Language Processing
21 pages
QUESTION BANK UNIT 5 - Computer Organization and Architecture
No ratings yet
QUESTION BANK UNIT 5 - Computer Organization and Architecture
9 pages
Unit - 3 NLP - R20
No ratings yet
Unit - 3 NLP - R20
21 pages
NLP Course File Notes
No ratings yet
NLP Course File Notes
71 pages
NLP Notes Unit-3.Doc
No ratings yet
NLP Notes Unit-3.Doc
19 pages
NLP Unit-3-Semantics-And-Pragmatics
No ratings yet
NLP Unit-3-Semantics-And-Pragmatics
20 pages
Unit 4
100% (1)
Unit 4
8 pages
CCS369
No ratings yet
CCS369
2 pages
Natural Language Processing-ppt
No ratings yet
Natural Language Processing-ppt
40 pages
NLP Important and Super Important Questions-18CS743
No ratings yet
NLP Important and Super Important Questions-18CS743
2 pages
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
No ratings yet
NLP: Background and Overview: Introduction To Natural Language Processing (CSE5321)
30 pages
Chapter 6
100% (1)
Chapter 6
28 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
English q2wk2 '19
No ratings yet
English q2wk2 '19
4 pages
Unit 3 Houses and Apartments: House Apartment
No ratings yet
Unit 3 Houses and Apartments: House Apartment
15 pages
BB Day 2
No ratings yet
BB Day 2
7 pages
OEC - Medicine 1 SB
No ratings yet
OEC - Medicine 1 SB
146 pages
Clauses Exercise
No ratings yet
Clauses Exercise
4 pages
Sach Hoc Sinh Tieng Anh 11 Think
No ratings yet
Sach Hoc Sinh Tieng Anh 11 Think
120 pages
Syntax
100% (1)
Syntax
317 pages
Grammar Exercises
100% (1)
Grammar Exercises
117 pages
Pronoun Ppt
No ratings yet
Pronoun Ppt
25 pages
Writing in Third Person
No ratings yet
Writing in Third Person
4 pages
Afaan Karoorsuufi Waaltessuu
80% (5)
Afaan Karoorsuufi Waaltessuu
59 pages
Texto Para Editar Con Word
No ratings yet
Texto Para Editar Con Word
9 pages
A Frequency Dictionary of Czech Core Vocabulary for Learners 1st Edition František Čermák - The ebook in PDF format with all chapters is ready for download
No ratings yet
A Frequency Dictionary of Czech Core Vocabulary for Learners 1st Edition František Čermák - The ebook in PDF format with all chapters is ready for download
52 pages
Nacionalidades-Paises-Verb to be - a-an
No ratings yet
Nacionalidades-Paises-Verb to be - a-an
4 pages
Integrating Content-Based and Task Based Approaches For Teaching, Learning, and Research
No ratings yet
Integrating Content-Based and Task Based Approaches For Teaching, Learning, and Research
15 pages
1000 Fixed Expressions
No ratings yet
1000 Fixed Expressions
136 pages
Oxford Discover Ce F Mapping
No ratings yet
Oxford Discover Ce F Mapping
15 pages
Unit 16 Presentation
No ratings yet
Unit 16 Presentation
27 pages
Psycholinguistics Articulation
0% (1)
Psycholinguistics Articulation
5 pages
Phonology
No ratings yet
Phonology
8 pages
Grammar Quiz
No ratings yet
Grammar Quiz
3 pages
Unit 4 - Lesson B: Routines: Touchstone 2nd Edition - Language Summary - Level 1
No ratings yet
Unit 4 - Lesson B: Routines: Touchstone 2nd Edition - Language Summary - Level 1
3 pages
Oral Comm. Module 3 Lesson 14
No ratings yet
Oral Comm. Module 3 Lesson 14
22 pages
English Final Test 4 Secundary
No ratings yet
English Final Test 4 Secundary
3 pages
Coding Decoding Practice Set (Printable PDF)
No ratings yet
Coding Decoding Practice Set (Printable PDF)
13 pages
0539_s24_ms_2
No ratings yet
0539_s24_ms_2
4 pages
Grammar Unit 7+8
No ratings yet
Grammar Unit 7+8
4 pages
Academic Writing: Coherence and Cohesion in Paragraph: January 2018
No ratings yet
Academic Writing: Coherence and Cohesion in Paragraph: January 2018
11 pages
Semi-Detailed Lesson Plan in English 8
100% (1)
Semi-Detailed Lesson Plan in English 8
2 pages
How To Improve Your English (15 Steps)
No ratings yet
How To Improve Your English (15 Steps)
5 pages