0% found this document useful (0 votes)
33 views35 pages

Semantic Analysis in NLP Explained

Module 4 covers semantic analysis in NLP, focusing on meaning representation, lexical semantics, and corpus studies. It discusses various approaches to meaning representation, including First Order Predicate Logic and Semantic Nets, as well as the importance of lexical resources like WordNet and BabelNet. Additionally, it highlights the significance of word sense disambiguation (WSD) in resolving semantic ambiguity for effective NLP applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views35 pages

Semantic Analysis in NLP Explained

Module 4 covers semantic analysis in NLP, focusing on meaning representation, lexical semantics, and corpus studies. It discusses various approaches to meaning representation, including First Order Predicate Logic and Semantic Nets, as well as the importance of lexical resources like WordNet and BabelNet. Additionally, it highlights the significance of word sense disambiguation (WSD) in resolving semantic ambiguity for effective NLP applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MODULE 4

SEMANTIC ANALYSIS
PART 1 _ TOPICS:
• Meaning Representation

• Lexical Semantics

• Corpus Study

• Language Dictionaries: WorldNet, Babelnet.

• Lexemes: Relations & Senses.


• Semantic analysis in NLP refers to the process of understanding the meaning
of text by analyzing its structure and identifying the relationships between
words, phrases, and sentences.

• Unlike syntactic analysis, which focuses on grammatical correctness, semantic


analysis seeks to extract the meaning and interpret the intent or knowledge
conveyed by language.
Meaning Representation

• In Natural Language Processing (NLP), meaning representation refers to the


formal way of capturing the meaning or semantics of a piece of text (such
as a word, phrase, or sentence).

• The goal is to create a machine-understandable structure that reflects the


information conveyed by the text.
Building Blocks of Semantic System
• In representation of the meaning of words, following points are considered:
• Entities − It represents the individual such as a particular person, location
etc. For example, Haryana. India, Ram.
• Concepts − It represents the general category of the individuals such as a
person, city, etc.
• Relations − It represents the relationship between entities and concept.
For example, Ram is a person.
• Predicates − It represents the verb structures. A predicate is the part of a
sentence, or a clause, that tells what the subject is doing or what the
subject is.
• Eg: The exam was difficult.
Subject Predicate
Approaches of Meaning Representation
• Different approaches used for meaning representation are:

1. First Order Predicate Logic

2. Semantic Net

3. Case Grammar
• First Order Predicate Logic (FOPL)
It converts sentence into logical form.

• Example 1:

• Jack loves jill

• loves(jack, jill)

• Example 2:

• Seema takes Engineering or Pharmacy

• takes(Seema, Engineering) V takes(Seema, Pharmacy)


• Semantic Net:
• The idea is that we can store our knowledge in the form of a
graph, with nodes representing objects, and arcs
representing relationships between those objects.
• It shows semantic relations between concepts and entity.

is pet is water

cat dog
lives
has
has tail fish
has
• Case Grammar
• It analyze the structure of sentence.

Gave what

• Ex: we gave our dog a bone

subject verb Indirect recipient object


object
Lexical Semantics
• Lexical semantics in NLP is the study of how words convey meaning and
how individual word meanings combine to form more complex meanings.

• It focuses on the nature of word meanings, relationships between words,


and the structure of the lexicon (vocabulary) within a language.

• Understanding lexical semantics is essential for many NLP tasks like word
sense disambiguation, machine translation, and question answering.
Corpus Study
• What is Corpus?
• A corpus is a large, structured set of texts that is typically representative of
a particular language, genre, or domain.

• In NLP, a corpus serves as the source of data for machine learning


algorithms and statistical analysis.

• It can consist of documents, sentences, words, and even annotated text,


such as part-of-speech tags or named entities.
•.

• Monolingual corpus: Text in a single language.

• Multilingual corpus: Texts in multiple languages, often aligned at the sentence


level for tasks like machine translation (e.g., Europarl, the European Parliament
Proceedings Parallel Corpus).

• Annotated corpus: A corpus where the text is enriched with linguistic annotations,
such as syntactic structure, part-of-speech tags, or semantic roles
Ex: Corpus
• British National Corpus (BNC): A large corpus of written and spoken English
from a variety of genres.

• Corpus of Contemporary American English (COCA): Contains texts from


different genres like fiction, news, and academic writing.
Corpus Annotation
Corpus studies often involve text annotation, which adds additional information to the
raw text. Common types of annotations include:
• Part-of-Speech (POS) tagging: Annotates each word with its part of speech (e.g., noun,
verb, adjective).
• Named Entity Recognition (NER): Identifies and tags named entities (e.g., persons,
organizations, locations).
• Syntactic Parsing: Annotates the syntactic structure of sentences (e.g., dependency or
constituency parsing).
• Sentiment Labels: Annotates texts with sentiment (e.g., positive, negative, neutral).
• Word Senses: Tags words with their meanings (e.g., in Word Sense Disambiguation
tasks).
Language Dictionaries : WorldNet / WORDNET
• WordNet (often mistakenly referred to as WORLDNET) is a large lexical
database of the English language, widely used in Natural Language
Processing (NLP) and computational linguistics.
Key Features of WordNet:
[Link] (Synonym Sets):
1. Words that are considered synonymous are grouped together in a synset.

2. Each synset represents a distinct concept or sense of a word.

Example: The synset for the word "car" includes other synonyms like "automobile",
and provides a definition such as "a motor vehicle with four wheels; usually propelled
by an internal combustion engine.“
2. Lexical Relations: WordNet also encodes relationships between words and
synsets.
• Hypernyms (Superordinate terms): Denotes a broader category.
Example: “Color" is a hypernym of “blue".
• Hyponyms (Subordinate terms): Denotes more specific concepts.
Example: “blue" is a hyponym of "color".
• Meronyms (Part-whole relationship): Represents the relationship where one word
is a part of another.
Example: "Wheel" is a meronym of "car".
3. Part-of-Speech Tags: WordNet organizes words based on their part of
speech, such as nouns, verbs, adjectives, and adverbs. The same word can have
different meanings based on its usage as different parts of speech.

4. Glosses: Each synset in WordNet has a gloss, which provides a brief


definition of the word's meaning and, sometimes, example sentences for
context.

5. Semantic Similarity: The relationships between words in WordNet allow


for the computation of semantic similarity
Types of Dictionaries
• Specialized Dictionaries:
• It focuses on specific subject field.
• Eg. Business dictionary , Law dictionary.
• Historical Dictionaries : It describes the development of words and
senses over time using original source material to support its
conclusions.
2. BABELNET Dictionary
• BabelNet is a large-scale multilingual lexical knowledge base and semantic
network, widely used in Natural Language Processing (NLP) tasks.

• It integrates information from various sources such as WordNet, Wikipedia,


and other lexicographic resources, providing a comprehensive resource for
both monolingual and multilingual NLP.
Key Features of BabelNet
• Multilingual Resource:

• BabelNet is available in over 500 languages, making it one of the largest and most comprehensive
multilingual dictionaries and semantic networks. It connects lexical items across languages,
facilitating tasks like machine translation and cross-lingual NLP.

• Integration of Resources:

• BabelNet combines data from multiple resources, including:


• WordNet (for English lexical information).

• Wikipedia (for multilingual encyclopedic knowledge).

• Wiktionary (for lexicographic and linguistic data).

• OmegaWiki (for multilingual dictionary content).

• Open Multilingual WordNet.


• Synsets:

• Similar to WordNet, BabelNet organizes words into synsets—sets of


synonyms that represent a specific concept or meaning. Each synset is
assigned a unique identifier, called a BabelSynset.

• Cross-lingual Mapping:

• BabelNet's synsets connect words across different languages, making it


possible to perform cross-lingual tasks like machine translation, cross-
lingual information retrieval, and multilingual sentiment analysis.
Applications of BabelNet in NLP
[Link] Sense Disambiguation (WSD):
1. BabelNet is commonly used in word sense disambiguation tasks to determine the intended meaning of a word
in context. It offers multiple senses for words in various languages, and NLP models can use this resource to
disambiguate word senses based on surrounding text.

[Link] Translation:
1. BabelNet is highly useful for machine translation systems due to its multilingual nature. By mapping concepts
across languages, BabelNet enables translation models to understand and align word meanings across
languages more effectively.

[Link] Linking and Named Entity Recognition (NER):


1. BabelNet’s integration of encyclopedic knowledge from Wikipedia makes it suitable for entity linking tasks,
where a system links words or phrases in a text to specific entities (e.g., linking “Paris” to the city in France).
2. In Named Entity Recognition, BabelNet can help identify and classify entities in multiple languages, improving
multilingual NER systems.
RELATIONS AMONG LEXEMES AND THEIR SENSES
• Semantic analysis can be divided into two parts:

• 1. Lexical Semantics: study of meaning of individual words.

• 2. Individual words will be combined to provide meaning in sentences.

• Important Elements of Semantic Analysis

1. Hyponymy

2. Homonymy

3. Polysemy

4. Synonymy

5. Antonymy
1. Hyponymy

• This relation is hierarchical and is useful for organizing words in


taxonomies, such as in WordNet or BabelNet.

• Hyponymy is the relationship between a more specific lexeme (hyponym)


and a more general lexeme (hypernym or superordinate).

• Example: "Dog" is a hyponym of "animal"; "animal" is a hypernym of


"dog."
2. Homonymy
Homonymy occurs when two lexemes share the same form (spelling or
pronunciation) but have completely unrelated meanings.

Example: "Bat" (the flying mammal) and "bat" (the sports equipment used in
baseball).
3. Polysemy :
• Polysemy is the phenomenon where a single lexeme has multiple related
senses.

• Unlike homonyms, represent distinct, unrelated senses.

• Polysemy ,where meanings are related.

• Eg : bank and blood bank

• Both are Repositories.


4. Synonymy

• Synonymy refers to the relationship between two lexemes that have the same or nearly the
same meaning.
• Example: "Big" and "large" are synonyms.
• Synonymy is the basis for many tasks like paraphrase generation and query expansion in information
retrieval.
5. Antonymy

• Antonymy refers to the relationship between two lexemes that have opposite meanings.
• Example: "Hot" and "cold" are antonyms.

• Antonymy is useful in sentiment analysis, where identifying opposites can aid in understanding the
sentiment polarity of a text.
⚫ Semantic Ambiguity- This kind of ambiguity occurs when the meaning of the words
themselves can be misinterpreted.
⚫ In other words, semantic ambiguity happens when a sentence contains an ambiguous
word or phrase.
⚫ For example,
⚫ “The car hit the pole while it was moving” is having semantic ambiguity because the
interpretations can be “The car, while moving, hit the pole”
and
⚫ “The car hit the pole while the pole was moving”.
⚫ The problem of resolving semantic ambiguity is called WSD (word sense
disambiguation).

⚫ Resolving semantic ambiguity is harder than resolving syntactic ambiguity.


Word Sense Disambiguation
• Word Sense Disambiguation (WSD) is a fundamental task in Natural
Language Processing (NLP) that involves determining the correct meaning
(or sense) of a word based on the context in which it is used.

• This task is challenging because many words in a have multiple senses or


meanings.
• For example, consider the following examples of the distinct sense
that exist for the word “bank” −
• The bank will not be accepting cash on Saturdays.
• The river overflowed the bank.
• In the first sentence, it means commercial (finance) bank.
• In the second sentence; it refers to the river bank.
• Hence, if it would be disambiguated by WSD the correct meaning to
the above sentences can be assigned as follows-
• The bank/financial institution will not be accepting cash on
Saturdays.
• The river overflowed the bank/riverfront.
• For example, consider the following examples of the distinct sense
that exist for the word “bass” −
• I can hear bass sound.
• He likes to eat grilled bass.
• The occurrence of the word bass clearly denotes the distinct
meaning.
• In first sentence, it means frequency and in second, it means fish.
• Hence, if it would be disambiguated by WSD then the correct
meaning to the above sentences can be assigned as follows −
• I can hear bass/frequency sound.
• He likes to eat grilled bass/fish.
Why WSD Is Important in NLP
• Machine Translation: WSD is critical for translating ambiguous words correctly in
different languages.
Example: The word "bank" must be translated differently depending on
whether it refers to a financial institution or the side of a river.
• Information Retrieval: Search engines need WSD to deliver relevant results
when a user searches for ambiguous words.
Example: A search for "apple" could refer to the fruit or the technology
company.
• Question Answering and Chatbots: Understanding the meaning of a word
is crucial for accurate responses.

• Speech Recognition and Summarization: WSD enhances the system’s


ability to produce coherent summaries or transcriptions by interpreting
words correctly in context.
Types of Word Sense Disambiguation Approaches

• WSD can be approached using several methods:

• 1. knowledge-based./ Dictionary Based – Lesk Algo

• 2. supervised.- Naïve Bayes Approach

• 3. unsupervised.-

• 4. semi-supervised.-

You might also like