0% found this document useful (0 votes)
25 views

Experiment 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Experiment 8

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Department of Computer Science & Engineering (AI&ML)

BE SEM :VII AY: 2024-25

Subject: Natural Language Processing Lab

Aim: Aim: Implementation of: (i) Using Word2Vector to generate word vectors (ii) Word Sense
Disambiguation
Theory:

Word2vec, a brainchild of a team of researchers led by Google’s Tomas Mikolov, is one of the
most popular models used to create word embeddings. Word2vec has two primary methods of
contextualizing words: the Continuous Bag-of-Words model (CBOW) and the Skip-Gram
model, which i will summarize in this post. Both models arrive at a similar conclusion, but take
nearly inverse paths to get there.

Continuous Bag-of-Words Model

CBOW, which is the less popular of the two models, uses source words to predict the target
words. For example, take the sentence “I want to learn python”. In this instance, the target word
is python, while the source words are I want to learn. CBOW is primarily used in smaller
datasets, since it treats the context of the sentence as a single observation towards predicting the
target word. In practice, this becomes very inefficient when working with a large set of words.

Skip-Gram Model
The Skip-Gram model works in the opposite fashion of the CBOW model, using target words to
predict the source, or context, of the surrounding words. Consider the sentence the quick brown
fox jumped over the lazy dog and suppose we use a simple definition for the context of a given

Department of Computer Science & Engineering-(AI&ML) | APSIT


word as the words immediately preceding and following it. The Skip-Gram model will break the
sentence into (context, target) pairs, resulting in a set of pairs in the following

Word Sense Disambiguation (WSD)

The task of selecting the correct sense for a word is called word sense disambiguation, or WSD.
The task of word sense disambiguation is to examine word tokens in context and specify which
sense of each word is being used.

For example, deciding whether make means “create” or “cook” can be solved by word sense
disambiguation.

Disambiguating word senses has the potential to improve many natural language processing
tasks. Machine translation is one area where word sense ambiguities can cause severe problems;
others include question answering, information retrieval, and text classification. The way that
WSD is exploited in these and other applications varies widely based on the particular needs of
the application.

In their most basic form, WSD algorithms take as input a word in context along with a fixed
inventory of potential word senses, and return the correct word sense for that use.

Conclusion

Conclusion: Using the methods of supervised,semi supervised and unsupervised learning we


have implemented the word sense disambiguation and using word to vec model(gensim library)
we have generated word vectors.

Department of Computer Science & Engineering-(AI&ML) | APSIT

You might also like