Word2Vec Implementation in Python

The document outlines two lab programs utilizing the Gensim library for natural language processing. The first program demonstrates the training of a Word2Vec model on a simple corpus, performing operations like vector addition, cosine similarity, and finding similar words. The second program focuses on a technology-themed corpus, including data preprocessing, training a Word2Vec model, visualizing word embeddings using PCA, and retrieving semantically similar words.

Uploaded by

Nikitha G R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views3 pages

Word2Vec Implementation in Python

Uploaded by

Nikitha G R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

Lab Program 1

!pip install gensim

corpus = ['king is a strong man','queen is a wise woman','boy is a young man','girl
is a young woman','prince is a young','prince will be strong','princess is
young','man is strong','woman is pretty', 'prince is a boy','prince will be king',
'princess is a girl', 'princess will be queen']
print(corpus)
statements_listt = []
for cor in corpus:
statements_listt.append([Link]())
print(statements_listt)
from [Link] import STOPWORDS
documents = [[word for word in document if word not in STOPWORDS] for document in
statements_listt]
documents

import gensim
from [Link] import Word2Vec
model = Word2Vec(documents, min_count=1, vector_size=3, window = 3)
# Assuming you have already trained your Word2Vec model and it's stored in the
'model' variable

# 1. Addition and Subtraction:

vector1 = [Link]['king']
vector2 = [Link]['man']
sum_vector = vector1 + vector2
print("sum vector ",sum_vector)
diff_vector = vector1 - vector2
print("difference vector ",sum_vector)
# 2. Cosine Similarity:
similarity = [Link]('king', 'queen')
print(f"Cosine Similarity between 'king' and 'queen': {similarity}")

# 3. Finding Most Similar Words:

similar_words = [Link].most_similar('king', topn=5)
print(f"Most Similar words to 'king': {similar_words}")

# 4. Analogy Example:
analogy_vector = [Link]['king'] - [Link]['man'] + [Link]['woman']
most_similar = [Link].most_similar(positive=[analogy_vector], topn=1)
print(f"Analogy Result (king - man + woman): {most_similar}")

program 2

import gensim
from [Link] import Word2Vec
import re
import pandas as pd
import [Link] as plt
from [Link] import PCA

# Sample domain-specific corpus (Technology)

technology_corpus = [
"Artificial intelligence is transforming various industries.",
"Machine learning algorithms improve predictive analytics.",
"Cloud computing enables scalable infrastructure for businesses.",
"Cybersecurity is crucial for protecting sensitive data.",
"Blockchain technology ensures secure and decentralized transactions.",
"The Internet of Things connects smart devices seamlessly.",
"Big data analytics helps organizations make data-driven decisions.",
"Quantum computing has the potential to revolutionize cryptography.",
"Edge computing brings computation closer to data sources.",
"Natural language processing enhances human-computer interactions."
]

# Basic text preprocessing function (tokenization & lowercasing)

def simple_tokenize(text):
return [Link](r'\b\w+\b', [Link]())

# Preprocess corpus manually

preprocessed_corpus = [simple_tokenize(sentence) for sentence in technology_corpus]

# Train Word2Vec model

model = Word2Vec(sentences=preprocessed_corpus, vector_size=50, window=5,
min_count=1, workers=4)

# Select 10 domain-specific words

selected_words = ["ai", "machine", "cloud", "cybersecurity", "blockchain", "iot",
"data", "quantum", "edge", "nlp"]
# Filter selected words to include only words present in [Link]
selected_words = [word for word in selected_words if word in [Link]]

# Extract word embeddings for selected words

word_vectors = [[Link][word] for word in selected_words if word in [Link]]

# Reduce dimensionality using PCA

pca = PCA(n_components=2)
reduced_vectors = pca.fit_transform(word_vectors)

# Create DataFrame for visualization

df_embeddings = [Link](reduced_vectors, columns=["x", "y"],
index=selected_words)

# Plot embeddings
[Link](figsize=(10, 6))
[Link](df_embeddings["x"], df_embeddings["y"], marker='o')

for word, (x, y) in zip(df_embeddings.index, reduced_vectors):

[Link](x, y, word, fontsize=12)

[Link]("PCA Component 1")

[Link]("PCA Component 2")
[Link]("Word Embeddings Visualization (Technology Domain)")
[Link]()

# Function to get semantically similar words

def get_similar_words(word, top_n=5):
if word in [Link]:
return [Link].most_similar(word, topn=top_n)
else:
return f"Word '{word}' not in vocabulary."
# Example usage
input_word = "technology"
similar_words = get_similar_words(input_word)
print(f"Top 5 words similar to '{input_word}':", similar_words)

Word Embeddings with Gensim and PCA
No ratings yet
Word Embeddings with Gensim and PCA
18 pages
Program Ms
No ratings yet
Program Ms
9 pages
Word Vector Analysis and Visualization
No ratings yet
Word Vector Analysis and Visualization
14 pages
Enhancing GenAI Prompts with Word Embeddings
No ratings yet
Enhancing GenAI Prompts with Word Embeddings
16 pages
Word Embedding Techniques in Python
No ratings yet
Word Embedding Techniques in Python
6 pages
GEN AI MANUAL (1)
No ratings yet
GEN AI MANUAL (1)
27 pages
GloVe and Word2Vec Usage Guide
No ratings yet
GloVe and Word2Vec Usage Guide
12 pages
AI-Enhanced Word Vector Techniques
No ratings yet
AI-Enhanced Word Vector Techniques
27 pages
Gensim Word Embeddings and Visualization
No ratings yet
Gensim Word Embeddings and Visualization
8 pages
Gensim Word2Vec Implementation Guide
No ratings yet
Gensim Word2Vec Implementation Guide
6 pages
Gen Ai Lab Manual Programs
No ratings yet
Gen Ai Lab Manual Programs
28 pages
Word Vector Analysis and Visualization
No ratings yet
Word Vector Analysis and Visualization
5 pages
Generative AI Manual
No ratings yet
Generative AI Manual
22 pages
Generative AI_Manual (1)
No ratings yet
Generative AI_Manual (1)
23 pages
Exploring Word Relationships with Gensim
No ratings yet
Exploring Word Relationships with Gensim
15 pages
Generative AI Lab Manual for CSE
No ratings yet
Generative AI Lab Manual for CSE
24 pages
Explore Word Relationships with Gensim
No ratings yet
Explore Word Relationships with Gensim
17 pages
Gensim Word Vector Analysis Labs
No ratings yet
Gensim Word Vector Analysis Labs
8 pages
Python Word Similarity and Analysis
No ratings yet
Python Word Similarity and Analysis
23 pages
Generative AI Lab Manual
No ratings yet
Generative AI Lab Manual
17 pages
Gen Ai Lab Manual - Sjmit
No ratings yet
Gen Ai Lab Manual - Sjmit
26 pages
Word Vector Analysis and Visualization
No ratings yet
Word Vector Analysis and Visualization
21 pages
Word2Vec: Word Embedding Techniques
No ratings yet
Word2Vec: Word Embedding Techniques
34 pages
Using Word2Vec for Sentiment Analysis
No ratings yet
Using Word2Vec for Sentiment Analysis
18 pages
Generative AI Lab Manual for CSE
No ratings yet
Generative AI Lab Manual for CSE
24 pages
Gensim Word Embeddings and NLP Techniques
No ratings yet
Gensim Word Embeddings and NLP Techniques
22 pages
NLP Tokenization, Stemming, Lemmatization Guide
No ratings yet
NLP Tokenization, Stemming, Lemmatization Guide
29 pages
NLP Techniques with Python Examples
No ratings yet
NLP Techniques with Python Examples
16 pages
Machine Learning Techniques for NLP
No ratings yet
Machine Learning Techniques for NLP
42 pages
Word Embedding Analysis and Visualization
No ratings yet
Word Embedding Analysis and Visualization
8 pages
Semantic Search for Research Papers
No ratings yet
Semantic Search for Research Papers
37 pages
Gensim NLP Model and Word Analysis
No ratings yet
Gensim NLP Model and Word Analysis
5 pages
Sentiment Analysis with Keras NLP
No ratings yet
Sentiment Analysis with Keras NLP
7 pages
NLP Final Exam Overview and Techniques
No ratings yet
NLP Final Exam Overview and Techniques
32 pages
NLP Text Processing Techniques in Python
No ratings yet
NLP Text Processing Techniques in Python
24 pages
NLP Data Preprocessing Techniques
No ratings yet
NLP Data Preprocessing Techniques
41 pages
Gena PLT: Word Vector Analysis
No ratings yet
Gena PLT: Word Vector Analysis
6 pages
Understanding Word Embeddings and Word2Vec
No ratings yet
Understanding Word Embeddings and Word2Vec
31 pages
Final NLP Material
No ratings yet
Final NLP Material
9 pages
Word Embeddings and Vector Semantics Guide
No ratings yet
Word Embeddings and Vector Semantics Guide
9 pages
NLP Lab Manual for B.Tech Students
No ratings yet
NLP Lab Manual for B.Tech Students
21 pages
Word Embedding
No ratings yet
Word Embedding
31 pages
NLP Vector Space Models Explained
No ratings yet
NLP Vector Space Models Explained
11 pages
Text Classification Using NLP
No ratings yet
Text Classification Using NLP
42 pages
GloVe to Word2Vec: Word Vector Analysis
No ratings yet
GloVe to Word2Vec: Word Vector Analysis
5 pages
Gensim NLP Handbook Overview
No ratings yet
Gensim NLP Handbook Overview
16 pages
Word Embedding and Similarity Analysis
No ratings yet
Word Embedding and Similarity Analysis
12 pages
NLP: Word Embedding & Semantic Search
No ratings yet
NLP: Word Embedding & Semantic Search
33 pages
Advanced NLP: Semantic Representation Techniques
No ratings yet
Advanced NLP: Semantic Representation Techniques
50 pages
Lab Programs1
No ratings yet
Lab Programs1
6 pages
PyTorch Word2Vec Skip-gram Model
No ratings yet
PyTorch Word2Vec Skip-gram Model
4 pages
Understanding Word Embeddings in NLP
No ratings yet
Understanding Word Embeddings in NLP
51 pages
Understanding Word Embeddings Techniques
No ratings yet
Understanding Word Embeddings Techniques
58 pages
3gen
No ratings yet
3gen
8 pages
NLP Basics for Data Science
No ratings yet
NLP Basics for Data Science
7 pages
Introduction to NLP and Vector Semantics
No ratings yet
Introduction to NLP and Vector Semantics
17 pages
TP2 Word Embedding English
No ratings yet
TP2 Word Embedding English
5 pages
Big Data Analysis with Hadoop and Hive
No ratings yet
Big Data Analysis with Hadoop and Hive
16 pages
AI Models for Customer Support Events
No ratings yet
AI Models for Customer Support Events
4 pages
NLP-Based Question and Answer System
No ratings yet
NLP-Based Question and Answer System
19 pages
Kumi Bebi: Innovative Food Delivery App
No ratings yet
Kumi Bebi: Innovative Food Delivery App
1 page
Intelligent Disaster Management System
No ratings yet
Intelligent Disaster Management System
2 pages
Overview of Nursing Informatics Concepts
No ratings yet
Overview of Nursing Informatics Concepts
10 pages
ΓΛ4-442 Lecture 1 Introduction to Natural Language Processing I
No ratings yet
ΓΛ4-442 Lecture 1 Introduction to Natural Language Processing I
58 pages
Deep Learning in Non-Destructive Testing
No ratings yet
Deep Learning in Non-Destructive Testing
18 pages
Classification Model Assignment Guide
No ratings yet
Classification Model Assignment Guide
3 pages
Data Models for 8th Grade Understanding
No ratings yet
Data Models for 8th Grade Understanding
27 pages
Best Case Analysis of Selection Sort
No ratings yet
Best Case Analysis of Selection Sort
8 pages
Data Warehousing & Mining Exam 2025
No ratings yet
Data Warehousing & Mining Exam 2025
3 pages
Introduction to Information Retrieval Basics
No ratings yet
Introduction to Information Retrieval Basics
60 pages
CGP Cat
No ratings yet
CGP Cat
5 pages
Machine Learning in Digital Forensics
No ratings yet
Machine Learning in Digital Forensics
2 pages
DMDB 2026: Data Mining Conference in Vienna
No ratings yet
DMDB 2026: Data Mining Conference in Vienna
2 pages
Major Tasks in Data Preprocessing
No ratings yet
Major Tasks in Data Preprocessing
14 pages
Online College Event Management System
No ratings yet
Online College Event Management System
10 pages
Understanding Relational Databases Basics
No ratings yet
Understanding Relational Databases Basics
16 pages
Blockchain Solutions for Forensic Trust
No ratings yet
Blockchain Solutions for Forensic Trust
39 pages
Data Science and Database Fundamentals
No ratings yet
Data Science and Database Fundamentals
23 pages
Micro-Certification Learning Platform
No ratings yet
Micro-Certification Learning Platform
3 pages
Promoting Local Businesses Creatively
No ratings yet
Promoting Local Businesses Creatively
6 pages
Sorting Algorithms and Their Analysis
No ratings yet
Sorting Algorithms and Their Analysis
8 pages
Kanhu Charan Behera: Java Developer Profile
No ratings yet
Kanhu Charan Behera: Java Developer Profile
2 pages
Deep Learning With Python and TensorFlow Keras
No ratings yet
Deep Learning With Python and TensorFlow Keras
4 pages
SEO Checklist for Nonprofits 2023
No ratings yet
SEO Checklist for Nonprofits 2023
3 pages
Core Components of Hadoop Explained
No ratings yet
Core Components of Hadoop Explained
46 pages
Golang Mastery: From Beginner to Expert
No ratings yet
Golang Mastery: From Beginner to Expert
31 pages
Data Science Internship Report at ShadowFox
No ratings yet
Data Science Internship Report at ShadowFox
3 pages