Dl Project

The document outlines the development of a movie recommendation system called MoviePal, which uses item-based collaborative filtering to suggest movies based on user preferences. It details the use of the MovieLens-small dataset, the implementation of K-Nearest Neighbors with cosine similarity, and the preprocessing steps necessary for effective recommendations. Key challenges addressed include handling sparse data and ensuring personalized recommendations for users with limited ratings history.

Uploaded by

Ruchita Maaran

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Dl Project

Uploaded by

Ruchita Maaran

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

T IT L E

Movie Recommendation
System - MoviePal
Objective:
The primary objective of this project is to build a movie
recommendation system that can suggest movies to users based on
their historical preferences. By leveraging item-based collaborative
filtering techniques, we aim to recommend movies that are similar
to those the user has already enjoyed.
The system will use ratings from users to predict preferences and
suggest top movies that the user might like. The system will be built
using the MovieLens-small dataset for simplicity and effectiveness.
The model will generate a list of 10 similar movies based on a given
movie input, helping users discover new content aligned with their
tastes.
Define the Problem
Statement:
The problem this project addresses is the challenge of
recommending movies to users based on their previous ratings and
the ratings of other users with similar tastes. Movie recommendation
systems are widely used by streaming services such as Netflix,
Amazon Prime, and Hulu. By applying collaborative filtering
techniques, we can predict which movies a user might like based on
the preferences of others who have similar viewing histories.
Key challenges in building the system include:
⁑ Handling sparse data, as users tend to rate only a small subset
of available movies.
⁑ Ensuring the recommendations are relevant and personalized.
⁑ Addressing the cold start problem, where the system may
struggle to recommend movies for new users with no ratings
history.
The solution involves creating an item-based collaborative filtering
model that:
 Takes in a movie title.
 Finds similar movies based on user ratings and recommends
them.
 Uses cosine similarity to determine how similar two movies are
based on their ratings.

Gather or Create Datasets:

To build a recommendation system, we need datasets containing
movie ratings and user behavior. In this project, we will use the
MovieLens-small dataset, which is a publicly available dataset that
contains user ratings of movies.
The dataset includes two key files:
 movies.csv: Contains the movie details, including the unique
movieId and the movie title.
 ratings.csv: Contains user ratings for movies. It includes the
userId, movieId, and the rating given by the user.
The MovieLens dataset is ideal because it is already cleaned and
formatted for use in recommendation system projects.

Set up the Simple Tool:

For this project, we will use Python as the primary programming
language, along with the following libraries:
 Pandas: For data manipulation and handling CSV files.
 NumPy: For numerical operations.
 Scikit-learn: For building machine learning models and using
nearest neighbors algorithms.
 SciPy: For sparse matrix operations, which are necessary to
handle large datasets efficiently.
 Matplotlib: For visualizations (optional, but useful for exploring
the data).
We will also use Jupyter Notebook for development, as this tool
provide a rich environment for writing and testing Python code
interactively.

Load & Explore the

Datasets:
First, we load the datasets to inspect the data structure. The
movies.csv contains information about movies, and the ratings.csv
contains ratings provided by users. We can load and inspect these
datasets using Pandas.
import pandas as pd
# Load the datasets
movies = pd.read_csv("movies.csv")
ratings = pd.read_csv("ratings.csv")
# Inspect the first few rows of each dataset
print(movies.head())
print(ratings.head())
The movies.csv file has the following columns:
 movieId: The unique identifier for each movie.
 title: The name of the movie.
 genres: The genres associated with the movie (e.g., Action,
Comedy).
The ratings.csv file has the following columns:
¥ userId: A unique identifier for each user.
¥ movieId: The identifier for the movie that the user has rated.
¥ rating: The rating given by the user to the movie (typically
between 1 and 5).
¥ timestamp: The time when the rating was given.

Preprocess the Data:

Before building the recommendation system, it is necessary to
preprocess the data. This involves handling missing values, ensuring
that the dataset is structured in a way that makes it easier for the
algorithm to process.
1. Pivot the Ratings Data: The first step in data preprocessing is
to pivot the ratings data so that each movie corresponds to a
row and each user corresponds to a column. The values will
represent the ratings.
final_dataset = ratings.pivot(index='movieId', columns='userId', values='rating')
final_dataset.fillna(0, inplace=True)
2. Remove Noise: We should filter out movies that have been
rated by only a few users and users who have rated too few
movies. This can be achieved by setting thresholds:
 A movie must have at least 10 users who rated it.
 A user must have rated at least 50 movies.
# Filter out movies with fewer than 10 ratings
no_user_voted = ratings.groupby('movieId')['rating'].agg('count')
final_dataset = final_dataset.loc[no_user_voted[no_user_voted > 10].index, :]
# Filter out users who have rated fewer than 50 movies
no_movies_voted = ratings.groupby('userId')['rating'].agg('count')
final_dataset = final_dataset.loc[:, no_movies_voted[no_movies_voted > 50].index]
3. Convert to Sparse Matrix: Since the ratings matrix is sparse
(many missing values), we convert it into a sparse matrix using
Scipy's csr_matrix function. This helps to save memory and
improves computational efficiency when performing similarity
calculations.
from scipy.sparse import csr_matrix
csr_data = csr_matrix(final_dataset.values)
final_dataset.reset_index(inplace=True)

Split Datasets (Training &

Test Data):
Typically, in machine learning, we split the dataset into training and
testing sets. However, since collaborative filtering works by finding
similarities based on user ratings, we use the entire dataset to train
the model. The recommendation engine will generate suggestions
based on the movie's rating similarity, so no explicit test/train split is
required in this case.

Choose & Train a Model:

We will use K-Nearest Neighbors (KNN) with cosine similarity as the
similarity metric to find similar movies. KNN is an efficient method
for finding the most similar items based on user ratings.
from sklearn.neighbors import NearestNeighbors
# KNN model with cosine similarity
knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20, n_jobs=-1)
knn.fit(csr_data)

Test & Evaluate the Model:

Once the model is trained, we evaluate it by inputting a movie name
and checking how well the model can recommend similar movies.
The evaluation will be based on the relevance of the
recommendations, i.e., whether the top 10 recommended movies
are similar to the input movie.
def get_movie_recommendation(movie_name):
n_movies_to_recommend = 10
movie_list = movies[movies['title'].str.contains(movie_name)]
if len(movie_list):
movie_idx = movie_list.iloc[0]['movieId']
movie_idx = final_dataset[final_dataset['movieId'] == movie_idx].index[0]
# Find the 10 most similar movies
distances, indices = knn.kneighbors(csr_data[movie_idx],
n_neighbors=n_movies_to_recommend+1)
rec_movie_indices = sorted(list(zip(indices.squeeze().tolist(), distances.squeeze().tolist())),
key=lambda x: x[1])[:0:-1]
recommend_frame = []
for val in rec_movie_indices:
movie_idx = final_dataset.iloc[val[0]]['movieId']
idx = movies[movies['movieId'] == movie_idx].index
recommend_frame.append({'Title': movies.iloc[idx]['title'].values[0], 'Distance': val[1]})
df = pd.DataFrame(recommend_frame, index=range(1, n_movies_to_recommend+1))
return df
else:
return "No movies found. Please check your input."
get_movie_recommendation('Iron Man')

Visualize Result:
Visualizing the results can provide insights into how well the model
is performing. In this case, we can use matplotlib to display the
number of users who rated each movie and other statistics, like the
number of votes by each user.
import matplotlib.pyplot as plt
# Visualize number of users who voted for each movie
f, ax = plt.subplots(1, 1, figsize=(16, 4))
plt.scatter(no_user_voted.index, no_user_voted, color='mediumseagreen')
plt.axhline(y=10, color='r') # Threshold for minimum 10 users per movie
plt.xlabel('MovieId')
plt.ylabel('No. of users voted')
plt.show()
Improve the Model:
Improvements can be made to the model by experimenting with:
 Matrix Factorization techniques like SVD (Singular Value
Decomposition).
 Hybrid models that combine collaborative filtering with
content-based filtering.
 Optimization of KNN parameters like the number of neighbors
and the distance metric used.
Additionally, feature engineering and tuning can be applied to
improve accuracy.

Conclusion:
This code demonstrates a simple movie recommendation system
using item-based collaborative filtering. By using the MovieLens-
small dataset and applying the KNN algorithm with cosine similarity,
the system finds and recommends similar movies based on the
user's input movie.
Key Takeaways:
 Collaborative filtering can be used for item-based
recommendations by identifying similar movies.
 Data preprocessing such as removing noise and handling
sparsity is crucial to improving the recommendation system's
accuracy and efficiency.
 KNN and cosine similarity are effective methods for finding
similarities in datasets with user ratings.
With this model, users can input a movie title and get
recommendations for similar films, improving the movie-watching
experience by offering personalized suggestions based on past user
behavior.

BY: RUCHITA MAARAN & SHOBANA M (252310022 & 252310027)

Project Closure Report Template
100% (11)
Project Closure Report Template
11 pages
SRMDB - in (B28 - Research Paper)
No ratings yet
SRMDB - in (B28 - Research Paper)
5 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
22 pages
Movie_Recommendation_System_project[1]
No ratings yet
Movie_Recommendation_System_project[1]
9 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
6 pages
Title: Movie Recommendation System Documentation: 1. Demographic Filtering
No ratings yet
Title: Movie Recommendation System Documentation: 1. Demographic Filtering
4 pages
smlPBL
No ratings yet
smlPBL
18 pages
Movie Recommendation System KNN (ML-Usecase)
No ratings yet
Movie Recommendation System KNN (ML-Usecase)
7 pages
Anand Yadav Internship
No ratings yet
Anand Yadav Internship
12 pages
Project Report on Movie Recommendation System
No ratings yet
Project Report on Movie Recommendation System
10 pages
Movie_Recommendation_Report
No ratings yet
Movie_Recommendation_Report
27 pages
rosp PPT
No ratings yet
rosp PPT
17 pages
ML Project Movie Recommendation System
No ratings yet
ML Project Movie Recommendation System
2 pages
Movie Recommender Systems
No ratings yet
Movie Recommender Systems
11 pages
ML CASE STUDY
No ratings yet
ML CASE STUDY
4 pages
Recommender System Unit Ii
No ratings yet
Recommender System Unit Ii
14 pages
DSBDA_Mini_Project
No ratings yet
DSBDA_Mini_Project
11 pages
Project Synopsis
No ratings yet
Project Synopsis
14 pages
Vaibhav - Project Report On Movie Recommender System Using Machine Learning
No ratings yet
Vaibhav - Project Report On Movie Recommender System Using Machine Learning
11 pages
Team 10 Movie Prediction
No ratings yet
Team 10 Movie Prediction
14 pages
Movie Rec
No ratings yet
Movie Rec
13 pages
Recommendation System
No ratings yet
Recommendation System
11 pages
Project Outline
No ratings yet
Project Outline
8 pages
Report Final-MovieLens
No ratings yet
Report Final-MovieLens
47 pages
Global Baseline Estimate - 12S21009
No ratings yet
Global Baseline Estimate - 12S21009
8 pages
Batch D17
No ratings yet
Batch D17
17 pages
Movie Recommdation Report
No ratings yet
Movie Recommdation Report
10 pages
Movie Recommendation System: Synopsis For Project (KCA 353)
No ratings yet
Movie Recommendation System: Synopsis For Project (KCA 353)
17 pages
Final Synopsis
No ratings yet
Final Synopsis
18 pages
PARNIT 05 PPT
No ratings yet
PARNIT 05 PPT
15 pages
F24_Proj4
No ratings yet
F24_Proj4
6 pages
Movie Recommendation Engine Using Artificial Intelligence
No ratings yet
Movie Recommendation Engine Using Artificial Intelligence
30 pages
Dsbda Mini Project Aissms Clg
No ratings yet
Dsbda Mini Project Aissms Clg
10 pages
B28 Viva
No ratings yet
B28 Viva
27 pages
Advanced Recommender Systems With Python
No ratings yet
Advanced Recommender Systems With Python
13 pages
System Design
No ratings yet
System Design
25 pages
Personalize Movie Recommendation System CS 229 Project Final Writeup
0% (1)
Personalize Movie Recommendation System CS 229 Project Final Writeup
6 pages
MOvie Recommendation System Project Report
No ratings yet
MOvie Recommendation System Project Report
30 pages
Movie Recommendation System Using Machine Learning
No ratings yet
Movie Recommendation System Using Machine Learning
6 pages
Movie _recommendations _system_Synopsis[6]
No ratings yet
Movie _recommendations _system_Synopsis[6]
11 pages
PPT
No ratings yet
PPT
15 pages
3170724_ML_210490131009_OEP
No ratings yet
3170724_ML_210490131009_OEP
8 pages
Movie Recommendation Project Report
No ratings yet
Movie Recommendation Project Report
9 pages
Final Report Ai Application
No ratings yet
Final Report Ai Application
18 pages
Filmview: A Review Paper On Movie Recommendation Systems: © JUN 2023 - IRE Journals - Volume 6 Issue 12 - ISSN: 2456-8880
No ratings yet
Filmview: A Review Paper On Movie Recommendation Systems: © JUN 2023 - IRE Journals - Volume 6 Issue 12 - ISSN: 2456-8880
6 pages
Movie at
No ratings yet
Movie at
11 pages
Recommender System
No ratings yet
Recommender System
45 pages
KNN Reccomendation
No ratings yet
KNN Reccomendation
7 pages
MCA IV Semester Project 1 Review Presentation: Movie Recommendation System Using Machine Learning
No ratings yet
MCA IV Semester Project 1 Review Presentation: Movie Recommendation System Using Machine Learning
12 pages
Review 2[Autosaved]
No ratings yet
Review 2[Autosaved]
30 pages
Cyber Document
No ratings yet
Cyber Document
21 pages
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
IV YEAR_MINI PROJECT_FINAL REVIEW PPT SAMPLE FORMAT
No ratings yet
IV YEAR_MINI PROJECT_FINAL REVIEW PPT SAMPLE FORMAT
25 pages
IJIRMPS Satish Sir
No ratings yet
IJIRMPS Satish Sir
8 pages
2331_mid_program_project_v1_es3_d2i02jl
No ratings yet
2331_mid_program_project_v1_es3_d2i02jl
5 pages
Movie Recommendations
No ratings yet
Movie Recommendations
12 pages
Movie Recommondation System Using Machine Learning
No ratings yet
Movie Recommondation System Using Machine Learning
8 pages
Lecture7.2 After Large
No ratings yet
Lecture7.2 After Large
19 pages
Recommendation Engines
No ratings yet
Recommendation Engines
17 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Mastering Symfony
From Everand
Mastering Symfony
Sohail Salehi
No ratings yet
Dot Net Set2 Model
No ratings yet
Dot Net Set2 Model
23 pages
Python Slip Test
No ratings yet
Python Slip Test
4 pages
UNIT-V
No ratings yet
UNIT-V
1 page
Permutations and Combinations
No ratings yet
Permutations and Combinations
8 pages
UNIT 2
No ratings yet
UNIT 2
5 pages
.Net
No ratings yet
.Net
20 pages
Dot Net Cia2 Important Questions
No ratings yet
Dot Net Cia2 Important Questions
2 pages
C++
No ratings yet
C++
7 pages
UNIT 1
No ratings yet
UNIT 1
21 pages
Data Structures Syllabus
No ratings yet
Data Structures Syllabus
2 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
3 pages
C PROGRAM
No ratings yet
C PROGRAM
43 pages
Dcn
No ratings yet
Dcn
7 pages
Unit 1 part 2 notes
No ratings yet
Unit 1 part 2 notes
34 pages
Questions
No ratings yet
Questions
1 page
Methods of The Assessment of Learning Outcomes and Graduate Attributes Followed at SIMATS
No ratings yet
Methods of The Assessment of Learning Outcomes and Graduate Attributes Followed at SIMATS
9 pages
Programming With Java For Language Developers Csa0711
No ratings yet
Programming With Java For Language Developers Csa0711
13 pages
Ruchita Maaran Data Science Test
No ratings yet
Ruchita Maaran Data Science Test
7 pages
12 Principles of Animation
No ratings yet
12 Principles of Animation
6 pages
GRAPHICS Bonafide
No ratings yet
GRAPHICS Bonafide
1 page
Introduction To Date and Calendar Utilities in Java
No ratings yet
Introduction To Date and Calendar Utilities in Java
7 pages
Integrated Grammar
No ratings yet
Integrated Grammar
19 pages
Business Intelligence 5 Marks
No ratings yet
Business Intelligence 5 Marks
19 pages
Utility Class
No ratings yet
Utility Class
10 pages
GRAPH
No ratings yet
GRAPH
31 pages
Software Engineering
No ratings yet
Software Engineering
72 pages
Log Cat 1708752783167
No ratings yet
Log Cat 1708752783167
11 pages
Meshlium-Datasheet - Eng - v4.0 - 02/2013
No ratings yet
Meshlium-Datasheet - Eng - v4.0 - 02/2013
33 pages
Sipitaly Catalogue PS342-FG1SIP
No ratings yet
Sipitaly Catalogue PS342-FG1SIP
2 pages
DAO2702 Course Overview
No ratings yet
DAO2702 Course Overview
9 pages
EDA Notes Part_1
No ratings yet
EDA Notes Part_1
25 pages
Class 3 Computer Studies
No ratings yet
Class 3 Computer Studies
3 pages
Illustrated Parts & Service Map: HP Compaq dc5700 Small Form Factor Business PC
No ratings yet
Illustrated Parts & Service Map: HP Compaq dc5700 Small Form Factor Business PC
4 pages
E-Bags (Final)
No ratings yet
E-Bags (Final)
14 pages
11i 10gDB Migration
No ratings yet
11i 10gDB Migration
7 pages
802SC
No ratings yet
802SC
16 pages
Unit 3: Declaring Arrays
No ratings yet
Unit 3: Declaring Arrays
10 pages
Detection of The Security Vulnerabilities in Web A
No ratings yet
Detection of The Security Vulnerabilities in Web A
11 pages
Copy To Be Retained Statement of TDS Under Section 200 (3) of The Income-Tax Act, 1961
No ratings yet
Copy To Be Retained Statement of TDS Under Section 200 (3) of The Income-Tax Act, 1961
1 page
DOC1_Kit to Order
No ratings yet
DOC1_Kit to Order
15 pages
Importing Your First Repository To Snyk - Snyk User Docs
No ratings yet
Importing Your First Repository To Snyk - Snyk User Docs
3 pages
ISTQB CTEL TA E Sample Exam Justifications
No ratings yet
ISTQB CTEL TA E Sample Exam Justifications
12 pages
Mitsubishi EVO I-III Installation Manual
No ratings yet
Mitsubishi EVO I-III Installation Manual
18 pages
SAP MDM Questions and Answers
No ratings yet
SAP MDM Questions and Answers
15 pages
Function-Blocks S40 e
No ratings yet
Function-Blocks S40 e
162 pages
Five Best Practices For Success Selling IBM Software White Paper 021014 Final PDF
No ratings yet
Five Best Practices For Success Selling IBM Software White Paper 021014 Final PDF
10 pages
80433-Reporting in Microsoft Dynamics NAV 2013
No ratings yet
80433-Reporting in Microsoft Dynamics NAV 2013
5 pages
Introduction of Print API
No ratings yet
Introduction of Print API
7 pages
Keys On The Keyboard
No ratings yet
Keys On The Keyboard
2 pages
NDT Security Intern Iman Kurnia
No ratings yet
NDT Security Intern Iman Kurnia
1 page
BLS Manual EN
No ratings yet
BLS Manual EN
28 pages
User Agent
No ratings yet
User Agent
31 pages
Origin Accounts 1
60% (5)
Origin Accounts 1
4 pages
Blockchain Technology and Its Synergy With General Artificial Intelligence (Gen AI)
No ratings yet
Blockchain Technology and Its Synergy With General Artificial Intelligence (Gen AI)
6 pages
Java (Mod 1)
No ratings yet
Java (Mod 1)
60 pages

Dl Project

Uploaded by

Dl Project

Uploaded by

T IT L E

Gather or Create Datasets:

Set up the Simple Tool:

Load & Explore the

Preprocess the Data:

Split Datasets (Training &

Choose & Train a Model:

Test & Evaluate the Model:

BY: RUCHITA MAARAN & SHOBANA M (252310022 & 252310027)

You might also like