0% found this document useful (0 votes)

34 views27 pages

Team Renegades MMLA Report

The document outlines a project by Team Renegades focused on developing a multivariate machine learning algorithm for a recommender system using an IMDB dataset. It details the processes of data cleansing, pre-processing, modeling, and methodology, including the use of generalized linear models and collaborative filtering techniques. The project aims to create a system similar to Netflix's recommendation engine, evaluating model performance through metrics like RMSE.

Uploaded by

Kenny Alpha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views27 pages

Team Renegades MMLA Report

Uploaded by

Kenny Alpha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Multivariate Machine Learning Algorithm

Project

Submitted by: Team Renegades

Mohit Verma FT201051
Sandeep Pallo FT201070
Shivang Srivastava FT201075
Rutwik Barik FT204071
Yogit Pant FT204101

Contents
Abstract.......................................................................................................................................................3
Objective.....................................................................................................................................................3
Dataset........................................................................................................................................................3
Data Cleansing.............................................................................................................................................4
Data Pre-processing.....................................................................................................................................7
Data Modelling............................................................................................................................................8
Methodology.............................................................................................................................................15
Data Exploration & Observation................................................................................................................16
Results & Observation...............................................................................................................................18
Model Performance...................................................................................................................................19
Conclusion.................................................................................................................................................22
References:................................................................................................................................................24
APPENDIX - CODE:.....................................................................................................................................25
Abstract
Recommender systems is one of the most used application of machine learning in the world today.
These system uses existing data to predict the rating or preference a user would give to an item. Netflix,
which is one of the early users of this system has used it to recommend movies and shows to the user
according to their interests. Netflix in 2006 launched a competition to develop and improve the accuracy
of the recommendation system they were using and even awarded a team of developer $ 1 million in
2009 for the challenge completion. With the help of data that we got from Kaggle, and using some
Machine learning techniques like linear regression and stochastic gradient descent we have also made a
recommender system.

Objective
Through the opensource and public datasets and SAS, we aim to build a basic recommender system that
mocks, to some extent, systems being used by Netflix etc to dissipate confusion. There is subclass of
information filtering system that seek to predict the “rating” or “preference” a user would give to an
item.

Dataset
We are working on an IMDB dataset for movies containing the following columns:

title name of the movie

release_date release date of the movie
imdb_link imdb link of the movie
user_id user id of the user who gave the ratings
rating ratings received by the user
timestamp time of the rating
Age age of the user
Sex gender of the user
Profession profession of the user
Post Code pin code of the user
Action genre of the movie
Adventure genre of the movie
Animation genre of the movie
Children genre of the movie
Comedy genre of the movie
Crime genre of the movie
Documentary genre of the movie
Drama genre of the movie
Fantasy genre of the movie
Film-Noir genre of the movie
Horror genre of the movie
Musical genre of the movie
Mystery genre of the movie
Romance genre of the movie
Sci-Fi genre of the movie
Thriller genre of the movie
War genre of the movie
Western genre of the movie

Data Cleansing
Data1- u.genre
This data displayed the various genres of the movies.
Data2 – u.data
This data contains – user_id, movie_id, rating, timestamp of all the movies.

Data3 – u.item

This data contained the data specific to movie name, year of release, dateOfRelease, imdb link, and
genre data.
Code to convert the data into excel format –

import pandas as pd
user_columns = ['user_id', 'Age', 'Sex','Profession', 'Post Code']
# The above line of code assigns column name to the data

users = pd.read_csv('u.user', sep='|', names=user_columns)

# Reads the data separated by ‘|’

users.to_excel('Newdata.xlsx',sheet_name='Sheet1')
# Writes the data to excel

rating_columns = ['user_id', 'movie_id', 'rating', 'timestamp']

ratings = pd.read_csv('u.data', sep='\t', names=rating_columns)
ratings.to_excel('ratings.xlsx',sheet_name='Sheet1')
# Same code for rating data file and converted to excel

movie_columns = ['movie_id', 'title', 'release_date', 'imdb_link', 'unknown', 'Action' , 'Adventure',

'Animation', 'Children', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']
movies = pd.read_csv('u.item', sep='|', names=movie_columns, encoding="iso-8859-1")
movies.to_excel('movies.xlsx',sheet_name='Sheet1')
# Code for assigning column name to movie data file and converting to excel

movie_ratings = pd.merge(movies, ratings)

movie_ratings.to_excel('movie_ratings.xlsx',sheet_name='Sheet1')
movie_data = pd.merge(movie_ratings, users)
movie_data.to_excel('movie_data.xlsx',sheet_name='Sheet1')
# Merged all the files into one file

Final data file in Excel format -

Data Pre-processing
Real-life data typically needs to be pre-processed (e.g. cleansed, filtered, transformed) in order to be
used by the machine learning techniques in the analysis step.

There are NA values appearing in most of the columns. They are in proportion of less than 10 percent so
we checked if all corresponding columns contain NAs. Removed all the records containing NA values.
This data is not erroneous, so we didn’t perform clamping and imputation on this data.
Data Modelling

MODEL 1: Generalized linear model

“proc glm” is used to estimate the model. GLM stands for global linear model and is used to create
predictive models to like regression, ANOVA, MANOVA etc. In this case a linear regression model was
created to model the rating of a particular movie based on the attributes chosen.

The categorical variable genre was dummy coded manually however the categorical variables Sex and
Profession were defined as categorical using class command. The interpretations will be done
accordingly.

Model Estimation

Class Level Information

Class Levels Values
Sex 2 FM
Profes 21 administrator artist doctor educator engineer entertainment executive healthcare homemaker lawyer
sion librarian marketing none another programmer salesman scientist student technician writer retired

Number of Observations Read 8000

0
Number of Observations Used 8000
0

Sum of
Source DF Squares Mean Square F Value Pr > F
Model 40 5973.1157 149.3279 125.70 <.0001
Error 7995 94991.0190 1.1880
9
Corrected Total 7999 100964.1347
9
R-Square Coeff Var Root MSE rating Mean
0.059161 31.06550 1.089953 3.508563

Standard Pr > |
Parameter Estimate Error t Value t|
Intercept 2.881279575 B 0.0436606 65.99 <.0001
4
Age 0.007018156 0.0004648 15.10 <.0001
6
Sex F 0.144114480 B 0.0101337 14.22 <.0001
8
Sex M 0.000000000 B . . .
Profession administrator 0.259303229 B 0.0359054 7.22 <.0001
0
Profession artist 0.406662064 B 0.0472285 8.61 <.0001
8
Profession doctor 0.370030015 B 0.0617327 5.99 <.0001
4
Profession educator 0.292391378 B 0.0350525 8.34 <.0001
1
Profession engineer 0.237993693 B 0.0359078 6.63 <.0001
1
Profession entertainment 0.202892464 B 0.0425241 4.77 <.0001
0
Profession executive 0.006214289 B 0.0391138 0.16 0.8738
1
Profession healthcare - B 0.0416045 -18.27 <.0001
0.760067399 5
Profession homemaker - B 0.0949321 -2.25 0.0242
0.213950199 1
Profession lawyer 0.549049678 B 0.0464403 11.82 <.0001
4
Standard Pr > |
Parameter Estimate Error t Value t|
Profession librarian 0.158942407 B 0.0378872 4.20 <.0001
1
Profession marketing 0.167404451 B 0.0442378 3.78 0.0002
2
Profession none 0.531568622 B 0.0519826 10.23 <.0001
6
Profession other 0.257386646 B 0.0361214 7.13 <.0001
9
Profession programmer 0.266529148 B 0.0364292 7.32 <.0001
9
Profession salesman 0.328136076 B 0.0559640 5.86 <.0001
9
Profession scientist 0.297812521 B 0.0455480 6.54 <.0001
3
Profession student 0.296919456 B 0.0369032 8.05 <.0001
3
Profession technician 0.233525682 B 0.0391071 5.97 <.0001
4
Profession writer - B 0.0376912 -0.15 0.8822
0.005585268 6
Profession retired 0.000000000 B . . .
Action - 0.0115898 -6.47 <.0001
0.075004458 7
Adventure 0.079638506 0.0132593 6.01 <.0001
3
Animation 0.370321053 0.0250168 14.80 <.0001
8
Children - 0.0193290 -10.63 <.0001
0.205378600 2
Comedy - 0.0108978 -6.00 <.0001
0.065362251 7
Crime 0.112192155 0.0148968 7.53 <.0001
2
Standard Pr > |
Parameter Estimate Error t Value t|
Documentary 0.314057928 0.0455823 6.89 <.0001
6
Drama 0.246236947 0.0107056 23.00 <.0001
1
Fantasy - 0.0337473 -6.98 <.0001
0.235578221 8
Film_Noir 0.383532254 0.0327502 11.71 <.0001
9
Horror - 0.0179418 -6.51 <.0001
0.116849031 2
Musical 0.078526508 0.0200557 3.92 <.0001
2
Mystery 0.108358613 0.0196747 5.51 <.0001
3
Romance 0.109921934 0.0102662 10.71 <.0001
8
Sci_Fi 0.106909952 0.0128755 8.30 <.0001
3
Thriller 0.039244664 0.0112107 3.50 0.0005
1
War 0.272366723 0.0143213 19.02 <.0001
5
Western 0.168631501 0.0281280 6.00 <.0001
2

After performing our analysis most of the independent variables came significant which can be referred
while suggesting movies to a user of same age, profession or gender. A movie of particular genre
popular among a similar group can also be recommended to similar people.
MODEL 2: Recommender Engine using Collaborative Filtering-Using PROC RECOMMEND

There are, basically, 2 methods/types for generating recommendation engine:

Content Filtering: This uses entity description (Meta data) or side information. This uses entity
description (Meta data) or side information. Goes beyond just watching a film. It examines the features
relative to you and the film to determine whether there is a match based on the larger categories
represented by the features. For example, if you are a woman who likes action films, the recommender
will search for suggestions that include the intersection of these two categories.

Collaborative Filtering: Calculate the target ITEMS similarity measure and find the minimum (Euclidean
distance, or Cosine distance, or another metric, depending on the algorithm). This is achieved by filtering
a user's desires, and gathering (collaborating) preferences from many others. Matches rates based on
the similarities of the movies or products used in the past. Can get recommendations based on items
liked by people similar to you or on items similar to those you like.

Matrix factorization (SVD) & KNN, under collaborative filtering, is an effective algorithm used to create
a recommender system. The approach is for approximating the matrix of ratings:
Rm×n by the product of two matrixes containing lower dimensions, Pk×m and Qk×n in a way that

For Ex. pu is the u-th column of P, and qv is the v-th column of Q, then
the movie rating placed by the user u on the item v would be predicted as puqv

A usual equation for the P and th Q is given by the below optimization problem:
where the (u,v) (u,v) are the locations of the real entries in the RR, ru,v is the real rating, f is the loss
function, and μP, μQ, λP, λQ are the usual penalization parameters used by many algorithms to avoid
overfitting.

Model training is the method for solving the matrix P and Q, and the choice of penalization parameters i
s the tuning of hyper parameters. We can then estimate, after obtaining the P and the Q:

Note:
 Pearson’s correlation (Modelling based on correlation): The recommendation engine's purpose
is to help users find new movies to watch, given their preferences (as seen in the ratings). The
algorithm used is quite simple: it is a Pearson correlation coefficient between the ratings vectors
of each pair of correlated films (i.e., find every user who rated both film A and film B, and the
associated ratings). It encompasses-1 to 1, the closer to 1 the higher the correlation
 • The underlying assumption of Collaborative filtering is that if an individual X is of the same
opinion as a person Y then the recommendation method should be focused on person Y
(similarity) preferences.
 Matrix factorization algorithms work Matrix factorization algorithms work by decomposing the
matrix of user- item interaction into the product of rectangular matrices of two dimensions.
Methodology
Many algorithms and data transformations where applied in order to achieve the lowest RMSE. Such as:

 Generalized linear modelling

 Collaborative filtering using PROC RECOMMEND
o Memory-based algorithms
- Slope one (slope1)
- K nearest neighbours (knn)
o Model-based algorithms
- Matrix factorization (svd)
o Mixture of different methods
- Ensemble (ensemble)

Method Evaluation: This was done by comparing R.M.S.E. between these three methods. Root mean
square error (RMSE) refers to the amount by which the estimator's predicted values differ from the
estimated quantities (typically outside the sample from which the model was estimated).

• The error in the ratings predicted is measured using “t”:

o Here, predicted is the rating predicted by the model and Actual is the original rating
o If a consumer has given a rating of 5 to a film and we have predicted a rating of 4, then
RMSE is 1
o Less the RMSE value, better the recommendations

The metric above tells us how reliable our recommendations are but does not focus on the order of
recommendations, i.e. they do not focus on which product to recommend higher. Lesser the RMSE
value, better the recommendations

PROC RECOMMEND
For collaborative filtering we'll use PROC RECOMMEND. Infact, PROC RECOMMEND takes explicit user
ratings for items as inputs and outputs individual user recommendations.

PROC RECOMMEND supports the various methods:

Memory-based algorithms
- Slope one (slope1)
- K nearest neighbors (KNN)
Model-based algorithms
- Matrix factorization (svd)
Market basket analysis
- Association rule (arm)
Mixture of different methods
- Clustering (cluster)
- Ensemble (ensemble)

Key Requirement: Generally, R & python are used or preferred as language for modelling
recommendation engine. In SAS, we use PROC RECOMMEND. In order to use this, we need to have SAS
Visual Analytics (along with SAS studio) installed so that we can run LASR Analytical Server. This was
done as per requirement of the project.

The recommender system's main task is to predict unknown entries in the rating matrix based on observ
ed values.

Data Exploration & Observation

 Actually, MovieLense is a sparse matrix, a matrix that compresses data by removing most zero v
alues. When necessary, you can convert it into a standard dense matrix for specific statistics.
 Rankings range from 1 to 5, and the rankings are more positive than the negative ones. Often
this happens with rating data: it has a certain imbalance in favor of positive data because users
tend to buy or watch what they think they want. Deception also gives rise to negative ratings
because expectations are not met

Using IMSTAT procedure: Using IMSTAT procedure: Prior to invoking the RECOMMEND procedure,
by invoking the IMSTAT procedure, we can print a part of the tables included in the recommender
program. Code in (Appendix)
The first FETCH statement prints a portion of the MovieRating table. The table contains rating
information made by users (customers) about items (movies).
Sample data Snapshot
Subsequent FETCH statements print a portion of the Movie & User Profile tables comprising of info
about each movie and user.

Movie Profile Table Snapshot

User Profile Table Snapshot

Results & Observation

The GLM Model (RMSE 1.089953) has similar evaluation process with Matrix factorization with SVD.
That’s why put more weight on the below features
output - Standardized Coefficient Magnitudes (standardized coefficient magnitudes).

But the RMSE 1.089953 was not so low like MF model.

b) Collaborative filtering using Proc Recommend

We used two different methods of generating explicitly rating recommendations from data. The
recommender program is initiated with the PROC RECOMMEND command. Each METHOD statement
sets out a design guideline to be developed. There are various options which correspond to the different
methods.
CODE Explanation (Full code in Appendix)
The K = choice specifies the number of neighbours employed for the KNN process. The option
SIMILARITY = PC states that a similarity to the Pearson Coefficient is used when calculating the distance
between two objects. The preference FACTORS=20 determines the matrix factorisation latent factors.
In the ENSEMBLE method and SVD method the WITHHOLD= option defines the proportion of users who
hold the number of ratings specified by the HOLD= option. A certain number of ratings are held for
validation by specifying the options WITHHOLD= and HOLD=. Ten per cent of users are selected in this
case.
In the PREDICT statement the NUM= option specifies the number of recommendations to be generated
for each user specified in the USERS= option. The recommender system can be supplemented with
several different methods, each receiving a name with the LABEL= option to the METHOD statement.
When using the PREDICT expression, you can use the LABEL= option to use the defined model with the
same name. When you use the PREDICT statement, you can specify the same name with the LABEL=
option to use the specified model.

The ADD statement introduces SAS LASR Analytic Server with a recommender feature, MovieLens.

In the recommender method, three ADDTABLE statements add the MovieRating, MovieProfile, and UserProfile
tables.

Each METHOD statement adds a method to the recommender system for calculating recommendations. With
options specified for each method, the methods KNN, SLOPE1, SVD, and ENSEMBLE were added. The PREDICT
statement generates five predictions for each specified user (1, 33, 478, and 2035)

The lowest RMSE surprisingly was achieved using KNN method for Memory based collaborative filtering
(RMSE 0.7705).
Even lower RMSE can be accomplished with more hyper parameters tuning. But as with KNN m
odels, not so low.
It seems that the most important features where:
-number of users rated the film -age of film = more ratings and a higher mean rating
-the film Id and
-Drama = (genre with the most ratings)
The other features didn't have a low p value and didn't improve the performance of the platform but the
over suit it.

Model Performance
It is important to examine the model performance before you proceed with using a model to
generate recommendations. PROC RECOMMEND provides ODS tables displaying the
performance of the models and allowing you to determine the performance of the model using
various criteria. The ENSEMBLE approach is used for the KNN system to measure the numerical
performance of the data sets for training and validation. The test is carried out by defining the
option METHODS=("KNN), "and the option CONSTRAINT. The numeric evaluation for KNN is
given as under:-.

Table 1: Movie Lens - Numerical Evaluations using Method KNN

The efficiency of the model for the SVD system is evaluated directly within the method. The
example code shows how to save the numeric performance table as a SAS table, using the ODS
OUTPUT statement, which can be used for visual examination. Figure 1 shows a root mean
square error plot and mean absolute error as a function of the training and validation data sets
for SVD method iterations.

If it does not perform well according to the selected numerical criteria, a model might not be
used. In this example, root means square error values of the KNN method (RMSE: 0.7705) are
lower than the holdout sample SVD method (RMSE: 0.9151). Due to better results the example
code demonstrates the KNN method used with the PREDICT statement to scor and produce
recommendations..

Figure RMSE for SVD

Figure 1(b): Mean Absolute Error for SVD

The PREDICT statement produces a recommendations tables for each user. See

Output from the METHOD Statement Using the ENSEMBLE Option

Ensemble optimal coefficients for Recommender System

Table for recommendations using the KNN method.

Conclusion
Based on the results, Collaborative filtering i.e KNN method

KNN method 0.77

Matrix Factorization (SVD) 0.91
RMSE for Collaborative filtering method using
Here, the KNN method has lower values (RMSE: 0.7705) of root means square error than the
SVD method (RMSE: 0.9151) for holdout sample. Because of this, the KNN method has been
used for scoring and generating recommendations with the PREDICT statement.
Future Work Scope
With web scraping we could add more dimensions into our datasets such us:

-budget of movie
-critics rating and
-duration of movie and compare the RMSE’s
Besides these methods one can also look at
 H2o Auto ML model
 H2o GLM model
 H2o GBM model
 H2o Random Forest model
 H2o Ensemble (Trees or clustering) model
As per the extensive research conducted on the MovieLens Datasets, above methods have
more or less satisfactory prediction and recommendation accuracy with low RMSE.
Problem of Cold start: Collaborative filtering-based recommendation engine can’t predict for
new movies, which have no rating data. Though this can be done using Content based
filtering. Therefore, there is a need for a hybrid method.
Using bigger movie lens data with metadata
References:
• Mueller, John, and Luca Massaron. Machine Learning for Dummies. John Wiley & Sons, Inc.,
2016.

• https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-recommendation-
engine-python/

• http://rpubs.com/Papacosmas/harvard

• http://rpubs.com/xinmeiz/movielens2

• https://rpubs.com/mrmelchi/492432

• https://blog.dataiku.com/2012/09/10/a-simple-recommendation-engine-implemented-in-
different-languages

• https://pages.dataiku.com/hubfs/Guidebooks/Recommendation%20Engines/
recommendation_engine_guidebook.pdf

• http://support.sas.com/documentation/cdl/en/inmsref/67597/HTML/default/
viewer.htm#p1ikis8zzd3iszn1t1hsxun24qe3.htm

• https://documentation.sas.com/?
docsetId=inmsref&docsetTarget=n0em119do6om6en1euhs41lem60p.htm&docsetVersion=2.82&locale
=en

• https://support.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/
2018/2095-2018.pdf

• https://muffynomster.wordpress.com/2015/06/07/building-a-movie-recommendation-engine-
with-r/

• https://www.r-bloggers.com/unleash-the-potential-of-recommender-systems/
APPENDIX - CODE:

For Method 2: Collaborative Filtering Using PROC RECOMMEND

Before invoking the RECOMMEND procedure, we can print a part of the tables that are included in the
recommender system by invoking the IMSTAT procedure.
proc imstat;
table MYlasr.movierating;
fetch/format;
run;

table MYlasr.movieprofile;
fetch/format;
run;

table MYlasr.userprofile;
fetch/format;
run;
quit;
The first FETCH statement prints a portion of the MovieRating table. The table contains rating information made by
users (customers) about items (movies).

/* Start a LASR Analytic Server */

options set=GRIDHOST="grid001.example.com";
options set=GRIDINSTALLLOC="/opt/TKGrid";
%Let portNumber = 10010;

libname hdfs sashdat path="/hps";

proc lasr create path="/tmp" port=&portNumber noclass;
performance nodes=all;
run;

/* Load rating table into memory */

proc lasr add port=&portNumber data = hdfs.ratings;

/* Load movies table into memory */

proc lasr add port=&portNumber data = hdfs.movies;
run;

/* Assign a libref to access tables in the server */

libname lasr sasiola port = &portNumber tag="hps";

/* Invoke PROC RECOMMEND */

proc recommend port=&portNumber recom = rs.movie;
/* Add a new recommendation project */
add rs.movie / item = movieid user = userid rating = rating;
/* Add tables */
addtable lasr.ratings / recom = rs.movie type = rating vars = (movieid userid rating);
addtable lasr.movies / recom = rs.movie type = item;
run;

/* Method -- Develop KNN model with neighbors 20 */

method knn / label="knn"
k=20
positive
similarity=pc
seed=1234;
/* Method -- Use ensemble to evaluate KNN for training and validation*/
method ensemble /label="knn_eva"
methods=("knn")
withhold=0.1
hold=1
details
constraint
seed=1234
fconv=1e-3
gconv=1e-3
maxiter=100
;
run;
/* Method -- Develop and evaluate a svd LBFGS with 20 factors for training and validation */
method svd / label="svd"
factors=20
fconv=1e-3
gconv=1e-3
maxiter=100
seed=1234
MAXFEVAL=5000
function=L2
lamda=0.2
technique=lbfgs
withhold=0.1
hold=1
details
;
ods output RecommenderFuncEvalInfo = movie_funcEval_svd;
run;

/* Score and make recommendations with KNN */

predict /label = "knn"
method = knn
Num = 5
users = ("1","33");
run;

remove rs.movie;
run;

quit;

/* Plot the numeric results of svd */

proc sgplot data = movie_funcEval_svd;
title "Movie Lens - Matrix Factorization Model";
title2 "Root Mean Square Error";
series x=NumFunc y=RMSE /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="RMSE for Training Data";
series x=NumFunc y=RMSEhold /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="RMSE for Hold Out";
xaxis label = 'Number of Iterations' type = discrete grid values = (0 to 100 by 5);
yaxis label = 'Root Mean Square Error';
run;

proc sgplot data = movie_funcEval_svd;

title "Movie Lens - Matrix Factorization Model";
title2 "Mean Absolute Error";
series x=NumFunc y=MAE /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="MAE for Traing Data";
series x=NumFunc y=MAEhold /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="MAE for Hold Out";
xaxis label = 'Number of Iterations' type = discrete grid values = (0 to 100 by 5) ;
yaxis label = 'Mean Absolute Error';
run;

/* Stop the LASR Analytic Server */

proc lasr term port = &portNumber;
run

To view the rating history for users 1 and 33 that were used for generating recommendations,
you can use the following example code.
/* Continue to use the lasr libref from the previous code example */
/* Use SCHEMA to join the ratings and movies tables. */
proc imstat data=lasr.ratings;
schema movies (movieid=movieid);
run;

/* Find the number of ratings for users 1 and 33 and view them. */
table lasr.&_templast_;
where userid in (1 33);
numrows / save=numrowstab;
run;
store numrowstab(_last_, 1) = obscount;
run;
fetch / format orderby=(userid rating movies_genres)
descending=(rating) to=&obscount.;
run;
quit;

Understanding Recommendation Systems
No ratings yet
Understanding Recommendation Systems
45 pages
Social Suggest Team Report
No ratings yet
Social Suggest Team Report
52 pages
Practical Work 1 - Recommender Systems
No ratings yet
Practical Work 1 - Recommender Systems
3 pages
Ex 3
No ratings yet
Ex 3
2 pages
Chapter 9 - Recommendation Systems
No ratings yet
Chapter 9 - Recommendation Systems
12 pages
Project Movielense Solution
No ratings yet
Project Movielense Solution
4 pages
Review 1
No ratings yet
Review 1
18 pages
ADS Phase3
No ratings yet
ADS Phase3
13 pages
Analyzing IMDB Scores of Netflix Films
No ratings yet
Analyzing IMDB Scores of Netflix Films
14 pages
Review 2
No ratings yet
Review 2
21 pages
Project Movielense Solution
29% (7)
Project Movielense Solution
4 pages
Recommendation Engine Problem Statement
No ratings yet
Recommendation Engine Problem Statement
37 pages
Build Your Movie Recommendation System
No ratings yet
Build Your Movie Recommendation System
8 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
CS 2 3 4 Aml
No ratings yet
CS 2 3 4 Aml
70 pages
IMDb+Movie+Assignment Stub
No ratings yet
IMDb+Movie+Assignment Stub
9 pages
MACHINE LEARNING
No ratings yet
MACHINE LEARNING
13 pages
CMSC422 Project Presentation
No ratings yet
CMSC422 Project Presentation
17 pages
Session 1
No ratings yet
Session 1
35 pages
Predicting Movie Rating Prior To Release
No ratings yet
Predicting Movie Rating Prior To Release
15 pages
Final Review
No ratings yet
Final Review
24 pages
ML Lab Report for ECE Students
No ratings yet
ML Lab Report for ECE Students
38 pages
Iml Project Proposal
No ratings yet
Iml Project Proposal
5 pages
ForecastingMovieRatingThroughDataAnalytics REDSET2019 CCIS
No ratings yet
ForecastingMovieRatingThroughDataAnalytics REDSET2019 CCIS
10 pages
App Rating Prediction Model
No ratings yet
App Rating Prediction Model
51 pages
Mini / Basic Python Code
No ratings yet
Mini / Basic Python Code
6 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
Jangan Hapus 1
No ratings yet
Jangan Hapus 1
14 pages
Code Day 3 ML
No ratings yet
Code Day 3 ML
24 pages
Inn Aat Report
No ratings yet
Inn Aat Report
10 pages
Neel
No ratings yet
Neel
12 pages
Department of Computer Science and Engineering (Data Science) Subject: Recommender System Laboratory (DJS22DSL6012)
No ratings yet
Department of Computer Science and Engineering (Data Science) Subject: Recommender System Laboratory (DJS22DSL6012)
16 pages
NEEL (1) Edited Edited
No ratings yet
NEEL (1) Edited Edited
12 pages
NEEL (1) - Edited
No ratings yet
NEEL (1) - Edited
12 pages
Divya NM (1) - 2
No ratings yet
Divya NM (1) - 2
41 pages
Seminar Report
No ratings yet
Seminar Report
13 pages
Neel Tyagi Movie Ratings Analysis
No ratings yet
Neel Tyagi Movie Ratings Analysis
12 pages
Advanced AIML: Association Rules
No ratings yet
Advanced AIML: Association Rules
11 pages
Machine Learning for Transport Prediction
80% (5)
Machine Learning for Transport Prediction
118 pages
Movie Recommendation System Overview
No ratings yet
Movie Recommendation System Overview
11 pages
Machine Learning
100% (1)
Machine Learning
33 pages
Recommender System Unit Ii
No ratings yet
Recommender System Unit Ii
14 pages
Project 5
No ratings yet
Project 5
5 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
80 pages
IMDB Data Analysis Assignment Guide
50% (2)
IMDB Data Analysis Assignment Guide
2 pages
Recommendation Engine 1657857468
No ratings yet
Recommendation Engine 1657857468
15 pages
Movie Industry Data Insights
No ratings yet
Movie Industry Data Insights
4 pages
Report
No ratings yet
Report
11 pages
Project Report
No ratings yet
Project Report
16 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Data Science Bootcamp Insights
No ratings yet
Data Science Bootcamp Insights
161 pages
Lecture9 Recommender Systems V0
No ratings yet
Lecture9 Recommender Systems V0
52 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
Intership PPT Final
No ratings yet
Intership PPT Final
15 pages
IP CSV Project For Class 12
No ratings yet
IP CSV Project For Class 12
22 pages
4BUIS014W Business Computing-Portfolio
No ratings yet
4BUIS014W Business Computing-Portfolio
7 pages
Group 13 NAB Final
No ratings yet
Group 13 NAB Final
5 pages
Auctions
No ratings yet
Auctions
15 pages
Matching
No ratings yet
Matching
7 pages
Games
No ratings yet
Games
3 pages
FT201033 MMLA Assignment
No ratings yet
FT201033 MMLA Assignment
10 pages
Section3 Pepperfry Group1
No ratings yet
Section3 Pepperfry Group1
5 pages
House of Tata - Sec 1 - Group 5
No ratings yet
House of Tata - Sec 1 - Group 5
8 pages
Section3 - Evoe Spring Spa - Group1
No ratings yet
Section3 - Evoe Spring Spa - Group1
7 pages
Lifebuoy's Strategy for India
No ratings yet
Lifebuoy's Strategy for India
6 pages
Inbound 8546083162991566973
No ratings yet
Inbound 8546083162991566973
27 pages
Review 1 Food Wings 2
No ratings yet
Review 1 Food Wings 2
15 pages
GTODE Research Home Page
No ratings yet
GTODE Research Home Page
18 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
Remote Diagnostic Services
No ratings yet
Remote Diagnostic Services
6 pages
Data Lenge Detection Project
No ratings yet
Data Lenge Detection Project
49 pages
PCC CS601
No ratings yet
PCC CS601
4 pages
Use Property-Based Testing To Bridge LLM Code Generation and Validation
No ratings yet
Use Property-Based Testing To Bridge LLM Code Generation and Validation
13 pages
Readme (Edrw)
No ratings yet
Readme (Edrw)
2 pages
FPGA-Based Distributed Arithmetic
No ratings yet
FPGA-Based Distributed Arithmetic
13 pages
Chapter 02 Organization Strategy and Project Selection
100% (1)
Chapter 02 Organization Strategy and Project Selection
7 pages
TLT35D Section D - CONTROLS
100% (2)
TLT35D Section D - CONTROLS
34 pages
Law - Ethics - and Confidentiality in Nursing PDF
No ratings yet
Law - Ethics - and Confidentiality in Nursing PDF
20 pages
ISAR Image Generation of Ships by Inpainting Using SinGAN
No ratings yet
ISAR Image Generation of Ships by Inpainting Using SinGAN
9 pages
Tech Inventory for Retailers
No ratings yet
Tech Inventory for Retailers
30 pages
ANSUL AUTOMAN II-C Releasing Device Component Sheet
100% (1)
ANSUL AUTOMAN II-C Releasing Device Component Sheet
1 page
PROJECT PROPOSAL of Student Portal
100% (1)
PROJECT PROPOSAL of Student Portal
3 pages
Features: TI Cortex A8 Industrial Communication Gateway With 2 X LAN and 6 X COM Ports
No ratings yet
Features: TI Cortex A8 Industrial Communication Gateway With 2 X LAN and 6 X COM Ports
1 page
Security Primer Securing Login Credentials
No ratings yet
Security Primer Securing Login Credentials
1 page
Object-Oriented Game Design Basics
No ratings yet
Object-Oriented Game Design Basics
12 pages
Lattice Basis Reduction Improved Practical Algorit
No ratings yet
Lattice Basis Reduction Improved Practical Algorit
28 pages
Pc200-6 Custom Parts Book
100% (21)
Pc200-6 Custom Parts Book
261 pages
Coursera - Online Courses From Top Universities Quiz 2
0% (2)
Coursera - Online Courses From Top Universities Quiz 2
3 pages
English Basics Student Book
No ratings yet
English Basics Student Book
11 pages
Closed Loop Hall Current Sensor CYHCS-B8: Designed With A High Galvanic Isolation Between Primary and Secondary Circuits
No ratings yet
Closed Loop Hall Current Sensor CYHCS-B8: Designed With A High Galvanic Isolation Between Primary and Secondary Circuits
2 pages
Roadmap to Automation Testing
No ratings yet
Roadmap to Automation Testing
2 pages
Course5 System Design Flow On Zynq Zybo Lab
No ratings yet
Course5 System Design Flow On Zynq Zybo Lab
121 pages
Web Content Management System Overview
No ratings yet
Web Content Management System Overview
14 pages
LP 1 Math 5
No ratings yet
LP 1 Math 5
29 pages
Retable Home - Turn Complex Spreadsheets Into Smart Database
No ratings yet
Retable Home - Turn Complex Spreadsheets Into Smart Database
17 pages

Team Renegades MMLA Report

Uploaded by

Team Renegades MMLA Report

Uploaded by

Multivariate Machine Learning Algorithm

Submitted by: Team Renegades

title name of the movie

users = pd.read_csv('u.user', sep='|', names=user_columns)

rating_columns = ['user_id', 'movie_id', 'rating', 'timestamp']

movie_columns = ['movie_id', 'title', 'release_date', 'imdb_link', 'unknown', 'Action' , 'Adventure',

movie_ratings = pd.merge(movies, ratings)

Final data file in Excel format -

MODEL 1: Generalized linear model

Class Level Information

Number of Observations Read 8000

There are, basically, 2 methods/types for generating recommendation engine:

 Generalized linear modelling

• The error in the ratings predicted is measured using “t”:

PROC RECOMMEND supports the various methods:

Data Exploration & Observation

Movie Profile Table Snapshot

Results & Observation

But the RMSE 1.089953 was not so low like MF model.

b) Collaborative filtering using Proc Recommend

Table 1: Movie Lens - Numerical Evaluations using Method KNN

Figure RMSE for SVD

Output from the METHOD Statement Using the ENSEMBLE Option

Ensemble optimal coefficients for Recommender System

KNN method 0.77

For Method 2: Collaborative Filtering Using PROC RECOMMEND

/* Start a LASR Analytic Server */

libname hdfs sashdat path="/hps";

/* Load rating table into memory */

/* Load movies table into memory */

/* Assign a libref to access tables in the server */

/* Invoke PROC RECOMMEND */

/* Method -- Develop KNN model with neighbors 20 */

/* Score and make recommendations with KNN */

/* Plot the numeric results of svd */

proc sgplot data = movie_funcEval_svd;

/* Stop the LASR Analytic Server */

You might also like