0% found this document useful (0 votes)
34 views27 pages

Team Renegades MMLA Report

The document outlines a project by Team Renegades focused on developing a multivariate machine learning algorithm for a recommender system using an IMDB dataset. It details the processes of data cleansing, pre-processing, modeling, and methodology, including the use of generalized linear models and collaborative filtering techniques. The project aims to create a system similar to Netflix's recommendation engine, evaluating model performance through metrics like RMSE.

Uploaded by

Kenny Alpha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views27 pages

Team Renegades MMLA Report

The document outlines a project by Team Renegades focused on developing a multivariate machine learning algorithm for a recommender system using an IMDB dataset. It details the processes of data cleansing, pre-processing, modeling, and methodology, including the use of generalized linear models and collaborative filtering techniques. The project aims to create a system similar to Netflix's recommendation engine, evaluating model performance through metrics like RMSE.

Uploaded by

Kenny Alpha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Multivariate Machine Learning Algorithm

Project

Submitted by: Team Renegades


Mohit Verma FT201051
Sandeep Pallo FT201070
Shivang Srivastava FT201075
Rutwik Barik FT204071
Yogit Pant FT204101

Contents
Abstract.......................................................................................................................................................3
Objective.....................................................................................................................................................3
Dataset........................................................................................................................................................3
Data Cleansing.............................................................................................................................................4
Data Pre-processing.....................................................................................................................................7
Data Modelling............................................................................................................................................8
Methodology.............................................................................................................................................15
Data Exploration & Observation................................................................................................................16
Results & Observation...............................................................................................................................18
Model Performance...................................................................................................................................19
Conclusion.................................................................................................................................................22
References:................................................................................................................................................24
APPENDIX - CODE:.....................................................................................................................................25
Abstract
Recommender systems is one of the most used application of machine learning in the world today.
These system uses existing data to predict the rating or preference a user would give to an item. Netflix,
which is one of the early users of this system has used it to recommend movies and shows to the user
according to their interests. Netflix in 2006 launched a competition to develop and improve the accuracy
of the recommendation system they were using and even awarded a team of developer $ 1 million in
2009 for the challenge completion. With the help of data that we got from Kaggle, and using some
Machine learning techniques like linear regression and stochastic gradient descent we have also made a
recommender system.

Objective
Through the opensource and public datasets and SAS, we aim to build a basic recommender system that
mocks, to some extent, systems being used by Netflix etc to dissipate confusion. There is subclass of
information filtering system that seek to predict the “rating” or “preference” a user would give to an
item.

Dataset
We are working on an IMDB dataset for movies containing the following columns:

title name of the movie


release_date release date of the movie
imdb_link imdb link of the movie
user_id user id of the user who gave the ratings
rating ratings received by the user
timestamp time of the rating
Age age of the user
Sex gender of the user
Profession profession of the user
Post Code pin code of the user
Action genre of the movie
Adventure genre of the movie
Animation genre of the movie
Children genre of the movie
Comedy genre of the movie
Crime genre of the movie
Documentary genre of the movie
Drama genre of the movie
Fantasy genre of the movie
Film-Noir genre of the movie
Horror genre of the movie
Musical genre of the movie
Mystery genre of the movie
Romance genre of the movie
Sci-Fi genre of the movie
Thriller genre of the movie
War genre of the movie
Western genre of the movie

Data Cleansing
Data1- u.genre
This data displayed the various genres of the movies.
Data2 – u.data
This data contains – user_id, movie_id, rating, timestamp of all the movies.

Data3 – u.item

This data contained the data specific to movie name, year of release, dateOfRelease, imdb link, and
genre data.
Code to convert the data into excel format –

import pandas as pd
user_columns = ['user_id', 'Age', 'Sex','Profession', 'Post Code']
# The above line of code assigns column name to the data

users = pd.read_csv('u.user', sep='|', names=user_columns)


# Reads the data separated by ‘|’

users.to_excel('Newdata.xlsx',sheet_name='Sheet1')
# Writes the data to excel

rating_columns = ['user_id', 'movie_id', 'rating', 'timestamp']


ratings = pd.read_csv('u.data', sep='\t', names=rating_columns)
ratings.to_excel('ratings.xlsx',sheet_name='Sheet1')
# Same code for rating data file and converted to excel

movie_columns = ['movie_id', 'title', 'release_date', 'imdb_link', 'unknown', 'Action' , 'Adventure',


'Animation', 'Children', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']
movies = pd.read_csv('u.item', sep='|', names=movie_columns, encoding="iso-8859-1")
movies.to_excel('movies.xlsx',sheet_name='Sheet1')
# Code for assigning column name to movie data file and converting to excel

movie_ratings = pd.merge(movies, ratings)


movie_ratings.to_excel('movie_ratings.xlsx',sheet_name='Sheet1')
movie_data = pd.merge(movie_ratings, users)
movie_data.to_excel('movie_data.xlsx',sheet_name='Sheet1')
# Merged all the files into one file

Final data file in Excel format -

Data Pre-processing
Real-life data typically needs to be pre-processed (e.g. cleansed, filtered, transformed) in order to be
used by the machine learning techniques in the analysis step.

There are NA values appearing in most of the columns. They are in proportion of less than 10 percent so
we checked if all corresponding columns contain NAs. Removed all the records containing NA values.
This data is not erroneous, so we didn’t perform clamping and imputation on this data.
Data Modelling

MODEL 1: Generalized linear model


“proc glm” is used to estimate the model. GLM stands for global linear model and is used to create
predictive models to like regression, ANOVA, MANOVA etc. In this case a linear regression model was
created to model the rating of a particular movie based on the attributes chosen.

The categorical variable genre was dummy coded manually however the categorical variables Sex and
Profession were defined as categorical using class command. The interpretations will be done
accordingly.

Model Estimation

Class Level Information


Class Levels Values
Sex 2 FM
Profes 21 administrator artist doctor educator engineer entertainment executive healthcare homemaker lawyer
sion librarian marketing none another programmer salesman scientist student technician writer retired

Number of Observations Read 8000


0
Number of Observations Used 8000
0

Sum of
Source DF Squares Mean Square F Value Pr > F
Model 40 5973.1157 149.3279 125.70 <.0001
Error 7995 94991.0190 1.1880
9
Corrected Total 7999 100964.1347
9
R-Square Coeff Var Root MSE rating Mean
0.059161 31.06550 1.089953 3.508563

Standard Pr > |
Parameter Estimate Error t Value t|
Intercept 2.881279575 B 0.0436606 65.99 <.0001
4
Age 0.007018156 0.0004648 15.10 <.0001
6
Sex F 0.144114480 B 0.0101337 14.22 <.0001
8
Sex M 0.000000000 B . . .
Profession administrator 0.259303229 B 0.0359054 7.22 <.0001
0
Profession artist 0.406662064 B 0.0472285 8.61 <.0001
8
Profession doctor 0.370030015 B 0.0617327 5.99 <.0001
4
Profession educator 0.292391378 B 0.0350525 8.34 <.0001
1
Profession engineer 0.237993693 B 0.0359078 6.63 <.0001
1
Profession entertainment 0.202892464 B 0.0425241 4.77 <.0001
0
Profession executive 0.006214289 B 0.0391138 0.16 0.8738
1
Profession healthcare - B 0.0416045 -18.27 <.0001
0.760067399 5
Profession homemaker - B 0.0949321 -2.25 0.0242
0.213950199 1
Profession lawyer 0.549049678 B 0.0464403 11.82 <.0001
4
Standard Pr > |
Parameter Estimate Error t Value t|
Profession librarian 0.158942407 B 0.0378872 4.20 <.0001
1
Profession marketing 0.167404451 B 0.0442378 3.78 0.0002
2
Profession none 0.531568622 B 0.0519826 10.23 <.0001
6
Profession other 0.257386646 B 0.0361214 7.13 <.0001
9
Profession programmer 0.266529148 B 0.0364292 7.32 <.0001
9
Profession salesman 0.328136076 B 0.0559640 5.86 <.0001
9
Profession scientist 0.297812521 B 0.0455480 6.54 <.0001
3
Profession student 0.296919456 B 0.0369032 8.05 <.0001
3
Profession technician 0.233525682 B 0.0391071 5.97 <.0001
4
Profession writer - B 0.0376912 -0.15 0.8822
0.005585268 6
Profession retired 0.000000000 B . . .
Action - 0.0115898 -6.47 <.0001
0.075004458 7
Adventure 0.079638506 0.0132593 6.01 <.0001
3
Animation 0.370321053 0.0250168 14.80 <.0001
8
Children - 0.0193290 -10.63 <.0001
0.205378600 2
Comedy - 0.0108978 -6.00 <.0001
0.065362251 7
Crime 0.112192155 0.0148968 7.53 <.0001
2
Standard Pr > |
Parameter Estimate Error t Value t|
Documentary 0.314057928 0.0455823 6.89 <.0001
6
Drama 0.246236947 0.0107056 23.00 <.0001
1
Fantasy - 0.0337473 -6.98 <.0001
0.235578221 8
Film_Noir 0.383532254 0.0327502 11.71 <.0001
9
Horror - 0.0179418 -6.51 <.0001
0.116849031 2
Musical 0.078526508 0.0200557 3.92 <.0001
2
Mystery 0.108358613 0.0196747 5.51 <.0001
3
Romance 0.109921934 0.0102662 10.71 <.0001
8
Sci_Fi 0.106909952 0.0128755 8.30 <.0001
3
Thriller 0.039244664 0.0112107 3.50 0.0005
1
War 0.272366723 0.0143213 19.02 <.0001
5
Western 0.168631501 0.0281280 6.00 <.0001
2

After performing our analysis most of the independent variables came significant which can be referred
while suggesting movies to a user of same age, profession or gender. A movie of particular genre
popular among a similar group can also be recommended to similar people.
MODEL 2: Recommender Engine using Collaborative Filtering-Using PROC RECOMMEND

There are, basically, 2 methods/types for generating recommendation engine:


Content Filtering: This uses entity description (Meta data) or side information. This uses entity
description (Meta data) or side information. Goes beyond just watching a film. It examines the features
relative to you and the film to determine whether there is a match based on the larger categories
represented by the features. For example, if you are a woman who likes action films, the recommender
will search for suggestions that include the intersection of these two categories.

Collaborative Filtering: Calculate the target ITEMS similarity measure and find the minimum (Euclidean
distance, or Cosine distance, or another metric, depending on the algorithm). This is achieved by filtering
a user's desires, and gathering (collaborating) preferences from many others. Matches rates based on
the similarities of the movies or products used in the past. Can get recommendations based on items
liked by people similar to you or on items similar to those you like.

Matrix factorization (SVD) & KNN, under collaborative filtering, is an effective algorithm used to create
a recommender system. The approach is for approximating the matrix of ratings:
Rm×n by the product of two matrixes containing lower dimensions, Pk×m and Qk×n in a way that

For Ex. pu is the u-th column of P, and qv is the v-th column of Q, then
the movie rating placed by the user u on the item v would be predicted as puqv

A usual equation for the P and th Q is given by the below optimization problem:
where the (u,v) (u,v) are the locations of the real entries in the RR, ru,v is the real rating, f is the loss
function, and μP, μQ, λP, λQ are the usual penalization parameters used by many algorithms to avoid
overfitting.

Model training is the method for solving the matrix P and Q, and the choice of penalization parameters i
s the tuning of hyper parameters. We can then estimate, after obtaining the P and the Q:

Note:
 Pearson’s correlation (Modelling based on correlation): The recommendation engine's purpose
is to help users find new movies to watch, given their preferences (as seen in the ratings). The
algorithm used is quite simple: it is a Pearson correlation coefficient between the ratings vectors
of each pair of correlated films (i.e., find every user who rated both film A and film B, and the
associated ratings). It encompasses-1 to 1, the closer to 1 the higher the correlation
 • The underlying assumption of Collaborative filtering is that if an individual X is of the same
opinion as a person Y then the recommendation method should be focused on person Y
(similarity) preferences.
 Matrix factorization algorithms work Matrix factorization algorithms work by decomposing the
matrix of user- item interaction into the product of rectangular matrices of two dimensions.
Methodology
Many algorithms and data transformations where applied in order to achieve the lowest RMSE. Such as:

 Generalized linear modelling


 Collaborative filtering using PROC RECOMMEND
o Memory-based algorithms
- Slope one (slope1)
- K nearest neighbours (knn)
o Model-based algorithms
- Matrix factorization (svd)
o Mixture of different methods
- Ensemble (ensemble)

Method Evaluation: This was done by comparing R.M.S.E. between these three methods. Root mean
square error (RMSE) refers to the amount by which the estimator's predicted values differ from the
estimated quantities (typically outside the sample from which the model was estimated).

• The error in the ratings predicted is measured using “t”:

o Here, predicted is the rating predicted by the model and Actual is the original rating
o If a consumer has given a rating of 5 to a film and we have predicted a rating of 4, then
RMSE is 1
o Less the RMSE value, better the recommendations

The metric above tells us how reliable our recommendations are but does not focus on the order of
recommendations, i.e. they do not focus on which product to recommend higher. Lesser the RMSE
value, better the recommendations

PROC RECOMMEND
For collaborative filtering we'll use PROC RECOMMEND. Infact, PROC RECOMMEND takes explicit user
ratings for items as inputs and outputs individual user recommendations.

PROC RECOMMEND supports the various methods:


Memory-based algorithms
- Slope one (slope1)
- K nearest neighbors (KNN)
Model-based algorithms
- Matrix factorization (svd)
Market basket analysis
- Association rule (arm)
Mixture of different methods
- Clustering (cluster)
- Ensemble (ensemble)

Key Requirement: Generally, R & python are used or preferred as language for modelling
recommendation engine. In SAS, we use PROC RECOMMEND. In order to use this, we need to have SAS
Visual Analytics (along with SAS studio) installed so that we can run LASR Analytical Server. This was
done as per requirement of the project.

The recommender system's main task is to predict unknown entries in the rating matrix based on observ
ed values.

Data Exploration & Observation


 Actually, MovieLense is a sparse matrix, a matrix that compresses data by removing most zero v
alues. When necessary, you can convert it into a standard dense matrix for specific statistics.
 Rankings range from 1 to 5, and the rankings are more positive than the negative ones. Often
this happens with rating data: it has a certain imbalance in favor of positive data because users
tend to buy or watch what they think they want. Deception also gives rise to negative ratings
because expectations are not met

Using IMSTAT procedure: Using IMSTAT procedure: Prior to invoking the RECOMMEND procedure,
by invoking the IMSTAT procedure, we can print a part of the tables included in the recommender
program. Code in (Appendix)
The first FETCH statement prints a portion of the MovieRating table. The table contains rating
information made by users (customers) about items (movies).
Sample data Snapshot
Subsequent FETCH statements print a portion of the Movie & User Profile tables comprising of info
about each movie and user.

Movie Profile Table Snapshot


User Profile Table Snapshot

Results & Observation

The GLM Model (RMSE 1.089953) has similar evaluation process with Matrix factorization with SVD.
That’s why put more weight on the below features
output - Standardized Coefficient Magnitudes (standardized coefficient magnitudes).

But the RMSE 1.089953 was not so low like MF model.

b) Collaborative filtering using Proc Recommend


We used two different methods of generating explicitly rating recommendations from data. The
recommender program is initiated with the PROC RECOMMEND command. Each METHOD statement
sets out a design guideline to be developed. There are various options which correspond to the different
methods.
CODE Explanation (Full code in Appendix)
The K = choice specifies the number of neighbours employed for the KNN process. The option
SIMILARITY = PC states that a similarity to the Pearson Coefficient is used when calculating the distance
between two objects. The preference FACTORS=20 determines the matrix factorisation latent factors.
In the ENSEMBLE method and SVD method the WITHHOLD= option defines the proportion of users who
hold the number of ratings specified by the HOLD= option. A certain number of ratings are held for
validation by specifying the options WITHHOLD= and HOLD=. Ten per cent of users are selected in this
case.
In the PREDICT statement the NUM= option specifies the number of recommendations to be generated
for each user specified in the USERS= option. The recommender system can be supplemented with
several different methods, each receiving a name with the LABEL= option to the METHOD statement.
When using the PREDICT expression, you can use the LABEL= option to use the defined model with the
same name. When you use the PREDICT statement, you can specify the same name with the LABEL=
option to use the specified model.

The ADD statement introduces SAS LASR Analytic Server with a recommender feature, MovieLens.

In the recommender method, three ADDTABLE statements add the MovieRating, MovieProfile, and UserProfile
tables.

Each METHOD statement adds a method to the recommender system for calculating recommendations. With
options specified for each method, the methods KNN, SLOPE1, SVD, and ENSEMBLE were added. The PREDICT
statement generates five predictions for each specified user (1, 33, 478, and 2035)

The lowest RMSE surprisingly was achieved using KNN method for Memory based collaborative filtering
(RMSE 0.7705).
Even lower RMSE can be accomplished with more hyper parameters tuning. But as with KNN m
odels, not so low.
It seems that the most important features where:
-number of users rated the film -age of film = more ratings and a higher mean rating
-the film Id and
-Drama = (genre with the most ratings)
The other features didn't have a low p value and didn't improve the performance of the platform but the
over suit it.

Model Performance
It is important to examine the model performance before you proceed with using a model to
generate recommendations. PROC RECOMMEND provides ODS tables displaying the
performance of the models and allowing you to determine the performance of the model using
various criteria. The ENSEMBLE approach is used for the KNN system to measure the numerical
performance of the data sets for training and validation. The test is carried out by defining the
option METHODS=("KNN), "and the option CONSTRAINT. The numeric evaluation for KNN is
given as under:-.

Table 1: Movie Lens - Numerical Evaluations using Method KNN


The efficiency of the model for the SVD system is evaluated directly within the method. The
example code shows how to save the numeric performance table as a SAS table, using the ODS
OUTPUT statement, which can be used for visual examination. Figure 1 shows a root mean
square error plot and mean absolute error as a function of the training and validation data sets
for SVD method iterations.

If it does not perform well according to the selected numerical criteria, a model might not be
used. In this example, root means square error values of the KNN method (RMSE: 0.7705) are
lower than the holdout sample SVD method (RMSE: 0.9151). Due to better results the example
code demonstrates the KNN method used with the PREDICT statement to scor and produce
recommendations..

Figure RMSE for SVD


Figure 1(b): Mean Absolute Error for SVD

The PREDICT statement produces a recommendations tables for each user. See

Output from the METHOD Statement Using the ENSEMBLE Option

Ensemble optimal coefficients for Recommender System


Table for recommendations using the KNN method.

Conclusion
Based on the results, Collaborative filtering i.e KNN method

KNN method 0.77


Matrix Factorization (SVD) 0.91
RMSE for Collaborative filtering method using
Here, the KNN method has lower values (RMSE: 0.7705) of root means square error than the
SVD method (RMSE: 0.9151) for holdout sample. Because of this, the KNN method has been
used for scoring and generating recommendations with the PREDICT statement.
Future Work Scope
With web scraping we could add more dimensions into our datasets such us:

-budget of movie
-critics rating and
-duration of movie and compare the RMSE’s
Besides these methods one can also look at
 H2o Auto ML model
 H2o GLM model
 H2o GBM model
 H2o Random Forest model
 H2o Ensemble (Trees or clustering) model
As per the extensive research conducted on the MovieLens Datasets, above methods have
more or less satisfactory prediction and recommendation accuracy with low RMSE.
Problem of Cold start: Collaborative filtering-based recommendation engine can’t predict for
new movies, which have no rating data. Though this can be done using Content based
filtering. Therefore, there is a need for a hybrid method.
Using bigger movie lens data with metadata
References:
• Mueller, John, and Luca Massaron. Machine Learning for Dummies. John Wiley & Sons, Inc.,
2016.

• https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-recommendation-
engine-python/

• http://rpubs.com/Papacosmas/harvard

• http://rpubs.com/xinmeiz/movielens2

• https://rpubs.com/mrmelchi/492432

• https://blog.dataiku.com/2012/09/10/a-simple-recommendation-engine-implemented-in-
different-languages

• https://pages.dataiku.com/hubfs/Guidebooks/Recommendation%20Engines/
recommendation_engine_guidebook.pdf

• http://support.sas.com/documentation/cdl/en/inmsref/67597/HTML/default/
viewer.htm#p1ikis8zzd3iszn1t1hsxun24qe3.htm

• https://documentation.sas.com/?
docsetId=inmsref&docsetTarget=n0em119do6om6en1euhs41lem60p.htm&docsetVersion=2.82&locale
=en

• https://support.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/
2018/2095-2018.pdf

• https://muffynomster.wordpress.com/2015/06/07/building-a-movie-recommendation-engine-
with-r/

• https://www.r-bloggers.com/unleash-the-potential-of-recommender-systems/
APPENDIX - CODE:

For Method 2: Collaborative Filtering Using PROC RECOMMEND

Before invoking the RECOMMEND procedure, we can print a part of the tables that are included in the
recommender system by invoking the IMSTAT procedure.
proc imstat;
table MYlasr.movierating;
fetch/format;
run;

table MYlasr.movieprofile;
fetch/format;
run;

table MYlasr.userprofile;
fetch/format;
run;
quit;
The first FETCH statement prints a portion of the MovieRating table. The table contains rating information made by
users (customers) about items (movies).

/* Start a LASR Analytic Server */


options set=GRIDHOST="grid001.example.com";
options set=GRIDINSTALLLOC="/opt/TKGrid";
%Let portNumber = 10010;

libname hdfs sashdat path="/hps";


proc lasr create path="/tmp" port=&portNumber noclass;
performance nodes=all;
run;

/* Load rating table into memory */


proc lasr add port=&portNumber data = hdfs.ratings;

/* Load movies table into memory */


proc lasr add port=&portNumber data = hdfs.movies;
run;

/* Assign a libref to access tables in the server */


libname lasr sasiola port = &portNumber tag="hps";

/* Invoke PROC RECOMMEND */


proc recommend port=&portNumber recom = rs.movie;
/* Add a new recommendation project */
add rs.movie / item = movieid user = userid rating = rating;
/* Add tables */
addtable lasr.ratings / recom = rs.movie type = rating vars = (movieid userid rating);
addtable lasr.movies / recom = rs.movie type = item;
run;

/* Method -- Develop KNN model with neighbors 20 */


method knn / label="knn"
k=20
positive
similarity=pc
seed=1234;
/* Method -- Use ensemble to evaluate KNN for training and validation*/
method ensemble /label="knn_eva"
methods=("knn")
withhold=0.1
hold=1
details
constraint
seed=1234
fconv=1e-3
gconv=1e-3
maxiter=100
;
run;
/* Method -- Develop and evaluate a svd LBFGS with 20 factors for training and validation */
method svd / label="svd"
factors=20
fconv=1e-3
gconv=1e-3
maxiter=100
seed=1234
MAXFEVAL=5000
function=L2
lamda=0.2
technique=lbfgs
withhold=0.1
hold=1
details
;
ods output RecommenderFuncEvalInfo = movie_funcEval_svd;
run;

/* Score and make recommendations with KNN */


predict /label = "knn"
method = knn
Num = 5
users = ("1","33");
run;

remove rs.movie;
run;

quit;

/* Plot the numeric results of svd */


proc sgplot data = movie_funcEval_svd;
title "Movie Lens - Matrix Factorization Model";
title2 "Root Mean Square Error";
series x=NumFunc y=RMSE /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="RMSE for Training Data";
series x=NumFunc y=RMSEhold /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="RMSE for Hold Out";
xaxis label = 'Number of Iterations' type = discrete grid values = (0 to 100 by 5);
yaxis label = 'Root Mean Square Error';
run;

proc sgplot data = movie_funcEval_svd;


title "Movie Lens - Matrix Factorization Model";
title2 "Mean Absolute Error";
series x=NumFunc y=MAE /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="MAE for Traing Data";
series x=NumFunc y=MAEhold /MARKERS LINEATTRS = (THICKNESS = 2) legendlabel ="MAE for Hold Out";
xaxis label = 'Number of Iterations' type = discrete grid values = (0 to 100 by 5) ;
yaxis label = 'Mean Absolute Error';
run;

/* Stop the LASR Analytic Server */


proc lasr term port = &portNumber;
run

To view the rating history for users 1 and 33 that were used for generating recommendations,
you can use the following example code.
/* Continue to use the lasr libref from the previous code example */
/* Use SCHEMA to join the ratings and movies tables. */
proc imstat data=lasr.ratings;
schema movies (movieid=movieid);
run;

/* Find the number of ratings for users 1 and 33 and view them. */
table lasr.&_templast_;
where userid in (1 33);
numrows / save=numrowstab;
run;
store numrowstab(_last_, 1) = obscount;
run;
fetch / format orderby=(userid rating movies_genres)
descending=(rating) to=&obscount.;
run;
quit;

You might also like