Programming Questions
Recommendation System
automatically adapts to the experience and knowledge level
of the user answering the questions, recommending
questions based on the correctness of the past questions
Used in
high-level mathematical distributed computation
functions for arrays framework
machine learning
data manipulation and analysis
Analytics Vidhya dataset for Recommendation Engines
problem_data.csv user_data.csv
Problem ID User ID
Level Submission Count
Points Problems Solved
Tags Contribution
… …
Problem ID
User ID
Attempts
train_submissions.csv
Workflow
Read the input files
Training data Split the data Test data
Train the models
Test the models
Evaluate the performance
• Ranks an item according to its overall
popularity
Popularity • Item is scored by the number of times it is
seen in the training set
Model • Provides a reasonable baseline
• Simple
• Fast
• Predict a score for each possible combination
of users and items
• Most powerful model in the GraphLab Create
recommender toolkit
Factorization • Users and items are represented by weights
(user bias towards item) and factor (models
Model connection between items – e.g. one likes
horror movies then we should know what are
the horror movies)
• Learning based on SGD
• Computes the similarity between items using
the observations of users who have interacted
Item with both items (Basically the modeling of the
connexions of the items is different)
Similarity • There are three choices of similarity metrics
Model to use: Jaccard, Cosine and Pearson
• Computes the similarity between items using
Item Counter the content of each item
Model • Computed as weighted average column-wise
of the dataset
• Popularity Model – RRMSE 20%
• Item Similarity Model – RRMSE < 40%
Results • Item Counter Model – RRMSE < 40%
• Popularity Model is the most accurate on the
given dataset
• Worth researching the Deep Learning
• Many aspects to be improved
Conclusions
• We also implemented a minimal GUI in
Jupyter Notebook