100% found this document useful (1 vote)
386 views

Project Report PDF

This document is a project report submitted by Harsh Negi to fulfill the requirements of a Bachelor of Engineering degree in Computer Science Engineering with a specialization in Big Data Analytics from Chandigarh University. The project aims to develop a Spotify playlist recommendation system using machine learning algorithms like KNN, random forest classifier, and decision trees. It analyzes attributes of a user's Spotify playlists and generates recommendations to create a new playlist. The project was developed in Python using libraries like pandas, Spotipy, sklearn, and Jupyter Notebook.

Uploaded by

Harsh Negi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
386 views

Project Report PDF

This document is a project report submitted by Harsh Negi to fulfill the requirements of a Bachelor of Engineering degree in Computer Science Engineering with a specialization in Big Data Analytics from Chandigarh University. The project aims to develop a Spotify playlist recommendation system using machine learning algorithms like KNN, random forest classifier, and decision trees. It analyzes attributes of a user's Spotify playlists and generates recommendations to create a new playlist. The project was developed in Python using libraries like pandas, Spotipy, sklearn, and Jupyter Notebook.

Uploaded by

Harsh Negi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Spotify Playlist Recommendation System

A Project Report

Submitted in the partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING
IN

Computer Science Engineering

(Spl. in Big Data Analytics)

Submitted by:
Harsh Negi

University Roll Number:


20BCS3935

Under the Supervision of:


Mr. Manvinder Singh

CHANDIGARH UNIVERSITY, GHARUAN, MOHALI - 140413,


PUNJAB

Nov 2022

1
Acknowledgement

The project work in this report is an outcome of continuous work over a period
and drew intellectual support from various sources. I would like to articulate
our profound gratitude and indebtedness to those persons who helped us in
completion of the project. I take this opportunity to express my sincere thanks
and deep gratitude to all those people who extended their wholehearted co-
operation and have helped me in completing this project successfully.

I am thankful to my supervisor Mr. Manvinder Singh and co-supervisor Mrs.


Shivani for assisting me in making the project successful. I would also like to
thank my other fellow mates for guiding and encouraging me throughout the
duration of the project. The project work in this report is an outcome of
continuous work over a period and drew intellectual support from various
sources.

Harsh Negi
20BCS3935

2
List of Figures

• Fig 1- Waterfall Model


• Fig 2- Spotify Profile
• Fig 3 – Spotify Playlist
• Fig 3 – Spotify Developers Dashboard
• Fig 4 - Authentication with Spotify
• Fig 5 – Reading Playlist Data
• Fig 6 – Playlist information
• Fig 7- Visualization of the components
• Fig 8 – Fitting Dataset for analysis
• Fig 9 – Using K Neighbors & Random Forest Classifier
• Fig 10 – Using Decision Tree
• Fig 11 - Generating Songs from Spotify
• Fig 12 - Creating Songs List and Checking the Data
• Fig 13 -Creating Playlist on Spotify
• Fig 14 - Spotify Playlist
• Fig 15 - Inserting songs into playlist
• Fig 16 - Final Output

3
List of Abbreviations

• KNN – K Nearest Neighbors


• ML – Machine Learning
• PCA – Principal Component Analysis
• API – Application Programming Interface

4
ABSTRACT

The project “Spotify Playlist Recommendation System” is a recommendation


system program designed for creating a Spotify Playlist based on user’s
Spotify playlist from which several attributes were analyze of the playlist’s track
list with use of several Data Science Algorithms.

The project uses concepts such as Random Forest Classifier, Decision Trees
to generate recommendations. It uses both content based and collaborative
based recommendation system working. The project enables users to create
playlist based on their previous playlist which they created on their own liking.

The project is created in Python programming language with several libraries


such as pandas, sklearn etc. Spotify API is also used using Developers Spotify
Option.

5
CONTENTS

Title Page 1
Acknowledgement 2
List of Figures 3
List of Abbreviations 4
ABSTRACT 5
CONTENTS 6
CHAPTER 1: INTRODUCTION
1.1 Problem Definition 7
1.2 Project Overview 7
1.3 Project Specifications 7
1.4 Literature Review 8
CHAPTER 2: THEORY
2.1 Machine Learning 9
2.2 KNN 9
2.3 Random Forest Classifier 10
2.4 Decision Trees 10
2.5 Recommendation System 11
2.6 Music Recommendation System 13
2.7 Application Programming Interface 14
2.8 Principal Component Analysis 15
CHAPTER 3: METHODOLOGY ADOPTED
4.1 Software Development Life Cycle 16
4.2 Python 18
4.3 Python Modules 19
4.4 Jupyter Notebook 24
4.5 Version Control System 25
4.6 Documentation 26
CHAPTER 4: RESULTS AND DISCUSSIONS 27
CHAPTER 5: CONCLUSIONS AND FUTURE SCOPE OF STUDY 34
REFERENCES 38

6
CHAPTER 1: INTRODUCTION

1.1 Problem Definition

Many music listeners have turned to listen to online music. The big data
technology has made it possible that music listeners could get access to music
as they want. Online service of music subscription has been increasingly
popular in the era of cloud computing. The advancement of cloud techniques
eases users to get access to an unlimited number of songs.
Some streaming music company such as Spotify, Pandora, and YouTube are
affording users with access to songs to their paid members. The playlist is a
special function of these streaming apps. Many users feel difficult to create a
list from a long list of music. As a result, users tend to play next song by in a
random mode or by recommendation

1.2 Project Overview

The project uses concepts such as Random Forest Classifier, Decision Trees
to generate recommendations. It uses both content based and collaborative
based recommendation system working. The project enables users to create
playlist based on their previous playlist which they created on their own liking.
The project takes Spotify API (Spotify Developers) credentials created by the
user for authenticating with Spotify to process the data in the machine. The
user provides its username and playlist with their song ratings. After analyzing
the programs generates random number of songs which it adds to newly
created Spotify playlist which is directly inspired from the given playlist.

1.3 Project Specifications

• Programming Language – Python


• Python Libraries – pandas, Spotipy, sklearn, Numpy, seaborn,
matplotlib
• IDE – Jupyter Notebook
• Documentation – Microsoft Word
• Version Control - Git

7
1.4 Literature Survey

Over the years, recommender systems have been studied widely and are
divided into different categories according to the approach being used. The
categories are collaborative filtering (CF), content based and context based.

Collaboration filtering
Collaborative filtering uses the numerical reviews given by the user and is
mainly based upon the historical data of the user available to the system. The
historical data available helps to build the user profile and the data available
about the item is used to make the item profile. Both the user profile and the
item profile are used to make a recommendation system. The Netflix
Competition has given much popularity to collaborative filtering, Collaborative
filtering is considered the most basic and the easiest method to find
recommendations and make predictions regarding the sales of a product. It
does have some disadvantages which has led to the development of new
methods and techniques.

Content Based Recommender System


Content based systems focus on the features of the products and aim at
creating a user profile depending on the previous reviews and also a profile of
the item in accordance with the features it provides and the reviews it has
received. It is observed that reviews usually contain product feature and user
opinion in pairs. It is observed that users’ reviews contain a feature of the
product followed by his/her opinion about the product. Content based
recommendation systems help overcome sparsity problem that is faced in
collaborative filtering-based recommendation system.

Context Based Recommender System


Extending the user/item convention to the circumstances of the user to
incorporate the contextual information is what is achieved in context-based
recommender systems.
Recommender structures are proving to be a useful device for addressing a
part of the records overload phenomenon from the internet The primary
technology of recommender system used conventional web sites to gather
information from the following sources:
(a) content material-primarily based records
(b) demographic statistics, and
(c) memory-primarily based information.
8
CHAPTER 2: THEORY

Machine Learning

A machine learning model is the output of the training process and is defined
as the mathematical representation of the real-world process. The machine
learning algorithms find the patterns in the training dataset, which is used to
approximate the target function and is responsible for mapping the inputs to
the outputs from the available dataset. These machine learning methods
depend upon the type of task and are classified as Classification models,
Regression models, Clustering, Dimensionality. Reductions, Principal
Component Analysis, etc. Machine learning is no exception, and a good flow
of organized, varied data is required for a robust ML solution. In today’s online-
first world, companies have access to a large amount of data about their
customers, usually in the millions. This data, which is both large in the number
of data points and the number of fields, is known as big data due to the sheer
amount of information it holds.

KNN

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-


parametric supervised learning method first developed by Evelyn
Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is
used for classification and regression. In both cases, the input consists of
the k closest training examples in a data set. The output depends on
whether k-NN is used for classification or regression:

• In k-NN classification, the output is a class membership. An object is


classified by a plurality vote of its neighbors, with the object being
assigned to the class most common among its k nearest neighbors (k is
a positive integer, typically small). If k = 1, then the object is simply
assigned to the class of that single nearest neighbor.

• In k-NN regression, the output is the property value for the object. This
value is the average of the values of k nearest neighbors.

k-NN is a type of classification where the function is only approximated locally


and all computation is deferred until function evaluation. Since this algorithm
relies on distance for classification, if the features represent different physical
units or come in vastly different scales then normalizing the training data can
improve its accuracy dramatically.
9
Both for classification and regression, a useful technique can be to assign
weights to the contributions of the neighbors, so that the nearer neighbors
contribute more to the average than the more distant ones. For example, a
common weighting scheme consists in giving each neighbor a weight of 1/d,
where d is the distance to the neighbor.
The neighbors are taken from a set of objects for which the class (for k-NN
classification) or the object property value (for k-NN regression) is known. This
can be thought of as the training set for the algorithm, though no explicit
training step is required.

Random Forest Classifier

Random forests or random decision forests is an ensemble learning method


for classification, regression and other tasks that operates by constructing a
multitude of decision trees at training time. For classification tasks, the output
of the random forest is the class selected by most trees. For regression tasks,
the mean or average prediction of the individual trees is returned. Random
decision forests correct for decision trees' habit of overfitting to their training
set. Random forests generally outperform decision trees, but their accuracy is
lower than gradient boosted trees However, data characteristics can affect
their performance.

The Random Forest or Random Decision Forest is a supervised Machine


learning algorithm used for classification, regression, and other tasks using
decision trees. The Random Forest classifier creates a set of decision trees
from a randomly selected subset of the training set. It is basically a set of
decision trees (DT) from a randomly selected subset of the training set and
then it collects the votes from different decision trees to decide the final
prediction.

Decision Trees

Decision Tree is the most powerful and popular tool for classification and
prediction. A Decision tree is a flowchart-like tree structure, where each
internal node denotes a test on an attribute, each branch represents an
outcome of the test, and each leaf node (terminal node) holds a class label.

10
A tree can be “learned” by splitting the source set into subsets based on an
attribute value test. This process is repeated on each derived subset in a
recursive manner called recursive partitioning. The recursion is completed
when the subset at a node all has the same value of the target variable, or
when splitting no longer adds value to the predictions. The construction of a
decision tree classifier does not require any domain knowledge or parameter
setting, and therefore is appropriate for exploratory knowledge discovery.
Decision trees can handle high-dimensional data. In general decision tree
classifier has good accuracy. Decision tree induction is a typical inductive
approach to learn knowledge on classification

Recommendation System

Recommender systems usually make use of either or both collaborative


filtering and content-based filtering (also known as the personality-based
approach), as well as other systems such as knowledge-based systems.
Collaborative filtering approaches build a model from a user's past behavior
(items previously purchased or selected and/or numerical ratings given to
those items) as well as similar decisions made by other users. This model is
then used to predict items (or ratings for items) that the user may have an
interest in. Content-based filtering approaches utilize a series of discrete, pre-
tagged characteristics of an item in order to recommend additional items with
similar properties.

We can demonstrate the differences between collaborative and content-based


filtering by comparing two early music recommender systems
– Last.fm and Pandora Radio.
• Last.fm creates a "station" of recommended songs by observing what
bands and individual tracks the user has listened to on a regular basis
and comparing those against the listening behavior of other users.
Last.fm will play tracks that do not appear in the user's library, but are
often played by other users with similar interests. As this approach
leverages the behavior of users, it is an example of a collaborative
filtering technique.
• Pandora uses the properties of a song or artist (a subset of the 400
attributes provided by the Music Genome Project) to seed a "station" that
plays music with similar properties. User feedback is used to refine the
station's results, deemphasizing certain attributes when a user "dislikes"
a particular song and emphasizing other attributes when a user "likes" a
song. This is an example of a content-based approach.
11
Each type of system has its strengths and weaknesses. In the above example,
Last.fm requires a large amount of information about a user to make accurate
recommendations. This is an example of the cold start problem, and is
common in collaborative filtering systems. Whereas Pandora needs very little
information to start, it is far more limited in scope (for example, it can only make
recommendations that are similar to the original seed).
Recommender systems are a useful alternative to search algorithms since
they help users discover items they might not have found otherwise. Of note,
recommender systems are often implemented using search engines indexing
non-traditional data. Recommender systems have been the focus of several
granted patents

There are three approaches for using a recommendation system.

1. Collaborative filtering - Collaborative filtering is based on the assumption


that people who agreed in the past will agree in the future, and that they
will like similar kinds of items as they liked in the past. The system
generates recommendations using only information about rating profiles
for different users or items. By locating peer users/items with a rating
history similar to the current user or item, they generate
recommendations using this neighborhood. Collaborative filtering
methods are classified as memory-based and model-based. A well-
known example of memory-based approaches is the user-based
algorithm, while that of model-based approaches is Matrix Factorization.

A key advantage of the collaborative filtering approach is that it does not


rely on machine analyzable content and therefore it is capable of
accurately recommending complex items such as movies without
requiring an "understanding" of the item itself. Many algorithms have
been used in measuring user similarity or item similarity in recommender
systems.

2. Content-based Filtering - Content-based filtering methods are based on


a description of the item and a profile of the user's preferences. These
methods are best suited to situations where there is known data on an
item (name, location, description, etc.), but not on the user. Content-
based recommenders treat recommendation as a user-specific
classification problem and learn a classifier for the user's likes and
dislikes based on an item's features.
12
In this system, keywords are used to describe the items, and a user
profile is built to indicate the type of item this user likes. It does not rely
on a user sign-in mechanism to generate this often-temporary profile. In
particular, various candidate items are compared with items previously
rated by the user, and the best-matching items are recommended.

3. Hybrid Recommendation Approach - Hybrid approaches can be


implemented in several ways: by making content-based and
collaborative-based predictions separately and then combining them; by
adding content-based capabilities to a collaborative-based approach
(and vice versa); or by unifying the approaches into one model (see for
a complete review of recommender systems). Several studies that
empirically compare the performance of the hybrid with the pure
collaborative and content-based methods and demonstrated that the
hybrid methods can provide more accurate recommendations than pure
approaches. These methods can also be used to overcome some of the
common problems in recommender systems such as cold start and the
sparsity problem, as well as the knowledge engineering bottleneck
in knowledge-based approaches.
Netflix is a good example of the use of hybrid recommender systems.
The website makes recommendations by comparing the watching and
searching habits of similar users (i.e., collaborative filtering) as well as
by offering movies that share characteristics with films that a user has
rated highly (content-based filtering).

Music Recommendation System

Music recommendation system is a recommendation engine which


recommends songs to the user based on his interest. There are many types of
music to which one can listen. World of so music is so big that a person can’t
explore all the songs he’d like and listen to them. So, we build a model which
assists a person in identifying songs he might like. These notes, which are the
pitch and duration of a sound, identify music. There are unique features of
songs which are determined by these notes. We can group the songs which
are similar based on these features. Whenever user listens to a song S, the
song which has similar features to S is recommended to him. The history of
the songs that user has listened to is also maintained according to the genre
and a playlist of songs of his interest is created.

13
Application Programming Interface

An application programming interface (API) is a way for two or more computer


programs to communicate with each other. It is a type of software interface,
offering a service to other pieces of software. A document or standard that
describes how to build or use such a connection or interface is called an API
specification. A computer system that meets this standard is said
to implement or expose an API. The term API may refer either to the
specification or to the implementation.

In contrast to a user interface, which connects a computer to a person, an


application programming interface connects computers or pieces of software
to each other. It is not intended to be used directly by a person (the end user)
other than a computer programmer who is incorporating it into the software.
An API is often made up of different parts which act as tools or services that
are available to the programmer. A program or a programmer that uses one of
these parts is said to call that portion of the API. The calls that make up the
API are also known as subroutines, methods, requests, or endpoints. An API
specification defines these calls, meaning that it explains how to use or
implement them.

One purpose of APIs is to hide the internal details of how a system works,
exposing only those parts a programmer will find useful and keeping them
consistent even if the internal details later change. An API may be custom-built
for a particular pair of systems, or it may be a shared standard
allowing interoperability among many systems.

14
Principal Component Analysis

The Principal Component Analysis is a popular unsupervised learning


technique for reducing the dimensionality of data. It increases interpretability
yet, at the same time, it minimizes information loss. It helps to find the most
significant features in a dataset and makes the data easy for plotting in 2D and
3D. PCA helps in finding a sequence of linear combinations of variables.

The Principal Components are a straight line that captures most of the variance
of the data. They have a direction and magnitude. Principal components are
orthogonal projections (perpendicular) of data onto lower-dimensional space.

Applications of PCA in Machine Learning

• PCA is used to visualize multidimensional data.


• It is used to reduce the number of dimensions in healthcare data.
• PCA can help resize an image.
• It can be used in finance to analyze stock data and forecast returns.
• PCA helps to find patterns in the high-dimensional datasets.

15
CHAPTER 3: METHODOLOGY ADOPTED

Software Development Life Cycle – Waterfall Model

The waterfall model is a breakdown of project activities into


linear sequential phases, where each phase depends on the deliverables of
the previous one and corresponds to a specialization of tasks. The approach
is typical for certain areas of engineering design. In software development, it
tends to be among the less iterative and flexible approaches, as progress
flows in largely one direction ("downwards" like a waterfall) through the phases
of conception, initiation, analysis, design, construction, testing, deployment
and maintenance.

The waterfall development model originated in the manufacturing


and construction industries, where the highly structured physical
environments meant that design changes became prohibitively expensive
such sooner in the development process. When first adopted for software
development, there were no recognized alternatives for knowledge-based
creative work.

The following phases are followed in order:

1. System and software requirements: captured in a product


requirements document

2. Analysis: resulting in models, schema, and business rules

3. Design: resulting in the software architecture

4. Coding: the development, proving, and integration of software

5. Testing: the systematic discovery and debugging of defects

6. Operations: the installation, migration, support, and maintenance of


complete systems

16
The waterfall model was selected as the SDLC model due to the following
reasons:

• Requirements were very well documented, clear and fixed.


• Technology was adequately understood.
• Simple and easy to understand and use.
• There were no ambiguous requirements.
• Easy to manage due to the rigidity of the model. Each phase has specific
deliverables and a review process.
• Clearly defined stages.
• Well, understood milestones. Easy to arrange tasks.

Figure 1

17
Programming Language

Python

Python is a high-level, interpreted, general-purpose programming language.


Its design philosophy emphasizes code readability with the use of significant
indentation.

Python is dynamically-typed and garbage-collected. It supports multiple


programming paradigms, including structured (particularly procedural), object
oriented and functional programming. It is often described as a "batteries
included" language due to its comprehensive standard library.

Python's large standard library, commonly cited as one of its greatest


strengths, provides tools suited to many tasks. For Internet-facing
applications, many standard formats and protocols such as MIME and HTTP
are supported. It includes modules for creating graphical user interfaces,
connecting to relational databases, generating pseudorandom numbers,
arithmetic with arbitrary precision decimals, manipulating regular expressions,
and unit testing.

Most Python implementations (including CPython) include a read–eval–print


loop (REPL), permitting them to function as a command line interpreter for
which users enter statements sequentially and receive results immediately.
Python also comes with an integrated development environment (IDE) called
IDLE, which is more beginner-oriented. Other shells, including IDLE and
IPython, add further abilities such as improved autocompletion, session state
retention and syntax highlighting.

Python can serve as a scripting language for web applications, e.g., via
mod_wsgi for the Apache web server. With Web Server Gateway Interface, a
standard API has evolved to facilitate these applications. Web frameworks like
Django, Pylons, Pyramid, Turbo Gears, web2py, Tornado, Flask, Bottle and
Zope support developers in the design and maintenance of complex
applications. Pyjs and IronPython can be used to develop the client side of
Ajax-based applications. SQLAlchemy can be used as a data mapper to a
relational database. Twisted is a framework to program communications
between computers, and is used (for example) by Dropbox.

18
Libraries such as NumPy, SciPy and Matplotlib allow the effective use of
Python in scientific computing, with specialized libraries such as Biopython
and Astropy providing domain-specific functionality. SageMath is a computer
algebra system with a notebook interface programmable in Python: its library
covers many aspects of mathematics, including algebra, combinatorics,
numerical mathematics, number theory, and calculus. OpenCV has Python
bindings with a rich set of features for computer vision and image processing.

Python is commonly used in artificial intelligence projects and machine


learning projects with the help of libraries like TensorFlow, Keras, Pytorch and
Scikitlearn. As a scripting language with modular architecture, simple syntax
and rich text processing tools, Python is often used for natural language
processing.

Python has been successfully embedded in many software products as a


scripting language, including in finite element method software such as
Abaqus, 3D parametric modeler like FreeCAD, the visual effects compositor
Nuke, 2D imaging programs like GIMP, Scribus and Paint Shop Pro, and
musical notation programs like scorewriter and capella. GNU Debugger uses
Python as a pretty printer to show complex structures such as C++ containers.
Esri promotes Python as the best choice for writing scripts in ArcGIS. It has
also been used in several video games, and has been adopted as first of the
three available programming languages in Google App Engine, the other two
being Java and Go

Python Modules

In Python, Modules are simply files with the “.py” extension containing Python
code that can be imported inside another Python Program.

In simple terms, we can consider a module to be the same as a code library


or a file that contains a set of functions that you want to include in your
application. With the help of modules, we can organize related functions,
classes, or any code block in the same file.

Some of the python modules included are:

19
Numpy

NumPy stands for Numerical Python and it is a core scientific computing library
in Python. It provides efficient multi-dimensional array objects and various
operations to work with these array objects.

NumPy is a library for the Python programming language, adding support for
large, multi-dimensional arrays and matrices, along with a large collection of
high-level mathematical functions to operate on these arrays.

NumPy targets the CPython reference implementation of Python, which is a


non-optimizing bytecode interpreter. Mathematical algorithms written for this
version of Python often run much slower than compiled equivalents due to the
absence of compiler optimization. NumPy addresses the slowness problem
partly by providing multidimensional arrays and functions and operators that
operate efficiently on arrays; using these requires rewriting some code, mostly
inner loops, using NumPy

Pandas

Pandas is a software library written for the Python programming language for
data manipulation and analysis. In particular, it offers data structures and
operations for manipulating numerical tables and time series. It is free
software released under the three-clause BSD license.

Pandas is mainly used for data analysis and associated manipulation of


tabular data in DataFrames. Pandas allows importing data from various file
formats such as comma-separated values, JSON, Parquet, SQL Database
tables or queries and Microsoft Excel. Pandas allows various data
manipulation operations such as merging, reshaping, selecting, as well
as data cleaning, and data wrangling features. The panda’s library is built
upon another library NumPy, which is oriented to efficiently working with
arrays instead of the features of working on DataFrames.

20
Spotipy

Spotipy is a lightweight Python library for the Spotify Web API.


With Spotipy you get full access to all of the music data provided by the Spotify
platform. Spotipy supports all of the features of the Spotify Web API including
access to all end points, and support for user authorization. For details on the
capabilities, you are encouraged to review the Spotify Web
API documentation. All methods require user authorization. You will need to
register your app at My Dashboard to get the credentials necessary to make
authorized calls (a client id and client secret).

Matplotlib
Matplotlib is one of the most popular Python packages used for data
visualization. It is a cross-platform library for making 2D plots from data in
arrays. Matplotlib is written in Python and makes use of NumPy, the numerical
mathematics extension of Python.

It provides an object-oriented API that helps in embedding plots in applications


using Python GUI toolkits such as PyQt, WxPythonotTkinter. It can be used in
Python and IPython shells, Jupyter notebook and web application servers
also.

Matplotlib has a procedural interface named the Pylab, which is designed to


resemble MATLAB, a proprietary programming language developed by
MathWorks. Matplotlib along with NumPy can be considered as the open-
source equivalent of MATLAB.

Matplotlib was originally written by John D. Hunter in 2003. The current stable
version is 2.2.0 released in January 2018.

21
Sklearn

Scikit-learn (Sklearn) is the most useful and robust library for machine learning
in Python. It provides a selection of efficient tools for machine learning and
statistical modeling including classification, regression, clustering and
dimensionality reduction via a consistence interface in Python. This library,
which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.

Rather than focusing on loading, manipulating and summarising data, Scikit-


learn library is focused on modeling the data. Some of the most popular groups
of models provided by Sklearn are as follows –

• Supervised Learning algorithms − Almost all the popular supervised


learning algorithms, like Linear Regression, Support Vector Machine
(SVM), Decision Tree etc., are the part of scikit-learn.

• Unsupervised Learning algorithms − On the other hand, it also has all


the popular unsupervised learning algorithms from clustering, factor
analysis, PCA (Principal Component Analysis) to unsupervised neural
networks.

• Clustering − This model is used for grouping unlabeled data.

• Cross Validation − It is used to check the accuracy of supervised models


on unseen data.

• Dimensionality Reduction − It is used for reducing the number of


attributes in data which can be further used for summarisation,
visualization and feature selection.

• Ensemble methods − As name suggest, it is used for combining the


predictions of multiple supervised models.

• Feature extraction − It is used to extract the features from data to define


the attributes in image and text data.

• Feature selection − It is used to identify useful attributes to create


supervised models.

22
Spotify Web API

Based on simple REST principles, the Spotify Web API endpoints return JSON
metadata about music artists, albums, and tracks, directly from the Spotify
Data Catalogue.

Web API also provides access to user related data, like playlists and music
that the user saves in the Your Music library. Such access is enabled through
selective authorization, by the user.

The base address of Web API is https://api.spotify.com. The API provides a


set of endpoints, each with its own unique path. To access private data
through the Web API, such as user profiles and playlists, an application must
get the user’s permission to access the data. Authorization is via the
Spotify Accounts service.

Web API responses normally include a JSON object. Browse the reference
documentation to find descriptions of common responses from each endpoint.

23
Jupyter Notebook

Jupyter Notebook (formerly IPython Notebook) is a web-based


interactive computational environment for creating notebook documents.
Jupyter Notebook is built using several open-source libraries,
including IPython, ZeroMQ, Tornado, jQuery, Bootstrap, and MathJax. A
Jupyter Notebook document is a browser-based REPL containing an ordered
list of input/output cells which can contain code, text (using Markdown),
mathematics, plots and rich media. Underneath the interface, a notebook is
a JSON document, following a versioned schema, usually ending with the “.
ipynb" extension.

Jupyter Notebook is similar to the notebook interface of other programs such


as Maple, Mathematica, and SageMath, a computational interface style that
originated with Mathematica in the 1980s. Jupyter interest overtook the
popularity of the Mathematica notebook interface in early 2018.

JupyterLab is a newer user interface for Project Jupyter, offering a flexible user
interface and more features than the classic notebook UI. The first stable
release was announced on February 20, 2018. In 2015, a joint $6 million grant
from The Leona M. and Harry B. Helmsley Charitable Trust, The Gordon and
Betty Moore Foundation, and The Alfred P. Sloan Foundation funded work
that led to expanded capabilities of the core Jupyter tools, as well as to the
creation of JupyterLab.

JupyterHub is a multi-user server for Jupyter Notebooks. It is designed to


support many users by spawning, managing, and proxying many singular
Jupyter Notebook servers.

24
Version Control System - Git

Git is software for tracking changes in any set of files, usually used for
coordinating work among programmers collaboratively developing source
code during software development. Its goals include speed, data integrity, and
support for distributed, non-linear workflows (thousands of parallel branches
running on different systems).

Git was originally authored by Linus Torvalds in 2005 for development of the
Linux kernel, with other kernel developers contributing to its initial
development. Since 2005, Junio Hamano has been the core maintainer. As
with most other distributed version control systems, and unlike most client–
server systems, every Git directory on every computer is a full-fledged
repository with complete history and full version-tracking abilities, independent
of network access or a central server. Git is free and open-source software
distributed under the GPL-2.0-only license.

Some characteristics of Git:

• Strong support for non-linear development

• Distributed development

• Compatibility with existent systems and protocols

• Efficient handling of large projects

• Cryptographic authentication of history

• Toolkit-based design

• Pluggable merge strategies

25
Documentation – Microsoft Word

Microsoft Word is a word processing software developed by Microsoft. It was


first released on October 25, 1983, under the name Multi-Tool Word for Xenix
systems. Among its features, Word includes: a built-in spell checker, a
thesaurus, a dictionary and utilities for manipulating and editing text.

The following are some aspects of its feature set

1. Templates
2. Image Formats
3. WordArt
4. Macros
5. Layout
6. Bullets and Numbering
7. AutoSummarize

Word is used in a project when a project planning document is needed. For


example, project charters, project communication plans, risk analysis reports,
and other key project documents are often produced using Microsoft’s word
processing program.

26
CHAPTER 4: Result & Discussion

Spotify Profile

Spotify Playlist

27
Spotify Developers Dashboard

Project Execution

1. Authentication with Spotify

28
2. Reading Playlist Data

3. Playlist information in Python Dataframe

29
4. Visualization of the components

5. Fitting Dataset for Analysis

30
6. Using K Neighbors Classifier & Random Forest Classifier

7. Using Decision Tree

31
8. Generating Songs from Spotify from above Analysis

9. Creating Songs List and checking the data

10. Creating Playlist on Spotify

32
11. Inserting Songs into Playlist

12. Final Output

33
Limitations

• The project uses limited number of classifier and prediction


algorithms which limit its accuracy

• Sometimes, the recommendations can exceed the number of


songs more than 100 with Spotify API allowing only insertion
of 100 songs. Inserting more than 100 songs will generate
error.

• The ratings have to be provided by the user manually

34
Learning Outcomes:

Following the learning outcomes from the project:

• Appraise the fundamental concepts, principles, theories, and


terminology used in the main branches of web development
techonology

• Plan and develop an independent research project that utilises


appropriate research methodologies of the discipline

35
Result

The playlist has been successfully created with random number of


songs generating based on the analysis of the user’s playlist done
with ratings given by user. The recommendations are well-aligned
with the songs in the playlist given as the input as its matches its
genre and other attributes.

This project is executed successfully using python and machine


learning. We used various algorithms and functions to achieve the
final output. The recommendation system works well in the field of
recommending a new playlist to user based on their preferred
categories and genres and on the basis of their prescribed playlist.
The system analyses the user’s playlist and perform classification
and then predict a new playlist. e. It indicates the effectiveness of
its hybrid structure to extract the music features.

36
Future Scope

• The system can extend to a web-based system using flask.

• More algorithms can be applied for further precision.

• New music features can be added to the system

• Run the algorithms on a distributed system, like Hadoop or


Condor, to parallelize the computation

37
References

1. Shefali Garg, Fangyan SUN. Music Recommender System,


Journal of Indian Institute of Technology, Kanpur, 2014.

2. Libo Zhang, Tiejian Luo, Fei Zhang and Anjum Wu. A


Recommendation Model Based on Deep Neural Network.
Journal of Chinese Academy of Sciences, Beijing, 2017.

3. Keita Nakamura, Takako Fujisawa. Music recommendation


system using lyric network, Journal of 2017 IEEE 6th Global
Conference on Consumer Electronics (GCCE), 2017.

4. Yading Song, Simon Dixon, and Marcus Pearce. A Survey of


Music Recommendation Systems and Future Perspectives,
Proceedings of 9th International Symposium on Computer
Music Modelling and Retrieval (CMMR), 2012.

5. Malte Ludewig, Iman Kamehkhosh, Nick Landia, Dietmar


Jannach. Effective Nearest-Neighbor Music
Recommendations. Proceedings of the ACM Recommender
Systems Challenge 2018

38

You might also like