Project Report PDF
Project Report PDF
A Project Report
BACHELOR OF ENGINEERING
IN
Submitted by:
Harsh Negi
Nov 2022
1
Acknowledgement
The project work in this report is an outcome of continuous work over a period
and drew intellectual support from various sources. I would like to articulate
our profound gratitude and indebtedness to those persons who helped us in
completion of the project. I take this opportunity to express my sincere thanks
and deep gratitude to all those people who extended their wholehearted co-
operation and have helped me in completing this project successfully.
Harsh Negi
20BCS3935
2
List of Figures
3
List of Abbreviations
4
ABSTRACT
The project uses concepts such as Random Forest Classifier, Decision Trees
to generate recommendations. It uses both content based and collaborative
based recommendation system working. The project enables users to create
playlist based on their previous playlist which they created on their own liking.
5
CONTENTS
Title Page 1
Acknowledgement 2
List of Figures 3
List of Abbreviations 4
ABSTRACT 5
CONTENTS 6
CHAPTER 1: INTRODUCTION
1.1 Problem Definition 7
1.2 Project Overview 7
1.3 Project Specifications 7
1.4 Literature Review 8
CHAPTER 2: THEORY
2.1 Machine Learning 9
2.2 KNN 9
2.3 Random Forest Classifier 10
2.4 Decision Trees 10
2.5 Recommendation System 11
2.6 Music Recommendation System 13
2.7 Application Programming Interface 14
2.8 Principal Component Analysis 15
CHAPTER 3: METHODOLOGY ADOPTED
4.1 Software Development Life Cycle 16
4.2 Python 18
4.3 Python Modules 19
4.4 Jupyter Notebook 24
4.5 Version Control System 25
4.6 Documentation 26
CHAPTER 4: RESULTS AND DISCUSSIONS 27
CHAPTER 5: CONCLUSIONS AND FUTURE SCOPE OF STUDY 34
REFERENCES 38
6
CHAPTER 1: INTRODUCTION
Many music listeners have turned to listen to online music. The big data
technology has made it possible that music listeners could get access to music
as they want. Online service of music subscription has been increasingly
popular in the era of cloud computing. The advancement of cloud techniques
eases users to get access to an unlimited number of songs.
Some streaming music company such as Spotify, Pandora, and YouTube are
affording users with access to songs to their paid members. The playlist is a
special function of these streaming apps. Many users feel difficult to create a
list from a long list of music. As a result, users tend to play next song by in a
random mode or by recommendation
The project uses concepts such as Random Forest Classifier, Decision Trees
to generate recommendations. It uses both content based and collaborative
based recommendation system working. The project enables users to create
playlist based on their previous playlist which they created on their own liking.
The project takes Spotify API (Spotify Developers) credentials created by the
user for authenticating with Spotify to process the data in the machine. The
user provides its username and playlist with their song ratings. After analyzing
the programs generates random number of songs which it adds to newly
created Spotify playlist which is directly inspired from the given playlist.
7
1.4 Literature Survey
Over the years, recommender systems have been studied widely and are
divided into different categories according to the approach being used. The
categories are collaborative filtering (CF), content based and context based.
Collaboration filtering
Collaborative filtering uses the numerical reviews given by the user and is
mainly based upon the historical data of the user available to the system. The
historical data available helps to build the user profile and the data available
about the item is used to make the item profile. Both the user profile and the
item profile are used to make a recommendation system. The Netflix
Competition has given much popularity to collaborative filtering, Collaborative
filtering is considered the most basic and the easiest method to find
recommendations and make predictions regarding the sales of a product. It
does have some disadvantages which has led to the development of new
methods and techniques.
Machine Learning
A machine learning model is the output of the training process and is defined
as the mathematical representation of the real-world process. The machine
learning algorithms find the patterns in the training dataset, which is used to
approximate the target function and is responsible for mapping the inputs to
the outputs from the available dataset. These machine learning methods
depend upon the type of task and are classified as Classification models,
Regression models, Clustering, Dimensionality. Reductions, Principal
Component Analysis, etc. Machine learning is no exception, and a good flow
of organized, varied data is required for a robust ML solution. In today’s online-
first world, companies have access to a large amount of data about their
customers, usually in the millions. This data, which is both large in the number
of data points and the number of fields, is known as big data due to the sheer
amount of information it holds.
KNN
• In k-NN regression, the output is the property value for the object. This
value is the average of the values of k nearest neighbors.
Decision Trees
Decision Tree is the most powerful and popular tool for classification and
prediction. A Decision tree is a flowchart-like tree structure, where each
internal node denotes a test on an attribute, each branch represents an
outcome of the test, and each leaf node (terminal node) holds a class label.
10
A tree can be “learned” by splitting the source set into subsets based on an
attribute value test. This process is repeated on each derived subset in a
recursive manner called recursive partitioning. The recursion is completed
when the subset at a node all has the same value of the target variable, or
when splitting no longer adds value to the predictions. The construction of a
decision tree classifier does not require any domain knowledge or parameter
setting, and therefore is appropriate for exploratory knowledge discovery.
Decision trees can handle high-dimensional data. In general decision tree
classifier has good accuracy. Decision tree induction is a typical inductive
approach to learn knowledge on classification
Recommendation System
13
Application Programming Interface
One purpose of APIs is to hide the internal details of how a system works,
exposing only those parts a programmer will find useful and keeping them
consistent even if the internal details later change. An API may be custom-built
for a particular pair of systems, or it may be a shared standard
allowing interoperability among many systems.
14
Principal Component Analysis
The Principal Components are a straight line that captures most of the variance
of the data. They have a direction and magnitude. Principal components are
orthogonal projections (perpendicular) of data onto lower-dimensional space.
15
CHAPTER 3: METHODOLOGY ADOPTED
16
The waterfall model was selected as the SDLC model due to the following
reasons:
Figure 1
17
Programming Language
Python
Python can serve as a scripting language for web applications, e.g., via
mod_wsgi for the Apache web server. With Web Server Gateway Interface, a
standard API has evolved to facilitate these applications. Web frameworks like
Django, Pylons, Pyramid, Turbo Gears, web2py, Tornado, Flask, Bottle and
Zope support developers in the design and maintenance of complex
applications. Pyjs and IronPython can be used to develop the client side of
Ajax-based applications. SQLAlchemy can be used as a data mapper to a
relational database. Twisted is a framework to program communications
between computers, and is used (for example) by Dropbox.
18
Libraries such as NumPy, SciPy and Matplotlib allow the effective use of
Python in scientific computing, with specialized libraries such as Biopython
and Astropy providing domain-specific functionality. SageMath is a computer
algebra system with a notebook interface programmable in Python: its library
covers many aspects of mathematics, including algebra, combinatorics,
numerical mathematics, number theory, and calculus. OpenCV has Python
bindings with a rich set of features for computer vision and image processing.
Python Modules
In Python, Modules are simply files with the “.py” extension containing Python
code that can be imported inside another Python Program.
19
Numpy
NumPy stands for Numerical Python and it is a core scientific computing library
in Python. It provides efficient multi-dimensional array objects and various
operations to work with these array objects.
NumPy is a library for the Python programming language, adding support for
large, multi-dimensional arrays and matrices, along with a large collection of
high-level mathematical functions to operate on these arrays.
Pandas
Pandas is a software library written for the Python programming language for
data manipulation and analysis. In particular, it offers data structures and
operations for manipulating numerical tables and time series. It is free
software released under the three-clause BSD license.
20
Spotipy
Matplotlib
Matplotlib is one of the most popular Python packages used for data
visualization. It is a cross-platform library for making 2D plots from data in
arrays. Matplotlib is written in Python and makes use of NumPy, the numerical
mathematics extension of Python.
Matplotlib was originally written by John D. Hunter in 2003. The current stable
version is 2.2.0 released in January 2018.
21
Sklearn
Scikit-learn (Sklearn) is the most useful and robust library for machine learning
in Python. It provides a selection of efficient tools for machine learning and
statistical modeling including classification, regression, clustering and
dimensionality reduction via a consistence interface in Python. This library,
which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.
22
Spotify Web API
Based on simple REST principles, the Spotify Web API endpoints return JSON
metadata about music artists, albums, and tracks, directly from the Spotify
Data Catalogue.
Web API also provides access to user related data, like playlists and music
that the user saves in the Your Music library. Such access is enabled through
selective authorization, by the user.
Web API responses normally include a JSON object. Browse the reference
documentation to find descriptions of common responses from each endpoint.
23
Jupyter Notebook
JupyterLab is a newer user interface for Project Jupyter, offering a flexible user
interface and more features than the classic notebook UI. The first stable
release was announced on February 20, 2018. In 2015, a joint $6 million grant
from The Leona M. and Harry B. Helmsley Charitable Trust, The Gordon and
Betty Moore Foundation, and The Alfred P. Sloan Foundation funded work
that led to expanded capabilities of the core Jupyter tools, as well as to the
creation of JupyterLab.
24
Version Control System - Git
Git is software for tracking changes in any set of files, usually used for
coordinating work among programmers collaboratively developing source
code during software development. Its goals include speed, data integrity, and
support for distributed, non-linear workflows (thousands of parallel branches
running on different systems).
Git was originally authored by Linus Torvalds in 2005 for development of the
Linux kernel, with other kernel developers contributing to its initial
development. Since 2005, Junio Hamano has been the core maintainer. As
with most other distributed version control systems, and unlike most client–
server systems, every Git directory on every computer is a full-fledged
repository with complete history and full version-tracking abilities, independent
of network access or a central server. Git is free and open-source software
distributed under the GPL-2.0-only license.
• Distributed development
• Toolkit-based design
25
Documentation – Microsoft Word
1. Templates
2. Image Formats
3. WordArt
4. Macros
5. Layout
6. Bullets and Numbering
7. AutoSummarize
26
CHAPTER 4: Result & Discussion
Spotify Profile
Spotify Playlist
27
Spotify Developers Dashboard
Project Execution
28
2. Reading Playlist Data
29
4. Visualization of the components
30
6. Using K Neighbors Classifier & Random Forest Classifier
31
8. Generating Songs from Spotify from above Analysis
32
11. Inserting Songs into Playlist
33
Limitations
34
Learning Outcomes:
35
Result
36
Future Scope
37
References
38