16BDA71011
ASSIGNMENT 1
INTRODUCTION TO PYTHON
Aim: Installation of Python and its packages.
Software Used: Python 3.5
Introduction:
Python is an interpreted, object-oriented, high-level programming language with dynamic
semantics. Its high-level built in data structures, combined with dynamic typing and dynamic
binding; make it very attractive for Rapid Application Development, as well as for use as a
scripting or glue language to connect existing components together.
Steps for Installing
Python
1. Download the latest Python build from www.python.org as shown in Figure 1 below.
2. Save the downloaded Python installer file. The file will usually be in the Downloads folder
as shown in Figure 2 below.
Department of Big Data Analytics
1
16BDA71011
3. Double click the downloaded Python installer file to start the installation. You will be
presented with the options as shown in Figure 3 below. Make sure the Add Python 3.5 to
PATH checkbox is ticked.
4. Click the Install Now and Python 3.5.0 install will proceed.
5. Once the installation has successfully completed Python 3.5.0 will be installed and
available from the Windows Start Menu. Run Python 3.5 (32-bit) from the start menu. It
can be found in the Python folder as shown in Figure 4 below.
Running the Python application from the Start Menu provides an interactive console that
gives a way to immediately write Python code and see the results in real time. If you can do
this then the base Python install has been successful. Type the following commands into
the interactive Python console window that appears when you run Python from the Start
Menu. You should get similar output to that shown in Figure 5 below, if Python is installed
correctly.
Type import sys and then hit return. You won't see any visible output from this command
apart from the Console Window prompt showing >>> on a new line.
Type the following command as one line:
Department of Big Data Analytics
2
16BDA71011
sys.stdout.write("Hello from Python %s\n" % (sys.version,))
And hit return. The output should be similar to Figure 5 below.
6. Following the successful Python installation, we need to check that that path to the
Python installation has been correctly installed. This is to ensure that the Add Python 3.5
to PATH tick box that was selected in step 3 has configured the environment variables
correctly.
From the Windows start prompt, type cmd then hit the Return key to launch the Windows
command line window. Once this has successfully launched, type echo %path% then hit
Enter. Check that the path is there as shown in Figure 6 below.
Note: The path has been underlined in Figure 6 for clarity.
If the path has not been correctly installed, please refer to the Setting the Correct PATH
Environment Variable section below for details on fixing the path.
Department of Big Data Analytics
3
16BDA71011
Advantages:
• Python is simple.
• It is easy to learn syntax emphasizes readability and therefore reduces the cost of
program maintenance.
• Python supports modules and packages, which encourages program modularity and
code reuse.
• The Python interpreter and the extensive standard library are available in source or
binary form without charge for all major platforms, and it can be freely distributed.
• Debugging Python programs is easy: a bug or bad input will never cause a
segmentation fault.
Packages:
Packages are namespaces, which contain multiple packages and modules themselves. They
are simply directories, but with a twist. Each package in Python is a directory, which MUST
contain a special file called __init__.py
Requirements for installing packages.
Installing pip, setup tools and wheel
If you have Python 2 >=2.7.9 or Python 3 >=3.4 installed from python.org, you will already
have pip and setup tools, but will need to upgrade to the latest version:
On Windows: python -m pip install -U pip setuptools
1.NumPy is the fundamental package for scientific computing with Python. It contains
among other things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random number capabilities
2.Matplotlib is a plotting library for the Python programming language and its numerical
mathematics extension NumPy. It provides an object-oriented API for embedding plots into
applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There is
also a procedural "pylab" interface based on a state machine (like OpenGL), designed to
closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of
matplotlib.
3.SciPy is a collection of mathematical algorithms and convenience functions built on the
Numpy extension of Python. It adds significant power to the interactive Python session by
providing the user with high-level commands and classes for manipulating and visualizing
data. With SciPy an interactive Python session becomes a data-processing and system-
prototyping environment rivaling systems such as MATLAB, IDL, Octave, R-Lab, and SciLab.
The additional benefit of basing SciPy on Python is that this also makes a powerful
programming language available for use in developing sophisticated programs and
specialized applications. Scientific applications using SciPy benefit from the development of
additional modules in numerous niches of the software landscape by developers across the
world. Everything from parallel programming to web and data-base subroutines and classes
have been made available to the Python programmer. All this power is available in addition
to the mathematical libraries in SciPy.
4.Pandas module is a massive collaboration of many modules along with some unique
features to make a very powerful module. Pandas is great for data manipulation, data
analysis, and data visualization.
Department of Big Data Analytics
4
16BDA71011
The Pandas modules uses objects to allow for data analysis at a high-performance rate in
comparison to typical Python procedures. With it, we can easily read and write from and to
CSV files, or even databases. From there, we can manipulate the data by columns, create
new columns, and even base the new columns on other column data. Next, we can progress
into data visualization using Matplotlib. Matplotlib is a great module even without the
teamwork of Pandas, but Pandas comes in and makes intuitive graphing with Matplotlib a
breeze.
Python is an interpreted, object-oriented, high-level programming language with dynamic
semantics. Its high-level built in data structures, combined with dynamic typing and dynamic
binding, make it very attractive for Rapid Application Development, as well as for use as a
scripting or glue language to connect existing components together.
Result:
Hence Python and its supporting packages were installed.
Department of Big Data Analytics
5
16BDA71011
ASSIGNMENT 2
HOW TO IMPORT FILES
Aim: To import files into python.
Software used: Python 3.5
Theory:
If you quit from the Python interpreter and enter it again, the definitions you have made
(functions and variables) are lost. Therefore, if you want to write a somewhat longer
program, you are better off using a text editor to prepare the input for the interpreter and
running it with that file as input instead. This is known as creating a script. As your program
gets longer, you may want to split it into several files for easier maintenance. You may also
want to use a handy function that you have written in several programs without copying its
definition into each program.
To support this, Python has a way to put definitions in a file and use them in a script or in an
interactive instance of the interpreter. Such a file is called a module; definitions from a
module can be imported into other modules or into the main module (the collection of
variables that you have access to in a script executed at the top level and in calculator mode).
A module is a file containing Python definitions and statements. The file name is the module
name with the suffix .py appended. Within a module, the module’s name (as a string) is
available as the value of the global variable __name__.
Now enter the Python interpreter and import this module with the following command:
>>> import fibo
This does not enter the names of the functions defined in fibo directly in the current symbol
table; it only enters the module name fibo there. Using the module name, you can access
the functions:
Procedure:
Department of Big Data Analytics
6
16BDA71011
1)Import
pandas
import
pandas as
pd
df = pd.read_csv(“file name”, header=0)
2)Import scikit-
learn from
sklearn import
datasets iris =
datasets.load_ir
is()
digits = datasets.load_digits()
3)Import matplotlib
Import
matplotlib.pyplot as
plt plt.plot(1,2,3,4)
plt.ylabel(‘some
numbers’)
plt.show()
Result:
Hence, files were imported into python for further calculations.
Department of Big Data Analytics
7
16BDA71011
ASSIGNMENT 3
HOW TO USE SKID LEARNING AND BASIC PANDAS COMMANDS
Aim: To use skid Learning and pandas.
Software Used: Python 3.5
Introduction:
SCIKIT-learn:
SCIKIT-learn are a Python module for machine learning built on top of Scipy and distributed
under the 3-Clause BSD license.
David Cournapeau as a Google Summer of Code project started the project in 2007, and since
then many volunteers have contributed.
Installation:
SCIKIT-learn require:
• Python (>= 2.7 or >= 3.3)
• numpy (>= 1.6.1)
• scipy (>= 0.9)
If you already have a working installation of numpy and scipy, the easiest way to install
SCIKIT-learn is using pip
$ python
>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> digits = datasets.load_digits()
A dataset is a dictionary-like object that holds all the data and some metadata about the
data. This data is stored in the .data member, which is an n_samples, n_features array. In
the case of supervised problem, one or more response variables are stored in the. target
member. More details on the different datasets can be found in the dedicated section.
For instance, in the case of the digit’s dataset, digits. Data gives access to the features that
can be used to classify the digit’s samples:
Pandas: Pandas is an open source, BSD-licensed library providing high-performance, easy-
to-use data structures and data analysis tools for the Python programming language.
Pandas is a NUMFocus sponsored project. This will help ensure the success of development
of pandas as an excellent open-source project.
Importing Libraries
# Import all libraries needed for the tutorial
# General syntax to import specific functions in a library:
##from (library) import (specific library function) from pandas import DataFrame, read_csv
# General syntax to import a library but no functions: ##import (library) as (give the library a
nickname/alias) import matplotlib.pyplot as plt import pandas as pd #this is how I usually
import pandas import sys #only needed to determine Python version number
import matplotlib #only needed to determine Matplotlib version number
# Enable inline plotting
%matplotlib
inline
Result:
Hence, Python was used for skid leaning and Pandas.
Department of Big Data Analytics
8
16BDA71011
ASSIGNMENT 4
USE OF SCIKIT LEARNING ON PIXEL
Aim: To use SCIKIT- learn on pixel data.
Software Used: Python 3.5
Theory:
SCIKIT-learn is a Python module for machine learning built on top of scipy and distributed
under the 3Clause BSD license.
David Cournapeau as a Google Summer of Code project started the project in 2007, and since
then many volunteers have contributed.
Installation:
SCIKIT-learn
require:
• python (>= 2.7 or >= 3.3)
• numpy (>= 1.6.1)
• scipy (>= 0.9)
If you already have a working installation of numpy and scipy, the easiest way to install
SCIKIT-learn is using pip
Input:
Department of Big Data Analytics
9
16BDA71011
Output:
Department of Big Data Analytics
10
16BDA71011
Department of Big Data Analytics
11
16BDA71011
Result: Hence SCIKIT- learn was used for image processing.
Department of Big Data Analytics
12
16BDA71011
ASSIGNMENT 5
REGRESSION USING PYTHON
Aim: Regression using Python.
Software Used: Python 3.5
Theory:
In statistical modeling, regression analysis is a statistical process for estimating the
relationships among variables. It includes many techniques for modeling and analyzing
several variables, when the focus is on the relationship between a dependent variable and
one or more independent variables (or 'predictors'). More specifically, regression analysis
helps one understand how the typical value of the dependent variable (or 'criterion variable')
changes when any one of the independent variables is varied, while the other independent
variables are held fixed.
Input:
Department of Big Data Analytics
13
16BDA71011
Output:
Result: Hence Regression analysis was implemented using Python.
Department of Big Data Analytics
14
16BDA71011
ASSIGNMENT 6
LEAST MEAN SQUARE
Aim: Implementing least mean square using python.
Software Used: Python 3.5
Introduction:
Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a desired
filter by finding the filter coefficients that relate to producing the least mean square of the
error signal (difference between the desired and the actual signal).
Input:
Department of Big Data Analytics
15
16BDA71011
Output:
Result:
Hence LMS was implemented using python.
Department of Big Data Analytics
16
16BDA71011
ASSIGNMENT 7
LOGISTIC REGRESSION
Aim: Implementing logistic Regression using Python.
Software Used: Python 3.5
Introduction:
Logistic regression is a statistical method for analyzing a dataset in which there are one or
more independent variables that determine an outcome. The outcome is measured with a
dichotomous variable (in which there are only two possible outcomes).
Input:
Output:
Result:
Hence Logistic Regression was perform using Python.
Department of Big Data Analytics
17
16BDA71011
ASSIGNMENT 8
KERNELS SVM
Aim: Implementation of kernel SVM in python.
Software Used: Python 3.5
Introduction:
In machine learning, kernel methods are a class of algorithms for pattern analysis, whose
best known member is the support vector machine (SVM). The general task of pattern
analysis is to find and study general types of relations (for example clusters, rankings,
principal, components, correlations, classifications) in datasets. For many algorithms that
solve these tasks, the data in raw representation have to be explicitly transformed into
feature vector representations via a user-specified feature map: in contrast, kernel methods
require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw
representation.
Input :
Department of Big Data Analytics
18
16BDA71011
Output:
Result:
Kernel SVM is studied using python.
Department of Big Data Analytics
19
16BDA71011
ASSIGNMENT 9
PRINCIPAL COMPONENT ANALYSIS
Aim: Implementing principal component analysis using Python.
Software Used: Python 3.6
Introduction:
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal
transformation to convert a set of observations of possibly correlated variables into a set of
values of linearly uncorrelated variables called principal components. The number of
principal components is less than or equal to the smaller of (number of original variables or
number of observations). This transformation is defined in such a way that the first principal
component has the largest possible variance (that is, accounts for as much of the variability
in the data as possible), and each succeeding component in turn has the highest variance
possible under the constraint that it is orthogonal to the preceding components. The
resulting vectors are an uncorrelated orthogonal basis set. PCA is sensitive to the relative
scaling of the original variables.
Input:
Output:
Result:
PCA was studied using python.
Department of Big Data Analytics
20
16BDA71011
ASSIGNMENT 10
K-MEANS CLUSTERING
Aim: Implementing k-means clustering using Python.
Software Used: Python 3.6
Introduction:
K-means clustering is a method of vector quantization, originally from signal processing, that
is popular for cluster analysis in data mining. K-means clustering aims to partition n
observations into k clusters in which each observation belongs to the cluster with the
nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data
space into Voronoi cells.
Input:
Department of Big Data Analytics
21
16BDA71011
Output:
Result:
K-means clustering was studied using python.
Department of Big Data Analytics
22
16BDA71011
ASSIGNMENT 11
ARTIFICIAL NEURAL NETWORK-BASICS
Aim: Implementing k-means clustering using Python.
Software Used: Python 3.6
Introduction:
Neural networks or connectionist systems are a computational approach used in computer
science and other research disciplines, which is based on a large collection of neural units
(artificial neurons), loosely mimicking the way a biological brain solves problems with large
clusters of biological neurons connected by axons. Each neural unit is connected with many
others, and links can be enforcing or inhibitory in their effect on the activation state of
connected neural units. Each individual neural unit may have a summation function which
combines the values of all its inputs together.
Input:
Output:
Result:
Artificial neural network was studied using python.
Department of Big Data Analytics
23