Data Science
Data Science
Practical-01
Aim:-Introduction to Jupyter Notebook.
Installation
you can use a handy tool that comes with Python called pip to install
Jupyter Notebook like this:
$ pip install jupyter
This will start up Jupyter and your default browser should start (or open a
new tab) to the following URL: http://localhost:8888/tree
right now you are not actually running a Notebook, but instead you are
just running the Notebook server.
1
IU2041230140 DS-B2
Creating a Notebook
click on the New button (upper right), choose Python 3.
Naming
You will notice that at the top of the page is the word Untitled. let’s
change it!
print('Hello Jupyter!'):
2
IU2041230140 DS-B2
Practical-02
Aim:-To Implement Python Basic Programs.
Output:
Enter first number: 10
Enter second number: 20
The sum of 10 and 20 is 30.0
The subtraction of 10 and 20 is -10.0
The multiplication of 10 and 20 is 200.0
The division of 10 and 20 is 0.5
3
IU2041230140 DS-B2
Output:
1. import cmath
2. a = float(input('Enter a: '))
3. b = float(input('Enter b: '))
4. c = float(input('Enter c: '))
5. d = (b**2) - (4*a*c)
6. sol1 = (-b-cmath.sqrt(d))/(2*a)
7. sol2 = (-b+cmath.sqrt(d))/(2*a)
8. print('The solution are {0} and {1}'.format(sol1,sol2))
Output:
Enter a: 8
Enter b: 5
Enter c: 9
The solution are (-0.3125-1.0135796712641785j) and (-0.3125+1.01357967126
Output:
Please enter value for P: 13
4
IU2041230140 DS-B2
1. import random
2. n = random.random()
3. print(n)
Output:
0.7632870997556201
If we run the code again, we will get the different output as follows.
0.8053503984689108
1. import random
2. n = random.randint(0,50)
3. print(n)
Output:
40
❖ Python program to display calendar
1. import calendar
2. yy = int(input("Enter year: "))
3. mm = int(input("Enter month: "))
4. print(calendar.month(yy,mm))
Output:
5
IU2041230140 DS-B2
Practical-03
Aim:-Study of various Machine Learning libraries.
1.Numpy: NumPy is a very popular python library for large multi-dimensional array
and matrix processing, with the help of a large collection of high-level mathematical
functions. It is very useful for fundamental scientific computations in Machine
Learning. It is particularly useful for linear algebra, Fourier transform, and random
number capabilities. High-end libraries like TensorFlow uses NumPy internally for
manipulation of Tensors.
import numpy as np
v = np.array([9, 10])
w = np.array([11, 12])
print(np.dot(x, y))
Output:
219
[29 67]
[[19 22]
[43 50]]
2.Pandas: Pandas is a popular Python library for data analysis. It is not directly related
to Machine Learning. As we know that the dataset must be prepared before training. In
this case, Pandas comes handy as it was developed specifically for data extraction and
analysis. It provides many inbuilt methoDS-B2 for grouping, combining and filtering data.
import pandas as pd
6
IU2041230140 DS-B2
data_table = pd.DataFrame(data)
print(data_table)
Output:
3.Matplotlib: Matplotlib is a very popular Python library for data visualization. Like
Pandas, it is not directly related to Machine Learning. It particularly comes in handy
when a programmer wants to visualize the patterns in the data. It is a 2D plotting library
used for creating 2D graphs and plots. A module named pyplot makes it easy for
formatting axes, etc. It provides various kinDS-B2 of graphs and plots for data visualization,
viz., histogram, error charts, bar chats, etc,
plt.legend()
plt.show()
Output:
7
IU2041230140 DS-B2
import tensorflow as tf
x1 = tf.constant([1, 2, 3, 4])
x2 = tf.constant([5, 6, 7, 8])
sess = tf.Session()
print(sess.run(result))
sess.close()
Output:
[ 5 12 21 32]
5.Keras It provides many inbuilt methoDS-B2 for groping, combining and filtering data.
Keras is a very popular Machine Learning library for Python. It is a high-level neural
networks API capable of running on top of TensorFlow, CNTK, or Theano. It can run
seamlessly on both CPU and GPU. Keras makes it really for ML beginners to build and
design a Neural Network. One of the best thing about Keras is that it allows for easy
and fast prototyping.
8
IU2041230140 DS-B2
import torch
dtype = torch.float
device = torch.device("cpu")
N, D_in, H, D_out = 64, 1000, 100, 10
learning_rate = 1e-6
for t in range(500):
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
Output:
0 47168344.0
1 46385584.0
2 43153576.0
...
...
...
497 3.987660602433607e-05
9
IU2041230140 DS-B2
498 3.945609932998195e-05
499 3.897604619851336e-05
If scipy.misc import imread, imsave,imresize does not work on your operating system
then try below code instead to proceed with above code
!pip install imageio
import imageio
from imageio import imread, imsave
Original image:
Tinted image:
10
IU2041230140 DS-B2
dataset = datasets.load_iris()
model = DecisionTreeClassifier()
model.fit(dataset.data, dataset.target)
print(model)
expected = dataset.target
predicted = model.predict(dataset.data)
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))
Output:
min_weight_fraction_leaf=0.0, presort=False,
random_state=None, splitter='best')
11
IU2041230140 DS-B2
[[50 0 0]
[ 0 50 0]
[ 0 0 50]]
9.Theano: We all know that Machine Learning is basically mathematics and statistics.
Theano is a popular python library that is used to define, evaluate and optimize
mathematical expressions involving multi-dimensional arrays in an efficient manner. It
is achieved by optimizing the utilization of CPU and GPU. It is extensively used for
unit-testing and self-verification to detect and diagnose different types of errors. Theano
is a very powerful library that has been used in large-scale computationally intensive
scientific projects for a long time but is simple and approachable enough to be used by
individuals for their own projects.
import theano
import theano.tensor as T
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = theano.function([x], s)
logistic([[0, 1], [-1, -2]])
Output:
array([[0.5, 0.73105858],
[0.26894142, 0.11920292]])
12
IU2041230140 DS-B2
Practical-04
Aim:-Introduction to GitHub Repository.
Git relies on the basis of distributed development of software where more than
one developer may have access to the source code of a specific application and
can modify changes to it that may be seen by other developers.
Git allows a team of people to work together, all using the same files. And it
helpsthe team cope with the confusion that tenDS-B2 to happen when multiple
people are editing the same files.
A Git repository is a key-value object store where all objects are indexed by their
SHA-1 hash value.
All commits, files, tags, and filesystem tree nodes are different types of objects
living in this repository.
A Git repository is a large hash table with no provision made for hash collisions.
13
IU2041230140 DS-B2
Step 1: Go to github.com and enter the required user credentials asked on the site
and then click on the SignUp for GitHub button.
Step 2: Choose a plan that best suits you. The following plans are available as
shown in below media as depicted:
The account has been created. The user is automatically redirected to your
Dashboard.
14
IU2041230140 DS-B2
E. The repository (in this case ITE-304 is the repository) is now created. The
repository can be created looks like:
15
IU2041230140 DS-B2
16
IU2041230140 DS-B2
Practical-05
Aim:-Download the data set and perform the analysis.
CODE:-
import pandas as pd
df = pd.read_csv('StudentsPerformance.csv')
df.head()
17
IU2041230140 DS-B2
18
IU2041230140 DS-B2
df.info()
print(f"Count : {df.count()}")
print(f"Mean : {df.mean()}")
print(f"SD : {df.std()}")
print(f"Max : {df.max()}")
print(f"Min : {df.min()}")
19
IU2041230140 DS-B2
df.count()
df['math score'].idxmax()
149
df['math score'].idxmin()
59
df.round()
20
IU2041230140 DS-B2
df['math score']
21
IU2041230140 DS-B2
import numpy as np
df['Language Score'] = np.random.randint(100,size = (1000))
df
22
IU2041230140 DS-B2
23
IU2041230140 DS-B2
Practical-06
Aim:-Write a program to implement Linear Regression.
CODE:-
import numpy as np
import matplotlib.pyplot as plt
# putting labels
plt.xlabel('x')
plt.ylabel('y')
24
IU2041230140 DS-B2
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
if __name__ == "__main__":
main()
OUTPUT:-
25
IU2041230140 DS-B2
Practical-07
Aim:-Write a program to implement K-Nearest Neighbors.
CODE:
irisData = load_iris()
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))
# Generate plot
plt.plot(neighbors, test_accuracy, label = 'Testing dataset Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training dataset Accuracy')
26
IU2041230140 DS-B2
plt.legend()
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.show()
OUTPUT:
27
IU2041230140 DS-B2
Practical-08
Aim:-Write a program for Automatic grouping of similar objects
into sets.
CODE:-
28
IU2041230140 DS-B2
29
IU2041230140 DS-B2
res = list(grouped.values())
print(res)
30