Python - Vectorized - Tute - Jupyter Notebook
Python - Vectorized - Tute - Jupyter Notebook
25005.565200480993
0.03804349899291992
In [26]: 1 # Method-2: ListComprehension Implemenattion
2 import time
3 t20 = time.time()
4 print(listcomprehension(x,w))
5 t21 = time.time()
6 time_listComp = t21 - t20
7 print(time_listComp)
25005.565200480993
0.02210259437561035
25005.565200480753
0.0009987354278564453
In [28]: 1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 # plt.style.use('ggplot')
4
5 x = ['Method-1', 'Method-2', 'Method-3']
6 Processing_Times = [time_forloop, time_listComp, time_vectorized]
7
8
9 x_pos = [i for i, _ in enumerate(x)]
10
11 plt.bar(x_pos, Processing_Times, color='green')
12 plt.xlabel("<-------------Method-----------> ")
13 plt.ylabel("<----------Time-------------> ")
14 plt.title(" Time Vs Method ")
15
16 plt.xticks(x_pos, x)
17
18 plt.show()
Example 2: Illustrative Example for Predicting House sales price using
Boston house dataset
In [29]: 1 import numpy as np
2 import matplotlib.pyplot as plt
3 %matplotlib inline
4
5 from sklearn.datasets import load_boston
6 boston_data = load_boston()
7 print(boston_data['DESCR'])
.. _boston_dataset:
:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
The Boston house-price data has been used in many machine learning papers that address regression
problems.
.. topic:: References
- Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity',
Wiley, 1980. 244-261.
- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth Internati
onal Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
In [30]: 1 # take the boston data
2 data = boston_data['data']
3 # we will only work with two of the features: INDUS and RM
4 # x_input = data[:, [2,]] # for single feature of input data (INDUS)
5 x_input = data[:, [2,5]] # for two features of input data (INDUS and RM)
6 # x_input = data[:, [2,5,7]] # for three features of input data (INDUS,RM, and DIS)
7 # x_input = data[:, ] # All features of input data
8 y_target = boston_data['target']
9 # print(x_input.shape[1])
10 # print(x_input)
11 # print(y_target.shape[0])
12 # print(y_target)
13
14 # Individual plots for the two features:
15 plt.title('Industrialness vs Med House Price')
16 plt.scatter(x_input[:, 0], y_target)
17 plt.xlabel('Industrialness')
18 plt.ylabel('Med House Price')
19 plt.show()
20
21 plt.title('Avg Num Rooms vs Med House Price')
22 plt.scatter(x_input[:, 1], y_target)
23 plt.xlabel('Avg Num Rooms')
24 plt.ylabel('Med House Price')
25 plt.show()
26
27 # plt.title('Avg weighted distances vs Med House Price')
28 # plt.scatter(x_input[:, 2], y_target)
29 # plt.xlabel('Avg weighted distances ')
30 # plt.ylabel('Med House Price')
31 # plt.show()
32
Define cost function: Non-vectorized form
1 𝑁
(𝑦,𝑡) = 𝑁 ∑(𝑦(𝑖) − 𝑡(𝑖) )2
𝑖=1
1 𝑁
(𝑦,𝑡) = 𝑁 ∑(𝑤1 𝑥(𝑖)1 + 𝑤2 𝑥(𝑖)2 + 𝑏 − 𝑡(𝑖) )2
𝑖=1
In [31]: 1 # Non-vectorized implementation
2 def cost(w1, w2, b, X, t):
3 '''
4 Evaluate the cost function in a non-vectorized manner for
5 inputs `X` and targets `t`, at weights `w1`, `w2` and `b`.
6 '''
7 costs = 0
8 for i in range(len(t)):
9 # y_i = w1 * X[i, 0] + w2 * X[i, 0] + b # for single feature of input data
10 y_i = w1 * X[i, 0] + w2 * X[i, 1] + b # for two features of input data
11 # y_i = w1 * X[i] + w2 * X[i] + b # All features of input data
12 t_i = t[i]
13 costs += (y_i - t_i) ** 2
14 return costs / len(t)
15
In [32]: 1
2 cost(3, 5, 1, x_input, y_target)
Out[32]: 2475.821173270752
In [33]: 1
2 cost(3, 5, 0, x_input, y_target)
Out[33]: 2390.2197701086957
Out[36]: 2475.821173270751
In [37]: 1
2
3 cost(3, 5, 0, x_input, y_target)
Out[37]: 2390.2197701086957
2475.821173270752
0.0009961128234863281
2475.821173270751
0.0009663105010986328
In [40]: 1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 # plt.style.use('ggplot')
4
5 x = ['Cost_NonVectorized', 'Cost_Vectorized']
6 Processing_Times = [time_CostNonvect, time_CostVect]
7
8
9 x_pos = [i for i, _ in enumerate(x)]
10
11 plt.bar(x_pos, Processing_Times, color='green')
12 plt.xlabel("<-------------Method-----------> ")
13 plt.ylabel("<----------Time-------------> ")
14 plt.title(" Time Vs Method ")
15
16 plt.xticks(x_pos, x)
17
18 plt.show()
In [ ]: 1