0% found this document useful (0 votes)
25 views

Python - Vectorized - Tute - Jupyter Notebook

Uploaded by

Anvitha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Python - Vectorized - Tute - Jupyter Notebook

Uploaded by

Anvitha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Python Tutorial for Vectorizing form of Implementation

Example 1: Illustrative Example of Neural Network Implementation via


Vectorizing the function
In [22]:  1 import numpy as np
2 ​
3 # Small number of data samples
4 ​
5 # x0, x1, x2 = 1., 2., 3.
6 # bias, w1, w2 = 0.1, 0.3, 0.5
7 ​
8 # x = [x0, x1, x2]
9 # w = [bias, w1, w2]
10 # x_vec, w_vec = np.array(x), np.array(w)
11 ​
12 # Large number of data samples
13 x, w = np.random.rand(100000), np.random.rand(100000)
14 x_vec, w_vec = np.array(x), np.array(w)
In [23]:  1 # Python code to demonstrate the working of # zip()
2
3 # # initializing lists
4 # name = [ "Manjeet", "Nikhil", "Shambhavi", "Astha" ]
5 # marks = [ 40, 50, 60, 70 ]
6
7 # # using zip() to map values
8 # mapped = zip(name, marks)
9 # # converting values to print as set
10 # mapped = set(mapped)
11
12 # # printing resultant values
13 # print ("The zipped result is : ",end="")
14 # print (mapped)
15 ​
16 # #Unzipping the Value Using zip()
17 # c, v, = zip(*mapped)
18 # print('c =', c)
19 # print('v =',v)
In [24]:  1 # Neural network output with For Loop statement
2 def forloop(x, w):
3 z = 0.
4 for i in range(len(x)):
5 z += x[i] * w[i]
6 return z
7 ​
8 # Neural network output with listcomprehension statement
9 def listcomprehension(x, w):
10 z = sum(x_i*w_i for x_i, w_i in zip(x, w))
11 return z
12 ​
13 # Neural network output with Vectorized form
14 def vectorized(x, w):
15 z = x_vec.dot(w_vec)
16 # z = (x_vec.transpose()).dot(w_vec)
17 return z
18 ​
19 ​

Comparison of Processing Speed of above three different forms of


implemenattion
In [25]:  1 # Method-1: forloop
2 import time
3 t10 = time.time()
4 print(forloop(x,w))
5 t11 = time.time()
6 time_forloop = t11 - t10
7 print(time_forloop)

25005.565200480993
0.03804349899291992
In [26]:  1 # Method-2: ListComprehension Implemenattion
2 import time
3 t20 = time.time()
4 print(listcomprehension(x,w))
5 t21 = time.time()
6 time_listComp = t21 - t20
7 print(time_listComp)

25005.565200480993
0.02210259437561035

In [27]:  1 # Method-3: Vectorized Implemenattion


2 import time
3 t30 = time.time()
4 print(vectorized(x_vec,w_vec))
5 t31 = time.time()
6 time_vectorized = t31 - t30
7 print(time_vectorized)

25005.565200480753
0.0009987354278564453
In [28]:  1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 # plt.style.use('ggplot')
4 ​
5 x = ['Method-1', 'Method-2', 'Method-3']
6 Processing_Times = [time_forloop, time_listComp, time_vectorized]
7
8 ​
9 x_pos = [i for i, _ in enumerate(x)]
10 ​
11 plt.bar(x_pos, Processing_Times, color='green')
12 plt.xlabel("<-------------Method-----------> ")
13 plt.ylabel("<----------Time-------------> ")
14 plt.title(" Time Vs Method ")
15 ​
16 plt.xticks(x_pos, x)
17 ​
18 plt.show()
Example 2: Illustrative Example for Predicting House sales price using
Boston house dataset
In [29]:  1 import numpy as np
2 import matplotlib.pyplot as plt
3 %matplotlib inline
4 ​
5 from sklearn.datasets import load_boston
6 boston_data = load_boston()
7 print(boston_data['DESCR'])
.. _boston_dataset:

Boston house prices dataset


---------------------------

**Data Set Characteristics:**

:Number of Instances: 506

:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

:Attribute Information (in order):


- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of black people by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's

:Missing Attribute Values: None

:Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.


https://archive.ics.uci.edu/ml/machine-learning-databases/housing/ (https://archive.ics.uci.edu/ml/machine-learn
ing-databases/housing/)

This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic


prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980. N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.

.. topic:: References

- Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity',
Wiley, 1980. 244-261.
- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth Internati
onal Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
In [30]:  1 # take the boston data
2 data = boston_data['data']
3 # we will only work with two of the features: INDUS and RM
4 # x_input = data[:, [2,]] # for single feature of input data (INDUS)
5 x_input = data[:, [2,5]] # for two features of input data (INDUS and RM)
6 # x_input = data[:, [2,5,7]] # for three features of input data (INDUS,RM, and DIS)
7 # x_input = data[:, ] # All features of input data
8 y_target = boston_data['target']
9 # print(x_input.shape[1])
10 # print(x_input)
11 # print(y_target.shape[0])
12 # print(y_target)
13 ​
14 # Individual plots for the two features:
15 plt.title('Industrialness vs Med House Price')
16 plt.scatter(x_input[:, 0], y_target)
17 plt.xlabel('Industrialness')
18 plt.ylabel('Med House Price')
19 plt.show()
20 ​
21 plt.title('Avg Num Rooms vs Med House Price')
22 plt.scatter(x_input[:, 1], y_target)
23 plt.xlabel('Avg Num Rooms')
24 plt.ylabel('Med House Price')
25 plt.show()
26 ​
27 # plt.title('Avg weighted distances vs Med House Price')
28 # plt.scatter(x_input[:, 2], y_target)
29 # plt.xlabel('Avg weighted distances ')
30 # plt.ylabel('Med House Price')
31 # plt.show()
32 ​
Define cost function: Non-vectorized form
1 𝑁
(𝑦,𝑡) = 𝑁 ∑(𝑦(𝑖) − 𝑡(𝑖) )2
𝑖=1
1 𝑁
(𝑦,𝑡) = 𝑁 ∑(𝑤1 𝑥(𝑖)1 + 𝑤2 𝑥(𝑖)2 + 𝑏 − 𝑡(𝑖) )2
𝑖=1
In [31]:  1 # Non-vectorized implementation
2 def cost(w1, w2, b, X, t):
3 '''
4 Evaluate the cost function in a non-vectorized manner for
5 inputs `X` and targets `t`, at weights `w1`, `w2` and `b`.
6 '''
7 costs = 0
8 for i in range(len(t)):
9 # y_i = w1 * X[i, 0] + w2 * X[i, 0] + b # for single feature of input data
10 y_i = w1 * X[i, 0] + w2 * X[i, 1] + b # for two features of input data
11 # y_i = w1 * X[i] + w2 * X[i] + b # All features of input data
12 t_i = t[i]
13 costs += (y_i - t_i) ** 2
14 return costs / len(t)
15 ​

In [32]:  1 ​
2 cost(3, 5, 1, x_input, y_target)

Out[32]: 2475.821173270752

In [33]:  1 ​
2 cost(3, 5, 0, x_input, y_target)

Out[33]: 2390.2197701086957

Vectorizing the cost function:


(𝑦,𝑡) = 𝑁1 ‖𝐗𝐰 + 𝐛 − 𝐭‖2
In [35]:  1 def cost_vectorized(w1, w2, b, X, t):
2 '''
3 Evaluate the cost function in a vectorized manner for
4 inputs `X` and targets `t`, at weights `w1`, `w2` and `b`.
5 '''
6 N = len(y_target)
7 w = np.array([w1, w2])
8 # print(w)
9 y = np.dot(X, w) + b * np.ones(N)
10 cost_vect = np.sum((y - t)**2) / (N)
11 return cost_vect
12 ​

In [36]:  1 cost_vectorized(3, 5, 1, x_input, y_target)

Out[36]: 2475.821173270751

In [37]:  1 ​
2 ​
3 cost(3, 5, 0, x_input, y_target)

Out[37]: 2390.2197701086957

Comparing Processing Speed of the Vectorized vs Nonvectorized


code
We'll see below that the vectorized code already runs ~2x faster than the non-vectorized code! Hopefully this will convince you to always
vectorized your code whenever possible
In [38]:  1 import time
2 t40 = time.time()
3 print(cost(3, 5, 1, x_input, y_target))
4 t41 = time.time()
5 time_CostNonvect = t41 - t40
6 print(time_CostNonvect)

2475.821173270752
0.0009961128234863281

In [39]:  1 import time


2 t50 = time.time()
3 print(cost_vectorized(3, 5, 1, x_input, y_target))
4 t51 = time.time()
5 time_CostVect = t51 - t50
6 print(time_CostVect)

2475.821173270751
0.0009663105010986328
In [40]:  1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 # plt.style.use('ggplot')
4 ​
5 x = ['Cost_NonVectorized', 'Cost_Vectorized']
6 Processing_Times = [time_CostNonvect, time_CostVect]
7
8 ​
9 x_pos = [i for i, _ in enumerate(x)]
10 ​
11 plt.bar(x_pos, Processing_Times, color='green')
12 plt.xlabel("<-------------Method-----------> ")
13 plt.ylabel("<----------Time-------------> ")
14 plt.title(" Time Vs Method ")
15 ​
16 plt.xticks(x_pos, x)
17 ​
18 plt.show()
In [ ]:  1 ​

You might also like