Introduction To Numpy CH-6
Introduction To Numpy CH-6
com
Downloaded from https:// www.dkgoelsolutions.com
IntroductIon Chapter
to numPy 6
om
.c
ns
io
In this chapter
ut » Introduction
ol
ls
» Array
oe
» NumPy Array
“The goal is to turn data into information, » Indexing and Slicing
kg
» Concatenating Arrays
w
» Splitting Arrays
//w
» Statistical Operations
s:
on Arrays
tp
Files
» Saving NumPy Arrays
6.1 IntroductIon in Files on Disk
NumPy stands for ‘Numerical Python’. It is a
package for data analysis and scientific computing
with Python. NumPy uses a multidimensional
array object, and has functions and tools
for working with these arrays. The powerful
n-dimensional array in NumPy speeds-up data
processing. NumPy can be easily interfaced with
other Python packages and provides tools for
integrating with other programming languages
like C, C++ etc.
2020-21
Chap 6.indd 95
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com 19-Jul-19 3:43:32 PM
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com
Installing NumPy
NumPy can be installed by typing following command:
pip install NumPy
6.2 ArrAy
We have learnt about various data types like list, tuple,
Contiguous memory and dictionary. In this chapter we will discuss another
allocation:
datatype ‘Array’. An array is a data type used to store
The memory space
must be divided multiple values using a single identifier (variable name).
into the fined sized An array contains an ordered collection of data elements
position and each where each element is of the same type and can be
position is allocated referenced by its index (position).
om
to a single data only.
The important characteristics of an array are:
Now Contiguous
• Each element of the array is of same data
.c
Memory Allocation:
Divide the data into type, though the values stored in them may be
ns
several blocks and different.
place in different
io
parts of the memory
• The entire array is stored contiguously in
according to the
ut
memory. This makes operations on array fast.
• Each element of the array is identified or
ol
availability of memory
space.
ls
referred using the name of the Array along with
oe
index value [0] associated with it; the 2nd value in the
s:
2020-21
Chap 6.indd 96
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com 19-Jul-19 3:43:32 PM
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com
IntroductIon to numPy 97
om
Lists do not support element wise operations, Arrays support element wise operations. For example,
for example, addition, multiplication, etc. if A1 is an array, it is possible to say A1/3 to divide
because elements may not be of same type. each element of the array by 3.
.c
ns
Lists can contain objects of different NumPy array takes up less space in memory as
datatype that Python must store the type compared to a list because arrays do not require to
io
information for every element along with its store datatype of each element separately.
element value. Thus lists take more space
in memory and are less efficient.
ut
ol
List is a part of core Python. Array (ndarray) is a part of NumPy library.
ls
oe
NumPy library.
w
w
2020-21
Chap 6.indd 97
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com 19-Jul-19 3:43:32 PM
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com
om
np.array([1,2,3,4]) >>> array3 = np.array([[2.4,3],
[4.91,7],[0,-1]])
>>> array3
.c
array([[ 2.4 , 3. ],
[ 4.91, 7. ],
ns
[ 0. , -1. ]])
io
Observe that the integers 3, 7, 0 and -1 have been
promoted to floats.
ut
ol
6.3.3 Attributes of NumPy Array
ls
Example 6.2
ht
>>> array1.ndim
1
>>> array3.ndim
2
ii) ndarray.shape: It gives the sequence of integers
indicating the size of the array for each dimension.
Example 6.3
# array1 is 1D-array, there is nothing
# after , in sequence
>>> array1.shape
(3,)
>>> array2.shape
(4,)
>>> array3.shape
(3, 2)
2020-21
Chap 6.indd 98
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com 19-Jul-19 3:43:32 PM
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com
IntroductIon to numPy 99
om
same data type. Common data types are int32,
int64, float32, float64, U32, etc.
.c
Example 6.5
ns
>>> array1.dtype
io
dtype('int32')
>>> array2.dtype
dtype('<U32>') ut
ol
>>> array3.dtype
dtype('float64')
ls
Example 6.6
s:
>>> array1.itemsize
tp
>>> array2.itemsize
128 # memory allocated to string
>>> array3.itemsize
8 #memory allocated to float type
2020-21
Chap 6.indd 99
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com 19-Jul-19 3:43:32 PM
Downloaded from https:// www.studiestoday.com
Downloaded from https:// www.dkgoelsolutions.com
om
[0., 0., 0., 0.]])
.c
3. We can create an array with all elements initialised
to 1 using the function ones(). By default, the
ns
data type of the array created by ones() is float.
io
The following code will create an array with 3 rows
and 2 columns.
ut
ol
>>> array6 = np.ones((3,2))
>>> array6
ls
array([[1., 1.],
oe
[1., 1.],
kg
[1., 1.]])
4. We can create an array with numbers in a given
.d
of Python.
//w
array([0, 1, 2, 3, 4, 5])
ht
2020-21
Vedika 76 75 47
Harun 84 59 60
Prasad 67 72 54
om
for four students given in this table. As there are 4
students (i.e. 4 rows) and 3 subjects (i.e. 3 columns),
.c
the array will be called marks[4][3]. This array can
ns
store 4*3 = 12 elements.
Here, marks[i,j] refers to the element at (i+1)th row
io
and (j+1)th column because the index values start at 0.
Thus marks[3,1] is the element in 4th row and second ut
ol
column which is 72 (marks of Prasad in English).
ls
>>> marks[0,2]
56
.d
6.4.2 Slicing
s:
2020-21
notes Now let us see how slicing is done for 2-D arrays.
For this, let us create a 2-D array called array9 having
3 rows and 4 columns.
>>> array9 = np.array([[ -7, 0, 10, 20],
[ -5, 1, 40, 200],
[ -1, 1, 4, 30]])
om
# access elements of 2nd and 3rd row from 1st
# and 2nd column
.c
>>> array9[1:3,0:2]
ns
array([[-5, 1],
[-1, 1]])
io
If row indices are not specified, it means all the rows
ut
are to be considered. Likewise, if column indices are
ol
not specified, all the columns are to be considered.
ls
>>>array9[:,2]
array([10, 40, 4])
.d
w
2020-21
#Subtraction
>>> array1 - array2
array([[ -7, -14],
[-11, -10]])
#Multiplication
>>> array1 * array2
array([[ 30, 120],
[ 60, 24]])
om
#Matrix Multiplication
>>> array1 @ array2
.c
array([[120, 132],
ns
[ 70, 104]])
io
#Exponentiation
>>> array1 ** 3
array([[ 27, 216], ut
ol
[ 64, 8]], dtype=int32)
ls
oe
#Division
>>> array2 / array1
array([[3.33333333, 3.33333333],
kg
[3.75 , 6. ]])
.d
#(Modulo)
w
array([[1, 2],
[3, 0]], dtype=int32)
s:
2020-21
6.5.3 Sorting
Sorting is to arrange the elements of an array in
hierarchical order either ascending or descending. By
default, numpy does sorting in ascending order.
>>> array4 = np.array([1,0,2,-3,6,8,4,7])
>>> array4.sort()
om
>>> array4
array([-3, 0, 1, 2, 4, 6, 7, 8])
.c
In 2-D array, sorting can be done along either of the
axes i.e., row-wise or column-wise. By default, sorting
ns
is done row-wise (i.e., on axis = 1). It means to arrange
io
elements in each row in ascending order. When axis=0,
ut
sorting is done column-wise, which means each column
ol
is sorted in ascending order.
ls
>>> array4 = np.array([[10,-7,0, 20],
oe
[-5,1,200,40],[30,1,-1,4]])
>>> array4
kg
>>> array4.sort()
>>> array4
s:
2020-21
om
>>> array1
array([[ 10, 20],
[-30, 40]])
.c
ns
>>> array2
array([[0, 0, 0],
io
[0, 0, 0]])
>>> array1.shape ut
ol
(2, 2)
ls
>>> array2.shape
oe
(2, 3)
kg
2020-21
>>> array3.reshape(2,6)
array([[10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21]])
om
array is to be split; or we can specify an integer N, that
indicates the number of equal parts in which the array
.c
is to be split, as parameter(s) to the NumPy.split()
ns
function. By default, NumPy.split() splits along axis =
io
0. Consider the array given below:
>>> array4
array([[ 10, ut-7, 0, 20],
ol
[ -5, 1, 200, 40],
ls
[ 1, 2, 0, 4],
[ 0, 1, 0, 2]])
kg
.d
[1, 3])
//w
2020-21
>>> secondc
array([[-7],
[ 1],
[ 1],
om
[ 2],
[ 1]])
.c
>>> thirdc
ns
array([[ 0, 20],
io
[200, 40],
[ -1,
[ 0,
4],
4], ut
ol
[ 0, 2]])
ls
oe
# column axis
>>> firsthalf, secondhalf =np.split(array4,2,
.d
axis=1)
w
>>> firsthalf
w
array([[10, -7],
//w
[-5, 1],
[30, 1],
s:
[ 1, 2],
tp
[ 0, 1]])
ht
>>> secondhalf
array([[ 0, 20],
[200, 40],
[ -1, 4],
[ 0, 4],
[ 0, 2]])
2020-21
om
array([4, 6])
2. The min() function finds the minimum element
.c
from an array.
ns
>>> arrayA.min()
-3
io
>>> arrayB.min()
2 ut
ol
>>> arrayB.min(axis=0)
ls
array([3, 2])
oe
>>> arrayA.sum()
.d
25
>>> arrayB.sum()
w
15
w
>>> arrayB.sum(axis=1)
tp
array([9, 6])
ht
2020-21
>>> arrayB.std(axis=0)
array([0.5, 2. ])
>>> arrayB.std(axis=1)
array([1.5, 1. ])
om
functions that can be used to load data from text files.
The most commonly used file type to handle large amount
.c
of data is called CSV (Comma Separated Values).
ns
Each row in the text file must have the same number
of values in order to load data from a text file into a
io
numpy array. Let us say we have the following data in a
ut
text file named data.txt stored in the folder C:/NCERT.
ol
ls
RollNo Marks1 Marks2 Marks3
oe
1, 36, 18, 57
2, 22, 23, 45
kg
3, 43, 51, 37
.d
4, 41, 40, 60
w
5, 13, 18, 37
w
>>> studentdata
array([[ 1, 36, 18, 57],
[ 2, 22, 23, 45],
[ 3, 43, 51, 37],
[ 4, 41, 40, 60],
[ 5, 13, 18, 27]])
In the above statement, first we specify the name
and path of the text file containing the data. Let us
understand some of the parameters that we pass in the
np.loadtext() function:
2020-21
om
We can load each row or column of the data file into
different numpy arrays using the unpack parameter.
.c
By default, unpack=False means we can extract each
ns
row of data as separate arrays. When unpack=True, the
io
returned array is transposed means we can extract the
columns as separate arrays.
ut
# To import data into multiple NumPy arrays
ol
# row wise. Values related to student1 in
ls
# array stud1, student2 in array stud2 etc.
oe
separated by commas.
A CSV file stores array([ 2, 22, 23, 45]) # and so on
//w
>>> mks1
array([36, 22, 43, 41, 13])
>>> mks2
array([18, 23, 51, 40, 18])
>>> mks3
array([57, 45, 37, 60, 27])
2020-21
om
5, 13, 18, 27
.c
>>> dataarray = np.genfromtxt('C:/NCERT/
ns
dataMissing.txt',skip_header=1,
delimiter = ',')
io
>>> dataarray ut
ol
array([[ 1., 36., 18., 57.],
ls
[ 2., nan, 23., 45.],
[ 3., 43., 51., nan],
oe
values.
tp
ht
2020-21
summAry
om
• Array is a data type that holds objects of same
datatype (numeric, textual, etc.). The elements of
.c
an array are stored contiguously in memory. Each
ns
element of an array has an index or position value.
io
• NumPy is a Python library for scientific computing
ut
which stores data in a powerful n-dimensional
ol
ndarray object for faster calculations.
ls
• Each element of an array is referenced by the array
oe
of type numpy.ndarray.
.d
2020-21
notes
exercIse
1. What is NumPy ? How to install it?
2. What is an array and how is it different from a list? What
is the name of the built-in array class in NumPy ?
3. What do you understand by rank of an ndarray?
4. Create the following NumPy arrays:
a) A 1-D array called zeros having 10 elements and
all the elements are set to zero.
b) A 1-D array called vowels having the elements ‘a’,
om
‘e’, ‘i’, ‘o’ and ‘u’.
c) A 2-D array called ones having 2 rows and 5
.c
columns and all the elements are set to 1 and
ns
dtype as int.
d) Use nested Python lists to create a 2-D array called
io
myarray1 having 3 rows and 3 columns and store
the following data: ut
ol
2.7, -2, -19
ls
0, 3.4, 99.9
oe
10.6, 0, 13
kg
in a single row.
c) Display the 2nd and 3rd element of the array vowels.
d) Display all elements in the 2nd and 3rd row of the
array myarray1.
e) Display the elements in the 1st and 2nd column of
the array myarray1.
f) Display the elements in the 1st column of the 2nd
and 3rd row of the array myarray1.
g) Reverse the array of vowels.
6. Using the arrays created in Question 4 above, write
NumPy commands for the following:
2020-21
om
and divide the resulting array by 2. The result
should be rounded to two places of decimals.
.c
7. Using the arrays created in Question 4 above, write
ns
NumPy commands for the following:
io
a) Find the transpose of ones and myarray2.
ut
b) Sort the array vowels in reverse.
c) Sort the array myarray1 such that it brings the
ol
lowest value of the column in the first row and so
ls
on.
oe
2020-21
om
etc. A data set lists values for each of the variables, such as
height and weight of a student, for each row (item) of the data
.c
set. Open data refers to information released in a publicly
ns
accessible repository.
The Iris flower data set is an example of an open data.
io
It is also called Fisher's Iris data set as this data set was
introduced by the British statistician and biologist Ronald ut
ol
Fisher in 1936. The Iris data set consists of 50 samples from
ls
each of the three species of the flower Iris (Iris setosa, Iris
oe
species from each other. The full data set is freely available
w
uci.edu/ml/datasets/iris.
We shall use the following smaller section of this data set
s:
2020-21
om
6.9 3.1 5.4 2.1 Iris-virginica 3
6.7 3.1 5.6 2.4 Iris-virginica 3
.c
6.9 3.1 5.1 2.3 Iris-virginica 3
ns
5.8 2.7 5.1 1.9 Iris-virginica 3
io
6.8 3.2 5.9 2.3 Iris-virginica 3
6.7 3.3
ut
5.7 2.5 Iris-virginica 3
ol
6.7 3 5.2 2.3 Iris-virginica 3
ls
6.3 2.5 5 1.9 Iris-virginica 3
oe
You may type this using any text editor (Notepad, gEdit
.d
file with a name called Iris.txt. (In case you wish to work
w
with the entire dataset you could download a .csv file for the
//w
2020-21
om
5.7, 2.8, 4.1, 1.3, Iris-versicolor, 2
6.9, 3.1, 5.4, 2.1, Iris-virginica, 3
.c
6.7, 3.1, 5.6, 2.4, Iris-virginica, 3
ns
6.9, 3.1, 5.1, 2.3, Iris-virginica, 3
io
5.8, 2.7, 5.1, 1.9, Iris-virginica, 3
ut
ol
6.8, 3.2, 5.9, 2.3, Iris-virginica, 3
ls
6.7, 3.3, 5.7, 2.5, Iris-virginica, 3
oe
iris.
s:
4. Split iris into three 2-D arrays, each array for a different
species. Call them iris1, iris2, iris3.
5. Print the three arrays iris1, iris2, iris3
6. Create a 1-D array header having elements "sepal
length", "sepal width", "petal length", "petal width",
"Species No" in that order.
7. Display the array header.
8. Find the max, min, mean and standard deviation for the
columns of the iris and store the results in the arrays
iris_max, iris_min, iris_avg, iris_std, iris_
var respectively. The results must be rounded to not
more than two decimal places.
2020-21
notes 9. Similarly find the max, min, mean and standard deviation
for the columns of the iris1, iris2 and iris3 and
store the results in the arrays with appropriate names.
10. Check the minimum value for sepal length, sepal width,
petal length and petal width of the three species in
comparison to the minimum value of sepal length, sepal
width, petal length and petal width for the data set as a
whole and fill the table below with True if the species value
is greater than the dataset value and False otherwise.
om
sepal length
.c
sepal width
ns
petal length
io
petal width ut
ol
ls
virginica.
kg
hard disk.
solutIons to cAse study bAsed exercIses
>>> import numpy as np
# Solution to Q1
>>> iris = np.genfromtxt('C:/NCERT/Iris.txt',skip_
header=1, delimiter=',', dtype = float)
# Solution to Q2
>>> iris = iris[0:30,[0,1,2,3,5]] # drop column 4
# Solution to Q3
>>> iris.shape
(30, 5)
>>> iris.ndim
2020-21
2 notes
>>> iris.size
150
# Solution to Q4
# Split into three arrays, each array for a different
# species
>>> iris1, iris2, iris3 = np.split(iris, [10,20],
axis=0)
# Solution to Q5
# Print the three arrays
>>> iris1
om
array([[5.1, 3.5, 1.4, 0.2, 1. ],
[4.9, 3. , 1.4, 0.2, 1. ],
.c
[4.7, 3.2, 1.3, 0.2, 1. ],
[4.6, 3.1, 1.5, 0.2, 1. ],
ns
[5. , 3.6, 1.4, 0.2, 1. ],
io
[5.4, 3.9, 1.7, 0.4, 1. ],
[4.6, 3.4, 1.4, 0.3, 1. ],
ut
ol
[5. , 3.4, 1.5, 0.2, 1. ],
[4.4, 2.9, 1.4, 0.2, 1. ],
ls
>>> iris2
array([[5.5, 2.6, 4.4, 1.2, 2. ],
.d
>>> iris3
array([[6.9, 3.1, 5.4, 2.1, 3. ],
[6.7, 3.1, 5.6, 2.4, 3. ],
[6.9, 3.1, 5.1, 2.3, 3. ],
[5.8, 2.7, 5.1, 1.9, 3. ],
[6.8, 3.2, 5.9, 2.3, 3. ],
[6.7, 3.3, 5.7, 2.5, 3. ],
[6.7, 3. , 5.2, 2.3, 3. ],
[6.3, 2.5, 5. , 1.9, 3. ],
[6.5, 3. , 5.2, 2. , 3. ],
[6.2, 3.4, 5.4, 2.3, 3. ]])
2020-21
notes # Solution to Q6
>>> header =np.array(["sepal length", "sepal
width", "petal length", "petal width",
"Species No"])
# Solution to Q7
>>> print(header)
['sepal length' 'sepal width' 'petal length' 'petal
width' 'Species No']
# Solution to Q8
# Stats for array iris
# Finds the max of the data for sepal length, sepal
om
width, petal length, petal width, Species No
>>> iris_max = iris.max(axis=0)
.c
>>> iris_max
ns
array([6.9, 3.9, 5.9, 2.5, 3. ])
io
# Finds the min of the data for sepal length, sepal
ut
# width, petal length, petal width, Species No
ol
>>> iris_min = iris.min(axis=0)
ls
>>> iris_min
oe
# Species No
ht
# Solution to Q9
>>> iris1_max = iris1.max(axis=0)
>>> iris1_max
array([5.4, 3.9, 1.7, 0.4, 1. ])
2020-21
om
array([5.8, 2.5, 5. , 1.9, 3. ])
.c
>>> iris1_avg = iris1.mean(axis=0)
ns
>>> iris1_avg
io
array([4.86, 3.31, 1.45, 0.22, 1. ])
>>> iris3_avg
.d
>>> iris1_std
array([0.28, 0.29, 0.1 , 0.07, 0. ])
s:
tp
2020-21
# Solution to Q11
#Compare Iris setosa and Iris virginica
>>> iris1_avg[1] > iris2_avg[1] #sepal width
True
# Solution to Q12
>>> iris1_avg[2] > iris2_avg[2] #petal length
False
om
# Solution to Q13
>>> iris1_avg[3] > iris2_avg[3] #petal width
.c
False
ns
# Solution to Q14
io
>>> np.savetxt('C:/NCERT/IrisMeanValues.txt',
ut
iris_avg, delimiter = ',')
ol
# Solution to Q15
ls
2020-21