0% found this document useful (0 votes)
126 views

Data Science With Python - Lesson 06 - Scientific Computing With Python (Scipy) - Ebook

SciPy is a Python-based ecosystem of open-source software for science and engineering. It contains modules for optimization, integration, linear algebra, Fourier transforms, and more. Key characteristics of SciPy include built-in mathematical libraries and functions, high-level commands for data manipulation and visualization, and efficient and fast data processing. SciPy has several sub-packages that handle different scientific domains like integration, linear algebra, optimization, statistics, and image processing. Common sub-packages provide functions for integration techniques, optimization algorithms, solving linear systems, and performing linear algebra operations.

Uploaded by

Samir Awol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views

Data Science With Python - Lesson 06 - Scientific Computing With Python (Scipy) - Ebook

SciPy is a Python-based ecosystem of open-source software for science and engineering. It contains modules for optimization, integration, linear algebra, Fourier transforms, and more. Key characteristics of SciPy include built-in mathematical libraries and functions, high-level commands for data manipulation and visualization, and efficient and fast data processing. SciPy has several sub-packages that handle different scientific domains like integration, linear algebra, optimization, statistics, and image processing. Common sub-packages provide functions for integration techniques, optimization algorithms, solving linear systems, and performing linear algebra operations.

Uploaded by

Samir Awol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Data Science with Python

Scientific Computing with Python (SciPy)


Learning Objectives

By the end of this lesson, you will be able to:

Explain the importance of SciPy

List the characteristics of SciPy

Explain sub-packages of SciPy

Discuss SciPy sub-packages, such as optimization,


integration, linear algebra, statistics, weave, and IO
SciPy and Its Characteristics
Multiple Scientific Domains

How to handle multiple scientific domains? The solution is SciPy.

Statistics

Space science Optimization

Image science Signal processing

Platform integration
Mathematical equations

Scientific Domains
SciPy

SciPy has built-in packages that help in handling the scientific domains.

Mathematics
integration Statistics
(Normal distribution)

Linear algebra

Multidimensional
image processing
Mathematics Language
constants integration
SciPy and Its Characteristics

Built-in mathematical libraries and 1 High-level commands for data


functions 2 manipulation and visualization

Simplifies scientific application


development 6
Efficient and fast data
3 processing

Large collection of sub-packages


for different scientific domains 5 Integrates well with multiple
4 systems and environments
SciPy Packages

Some widely used packages are:

Integration IO

Linear Algebra Optimize

Statistics Weave packages

5
4
Introduction of SciPy Sub-Package
SciPy Sub-Package

SciPy has multiple sub-packages which handle different scientific domains.

cluster ndimage
Clustering algorithms N-dimensional image processing

constants odr
Physical and mathematical constant Orthogonal distance regression

fftpack optimize
Fast Fourier Transform routines Optimization and root-finding routines

integrate signal
Integration and ordinary differential equation solvers Signal processing

Spatial sparse
Spatial data structures and algorithms Sparse matrices and associated routines

interpolate weave
Interpolation and smoothing splines C/C++ integration

IO stats
Input and Output Statistical distributions and functions

special
linalg
Special functions
Linear algebra
SciPy Sub-Package: Integration

SciPy provides integration techniques that solve mathematical sequences and series, or
perform function approximation.

General integration (quad) General multiple integration (dblquad, tplquad, nquad)

integrate.quad(f, a, b)
• integrate.dblquad()
• integrate.tplquad()
• integrate.nquad()

The limits of all inner integrals need to be defined as


functions.
SciPy Sub-Package: Integration

This example shows how to perform quad integration.

Import quad from


integrate sub-
package
Define function for
integration of x
Perform quad
integration for function
of x for limit 0 to 1

Define function for ax +


b

Declare value of a and


b
Perform quad
integration and pass
functions and
arguments
SciPy Sub-Package: Integration

This example shows you how to perform multiple integration.

Import integrate package


sub-package

Define function for x + y

Perform multiple
integration using the
lambda built-in function
SciPy Sub-Package: Optimization
SciPy Sub-Package: Optimization
Optimization is a process to improve performance of a system mathematically by fine-tuning the process
parameters.

SciPy provides several optimization algorithms, such as bfgs, Nelder-Mead simplex, Newton Conjugate
Gradient, COBYLA, or SLSQP.

Root finding, Curve fitting


Minimization functions

optimize.minimize(f, x0, method=‘BFGS’)

lower limit in a
given range

root(f, x0, method=’hybr’)


optimize.curve_fit(f, xdata, ydata)
SciPy Sub-Package: Optimization
Import numpy and
optimize from SciPy

Define function for


X^2 + 5 sin x

Perform optimize
minimize function
using bfgs method
and options

Perform optimize minimize


function using bfgs method and
without options
SciPy Sub-Package: Optimization

Define function for


X + 3.5 Cos x

Pass x value in argument for


root

Function value and array


values
SciPy Sub-Package: Linear Algebra
SciPy Sub-Package: Linear Algebra

SciPy provides rapid linear algebra capabilities and contains advanced algebraic functions.

Inverse of matrix Finding Determinant Solve Linear Single Value


systems Decomposition (SVD)

This function is used to compute the inverse of the given matrix. Let’s look at the inverse matrix
operation.

Import linalg and


Define a numpy
matrix or array

View the type

Use inv function to


inverse the matrix
SciPy Sub-Package: Linear Algebra

SciPy provides rapid linear algebra capabilities and contains advanced algebraic functions.

Inverse of matrix Finding Determinant Solve Linear Single Value


systems Decomposition (SVD)

With this function you can compute the value of the determinant for the given matrix.

Import linalg and


Define an numpy matrix or
array

Use det function to find the


determinant value of the
matrix
SciPy Sub-Package: Linear Algebra

SciPy provides rapid linear algebra capabilities and contains advanced algebraic functions.

Inverse of matrix Finding Determinant Solve Linear Single Value


systems Decomposition (SVD)

Linear equations Import linalg

2x + 3 y + z = 21
-x + 5y + 4z = 9
3x + 2y + 9z = 6

Use solve
method
SciPy Sub-Package: Linear Algebra

SciPy provides rapid linear algebra capabilities and contains advanced algebraic functions.

Inverse of matrix Finding Determinant Solve Linear Single Value


systems Decomposition (SVD)

Import linalg

Define matrix

Find shape of ndarray which


is 2X3 matrix

Use svd function

U (Unitary matrix)
Sigma or square root of eigenvalues

VH is values collected into


unitary matrix
Calculate Eigenvalues and Eigenvectors

Problem Statement: Demonstrate how to calculate eigenvalues and eigenvectors

Access: Click on the Practice Labs tab on the left side panel of the LMS. Copy or note the
username and password that is generated. Click on the Launch Lab button. On the page that
appears, enter the username and password in the respective fields, and click Login.
SciPy Sub-Package: Statistics
SciPy Sub-Package: Statistics

SciPy provides a very rich set of statistical functions which are:

• This package contains distributions for which random variables are


generated.
• These packages enable the addition of new routines and distributions. It
also offers convenience methods such as pdf(), cdf()
• Following are the statistical functions for a set of data:
o linear regression: linregress()
o describing data: describe(), normaltest()
SciPy Sub-Package: Statistics

CDF or Cumulative Distribution Function provides the cumulative probability associated with a function.

One standard
Cumulative deviation
Age Range Frequency
Frequency

0-10 19 19

10-20 55 74 68% of data

Total number of 95% of data


21-30 23 97 persons within
this age
31-40 36 133 99.7% of data

41-50 10 143
-3 -2 -1 01 1 2 3
51-60 17 160
F(x) = P(X≤x)

negative infinity
SciPy Sub-Package: Statistics

Probability Density Function, or PDF, of a continuous random variable is the derivative of its Cumulative Distribution
Function, or CDF.

Derivative of CDF
SciPy Sub-Package: Statistics

Shown here are functions used to perform Normal Distribution:

Import norm for normal


distribution
rvs for Random variables

cdf for Cumulative Distribution Function

pdf for Probability Density


Function for random
distribution

loc and scale are used to adjust the location and scale of the data distribution.
SciPy Sub-Package: Weave and IO
SciPy Sub-Package: Weave

The weave package provides ways to modify and extend any supported extension libraries.

Features of Weave Package:

• Includes C/C++ code within Python code


• Speed ups of 1.5x to 30x compared to algorithms written in pure Python

Two main functions of weave::


• inline() compiles and executes C/C++ code on the fly
• blitz() compiles NumPy Python expressions for fast execution
SciPy Sub-Package: IO

The IO package provides a set of functions to deal with several kinds of file formats.

It offers a set of functions to deal with file formats that include:


• MatLab file
• IDL files
• Matrix market files
• Wav sound files
• Arff files
• Netcdf files
Package provides additional files and its corresponding methods such as:
• Numpy.loadtxt()/Numpy.savetxt()
• Numpy.genfromtxt()/Numpy.recfromcsv()
• Numpy.save()/Numpy.load()
Using SciPy to Solve a Linear Algebra Problem

Problem Statement:
There is a test with 30 questions worth 150 marks. The test has two types of questions:
1. True or false – carries 4 marks each
2. Multiple choice – carries 9 marks each
Find the number of true or false and multiple-choice questions.

Common instructions:
•If you are new to Python, download the “Anaconda Installation Instructions” document
from the “Resources” tab to view the steps for installing Anaconda and the Jupyter
notebook.
•Download the “Assignment 01” notebook and upload it on the Jupyter notebook to
access it.
•Follow the cues provided to complete the assignment.
Using SciPy to Declare Random Values

Problem Statement:
Use SciPy to declare 20 random values for random values and perform the following:
1. CDF – Cumulative Distribution Function for 10 random variables.
2. PDF – Probability Density Function for 14 random variables.

Common instructions:
•If you are new to Python, download the “Anaconda Installation Instructions” document from the
“Resources” tab to view the steps for installing Anaconda and the Jupyter notebook.
•Download the “Assignment 02” notebook and upload it on the Jupyter notebook to access it.
•Follow the cues provided to complete the assignment.
Key Takeaways

You are now able to:

Explain the importance of SciPy

List the characteristics of SciPy

Explain sub-packages of SciPy

Discuss SciPy sub-packages, such as optimization,


integration, linear algebra, statistics, weave, and IO
Knowledge Check
Knowledge
Check What are the specification limits provided for curve fitting function (optimize.curve.fit),
during the optimization process?
1

a. Upper limit value

b. Lower limit value

c. Upper and lower limit values

d. Only the optimization method


Knowledge
Check What are the specification limits provided for curve fitting function (optimize.curve.fit),
during the optimization process?
1

a. Upper limit value

b. Lower limit value

c. Upper and lower limit values

d. Only the optimization method

The correct answer is c

Both the upper and lower limit values should be specified for optimize.curve.fit function.
Knowledge
Check
Which of the following function is used for inversing the matrix?
2

a. SciPy.special

b. SciPy.linalg

c. SciPy.signal

d. SciPy.stats
Knowledge
Check
Which of the following function is used for inversing the matrix?
2

a. SciPy.special

b. SciPy.linalg

c. SciPy.signal

d. SciPy.stats

The correct answer is b

SciPy.linalg is used to inverse the matrix.


Knowledge
Check
Which of the following is performed using SciPy?
3

a. Website

b. Plot data

c. Scientific calculations

d. System administration
Knowledge
Check
Which of the following is performed using SciPy?
3

a. Website

b. Plot data

c. Scientific calculations

d. System administration

The correct answer is c

SciPy has been specially made to perform scientific calculations. Generally, Python is the programming
language that has libraries to perform all listed activities.
Knowledge
Check
Which of the following functions is used to calculate minima?
4

a. optimize.minimize()

b. integrate.quad()

c. stats.linregress()

d. linalg.solve()
Knowledge
Check
Which of the following functions is used to calculate minima?
4

a. optimize.minimize()

b. integrate.quad()

c. stats.linregress()

d. linalg.solve()

The correct answer is a

The function optimize.minimize() is used to calculate minima. integrate.quad () is used for integral
calculation, stats.linregress() is used for linear regression, and linalg.solve() is used to solve a linear system.
Knowledge
Check
Which of the following syntaxes is used to generate 100 random variables from a
5 t-distribution with df = 10?

a. stats.t.pmf(df=10, size=100)

b. stats.t.pdf(df=10, size=100)

c. stats.t.rvs(df=10, size=100)

d. stats.t.rand(df=10, size=100)
Knowledge
Check
Which of the following syntaxes is used to generate 100 random variables from a
5 t-distribution with df = 10?

a. stats.t.pmf(df=10, size=100)

b. stats.t.pdf(df=10, size=100)

c. stats.t.rvs(df=10, size=100)

d. stats.t.rand(df=10, size=100)

The correct answer is c

The stats.t.rvs() function is used to generate random variables. stats.t.pmf() function is used to generate the
probability of mass function, and stats.t.pdf() is used to generate probability density function. Note that
stats.t.rand () does not exist.
Knowledge
Check
Which of the following functions is used to run C or C++ codes in SciPy?
6

a. io.loadmat()

b. weave.inline()

c. weave.blitz()

d. io.whosmat()
Knowledge
Check
Which of the following functions is used to run C or C++ codes in SciPy?
6

a. io.loadmat()

b. weave.inline()

c. weave.blitz()

d. io.whosmat()

The correct answer is b

inline() function accepts C codes as string and compiles them for later use. loadmat() loads variables from
.mat file. whosmat() checks the variables inside a .mat file.blitz(), and then compiles NumPy expressions for
faster running, but it can’t accept C codes.
Thank You

You might also like