Programming Basics and AI Lecture
Programming Basics and AI Lecture
Lectures on YouTube:
https://www.youtube.com/@mathtalent
Seongjai Kim
• mathematical analysis,
• generating computational algorithms,
• profiling algorithms’ accuracy and cost, and
• the implementation of algorithms in selected programming languages
(commonly referred to as coding).
The source code of a program can be written in one or more programming languages.
The manuscript is conceived as an introduction to the thriving field of information engi-
neering, particularly for early-year college students who are interested in mathemat-
ics, engineering, and other sciences, without an already strong background in computa-
tional methods. It will also be suitable for talented high school students. All examples
to be treated in this manuscript are implemented in Matlab and Python, and occasionally
in Maple.
iii
iv
Level of Lectures
• The target audience is undergraduate students.
• However, talented high school students would be able to follow
the lectures.
• Persons with no programming experience will understand
most of lectures.
Goals of Lectures
Let the students understand
• Mathematical Basics: Calculus & Linear Algebra
• Programming, with Matlab and Python
• Artificial Intelligence (AI)
• Basics of Machine Learning (ML)
• An ML Software: Scikit-Learn
Title ii
Prologue iv
Table of Contents ix
1 Programming Basics 1
1.1. What is Programming or Coding? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1. Programming: Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2. Functions: Generalization and Reusability . . . . . . . . . . . . . . . . . . . . . 5
1.1.3. Becoming a Good Programmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2. Matlab: A Powerful Computer Language . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1. Introduction to Matlab/Octave . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2. Repetition: Iteration Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.3. Anonymous Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.2.4. Open Source Alternatives to Matlab . . . . . . . . . . . . . . . . . . . . . . . . . 25
Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Programming Examples 29
2.1. Area Estimation of the Region Defined by a Closed Curve . . . . . . . . . . . . . . . . . 30
2.2. Visualization of Complex-Valued Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3. Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.1. Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.2. Short-Time Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4. Computational Algorithms and Their Convergence . . . . . . . . . . . . . . . . . . . . . 47
2.4.1. Computational Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.2. Big O and little o notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5. Inverse Functions: Exponentials and Logarithms . . . . . . . . . . . . . . . . . . . . . 53
2.5.1. Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.5.2. Logarithmic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
v
vi Contents
A Appendices 333
A.1. Optimization: Primal and Dual Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 334
A.1.1. The Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
A.1.2. Lagrange Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
A.2. Weak Duality, Strong Duality, and Complementary Slackness . . . . . . . . . . . . . . 338
A.2.1. Weak Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
A.2.2. Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
A.2.3. Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
A.3. Geometric Interpretation of Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
A.4. Rank-One Matrices and Structure Tensors . . . . . . . . . . . . . . . . . . . . . . . . . 349
A.5. Boundary-Effects in Convolution Functions in Matlab and Python SciPy . . . . . . . . 353
A.6. From Python, Call C, C++, and Fortran . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
P Projects 365
P.1. Project: Canny Edge Detection Algorithm for Color Images . . . . . . . . . . . . . . . . 366
P.1.1. Noise Reduction: Image Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
P.1.2. Gradient Calculation: Sobel Gradient . . . . . . . . . . . . . . . . . . . . . . . . 372
P.1.3. Edge Thinning: Non-maximum Suppression . . . . . . . . . . . . . . . . . . . . 375
P.1.4. Double Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
P.1.5. Edge Tracking by Hysteresis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
P.2. Project: Text Extraction from Images, PDF Files, and Speech Data . . . . . . . . . . . 380
Bibliography 383
Contents ix
Index 385
x Contents
1
C HAPTER
Programming Basics
Contents of Chapter 1
1.1. What is Programming or Coding? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Matlab: A Powerful Computer Language . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1
2 Chapter 1. Programming Basics
Solution. You may start with 2; add 3, add 4, and finally add 5; the answer
is 14. This simple procedure is the result of programming in your brain.
Programming is thinking.
1.1. What is Programming or Coding? 3
√
Example 1.3. Let’s try to get 5. Your calculator must have a function
√ √
key . When you input 5 and push Enter , your calculator displays the
answer on the spot. How can the calculator get the answer?
Solution. Calculators or computers cannot keep a table to look the answer
up. They compute the answer on the spot as follows.
Let Q = 5.
1. initialization: p
2. for i = 1, 2, · · · , itmax
p ← (p + Q/p)/2;
3. end for
squareroot_Q.m
1 Q=5;
2
3 p = 1;
4 for i=1:8
5 p = (p+Q/p)/2;
6 fprintf("%3d %.20f\n",i,p)
7 end
Output
1 1 3.00000000000000000000
2 2 2.33333333333333348136
3 3 2.23809523809523813753
4 4 2.23606889564336341891
5 5 2.23606797749997809888
6 6 2.23606797749978980505
7 7 2.23606797749978980505
8 8 2.23606797749978980505
Solution.
• This example asks to evaluate the quantity:
10
X
2 2 2
1 + 2 + · · · + 10 = i2 . (1.2)
i=1
7 sqsum = 0;
8 for i=m:n
9 sqsum = sqsum + i^2;
10 end
• Lines 2–5 of squaresum.m, beginning with the percent sign (%), are for
a convenient user interface. A built–in function help can be utilized
whenever we want to see what the programmer has commented for the
function. For example,
help
1 >> help squaresum
2 function sqsum = squaresum(m,n)
3 Evaluates the square sum of consecutive integers: m to n.
4 input: m,n
5 output: sqsum
• The last four lines of squaresum.m include the required operations for
the given task.
1 >> squaresum(1,10)
2 ans = 385
8 Chapter 1. Programming Basics
5 %% initial setting
6 S = R;
7
11 %% begin sorting
12 for j=n:-1:2 %index for the largest among remained
13 for i=1:j-1
14 if S(i) > S(i+1)
15 tmp = S(i);
16 S(i) = S(i+1);
17 S(i+1) = tmp;
18 end
19 end
20 end
1.1. What is Programming or Coding? 11
SortArray.m
1 % User parameter
2 n=10;
3
8 % Call "mysort"
9 S = mysort(R)
Output
1 >> SortArray
2 R =
3 33 88 75 17 91 94 79 36 2 72
4 S =
5 2 17 33 36 72 75 79 88 91 94
Note: You may have to run “SortArray.m” a few times, to make sure that
“mysort” works correctly.
• The symbols (,) and (;) can be used to combine more than one command
in the same line.
• If we use semicolon (;), Matlab sets the variable but does not print the
output.
3 %% a curve
4 X1=linspace(0,2*pi,11); % n=11
5 Y1=cos(X1);
6
7 %% another curve
8 X2=linspace(0,2*pi,51);
9 Y2=sin(X2);
10
11 %% plot together
12 plot(X1,Y1,'-or','linewidth',2, X2,Y2,'-b','linewidth',2)
13 legend({'y=cos(x)','y=sin(x)'})
14 axis tight
15 print -dpng 'fig_cos_sin.png'
The command doc opens the Help browser. If the Help browser is already
open, but not visible, then doc brings it to the foreground and opens a
new tab. Try doc surf, followed by doc contour.
1.2. Matlab: A Powerful Computer Language 19
Note: Repetition
• In scientific computing, one of most frequently occurring events is
repetition.
• Each repetition of the process is also called an iteration.
• It is the act of repeating a process, to generate a (possibly un-
bounded) sequence of outcomes, with the aim of approaching a de-
sired goal, target or result. Thus,
(a) Iteration must start with an initialization (starting point), and
(b) Perform a step-by-step marching in which the results of one it-
eration are used as the starting point for the next iteration.
While loop
The while loop repeatedly executes statements while a specified condi-
tion is true. The syntax of a while loop in Matlab is as follows.
while <expression>
<statements>
end
An expression is true when the result is nonempty and contains all
nonzero elements, logical or real numeric; otherwise the expression is
false.
while a<=b
fprintf(' The value of a=%d\n',a);
a = a+1;
end
When the code above is executed, the result will be:
while loop execution: a=10, b=15
The value of a=10
The value of a=11
The value of a=12
The value of a=13
The value of a=14
The value of a=15
1.2. Matlab: A Powerful Computer Language 21
For loop
A for loop is a repetition control structure that allows you to efficiently
write a loop that needs to execute a specific number of times. The syntax
of a for loop in Matlab is as following:
for index = values
<program statements>
end
Here values can be any list of numbers. For example:
• initval:endval – increments the index variable from initval to
endval by 1, and repeats execution of program statements while in-
dex is not greater than endval.
• initval:step:endval – increments index by the value step on each
iteration, or decrements when step is negative.
Example 1.16. The code in Example 1.15 can be rewritten as a for loop.
%% for loop
a=10; b=15;
fprintf('for loop execution: a=%d, b=%d\n',a,b);
for i=a:b
fprintf(' The value of i=%d\n',i);
end
When the code above is executed, the result will be:
for loop execution: a=10, b=15
The value of i=10
The value of i=11
The value of i=12
The value of i=13
The value of i=14
The value of i=15
22 Chapter 1. Programming Basics
Nested loops
Matlab also allows to use one loop inside another loop. The syntax for a
nested loop in Matlab is as follows:
for n = n0:n1
for m = m0:m1
<statements>;
end
end
The syntax for a nested while loop statement in Matlab is as follows:
while <expression1>
while <expression2>
<statements>;
end
end
For a nested loop, you can combine
• for loop and while loop
• more than two
1.2. Matlab: A Powerful Computer Language 23
Example 1.17. Let’s modify the code in Example 1.15 to involve a break
statement.
%% "break" statement with while loop
a=10; b=15; c=12.5;
fprintf('while loop execution: a=%d, b=%d, c=%g\n',a,b,c);
while a<=b
fprintf(' The value of a=%d\n',a);
if a>c, break; end
a = a+1;
end
When the code above is executed, the result is:
while loop execution: a=10, b=15, c=12.5
The value of a=10
The value of a=11
The value of a=12
The value of a=13
When the condition a>c is satisfied, break is invoked; which breaks the while
loop to stop.
24 Chapter 1. Programming Basics
Continue Statement
continue passes control to the next iteration of a for or while loop.
• It skips any remaining statements in the body of the loop for the
current iteration; the program continues execution from the next
iteration.
• continue applies only to the body of the loop where it is called.
• In nested loops, continue skips remaining statements only in the
body of the loop in which it occurs.
for i=a:b
if mod(i,2), continue; end % even integers, only
disp([' The value of i=' num2str(i)]);
end
When the code above got executed, the result is:
for loop execution: a=10, b=15
The value of i=10
The value of i=12
The value of i=14
9 %% Calculus
10 q = integral(f,1,3)
Output
1 >> anonymous_function
2 f1 =
3 -2
4 fX =
5 -2 4 22 58 118 208
6 q =
7 12
• 1:20
• 1:1:20
• 1:2:20
• 1:3:20;
• isprime(12)
• isprime(13)
• for i=3:3:30, fprintf('[i,i^2]=[%d, %d]\n',i,i^2), end
for i=3:3:30
1.2. Compose a code and write as a function for the sum of prime numbers not larger than
a positive integer n.
1.3. Modify the function you made in Exercise 2 to count the number of prime numbers
and return the result along with the sum. For multiple output, the function may
start with
function [sum, numver] = <function_name>(inputs)
and n
X
Tn = Sk .
k=1
n fn rn tn
for n ≤ K = 20.
You may start with
Fibonacci_sequence.m
1 K = 20;
2 F = zeros(K);
3 F(1)=1; F(2)=F(1);
4
5 for n=3:K
6 F(n) = F(n-1)+F(n-2);
7 rn = F(n)/F(n-1);
8 fprintf("n =%3d; F = %7d; rn = %.12f\n",n,F(n),rn);
9 end
(b) Find n such that rn has 12-digit decimal accuracy to the golden ratio φ.
Ans: (b) n = 32
28 Chapter 1. Programming Basics
2
C HAPTER
Programming Examples
Contents of Chapter 2
2.1. Area Estimation of the Region Defined by a Closed Curve . . . . . . . . . . . . . . . . . 30
2.2. Visualization of Complex-Valued Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3. Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4. Computational Algorithms and Their Convergence . . . . . . . . . . . . . . . . . . . . . 47
2.5. Inverse Functions: Exponentials and Logarithms . . . . . . . . . . . . . . . . . . . . . 53
Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
29
30 Chapter 2. Programming Examples
b · (d − c) − a · (d − c) = b · (d − c) + a · (c − d)
where the sum is carried out over line segments Li and x∗i denotes the
mid value of x on Li .
(b) Find the area of a triangle.
Solution. We know the area = 21 (b − a) · (d − c).
Now, let’s try to find the area using the formula
(2.2): X
Area = x∗i · ∆y i .
i
2 · 3 − 2.5 · 2 + 3.5 · 2 − 4 · 3
+6 · 6
−3.5 · 2 + 2.5 · 2
= 6 − 5 + 7 − 12
+36
−7 + 5
= 30
• Let Li be the i-th line segment connecting (xi−1 , yi−1 ) and (xi , yi ), n =
1, 2, · · · , n. Then the area of R can be computed using the formula
n
X
Area(R) = x∗i · ∆y i , (2.4)
i=1
where
xi−1 + xi
x∗i = , ∆y i = yi − yi−1 .
2
Note: The formula (2.4) is a result of Green’s Theorem for the line
integral and numerical approximation.
(a) Generate a dataset that represents the circle of radius centered the
origin. For example, for i = 0, 1, 2, · · · , n,
2π
(xi , yi ) = (cos θi , sin θi ), θi = i · . (2.5)
n
Note that (xn , yn ) = (x0 , y0 ).
(b) Analyze accuracy improvement of the area as n grows. The larger n you
choose, the more accurately the data would represent the region.
Solution.
circle.m
1 n = 10;
2 %%---- Data generation -----------------
3 theta = linspace(0,2*pi,n+1)'; % a column vector
4 data = [cos(theta),sin(theta)];
5
2.1. Area Estimation of the Region Defined by a Closed Curve 33
20 %%======================================
21 %%---- Read the data -------------------
22 %%======================================
23 DATA = load(filename);
24 X = DATA(:,1);
25 Y = DATA(:,2);
26
27 figure,
28 plot(X,Y,'b--','linewidth',2);
29 daspect([1 1 1]); axis tight
30 xlim([-1 1]), ylim([-1 1]);
31 title(['Circle: n=' int2str(n)])
32 yticks(-1:0.5:1)
33 saveas(gcf,'circle-dashed.png');
34
Accuracy Improvement
1 n = 10; area = 2.938926261462, misfit = 0.202666392127
2 n = 20; area = 3.090169943749, misfit = 0.051422709840
3 n = 40; area = 3.128689300805, misfit = 0.012903352785
4 n = 80; area = 3.138363829114, misfit = 0.003228824476
5 n = 160; area = 3.140785260725, misfit = 0.000807392864
f (x) = x2 − x + 1 = 0, (2.6)
we can easily find that the equation has no real solutions. However,
by using the quadratic formula, the complex-valued solutions are
√
1 ± 3i
x= .
2
Here we have questions:
What do the complex-valued solutions mean?
Can we visualize them?
C = {x + yi | x, y ∈ R}
√
where i = −1, called the imaginary unit.
• Seeking a real-valued solution of f (x) = 0 is the same as finding
a solution of f (z) = 0, z = x + yi, restricting on the x-axis (y = 0).
• If
f (z) = A(x, y) + B(x, y) i, (2.7)
then the complex-valued solutions are the points x + yi such that
A(x, y) = B(x, y) = 0.
Ans: f (z) = (x2 − x − y 2 + 1) + (2x − 1)y i
36 Chapter 2. Programming Examples
4 syms x y real
5
6 %% z^2 -z +1 = 0
7 A = @(x,y) x.^2-x-y.^2+1;
8 B = @(x,y) (2*x-1).*y;
9 T = 'z^2-z+1=0';
10
24 figure,
25 np=101; X=linspace(-5,5,np); Y=linspace(-5,5,np);
26 contour(X,Y,A(X,Y'), [0 0],'r','linewidth',2), hold on
27 contour(X,Y,B(X,Y'), [0 0],'b--','linewidth',2)
28 plot(double(xs),double(ys),'r.','MarkerSize',30) % the solutions
29 grid on
30 %ax=gca; ax.GridAlpha=0.5; ax.FontSize=13;
31 legend("A=0","B=0")
32 xlabel('x'), ylabel('yi'), title(['Compex solutions of ' T])
33 hold off
34 print -dpng 'complex-solutions-A-B=0.png'
2.2. Visualization of Complex-Valued Solutions 37
Figure 2.3: Two solutions are 1/2 + -3ˆ (1/2)/2 i and 1/2 + 3ˆ (1/2)/2 i.
Remark 2.10. You can easily find the real part and the imaginary
part of polynomials of z = x + iy as follows.
Real and Imaginary Parts
1 syms x y real
2 z = x + 1i*y;
3
4 g = z^2 -z +1;
5 simplify(real(g))
6 simplify(imag(g))
Here “1i” (number 1 and letter i), appeared in Line 2, means the imaginary
√
unit i = −1.
Output
1 ans = x^2 - x - y^2 + 1
2 ans = y*(2*x - 1)
The identity
eiθ = cos θ + i sin θ, θ ∈ R, (2.10)
is called the Euler’s identity; see Exercise 3.3 on p.109.
2.3. Discrete Fourier Transform 39
where
• N = the total number of discrete data points taken
• T = the total sampling time (second)
• ∆t = time between data points, ∆t = T /N
• ∆f = the frequency increment (frequency resolution)
∆f = 1/T (Hz)
• fs = the sampling frequency (per second), fs = 1/∆t = N/T .
Note: (k∆f )(n∆t) = kn/N
discrete_Fourier_inverse.m
1 function X = discrete_Fourier_inverse(x)
2 % function X = discrete_Fourier_inverse(x)
3 % Calculate the inverse DFT
4
15 X = X/N;
2.3. Discrete Fourier Transform 41
3 T = 4; fs = 100;
4 t = 0:1/fs:T-1/fs; % Time vector
5 x = sin(2*pi*10*t) + 2*sin(2*pi*20*t); % Signal
6
7 figure, plot(t,x,'-k')
8 print -dpng 'dft-data-signal.png'
9
10 %-------------------------------------
11 X = discrete_Fourier(x);
12
22 %-------------------------------------
23 x_restored = discrete_Fourier_inverse(X);
24 misfit = max(abs(x-x_restored))
25
Output
1 misfit = 2.6083e-13
42 Chapter 2. Programming Examples
4 t = linspace(0,2*pi,M);
5 g = 0.55-0.45*cos(t);
short_time_DFT.m
1 close all; clear all
2
8 %-------------------------------------------------
9 x = [x0,x0,x0]; %t = [t0,t0+T0,t0+2*T0];
10
11 %-------------------------------------------------
12 M = 150; % window length
13 R = 60; % sliding length
14
15 g = win_cos(M);
16 F = stft2(x,g,R);
17 S = abs(F).^2; % spectrogram
18
24 figure,plot(1:M,g,'-b','linewidth',1.5);
25 ylim([0,1]); title('win\_cos, a window function')
26 print -dpng 'stft-window-g.png'
27
46 Chapter 2. Programming Examples
28 Df = fs/M; Dt = R/fs;
29 figure, imagesc((0:size(S,2)-1)*Dt,(0:M-1)*Df,S)
30 xlabel('Time (second)'); ylabel('Frequency (Hz)'); title('Spectrogram');
31 colormap('pink'); colorbar; set(gca,'YDir','normal')
32 print -dpng 'stft-spectrogram.png'
stft2.m
1 function F = stft2(x,g,R)
2 % function F = stft2(x,g,R)
3 % x: the signal
4 % g: the window function of length M
5 % R: sliding length
6 % Output: F = the short-time DFT
7
8 Ns = length(x); M = length(g);
9 L = M-R; % overlap
10
11 Col = floor((Ns-L)/(M-L));
12 F = zeros(M,Col); c = 1;
13
14 while 1
15 if c==1, n0=1; else, n0=n0+R; end
16 n1=n0+M-1;
17 if n1>Ns, break; end
18 signal = x(n0:n1).*g;
19 F(:,c) = discrete_Fourier(signal)'; c=c+1;
20 end
Figure 2.6: The first 1000 samples of the chirp signal and the spectrogram from the STFT.
2.4. Computational Algorithms and Their Convergence 47
Algorithms consist of various steps for inputs, outputs, and functional oper-
ations, which can be described effectively by a so-called pseudocode.
Solution.
sequence_sqrt2.m
1 x = 2;
2 for n=1:5
3 x = x/2 + 1/x;
4 fprintf('n=%d: xn = %.10f\n',n,x)
5 end
Output
1 n=1: xn = 1.5000000000
2 n=2: xn = 1.4166666667
3 n=3: xn = 1.4142156863
4 n=4: xn = 1.4142135624
5 n=5: xn = 1.4142135624
• A sequence {αn }∞ ∞
n=1 is said to be in O (big Oh) of {βn }n=1 if a positive
number K exists for which
|αn |
|αn | ≤ K|βn |, for large n or equivalently, ≤K . (2.20)
|βn |
In this case, we say “αn is in O(βn )" and denote αn ∈ O(βn ) or αn =
O(βn ).
• A sequence {αn } is said to be in o (little oh) of {βn } if there exists a
sequence εn tending to 0 such that
|αn |
|αn | ≤ εn |βn |, for large n or equivalently, lim =0 . (2.21)
n→∞ |βn |
1
Example 2.32. Let f (h) = (1 + h − eh ). What are the limit and the rate
h
of convergence of f (h) as h → 0?
Solution.
a. ex − 1 = O(x2 ), as x → 0
b. x = O(tan−1 x), as x → 0
c. sin x cos x = o(x), as x → 0
Solution.
2.5. Inverse Functions: Exponentials and Logarithms 53
a. f (x) = x2 b. g(x) = x2 , x ≥ 0
c. h(x) = x3
Solution.
Solution. Write y = x3 + 2.
Exponential Functions
Definition 2.40. A function of the form
Example 2.41. Table 2.1 shows data for the population of the world
in the 20th century. Figure 2.8 shows the corresponding scatter plot.
• The pattern of the data points suggests an exponential growth.
• Use an exponential regression algorithm to find a model of the form
P (t) = a · bt , (2.31)
One can find the parameters (α, β) which fit best the following:
α + 0β = ln 1650
α + 10β = ln 1750
" #
α
α + 20β = ln 1860 ⇒ A =r (2.33)
.. β
.
α + 110β = ln 6870
13 plot(Data(:,1),Data(:,2),'k.','MarkerSize',20)
14 xlabel('Years since 1900');
15 ylabel('Millions'); hold on
16 print -dpng 'population-data.png'
17 t = Data(:,1);
18 plot(t,a*b.^t,'r-','LineWidth',2)
19 print -dpng 'population-regression.png'
20 hold off
The Number e
Of all possible bases for an exponential function, there is one that is
most convenient for the purposes of calculus. The choice of a base a is
influenced by the way the graph of y = ax crosses the y-axis.
• Some of the formulas of calculus will be greatly simplified, if we
choose the base a so that the slope of the tangent line to y = ax
at x = 0 is exactly 1.
• In fact, there is such a number and it is denoted by the letter e.
(This notation was chosen by the Swiss mathematician Leonhard
Euler in 1727, probably standing for exponential.)
• It turns out that the number e lies between 2 and 3:
e ≈ 2.718282 (2.35)
e_limit.m Output
1 % An increasing sequence 1 e_1 = 2.5937424601
2 2 e_2 = 2.7048138294
3 for n=1:8 3 e_3 = 2.7169239322
4 x=1/10^n; 4 e_4 = 2.7181459268
5 en = (1+x)^(1/x); 5 e_5 = 2.7182682372
6 fprintf('e_%d = %.10f\n',n,en) 6 e_6 = 2.7182804691
7 end 7 e_7 = 2.7182816941
8 e_8 = 2.7182817983
2.5. Inverse Functions: Exponentials and Logarithms 59
y = loga x ⇔ ay = x. (2.37)
Solution.
1. Solve y = 2x for x:
x = log2 y
2. Exchange x and y:
y = log2 x
Note:
• Equation (2.37) represents the action of “solving for x”
• The domain of y = loga x must be the range of y = ax , which is (0, ∞).
60 Chapter 2. Programming Examples
• The logarithm with base e is called the natural logarithm and has
a special notation:
loge x = ln x (2.38)
Remark 2.47.
• From your calculator, you can see buttons of LN and LOG , which
represent ln = loge and log = log10 , respectively.
• When you implement a code on computers, the functions ln and
log can be called by “log” and “log10”, respectively.
Properties of Logarithms
(a) e5−3x = 3.
(b) log3 x + log3 (x − 2) = 1
(c) ln(ln x) = 0
Solution.
Claim 2.49.
(a) Every exponential function is a power of the natural exponential
function.
ax = ex ln a . (2.42)
Note: You will work on a project, Canny Edge Detection Algorithm For
Color Images, while you are studying the next chapter.
62 Chapter 2. Programming Examples
Hint : You may use the following. You should finish the function area_closed_curve.
Note that the index in Matlab arrays begins with 1, not 0.
heart.m
1 DATA = load('heart-data.txt');
2
3 X = DATA(:,1); Y = DATA(:,2);
4 figure, plot(X,Y,'r-','linewidth',2);
5
6 [m,n] = size(DATA);
7 area = area_closed_curve(DATA);
8
area_closed_curve.m
1 function area = area_closed_curve(data)
2 % compute the area of a region of closed curve
3
4 [m,n] = size(data);
5 area = 0;
6
7 for i=2:m
8 %FILL HERE APPROPRIATELY
9 end
4 gong = audioplayer(x,fs);
5 play(gong)
2.5. The population of Starkville, Mississippi, was 2,689 in the year 1900 and 25,495 in
2020. Assume that the population in Starkville grows exponentially with the model
Pn = P0 · (1 + r)n , (2.44)
where n is the elapsed year and r denotes the growth rate per year.
Hint : Applying the natural log to (2.44) reads log(Pn /P0 ) = n log(1 + r). Dividing it
by n and applying the natural exponential function gives 1 + r = exp(log(Pn /P0 )/n),
where Pn = 25495, P0 = 2689, and n = 120.
Ans: (a) r = 0.018921(= 1.8921%). (c) 2056.
64 Chapter 2. Programming Examples
3
C HAPTER
Contents of Chapter 3
3.1. Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2. Basis Functions and Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3. Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4. Numerical Differentiation: Finite Difference Formulas . . . . . . . . . . . . . . . . . . 93
3.5. Newton’s Method for the Solution of Nonlinear Equations . . . . . . . . . . . . . . . . 99
3.6. Zeros of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
65
66 Chapter 3. Programming with Calculus
3.1. Differentiation
3.1.1. The Slope of the Tangent Line
Example 3.2. If y denotes the distance fallen in feet after t seconds, then
the Galileo’s law of free-fall is
Let t0 = 1.
Solution.
free_fall.m
1 syms f(t) Q(h) %also, views t and h as symbols
2
Difference Quotient at t0 = 1
1 h= -0.10000; Q(h) = 30.40000000
2 h= -0.01000; Q(h) = 31.84000000
3 h= -0.00100; Q(h) = 31.98400000
4 h= -0.00010; Q(h) = 31.99840000
5 h= -0.00001; Q(h) = 31.99984000
6 h= 0.10000; Q(h) = 33.60000000
7 h= 0.01000; Q(h) = 32.16000000
8 h= 0.00100; Q(h) = 32.01600000
9 h= 0.00010; Q(h) = 32.00160000
10 h= 0.00001; Q(h) = 32.00016000
• As h gets closer and closer to 0, the average speed has the limiting
value 32 ft/sec when t0 = 1 sec.
• Thus, the slope of the tangent line is 32.
3.1. Differentiation 69
Solution. Let’s first try to find the slope, as the limit of the difference
quotient.
Definition 3.4.
The slope of the curve y = f (x) at the point P (x0 , f (x0 )) is the number
f (x0 + h) − f (x0 )
=: f 0 (x0 ), provided the limit exists .
lim (3.5)
h→0 h
The tangent line to the curve at P is the line through P with this slope:
y − f (x0 ) = f 0 (x0 ) (x − x0 ). (3.6)
70 Chapter 3. Programming with Calculus
secant_lines_abs_x2_minus_1.m
1 syms f(x) Q(h) %also, views t and h as symbols
2
3 f(x)=abs(x.^2-1); x0=1;
4 figure, fplot(f(x),[x0-3,x0+1.5], 'k-','LineWidth',3)
5 hold on
6
7 Q(h) = (f(x0+h)-f(x0))/h;
8 S(x,h) = Q(h)*(x-x0)+f(x0); % Secant line
9 %%---- Secant Lines, with Various h ------
10 for h0 = [-0.5 -0.25 -0.1 0.1 0.25 0.5]
11 fplot(S(x,h0),[x0-1,x0+1], 'b--','LineWidth',2)
12 plot([x0+h0],[f(x0+h0)],'b.','markersize',25)
13 end
14 plot([x0],[f(x0)],'r.','markersize',35)
15 daspect([1 2 1])
16 axis tight, grid on
17 ax=gca; ax.FontSize=15; ax.GridAlpha=0.5;
18 hold off
19 print -dpng 'secant-y=abs-x2-1.png'
3.1. Differentiation 71
Solution.
f (x) = x ⇒ f 0 (x) = 1
f (x) = x2 ⇒ f 0 (x) = 2x
f (x) = x3 ⇒ f 0 (x) = 3x2
.. ..
. .
f (x) = xn ⇒ f 0 (x) = nxn−1
72 Chapter 3. Programming with Calculus
Solution.
Example 3.11. Use the product rule (3.9) to find the derivative of the
function
f (x) = x6 = x2 · x4
Solution.
Example 3.12. Does the curve y = x4 −2x2 +2 have any horizontal tangent
line? Use the information you just found, to sketch the graph.
Solution.
74 Chapter 3. Programming with Calculus
Ans: (b) (3x − 1)6 (6x + 5) /x6
76 Chapter 3. Programming with Calculus
p = c0 + c1 x + c2 x2 + · · · + cn xn . (3.15)
Example 3.22. Taking all the coefficients to be 1 in (3.18) gives the geo-
metric series ∞
X
xn = 1 + x + x2 + · · · + xn + · · · ,
n=0
which converges to 1/(1 − x) only if |x| < 1. That is,
1
= 1 + x + x2 + · · · + xn + · · · , |x| < 1. (3.20)
1−x
3.2. Basis Functions and Taylor Series 79
Theorem
X 3.24. The Ratio Test:
Let an be any series and suppose that
an+1
lim = ρ. (3.21)
n→∞ an
P
(a) If ρ < 1, then the series converges absolutely. ( |an | converges)
(b) If ρ > 1, then the series diverges.
(c) If ρ = 1, then the test is inconclusive.
Example 3.25. For what values of x do the following power series con-
verge?
∞ n ∞
X
n−1 x x2 x3 X xn x2 x3
(a) (−1) = x− + −··· (b) = 1+x+ + + ···
n=1
n 2 3 n=0
n! 2! 3!
Solution.
∞
X
f (x) = cn (x − a)n on |x − a| < R. (3.22)
n=0
This function f has derivatives of all orders inside the interval, and we
obtain the derivatives by differentiating the original series term by term:
∞
X
0
f (x) = ncn (x − a)n−1 ,
n=1
∞
X (3.23)
f 00 (x) = n(n − 1)cn (x − a)n−2 ,
n=2
and so on. Each of these derived series converges at every point of the
interval a − R < x < a + R.
Series Representations
• Thus, when x = a,
Example 3.31. Find the Taylor series and Taylor polynomials generated
by f (x) = cos x at x = 0.
Solution. The cosine and its derivatives are
f (x) = cos x f 0 (x) = − sin x
f 00 (x) = − cos x f (3) (x) = sin x
.. ..
. .
(2n) n (2n+1)
f (x) = (−1) cos x f (x) = (−1)n+1 sin x.
At x = 0, the cosines are 1 and the sines are 0, so
f (2n) (0) = (−1)n , f (2n+1) (0) = 0. (3.30)
The Taylor series generated by cos x at x = 0 is
1 0 1 x2 x4 x6
1 + 0 · x − x2 + x3 + x4 + · · · = 1 − + − + ··· (3.31)
2! 3! 4! 2! 4! 6!
Note: The interval of convergence can be verified using e.g. the ratio
test, presented in Theorem 3.24, p. 79.
sin x
Self-study 3.32. Plot the sinc function f (x) = and its Taylor poly-
x
nomials of order 4, 6, and 8, about x = 0.
Solution. Hint : Use e.g., syms x; T4 = taylor(sin(x)/x,x,0,’Order’,5). Here “Or-
der” means the leading order of truncated terms.
Taylor Polynomials
Definition 3.33. Let f be a function with derivatives of order k =
1, 2, · · · , N in some interval containing a as an interior point. Then for
any integer n from 0 through N , the Taylor polynomial of order n
generated by f at x = a is the polynomial
Example 3.35. Let f (x) = cos(x) and x0 = 0. Determine the second and
third Taylor polynomials for f about x0 .
Maple-code
1 f := x -> cos(x):
2 fp := x -> -sin(x):
3 fpp := x -> -cos(x):
4 fp3 := x -> sin(x):
5 fp4 := x -> cos(x):
6
18 # On the other hand, you can find the Taylor polynomials easily
19 # using built-in functions in Maple:
20 s3 := taylor(f(x), x = 0, 4);
21 = 1 - 1/2 x^2 + O(x^4)
22 convert(s3, polynom);
23 = 1 - 1/2 x^2
3.2. Basis Functions and Taylor Series 85
Figure 3.4: f (x) = cos x and its third Taylor polynomial P3 (x).
equivalently,
f (b) − f (a)
f 0 (c) = , for some c ∈ (a, b), (3.36)
b−a
which is the Mean Value Theorem.
Example 3.38. Find the interpolating polynomial p2 passing (−2, 3), (0, −1),
and (1, 0).
Solution.
3.3. Polynomial Interpolation 87
where Ln,k (x) are basis polynomials that depend on the nodes
x0 , x1 , · · · , xn , but not on the ordinates y0 , y1 , · · · , yn .
See Definition 3.19, p.77, for basis.
For example, for {(x0 , y0 ), (x1 , y1 ), (x2 , y2 )}, the Lagrange form of interpolat-
ing polynomial reads
• On the other hand, the polynomial pn interpolating the data must sat-
isfy pn (xj ) = δij , where δij is the Kronecker delta
(
1 if i = j,
δij =
0 if i 6= j.
and therefore
1
c= . (3.45)
(x0 − x1 )(x0 − x2 ) · · · (x0 − xn )
Hence, we have
n
(x − x1 )(x − x2 ) · · · (x − xn ) Y (x − xj )
Ln,0 (x) = = . (3.46)
(x0 − x1 )(x0 − x2 ) · · · (x0 − xn ) j=1 (x0 − xj )
1 1 1
Ans: p2 = (x − 4)(x − 5) − (x − 2)(x − 5) + (x − 2)(x − 4)
12 8 15
Lagrange_interpol.py
1 import sympy
2
3 def Lagrange(Lx,Ly):
4 X=sympy.symbols('X')
5 if len(Lx)!= len(Ly):
6 print("ERROR"); return 1
7 p=0
8 for k in range(len(Lx)):
9 t=1
10 for j in range(len(Lx)):
11 if j != k:
12 t *= ( (X-Lx[j])/(Lx[k]-Lx[j]) )
13 p += t*Ly[k]
14 return p
15
16 if __name__ == "__main__":
17 Lx=[2,4,5]; Ly=[1/2,1/4,1/5]
18 p2 = Lagrange(Lx,Ly)
19 print(p2); print(sympy.simplify(p2))
Output
1 [Tue Aug.29] python Lagrange_interpol.py
2 0.5*(5/3 - X/3)*(2 - X/2) + 0.25*(5 - X)*(X/2 - 1) + 0.2*(X/3 - 2/3)*(X - 4)
3 0.025*X**2 - 0.275*X + 0.95
90 Chapter 3. Programming with Calculus
where
M = max |f (n+1) (ξ)|.
ξ∈[a,b]
Start by picking an x. We can assume that x is not one of the nodes, because
otherwise the product in question is zero. Let x ∈ (xj , xj+1 ), for some j. Then
we have
h2
|x − xj | · |x − xj+1 | ≤ . (3.51)
4
Now note that
(
(j + 1 − i)h for i < j
|x − xi | ≤ (3.52)
(i − j)h for j + 1 < i.
Thus n
Y h2
|x − xi | ≤ [(j + 1)! hj ] [(n − j)! hn−j−1 ]. (3.53)
j=1
4
Since (j + 1)!(n − j)! ≤ n!, we can reach the following bound
n
Y 1
|x − xi | ≤ hn+1 n!. (3.54)
j=1
4
4 a,b=-1,1; n=5
5
6 f = sin(x)
7 print( diff(f,x,n+1) )
8
9 h = (b-a)/n
10 M = abs(-sin(1.));
11
12 err_bound = h**(n+1)/(4*(n+1)) *M
13 print(err_bound)
Output
1 -sin(x)
2 0.000143611048073881
f (xi + h) − f (xi )
f 0 (xi ) ≈ Dx+ f (xi ) = , (forward-difference)
h
f (xi ) − f (xi − h)
f 0 (xi ) ≈ Dx− f (xi ) = . (backward-difference)
h
(3.60)
94 Chapter 3. Programming with Calculus
3 h := 0.1:
4 (f(x0 + h) - f(x0))/h
5 3.310000000
6 h := 0.05:
7 (f(x0 + h) - f(x0))/h
8 3.152500000
9 h := 0.025:
10 (f(x0 + h) - f(x0))/h
11 3.075625000
Hence,
n n
0
X f (n+1) (ξ) Y
f (xi ) = f (xk )L0n,k (xi ) + (xi − xk ), (3.63)
(n + 1)!
k=0 k=0,k6=i
f (x) = f (x0 )L2,0 (x) + f (x1 )L2,1 (x) +f (x2 )L2,2 (x)
2
f (3) (ξ) Y (3.64)
+ (x − xk ),
3!
k=0
Replace x by x0 + h:
3.4. Numerical Differentiation: Finite Difference Formulas 97
Example 3.56. Use the Taylor series to derive the midpoint formula
f−1 − 2f0 + f1
f 00 (x0 ) =
h2 (3.71)
2
h (4) h4 (6) h6 (8)
− f (x0 ) − f (x0 ) − f (x0 ) − · · ·
12 360 20160
f (p) = 0. (3.74)
Graphical interpretation
• Let p0 be the initial approximation close to p. Then, the tangent line
at (p0 , f (p0 )) reads
L(x) = f 0 (p0 )(x − p0 ) + f (p0 ). (3.79)
Remark 3.60.
• The Newton’s method may diverge, unless the initialization is accu-
rate.
• It cannot be continued if f 0 (pn−1 ) = 0 for some n. As a matter of fact,
the Newton’s method is most effective when f 0 (x) is bounded away
from zero near p.
Since p = 0, en = pn and
|en | ≤ 0.67|en−1 |3 , (3.85)
which is an occasional super-convergence.
Theorem 3.63. Newton’s Method for a Convex Function
Let f ∈ C 2 (R) be increasing, convex, and of a zero p. Then, the zero p is
unique and the Newton iteration converges to p from any starting point.
Example 3.64. Use the Newton’s method to find the square root of a
positive number Q.
√
Solution. Let x = Q. Then x is a root of x2 − Q = 0. Define f (x) = x2 − Q;
set f 0 (x) = 2x. The Newton’s method reads
f (pn−1 ) p2n−1 − Q 1 Q
pn = pn−1 − 0 = pn−1 − = pn−1 + . (3.86)
f (pn−1 ) 2pn−1 2 pn−1
mysqrt.m
1 function x = mysqrt(q)
2 %function x = mysqrt(q)
3
4 x = (q+1)/2;
5 for n=1:10
6 x = (x+q/x)/2;
7 fprintf('x_%02d = %.16f\n',n,x);
8 end
3.5. Newton’s Method for the Solution of Nonlinear Equations 103
Results
1 >> mysqrt(16); 1 >> mysqrt(0.1);
2 x_01 = 5.1911764705882355 2 x_01 = 0.3659090909090910
3 x_02 = 4.1366647225462421 3 x_02 = 0.3196005081874647
4 x_03 = 4.0022575247985221 4 x_03 = 0.3162455622803890
5 x_04 = 4.0000006366929393 5 x_04 = 0.3162277665175675
6 x_05 = 4.0000000000000506 6 x_05 = 0.3162277660168379
7 x_06 = 4.0000000000000000 7 x_06 = 0.3162277660168379
8 x_07 = 4.0000000000000000 8 x_07 = 0.3162277660168379
9 x_08 = 4.0000000000000000 9 x_08 = 0.3162277660168379
10 x_09 = 4.0000000000000000 10 x_09 = 0.3162277660168379
11 x_10 = 4.0000000000000000 11 x_10 = 0.3162277660168379
• Localization of Roots:
All roots of the polynomial P lie in the open disk centered at the origin
and of radius of
1
ρ=1+ max |ai |. (3.89)
|an | 0≤i<n
• Uniqueness of Polynomials:
Let P (x) and Q(x) be polynomials of degree n. If x1 , x2 , · · · , xr , with
r > n, are distinct numbers with P (xi ) = Q(xi ), for i = 1, 2, · · · , r, then
P (x) = Q(x) for all x.
– For example, two polynomials of degree n are the same if they
agree at (n + 1) points.
3.6. Zeros of Polynomials 105
• Substituting the above into (3.90), utilizing (3.87), and setting equal
the coefficients of like powers of x on the two sides of the resulting
equation, we have
bn = an
bn−1 = an−1 + x0 bn
.. (3.92)
.
b1 = a1 + x 0 b 2
P (x0 ) = a0 + x0 b1
• Introducing b0 = P (x0 ), the above can be rewritten as
reads
P 0 (x) = Q(x) + (x − x0 )Q0 (x). (3.96)
Thus
P 0 (x0 ) = Q(x0 ). (3.97)
That is, the evaluation of Q at x0 becomes the desired quantity P 0 (x0 ).
3.6. Zeros of Polynomials 107
Example 3.71. Evaluate P 0 (3) for P (x) considered in Example 3.68, the
previous example.
Solution. As in the previous example, we arrange the calculation and carry
out the synthetic division one more time:
5 n = size(A(:),1);
6 p = A(n); d=0;
7
8 for i = n-1:-1:1
9 d = p + x0*d;
10 p = A(i) +x0*p;
11 end
Call_horner.m
1 a = [-2 -5 7 -4 1];
2 x0=3;
3 [p,d] = horner(a,x0);
4 fprintf(" P(%g)=%g; P'(%g)=%g\n",x0,p,x0,d)
5 Result: P(3)=19; P'(3)=37
108 Chapter 3. Programming with Calculus
5 x = x0;
6 for it=1:itmax
7 [p,d] = horner(A,x);
8 h = -p/d;
9 x = x + h;
10 if(abs(h)<tol), break; end
11 end
Call_newton_horner.m
1 a = [-2 -5 7 -4 1];
2 x0=3;
3 tol = 10^-12; itmax=1000;
4 [x,it] = newton_horner(a,x0,tol,itmax);
5 fprintf(" newton_horner: x0=%g; x=%g, in %d iterations\n",x0,x,it)
6 Result: newton_horner: x0=3; x=2, in 7 iterations
Figure 3.7: Polynomial P (x) = x4 − 4x3 + 7x2 − 5x − 2. Its two zeros are −0.275682 and 2.
3.6. Zeros of Polynomials 109
3.1. In Example 3.5, we considered the curve y = |x2 − 1|. Find the left-hand limit and
right-hand slope of the difference quotient at x0 = 1.
Ans: −2 and 2.
3.2. The number e is determined so that the slope of the graph of y = ex at x = 0 is exactly
1. Let h be a point near 0. Then
eh − e0 eh − 1
Q(h) := =
h−0 h
represents the average slope of the graph between the two points (0, 1) and (h, eh ).
Evaluate Q(h), for h = 0.1, 0.01, 0.001, 0.0001. What can you say about the results?
Ans: For example, Q(0.01) = 1.0050.
√
3.3. Recall the Taylor series for ex , cos x and sin x in (3.32). Let x = iθ, where i = −1.
Then
i2 θ2 i3 θ3 i4 θ4 i5 θ5 i6 θ6
eiθ = 1 + iθ + + + + + + ··· (3.98)
2! 3! 4! 5! 6!
(a) Prove that eiθ = cos θ + i sin θ, which is called the Euler’s identity.
(b) Prove that eiπ + 1 = 0.
• Use fimplicit
• Visualize, with ylim([-2*pi 4*pi]), yticks(-pi:pi:3*pi)
3.8. Let f (x) = cos x + sin x be defined on on the interval [−1, 1].
(a) How many equally spaced nodes are required to interpolate f to within 10−8 on
the interval?
(b) Evaluate the interpolating polynomial at the midpoint of a subinterval and
verify that the error is not larger than 10−8 .
hn+1
Hint : (a). Recall the formula: |f (x) − Pn (x)| ≤ M . Then, for n, solve
4(n + 1)
(2/n)n+1 √
2 ≤ 10−8 .
4(n + 1)
3.9. Use the most accurate three-point formulas to determine the missing entries.
x f (x) f 0 (x) f 00 (x)
1.0 2.0000 6.00
1.2 1.7536
1.4 1.9616
1.6 2.8736
1.8 4.7776
2.0 8.0000
2.2 12.9056 52.08
3.6. Zeros of Polynomials 111
Contents of Chapter 4
4.1. Solutions of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.2. Row Reduction and the General Solution of Linear Systems . . . . . . . . . . . . . . . 119
4.3. Linear Independence and Span of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4. Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
113
114 Chapter 4. Linear Algebra Basics
a1 x1 + a2 x2 + · · · + an xn = b, (4.2)
• Solution Set: The set of all possible solutions is called the solution
set of the linear system.
• Equivalent System: Two linear systems are called equivalent if
they have the same solution set.
For example, Example 4.2 (a) is equivalent to
(
2x1 − 4x2 = −2
R1 ← R1 − R2
2x1 + 3x2 = 5
4.1. Solutions of Linear Systems 115
Solving (4.3) :
1 ↔ 2 : (interchange)
( " #
x1 + 2x2 = 4 1 1 2 4
−2x1 + 3x2 = −1 2 −2 3 −1
2 ← 2 +2· 1 : (replacement)
( " #
x1 + 2x2 = 4 1 1 2 4
7x2 = 7 2 0 7 7
2 ← 2 /7: (scaling)
( " #
x1 + 2x2 = 4 1 1 2 4
x2 = 1 2 0 1 1
1 ← 1 −2· 2 : (replacement)
( " #
x1 = 2 1 1 0 2
x2 = 1 2 0 1 1
Example 4.8. x = [x1 , x2 ]T = [−3, 2]T is the solution of the linear system.
which, in turn, has the same solution set as the system with augmented
matrix
[a1 a2 · · · an : b]. (4.9)
Example 4.10. Determine the values of h such that the given system is a
consistent linear system
x + h y = −5
2x − 8y = 6
Solution.
Ans: h 6= −4
4.2. Row Reduction and the General Solution of Linear Systems 119
x2 − 2x3 = 0
x1 − 2x2 + 2x3 = 3
4x1 − 8x2 + 6x3 = 14
Solution.
4 Ab = [A b];
5 rref(Ab)
Result
1 ans =
2 1 0 0 1
3 0 1 0 -2
4 0 0 1 -1
120 Chapter 4. Linear Algebra Basics
Example 4.13. Verify whether the following matrices are in echelon form,
row reduced echelon form.
1 0 2 0 1 2 0 0 5
(a) 0 1 3 0 4 (b) 0 0 0 9
0 0 0 0 0 0 1 0 6
1 1 0 1 1 2 2 3
(c) 0 0 1 (d) 0 0 1 1 1
0 0 0 0 0 0 0 4
1 0 0 5 0 1 0 5
(e) 0 1 0 6 (f) 0 0 0 6
0 0 0 1 0 0 1 2
Solution.
4.2. Row Reduction and the General Solution of Linear Systems 121
Terminologies
1) A pivot position is a location in A that corresponds to a leading 1
in the reduced echelon form of A.
2) A pivot column is a column of A that contains a pivot position.
Example 4.14. The matrix A is given with its reduced echelon form. Find
the pivot positions and pivot columns of A.
1 1 0 2 0 1 1 0 2 0
R.E.F
A = 1 1 1 3 0 −−−→ 0 0 1 1 0
1 1 0 2 4 0 0 0 0 1
Solution.
Terminologies
3) Basic variables: In the system Ax = b, the variables that corre-
spond to pivot columns (in [A : b]) are basic variables.
4) Free variables: In the system Ax = b, the variables that correspond
to non-pivotal columns are free variables.
Example 4.15. For the system of linear equations, identify its basic vari-
ables and free variables.
−x1 − 2x2
= −3
2x3 = 4
3x3 = 6
Solution. Hint : You may start with its augmented matrix, and apply row operations.
122 Chapter 4. Linear Algebra Basics
where {x1 , x2 } are basic vari- in which you are free to choose
ables (∵ pivots). any value for x3 . (That is why it
3) Rewrite (4.11) as is called a “free variable”.)
x1 = 1 +5 x3
x2 = 4 −x3 (4.12)
x
3 is free
Example 4.19. Find the general solution of the system whose augmented
matrix is
1 0 −5 0 −8 3
0 1 4 −1 0 6
[A|b] =
0 0 0 0 1 0
0 0 0 0 0 0
Solution. Hint : You should first row reduce it for the reduced echelon form.
1 0 −1 0 4
Solution.
linear_equations_rref.m
1 Ab = [1 0 0 1 7; 0 1 3 0 -1; 2 -1 -3 2 15; 1 0 -1 0 4];
2 rref(Ab)
Result
1 ans =
2 1 0 0 1 7
3 0 1 0 -3 -10
4 0 0 1 1 3
5 0 0 0 0 0
True-or-False 4.22.
a. The row reduction algorithm applies to only to augmented matrices for
a linear system.
b. If one row in an echelon form of an augmented matrix is [0 0 0 0 2 0],
then the associated linear system is inconsistent.
c. The pivot positions in a matrix depend on whether or not row inter-
changes are used in the row reduction process.
d. Reducing a matrix to an echelon form is called the forward phase of
the row reduction process.
Solution.
Ans: F,F,F,T
126 Chapter 4. Linear Algebra Basics
x1 v1 + x2 v2 + · · · + xp vp = 0 (4.15)
c1 v1 + c2 v2 + · · · + cp vp = 0. (4.16)
Span{v1 , v2 , · · · , vp } = {y | y = c1 v1 + c2 v2 + · · · + cp vp } (4.17)
True-or-False 4.30.
a. The columns of any 3 × 4 matrix are linearly dependent.
b. If u and v are linearly independent, and if {u, v, w} is linearly depen-
dent, then w ∈ Span{u, v}.
c. Two vectors are linearly dependent if and only if they lie on a line
through the origin.
d. The columns of a matrix A are linearly independent, if the equation
Ax = 0 has the trivial solution.
Ans: T,T,T,F
4.4. Invertible Matrices 129
0 1 0
Self-study 4.36. Use pencil-and-paper to find the inverse of A = 1 0 3,
4 −3 8
if it exists.
Solution.
When it is implemented:
inverse_matrix.m
1 A = [0 1 0
2 1 0 3
3 4 -3 8];
4 I = eye(3);
5
6 AI = [A I];
7 rref(AI)
Result
1 ans =
2 1.0000 0 0 2.2500 -2.0000 0.7500
3 0 1.0000 0 1.0000 0 0
4 0 0 1.0000 -0.7500 1.0000 -0.2500
4.4. Invertible Matrices 131
1 4 8 1
Example 4.38. If A = 0 −2 −1 3, then AT =
9 0 0 5
4.1. Find the general solutions of the systems (in parametric vector form) whose aug-
mented matrices are given as
1 −7
0 6 5 1 2 −5 −6 0 −5
(a) 0 0 1 −2 −3 0 1 −6 −3 0
2
(b)
−1 7 −4 2 7 0 0 0 0 1 0
0 0 0 0 0 0
Ans: (a) x = [5, 0, −3, 0]T + x2 [7, 1, 0, 0]T + x4 [−6, 0, 2, 1]T ;
Ans: (b) x = [−9, 2, 0, 0, 0]T + x3 [−7, 6, 1, 0, 0]T + x4 [0, 3, 0, 1, 0]T .
4.2. In the following, we use the notation for matrices in echelon form: the leading entries
with , and any values (including zero) with ∗. Suppose each matrix represents the
augmented matrix for a system of linear equations. In each case, determine if the
system is consistent. If the system is consistent, determine if the solution is unique.
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
0
(a) 0 ∗ ∗ (b) 0 0 ∗ ∗ (c) 0 0 ∗ ∗
0 0 ∗ 0 0 0 0 0 0 0 ∗
4.3. Suppose the coefficient matrix of a system of linear equations has a pivot position in
every row. Explain why the system is consistent.
4.4. (a) For what values of h is v3 in Span{v1 , v2 }, and (b) for what values of h is {v1 , v2 , v3 }
linearly dependent? Justify each answer.
−3
1 5
v1 = −3, v2 = 9, v3 = −7.
2 −6 h
Ans: (a) No h; (b) All h
1 −2
1
3 −4
4.5. Find the inverses of the matrices, if exist: A = and B = 4 −7 3
7 −8
−2 6 −4
Ans: B is not invertible.
4.6. Describe the possible echelon forms of the matrix. Use the notation of Exercise 2
above.
(a) A is a 3 × 3 matrix with linearly independent columns.
(b) A is a 2 × 2 matrix with linearly dependent columns.
(c) A is a 4 × 2 matrix, A = [a1 , a2 ] and a2 is not a multiple of a1 .
Contents of Chapter 5
5.1. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.2. Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.3. Dot Product, Length, and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.4. Vector Norms, Matrix Norms, and Condition Numbers . . . . . . . . . . . . . . . . . . 151
5.5. Power Method and Inverse Power Method for Eigenvalues . . . . . . . . . . . . . . . . 155
Exercises for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
135
136 Chapter 5. Programming with Linear Algebra
5.1. Determinants
Definition 5.1. Let A be an n × n square matrix. Then the determi-
nant of A is a scalar value, denoted by det A or |A|.
1) Let A = [a] ∈ R1 × 1 . Then det A = a.
" #
a b
2) Let A = ∈ R2 × 2 . Then det A = ad − bc.
c d
" #
2 1
Example 5.2. Let A = . Consider a linear transformation T : R2 → R2
0 3
defined by T (x) = Ax.
Ans: (c) 12
Note: The determinant can be viewed as a volume scaling factor.
5.1. Determinants 137
Ans: −2
138 Chapter 5. Programming with Linear Algebra
determinant.m
1 A = [1 -2 5 2; 0 -6 -7 5; 0 0 3 0; 0 0 0 4];
2 det(A)
Result
1 ans =
2 -72
Remark 5.7. The matrix A in Example 5.6 has a pivit position in each
column ⇒ It is invertible.
5.1. Determinants 139
Properties of Determinants
1 −4 2
Example 5.9. Compute det A, where A = −2 8 −9, after applying
−1 7 0
some elementary row operations.
Solution.
Ans: 15
140 Chapter 5. Programming with Linear Algebra
1
c) If A is invertible, then det A−1 = . (∵ det In = 1.)
det A
Ans: −30
5.2. Eigenvalues and Eigenvectors 141
" #
1 6
Example 5.15. Let A = . Show that 7 is an eigenvalue of matrix A,
5 2
and find the corresponding eigenvectors.
Solution. Hint : Start with Ax = 7x. Then (A − 7I)x = 0.
142 Chapter 5. Programming with Linear Algebra
4 polyA = charpoly(A,x)
5 eigenA = solve(polyA)
6 [P,D] = eig(A) % A*P = P*D
7 P*D*inv(P)
Results
1 polyA =
2 12 - 4*x - 3*x^2 + x^3
3
4 eigenA =
5 -2
6 2
7 3
8
9 P =
10 0.4472 -0.3162 -0.6155
11 0.8944 0.9487 -0.6155
12 0 0 0.4924
13 D =
14 3 0 0
15 0 -2 0
16 0 0 2
17
18 ans =
19 1.0000 1.0000 -0.0000
20 6.0000 0.0000 5.0000
21 0 0 2.0000
144 Chapter 5. Programming with Linear Algebra
A = P BP −1 , or equivalently, P −1 AP = B.
The next theorem illustrates one use of the characteristic polynomial, and
it provides the foundation for several iterative methods that approximate
eigenvalues.
Theorem 5.22. If n × n matrices A and B are similar, then they have
the same characteristic polynomial and hence the same eigenval-
ues (with the same multiplicities).
Proof. B = P −1 AP . Then,
B − λI = P −1 AP − λI
= P −1 AP − λP −1 P
= P −1 (A − λI)P,
Diagonalization
Definition 5.23. An n × n matrix A is said to be diagonalizable if
there exists an invertible matrix P and a diagonal matrix D such that
A = P DP −1 (or P −1 AP = D) (5.6)
A2 = (P DP −1 )(P DP −1 ) = P D2 P −1
Ak = P Dk P −1
(5.7)
A−1 = P D−1 P −1 (when A is invertible)
det A = det D
2 · 5k − 3k 5k − 3k
k
Ans: A =
2 · 3k − 2 · 5k 2 · 3k − 5k
146 Chapter 5. Programming with Linear Algebra
P = [v1 v2 · · · vn ],
λ1 0 · · · 0
0 λ ··· 0 (5.8)
2
D = diag(λ1 , λ2 , · · · , λn ) = .. .. . . ,
. . .
. ..
0 0 · · · λn
where Avk = λk vk , k = 1, 2, · · · , n.
while
λ1 0 · · · 0
0 λ ··· 0
2
P D = [v1 v2 · · · vn ] .. .. . . = [λ1 v1 λ2 v2 · · · λn vn ]. (5.10)
. . .
. ..
0 0 · · · λn
(⇒ ) Now suppose A is diagonalizable and A = P DP −1 . Then we have
AP = P D; it follows from (5.9) and (5.10) that
Avk = λk vk , k = 1, 2, · · · , n. (5.11)
Solution.
1. Find the eigenvalues of A.
2. Find three linearly independent eigenvectors of A.
3. Construct P from the vectors in step 2.
4. Construct D from the corresponding eigenvalues.
Check: AP = P D?
−1 −1
1
Ans: λ = 1, −2, −2. v1 = −1 , v2 =
1 , v3 =
0
1 0 1
diagonalization.m
1 A = [1 3 3; -3 -5 -3; 3 3 1];
2 [P,D] = eig(A) % A*P = P*D
3 P*D*inv(P)
Results
1 P =
2 -0.5774 -0.7876 0.4206
3 0.5774 0.2074 -0.8164
4 -0.5774 0.5802 0.3957
5 D =
6 1.0000 0 0
7 0 -2.0000 0
8 0 0 -2.0000
9
10 ans =
11 1.0000 3.0000 3.0000
12 -3.0000 -5.0000 -3.0000
13 3.0000 3.0000 1.0000
Distance in Rn
Definition 5.33. For u, v ∈ Rn , the distance between u and v is
Example 5.34. Compute the distance between the vectors u = (7, 1) and
v = (3, 2).
Solution.
Orthogonal Vectors
Definition 5.35. Two vectors u and v in Rn are orthogonal if u•v = 0.
Solution.
5.4. Vector Norms, Matrix Norms, and Condition Numbers 151
Example 5.40. Let x = [4, 2, −2, −4, 3]T . Find ||x||p , for p = 1, 2, ∞.
Solution.
Note: In general, kxk∞ ≤ kxk2 ≤ kxk1 for all x ∈ Rn ; see Exercise 5.5.
152 Chapter 5. Programming with Linear Algebra
Matrix Norms
Definition 5.41. A matrix norm on m × n matrices is a vector norm
on the mn-dimensional space, satisfying
kAk ≥ 0, and kAk = 0 ⇔ A = 0 (positive definiteness)
kλ Ak = |λ| kAk (homogeneity) (5.22)
kA + Bk ≤ kAk + kBk (triangle inequality)
X 1/2 q
2
Example 5.42. kAkF ≡ |aij | = tr(AAT ) is called the Frobenius
i,j
norm. Here “tr(B)” is the trace of a square matrix B, the sum of elements
on the main diagonal.
Definition 5.43. Once a vector norm || · || has been specified, the in-
duced matrix norm is defined by
kAxk
kAk = max = max kAxk. (5.23)
x6=0 kxk ||x||=1
Theorem 5.44.
(a) For all operator norms and the Frobenius norm,
Solution.
kAxk2
kAk2 = max = max kAxk2 = max |yT Ax|. (5.26)
x6=0 kxk2 ||x||2 =1 ||x||2 =||y||2 =1
Proof.
(a) The claim follows from the fact that yT Ax is a scalar and therefore
(yT Ax)T = xT AT y and |yT Ax| = |xT AT y|.
(b) Using the Cauchy-Schwarz inequality,
The power method approximates the largest eigenvalue λ1 and its asso-
ciated eigenvector v1 .
• In general,
n
X
k
A x = βj λkj vj , k = 1, 2, · · · , (5.33)
j=1
which gives
n
X λ k h λ k λ k λ k i
k j 1 2 n
A x = λk1 · βj vj = λk1 · β1 v1 + β2 v2 + · · · + βn vn . (5.34)
j=1
λ1 λ1 λ1 λ1
• For j = 2, 3, · · · , n, since |λj /λ1 | < 1, we have lim |λj /λ1 |k = 0, and
k→∞
initialization : x0 = x/||x||∞
for k = 1, 2, · · ·
yk = Axk−1 ; µk = ||yk ||∞ (5.36)
xk = yk /µk
end for
8 x = [1 0 0]';
9 fmt = ['k=%2d: x=[',repmat('%.5f, ',1,numel(x)-1),'%.5f], ',...
10 'mu=%.5f (error=%.7f)\n'];
11
12 for k=1:10
13 y = A*x;
14 [~,ind] = max(abs(y)); mu = y(ind);
15 x =y/mu;
16 fprintf(fmt,k,x,mu,abs(evalues(1)-mu))
17 end
power_iteration.py
1 import numpy as np;
2 np.set_printoptions(suppress=True)
3
4 A = np.array([[5,-2,2],[-2,3,-4],[2,-4,3]])
5 evalues, EVectors = np.linalg.eig(A)
6
12 print('evalues=',evalues)
13 print('EVectors=\n',EVectors)
14
15 x = np.array([1,0,0]).T
16 for k in range(10):
17 y = A.dot(x)
18 ind = np.argmax(np.abs(y)); mu = y[ind]
19 x = y/mu
20 print('k=%2d; x=[%.5f, %.5f, %.5f]; mu=%.5f (error=%.7f)'
21 %(k,*x,mu,np.abs(evalues[0]-mu)) );
The results are the same; here is the output from the Matlab code.
Output from power_iteration.m
1 evalues =
2 9.0000 3.0000 -1.0000
3
4 V =
5 1.0000e+00 1.0000e+00 3.9252e-17
6 -1.0000e+00 5.0000e-01 1.0000e+00
7 1.0000e+00 -5.0000e-01 1.0000e+00
8
1 1
Notice that |9 − µk | ≈ |9 − µk−1 |, for which |λ2 /λ1 | = .
3 3
5.5. Power Method and Inverse Power Method for Eigenvalues 159
Avi = λi vi , i = 1, 2, · · · , n. (5.39)
Thus, we obtain
1
(A − qI)−1 vi = vi . (5.41)
λi − q
• That is, when q 6∈ {λ1 , λ2 , · · · , λn }, the eigenvalues of (A − qI)−1 are
1 1 1
, , ··· , , (5.42)
λ1 − q λ2 − q λn − q
with the same eigenvectors {v1 , v2 , · · · , vn } of A.
8 x = [1 0 0]';
9 fmt = ['k=%2d: x = [',repmat('%.5f, ',1,numel(x)-1),'%.5f], ',...
10 'lambda=%.7f (error = %.7f)\n'];
11
12 q = 4; B = inv(A-q*eye(3));
13 for k=1:10
14 y = B*x;
15 [~,ind] = max(abs(y)); mu = y(ind);
16 x =y/mu;
17 lambda = 1/mu + q;
18 fprintf(fmt,k,x,lambda,abs(evalues(2)-lambda))
19 end
inverse_power.py
1 import numpy as np;
2 np.set_printoptions(suppress=True)
3
4 A = np.array([[5,-2,2],[-2,3,-4],[2,-4,3]])
5 evalues, EVectors = np.linalg.eig(A)
6
12 print('evalues=',evalues)
13 print('EVectors=\n',EVectors)
14
15 q = 4; x = np.array([1,0,0]).T
5.5. Power Method and Inverse Power Method for Eigenvalues 161
16 B = np.linalg.inv(A-q*np.identity(3))
17 for k in range(10):
18 y = B.dot(x)
19 ind = np.argmax(np.abs(y)); mu = y[ind]
20 x = y/mu
21 Lambda = 1/mu + q
22 print('k=%2d; x=[%.5f, %.5f, %.5f]; Lambda=%.7f (error=%.7f)'
23 %(k,*x,Lambda,np.abs(evalues[1]-Lambda)) );
Multivariable Calculus
Contents of Chapter 6
6.1. Multi-Variable Functions and Their Partial Derivatives . . . . . . . . . . . . . . . . . . 164
6.2. Directional Derivatives and the Gradient Vector . . . . . . . . . . . . . . . . . . . . . . 168
6.3. Optimization: Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . 173
6.4. The Gradient Descent Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Exercises for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
163
164 Chapter 6. Multivariable Calculus
√
Ans: f (3, 2) = 6/2; D = {(x, y) : x + y + 1 ≥ 0, x 6= 1}
Example 6.4. Find the domain and the range of
p
f (x, y) = 9 − x2 − y 2 .
Solution.
6.1. Multi-Variable Functions and Their Partial Derivatives 165
Figure 6.1: Ordinary derivative f 0 (a) and partial derivatives fx (a, b) and fy (a, b).
fx = ∂f /∂x
Let f be a function of two variables (x, y). Suppose we let only x vary while
keeping y fixed, say y = b . Then g(x) := f (x, b) is a function of a single
variable. If g is differentiable at a, then we call it the partial derivative
of f with respect to x at (a, b) and denoted by fx (a, b).
g(a + h) − g(a)
g 0 (a) = lim
h→0 h
(6.1)
f (a + h, b) − f (a, b)
= lim =: fx (a, b).
h→0 h
166 Chapter 6. Multivariable Calculus
fy = ∂f /∂y
Similarly, the partial derivative of f with respect to y at (a, b), denoted
by fy (a, b), is obtained keeping x fixed, say x = a , and finding the ordinary
derivative at b of G(y) := f (a, y) :
G(b + h) − G(b)
G0 (b) = lim
h→0 h
(6.2)
f (a, b + h) − f (a, b)
= lim =: fy (a, b).
h→0 h
p
Example 6.5. Find fx (0, 0), when f (x, y) = 3
x3 + y 3 .
Solution. Using the definition,
f (h, 0) − f (0, 0)
fx (0, 0) = lim
h→0 h
Ans: 1
Definition 6.6. If f is a function of two variables, its partial deriva-
tives are the functions fx = ∂f ∂f
∂x and fy = ∂y defined by:
∂f f (x + h, y) − f (x, y)
fx (x, y) = (x, y) = lim and
∂x h→0 h
(6.3)
∂f f (x, y + h) − f (x, y)
fy (x, y) = (x, y) = lim .
∂y h→0 h
6.1. Multi-Variable Functions and Their Partial Derivatives 167
Example 6.8. If f (x, y) = x3 +x2 y 3 −2y 2 , find fx (2, 1), fy (2, 1), and fxy (2, 1).
Solution.
Note that
f (x0 + ha, y0 + hb) − f (x0 , y0 ) = f (x0 + ha, y0 + hb) − f (x0 , y0 + hb)
+ f (x0 , y0 + hb) − f (x0 , y0 )
Thus
f (x0 + ha, y0 + hb) − f (x0 , y0 ) f (x0 + ha, y0 + hb) − f (x0 , y0 + hb)
= a
h ha
f (x0 , y0 + hb) − f (x0 , y0 )
+ b ,
hb
which converges to “a fx (x0 , y0 ) + b fy (x0 , y0 )" as h → 0.
6.2. Directional Derivatives and the Gradient Vector 169
Figure 6.3
√
Ans: 65 2
170 Chapter 6. Multivariable Calculus
√
Ans: (1 + 3)/2
Example 6.14. (Functions of Three Variables).
If f (x, y, z) = x2 − 2y 2 + z 4 , find the directional derivative of f at (1, 3, 1) in
the direction of v = h2, −2, −1i .
Solution.
Ans: 8
6.2. Directional Derivatives and the Gradient Vector 171
Example 6.16. If f (x, y) = sin(x) + exy , find ∇f (x, y) and ∇f (0, 1).
Solution.
Ans: h2, 0i
Remark 6.17. With this notation of the gradient vector, we can rewrite
Du f (x, y) = ∇f (x, y) · u = fx (x, y)a + fy (x, y)b, where u = ha, bi. (6.7)
Ans: 4
172 Chapter 6. Multivariable Calculus
√
Ans: (a) 0; (b) 2
∇f (x)
Remark 6.21. Let u = , the unit vector in the gradient direc-
|∇f (x)|
tion. Then
∇f (x)
Du f (x) = ∇f (x) · u = ∇f (x) · = |∇f (x)|. (6.9)
|∇f (x)|
This implies that the directional derivative is maximized in the gradient
direction.
Claim 6.22. The gradient direction is the direction where the func-
tion changes fastest, more precisely, increases fastest!
6.3. Optimization: Method of Lagrange Multipliers 173
Level Curves
Example 6.23. Consider the unit circle, the circle of radius 1:
F (x, y) = x2 + y 2 = 1. (6.10)
• Note that h− sin t, cos ti = r 0 (t) is the tangential direction to the unit
circle. Thus ∇F must be normal to the curve.
• Indeed,
∇F = h2x, 2yi, (6.13)
which is normal to the curve and the fastest increasing direction.
Claim 6.24. Given a level curve F (x) = k, the gradient vector ∇F (x)
is normal to the curve and pointing the fastest increasing direction.
174 Chapter 6. Multivariable Calculus
(b) Evaluate f at all these points, to find the maximum and mini-
mum.
6.3. Optimization: Method of Lagrange Multipliers 175
Ans: 4 (x = y = 2z = 2)
Example 6.27. Find the extreme values of f (x, y) = x2 + 2y 2 on the circle
x2 + y 2 = 1.
2x = 2x λ 1
" # " #
2x 2x
Solution. ∇f = λ∇g =⇒ =λ . Therefore, 4y = 2y λ 2
4y 2y x2 + y 2 = 1 3
From 1 , x = 0 or λ = 1.
Indeed,
∇x L(x, λ) = ∇f (x) − λ∇g(x),
∂ (6.19)
L(x, λ) = g(x) − c.
∂λ
By equating the right-side with zero, we obtain (6.17).
The function L(x, λ) is called the Lagrangian for the problem (6.16).
6.3. Optimization: Method of Lagrange Multipliers 177
Now, consider
minx maxα L(x, α) subj.to α ≥ 0.
Figure 6.5: minx x2 subj.to x ≥ 1.
(6.22)
3 Let x < 1. ⇒ maxα≥0 {−α(x − 1)} = ∞. However, minx won’t make this
happen! (minx is fighting maxα ) That is, when x < 1, the objective L(x, α)
becomes huge as α grows; then, minx will push x % 1 or increase it to become
x ≥ 1. In other words, minx forces maxα to behave, so constraints will
be satisfied.
178 Chapter 6. Multivariable Calculus
The above analysis implies that the (original) minimization problem (6.23)
is equivalent to the minimax problem.
6.3. Optimization: Method of Lagrange Multipliers 179
where
L(x, α) = x2 − α (x − 1).
In the maximin problem, the term minx L(x, α) is called the Lagrange
dual function and the Lagrange multiplier α is also called the dual
variable.
How to solve it . For the Lagrange dual function minx L(x, α), the mini-
mum occurs where the gradient is equal to zero.
d α
L(x, α) = 2x − α = 0 ⇒ x = . (6.28)
dx 2
Plugging this to L(x, α), we have
α 2 α α2
L(x, α) = −α −1 =α− .
2 2 4
Multiple Constraints
Consider the problem of the form
max f (x) subj.to g(x) = c and h(x) = d. (6.31)
x
Thus (6.31) can be solved by finding all values of (x, y, z) and (λ, µ) such
that
∇f (x, y, z) = λ∇g(x, y, z) + µ∇h(x, y, z)
g(x, y, z) = c (6.33)
h(x, y, z) = d
Ans: 2
6.4. The Gradient Descent Method 181
xk+1 = xk + γk pk , k = 0, 1, · · · , (6.35)
f (xk+1 ) = f (xk + γk pk )
γk2 (6.36)
= f (xk ) + γk f 0 (xk ) · pk + pk · f 00 (ξ)pk .
2
• Assume that f 00 is bounded. Then
pk = −f 0 (xk ), (6.39)
then
f 0 (xk ) · pk = −||f 0 (xk )||2 < 0, (6.40)
which satisfies (6.38) and therefore (6.37).
• Summary: In the GD method, the search direction is the negative
gradient, the steepest descent direction.
6.4. The Gradient Descent Method 183
Picking the step length γ : Assume that the step length was chosen to
be independent of n, although one can play with other choices as well. The
question is how to select γ in order to make the best gain of the method. To
turn the right-hand side of (6.42) into a more manageable form, we invoke
Taylor’s Theorem:1
ˆ x+t
0
f (x + t) = f (x) + t f (x) + (x + t − s) f 00 (s) ds. (6.43)
x
00
Assuming that |f (s)| ≤ L, we have
t2 0
f (x + t) ≤ f (x) + t f (x) + L.
2
Now, letting x = xk and t = −γ f 0 (xk ) reads
f (xk+1 ) = f (xk − γ f 0 (xk ))
1
≤ f (xk ) − γ f 0 (xk ) f 0 (xk ) + L [γ f 0 (xk )]2 (6.44)
2
L
= f (xk ) − [f 0 (xk )]2 γ − γ 2 .
2
The gain (learning) from the method occurs when
L 2
γ − γ2 > 0 ⇒ 0 < γ < , (6.45)
2 L
and it will be best when γ − L2 γ 2 is maximal. This happens at the point
1
γ= . (6.46)
L
1
Taylor’s Theorem with integral remainder: Suppose f ∈ C n+1 [a, b] and x0 ∈ [a, b]. Then, for every
n ˆ
X f (k) (x0 ) 1 x
x ∈ [a, b], f (x) = (x − x0 )k + Rn (x), where Rn (x) = (x − s)n f (n+1) (s) ds.
k! n! x0
k=0
184 Chapter 6. Multivariable Calculus
f 0 (b
x) = lim f 0 (xk ) = 0, (6.51)
n→∞
Use the GD method to find the minimizer, starting with x0 = (−1, 2).
rosenbrock_2D_GD.py
1 import numpy as np; import time
2
6 def rosen(x):
7 return (1.-x[0])**2+100*(x[1]-x[0]**2)**2
8
9 def rosen_grad(x):
10 h = 1.e-5;
11 g1 = ( rosen([x[0]+h,x[1]]) - rosen([x[0]-h,x[1]]) )/(2*h)
12 g2 = ( rosen([x[0],x[1]+h]) - rosen([x[0],x[1]-h]) )/(2*h)
2
The Rosenbrock function in 3D is given as f (x, y, z) = [(1 − x)2 + 100 (y − x2 )2 ] + [(1 − y)2 + 100 (z − y 2 )2 ],
which has exactly one minimum at (1, 1, 1). Similarly, one can define the Rosenbrock function in gen-
eral N -dimensional spaces, for N ≥ 4, by adding one more component for each enlarged dimension.
N
X −1
(1 − xi )2 + 100(xi+1 − x2i )2 , where x = [x1 , x2 , · · · , xN ] ∈ RN . See Wikipedia
That is, f (x) =
i=1
(https://en.wikipedia.org/wiki/Rosenbrock_function) for details.
186 Chapter 6. Multivariable Calculus
13 return np.array([g1,g2])
14
Output
1 GD Method: it = 7687; E-time = 0.0521
2 [0.99994416 0.99988809]
The gradient descent algorithm with backtracking line search then becomes
Algorithm 6.39. (The Gradient Descent Algorithm, with Back-
tracking Line Search).
Note: The gradient descent method with partial updates is called the
stochastic gradient descent (SGD) method.
188 Chapter 6. Multivariable Calculus
For the algebraic system (6.56), Krylov subspace methods update the
iterates as follows.
Given an initial guess x0 ∈ Rn , find successive approximations xk ∈ Rn of
the form
xk+1 = xk + αk pk , k = 0, 1, · · · , (6.58)
where pk is the search direction and αk > 0 is the step length.
• Different methods differ in the choice of the search direction and the
step length.
• In this subsection, we focus on the gradient descent method.
• For other Krylov subspace methods, see e.g. [1, 7].
6.4. The Gradient Descent Method 189
f (xk+1 ) = f (xk + αk pk )
αk2
= f (xk ) + αk f 0 (xk ) · pk + pk · f 00 (ξ)pk (6.60)
22
α
= f (xk ) + αk f 0 (xk ) · pk + k pk · Apk .
2
• Since A is bounded,
That is, the search direction is the negative gradient, the steepest
descent direction.
190 Chapter 6. Multivariable Calculus
If αk is optimal, then
d
0 = f (xk + αpk ) = f 0 (xk + αk pk ) · pk
dα α=αk
= (A(xk + αk pk ) − b) · pk (6.65)
= (Axk − b) · pk + αk pk · Apk .
So,
r k · pk
αk = . (6.66)
pk · Apk
Contents of Chapter 7
7.1. The Least-Squares Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.2. Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.3. Scene Analysis with Noisy Data: Weighted Least-Squares and RANSAC . . . . . . . . 204
Exercises for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
193
194 Chapter 7. Least-Squares and Regression Analysis
where x
b is called a least-squares solution of Ax = b.
196 Chapter 7. Least-Squares and Regression Analysis
AT Ax = AT b. (7.2)
Method of Calculus
Let J (x) = kAx − bk2 = (Ax − b)T (Ax − b) and x
b a minimizer of J (x).
• Then we must have
∂J (x)
∇x J (b
x) = = 0. (7.3)
∂x x=b
x
b = (AT A)−1 AT b.
x (7.5)
least_squares.m
1 A = [1 1 0; 0 1 0; 0 0 1; 1 0 1];
2 b = [1; 3; 8; 2];
3 x = (A'*A)\(A'*b)
such that the graph is close to a line. We (may and must) determine a
line
y = β0 + β1 x (7.6)
that is as close as possible to the data points. Then this line is called
the least-squares line; it is also called the regression line of y on x
and β0 , β1 are called regression coefficients.
7.2. Regression Analysis 199
where
1 x1 " # y1
1 x β0 y
2 2
X = .. .. , β= , y = .. .
. . β1 .
1 xm ym
Here we call X the design matrix, β the parameter vector, and y
the observation vector.
• Thus the LS solution can be determined by solving the normal equa-
tions:
X T Xβ = X T y, (7.9)
provided that X T X is invertible.
• The normal equations for the regression line read
" # " #
m Σxi Σyi
β= . (7.10)
Σxi Σx2i Σxi yi
200 Chapter 7. Least-Squares and Regression Analysis
y = β0 + β1 x + β2 x2 ,
where
1 x1 x21 y1
1 x x2 β0 y
2 2 2
X = .. .. , β = β1 , y = .. .
. . .
..
.
β2
1 xm x2m ym
Now, it can be solved through normal equations:
2
Σ1 Σxi Σxi Σyi
X Xβ = Σxi Σxi Σxi β = Σxi yi = X T y
T 2 3 (7.14)
Σx2i Σx3i Σx4i Σx2i yi
Ans: y = 1 + 0.5x2
202 Chapter 7. Least-Squares and Regression Analysis
Example 7.15. Find the best fitting curve of the form y = cedx for the data
0.1 1.9940
0.2 2.0087
0.3 1.8770
0.4 3.5783
0.5 3.9203
0.6 4.7617
0.7 6.7246
0.8 7.1491
0.9 9.5777
1.0 11.5625
ln y = ln c + dx. (7.16)
Y = ln y, a0 = ln c, a1 = d, X = x,
7.2. Regression Analysis 203
8 # The linear LS
9 L := CurveFitting[LeastSquares](xlny, x, curve = b*x + a);
10 0.295704647799999 + 2.1530740654363654 x
11
Xβ = y. (7.19)
X T W Xβ = X T W y. (7.21)
Example 7.18. Given data, find the LS line with and without a weight.
When a weight is applied, weigh the first and the last data point by 1/4.
" #T
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
xy :=
5.89 1.92 2.59 4.41 4.49 6.22 7.74 7.07 9.05 5.7
Solution.
Weighted-LS
1 LS := CurveFitting[LeastSquares](xy, x);
2 2.7639999999999967 + 0.49890909090909125 x
3 WLS := CurveFitting[LeastSquares](xy, x,
4 weight = [1/4,1,1,1,1,1,1,1,1,1/4]);
5 1.0466694879390623 + 0.8019424460431653 x
206 Chapter 7. Least-Squares and Regression Analysis
3. Consensus set C:
n |a + bxi − yi | o
C = (xi , yi ) ∈ X | d = √ ≤ τe (7.22)
b2 + 1
7.3. Scene Analysis with Noisy Data: Weighted Least-Squares and RANSAC 207
Table 7.1: The RANSAC: model fitting y = a0 + a1 x. The algorithm runs 1000 times for
each dataset to find the standard deviation of the error: σ(a0 − b
a0 ) and σ(a1 − b
a1 ).
Data σ(a0 − b
a0 ) σ(a1 − b
a1 ) E-time (sec)
1 0.1156 0.0421 0.0156
2 0.1101 0.0391 0.0348
(a) Implement the method of normal equations for the least-squares regression to
find the best-fitting line.
(b) The RANSAC, Algorithm 7.19 is implemented for you below. Use the code to
analyze the performance of the RANSAC.
• Set τe = 1, γ = η|X| = 8, and N = 100.
• Run ransac2 100 times to get the minimum, maximum, and average number
of iterations for the RANSAC to find an acceptable hypothesis consensus set.
(c) Plot the best-fitting lines found from (a) and (b), superposed over the data.
get_hypothesis_WLS.m
1 function p = get_hypothesis_WLS(X,C)
2 % Get hypothesis p, with C being used as weights
3 % Output: p = [a,b], where y= a+b*x
4
5 m = size(X,1);
6
7 A = [ones(m,1) X(:,1)];
8 A = A.*C; %A = bsxfun(@times,A,C);
9 r = X(:,2).*C;
10
11 p = ((A'*A)\(A'*r))';
210 Chapter 7. Least-Squares and Regression Analysis
inlier.m
1 function C = inlier(X,p,tau_e)
2 % Input: p=[a,b] s.t. a+b*x-y=0
3
4 m = size(X,1);
5 C = zeros(m,1);
6
7 a = p(1); b=p(2);
8 factor = 1./sqrt(b^2+1);
9 for i=1:m
10 xi = X(i,1); yi = X(i,2);
11 dist = abs(a+b*xi-yi)*factor; %distance from point to line
12 if dist<=tau_e, C(i)=1; end
13 end
ransac2.m
1 function [p,C,iter] = ransac2(X,tau_e,gamma,N)
2 % Input: X = {(x_i,y_i)}
3 % tau_e: the error tolerance
4 % gamma = eta*|X|
5 % N: the maximum number of iterations
6 % Output: p = [a,b], where y= a+b*x
7
8 %%-----------
9 [m,n] = size(X);
10 if n>m, X=X'; [m,n] = size(X); end
11
Python Basics
Contents of Chapter 8
8.1. Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.2. Python Essentials in 30 Minutes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.3. Zeros of a Polynomial in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8.4. Python Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Exercises for Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
211
212 Chapter 8. Python Basics
Advantages of Python
Python has the following characteristics.
• Easy to learn and use
• Flexible and reliable
• Extensively used in Data Science
• Handy for Web Development purposes
• Having Vast Libraries support
• Among the fastest-growing programming languages in the tech
industry
Disadvantage of Python
Python is an interpreted and dynamically-typed language. The line-by-
line execution of code, built with a high flexibility, most likely leads to
slow execution. Python scripts are way slow!
3 LIB_F90='lib_f90'
4 LIB_GCC='lib_gcc'
5 LIB_GPP='lib_gpp'
6
3 import numpy as np
4 import ctypes, time
5 from lib_py3 import *
6 from lib_f90 import *
7 lib_gcc = ctypes.CDLL("./lib_gcc.so")
8 lib_gpp = ctypes.CDLL("./lib_gpp.so")
9
19 lib_gcc.CFUNCTION.argtypes = IN_ddii
20 lib_gcc.CFUNCTION.restype = OUT_d
21
22 result = lib_gcc.CFUNCTION(x,y,n,m)
214 Chapter 8. Python Basics
∼/.python_startup.py
1 #.bashrc: export PYTHONSTARTUP=~/.python_startup.py
2 #.cshrc: setenv PYTHONSTARTUP ~/.python_startup.py
3 #---------------------------------------------------
4 print("\t^[[1;33m~/.python_startup.py")
5
13 import random
14 from sympy import *
15 x,y,z,t = symbols('x,y,z,t');
16 print("\tfrom sympy import *; x,y,z,t = symbols('x,y,z,t')")
17
Programming Features
• Python has no support pointers.
• Python codes are stored with .py extension.
• Indentation: Python uses indentation to define a block of code.
– A code block (body of a function, loop, etc.) starts with indenta-
tion and ends with the first unindented line.
– The amount of indentation is up to the user, but it must be consis-
tent throughout that block.
• Comments:
– The hash (#) symbol is used to start writing a comment.
– Multi-line comments: Python uses triple quotes, either ”’ or """.
216 Chapter 8. Python Basics
Python Essentials
• Sequence datatypes: list, tuple, string
– [list]: defined using square brackets (and commas)
>>> li = ["abc", 14, 4.34, 23]
– (tuple): defined using parentheses (and commas)
>>> tu = (23, (4,5), ’a’, 4.1, -7)
– "string": defined using quotes (", ’, or """)
>>> st = ’Hello World’
>>> st = "Hello World"
>>> st = """This is a multi-line string
. . . that uses triple quotes."""
• Retrieving elements
>>> li[0]
’abc’
>>> tu[1],tu[2],tu[-2]
((4, 5), ’a’, 4.1)
>>> st[25:36]
’ng\nthat use’
• Slicing
>>> tu[1:4] # be aware
((4, 5), ’a’, 4.1)
• The + and ∗ operators
>>> [1, 2, 3]+[4, 5, 6,7]
[1, 2, 3, 4, 5, 6, 7]
>>> "Hello" + " " + ’World’
Hello World
>>> (1,2,3)*3
(1, 2, 3, 1, 2, 3, 1, 2, 3)
8.2. Python Essentials in 30 Minutes 217
• Reference semantics
>>> a = [1, 2, 3]
>>> b = a
>>> a.append(4)
>>> b
[1, 2, 3, 4]
Be aware with copying lists and numpy arrays!
• numpy, range, and iteration
>>> range(8)
[0, 1, 2, 3, 4, 5, 6, 7]
>>> import numpy as np
>>> for k in range(np.size(li)):
... li[k]
. . . <Enter>
’abc’
14
4.34
23
• numpy array and deepcopy
>>> from copy import deepcopy
>>> A = np.array([1,2,3])
>>> B = A
>>> C = deepcopy(A)
>>> A *= 4
>>> B
array([ 4, 8, 12])
>>> C
array([1, 2, 3])
218 Chapter 8. Python Basics
12 ## Docstrings in Python
13 def double(num):
14 """Function to double the value"""
15 return 2*num
16 print(double.__doc__)
17 # Output: Function to double the value
18
35
36 ## Python Dictionary
37 d = {'key1':'value1', 'Seth':22, 'Alex':21}
38 print(d['key1'],d['Alex'],d['Seth'])
39 # Output: value1 21 22
40
41 ## Output Formatting
42 x = 5.1; y = 10
43 print('x = %d and y = %d' %(x,y))
44 print('x = %f and y = %d' %(x,y))
45 print('x = {} and y = {}'.format(x,y))
46 print('x = {1} and y = {0}'.format(x,y))
47 # Output: x = 5 and y = 10
48 # x = 5.100000 and y = 10
49 # x = 5.1 and y = 10
50 # x = 10 and y = 5.1
51
52 print("x=",x,"y=",y, sep="#",end="&\n")
53 # Output: x=#5.1#y=#10&
54
8 if __name__ == '__main__':
9 num = input('Enter a natural number: ')
10 cubes = get_cubes(int(num))
11 print(cubes)
3 cubes = get_cubes(8)
4 print(cubes)
Execusion
1 [Sun Nov.05] python call_get_cubes.py
2 [1, 8, 27, 64, 125, 216, 343, 512]
8.3. Zeros of a Polynomial in Python 221
7 print(P)
8 print(Pder)
9 print(np.roots(P))
10 print(P(3), Pder(3))
Output
1 4 3 2
2 1 x - 4 x + 7 x - 5 x - 2
3 3 2
4 4 x - 12 x + 14 x - 5
5 [ 2. +0.j 1.1378411+1.52731225j 1.1378411-1.52731225j -0.2756822+0.j ]
6 19 37
7 for i in range(1,n):
8 d = p + x0*d
9 p = A[i] +x0*p
10 return p,d
11
12 def newton_horner(A,x0,tol,itmax):
13 """ input: A = [a_n,...,a_1,a_0]
14 output: x: P(x)=0 """
15 x=x0
16 for it in range(1,itmax+1):
17 p,d = horner(A,x)
18 h = -p/d;
19 x = x + h;
20 if(abs(h)<tol): break
21 return x,it
22
23 if __name__ == '__main__':
24 coeff = [1, -4, 7, -5, -2]; x0 = 3
25 tol = 10**(-12); itmax = 1000
26 x,it =newton_horner(coeff,x0,tol,itmax)
27 print("newton_horner: x0=%g; x=%g, in %d iterations" %(x0,x,it))
Execution
1 [Sat Jul.23] python Zeros-Polynomials-Newton-Horner.py
2 newton_horner: x0=3; x=2, in 7 iterations
224 Chapter 8. Python Basics
Note: The above Python code must be compared with the Matlab code
in §3.6:
horner.m
1 function [p,d] = horner(A,x0)
2 % input: A = [a_0,a_1,...,a_n]
3 % output: p=P(x0), d=P'(x0)
4
5 n = size(A(:),1);
6 p = A(n); d=0;
7
8 for i = n-1:-1:1
9 d = p + x0*d;
10 p = A(i) +x0*p;
11 end
newton_horner.m
1 function [x,it] = newton_horner(A,x0,tol,itmax)
2 % input: A = [a_0,a_1,...,a_n]; x0: initial for P(x)=0
3 % outpue: x: P(x)=0
4
5 x = x0;
6 for it=1:itmax
7 [p,d] = horner(A,x);
8 h = -p/d;
9 x = x + h;
10 if(abs(h)<tol), break; end
11 end
Call_newton_horner.m
1 a = [-2 -5 7 -4 1];
2 x0=3;
3 tol = 10^-12; itmax=1000;
4 [x,it] = newton_horner(a,x0,tol,itmax);
5 fprintf(" newton_horner: x0=%g; x=%g, in %d iterations\n",x0,x,it)
6 Result: newton_horner: x0=3; x=2, in 7 iterations
8.4. Python Classes 225
In the following, we would build a simple class, as Dr. Xu did in [14, Ap-
pendix B.5]; you will learn how to initiate, refine, and use classes.
226 Chapter 8. Python Basics
Initiation of a Class
Polynomial_01.py
1 class Polynomial():
2 """A class of polynomials"""
3
4 def __init__(self,coefficient):
5 """Initialize coefficient attribute of a polynomial."""
6 self.coeff = coefficient
7
8 def degree(self):
9 """Find the degree of a polynomial"""
10 return len(self.coeff)-1
11
12 if __name__ == '__main__':
13 p2 = Polynomial([1,2,3])
14 print(p2.coeff) # a variable; output: [1, 2, 3]
15 print(p2.degree()) # a method; output: 2
4 count = 0 #Polynomial.count
5
6 def __init__(self):
7 """Initialize coefficient attribute of a polynomial."""
8 self.coeff = [1]
9 Polynomial.count += 1
10
11 def __del__(self):
12 """Delete a polynomial object"""
13 Polynomial.count -= 1
14
15 def degree(self):
16 """Find the degree of a polynomial"""
17 return len(self.coeff)-1
18
19 def evaluate(self,x):
20 """Evaluate a polynomial."""
21 n = self.degree(); eval = []
22 for xi in x:
23 p = self.coeff[0] #Horner's method
24 for k in range(1,n+1): p = self.coeff[k]+ xi*p
25 eval.append(p)
26 return eval
27
28 if __name__ == '__main__':
29 poly1 = Polynomial()
30 print('poly1, default coefficients:', poly1.coeff)
31 poly1.coeff = [1,2,-3]
32 print('poly1, coefficients after reset:', poly1.coeff)
33 print('poly1, degree:', poly1.degree())
34
Output
1 poly1, default coefficients: [1]
2 poly1, coefficients after reset: [1, 2, -3]
3 poly1, degree: 2
4 poly2, coefficients after reset: [1, 2, 3, 4, -5]
5 poly2, degree: 4
6 number of created polynomials: 2
7 number of polynomials after a deletion: 1
8 poly2.evaluate([-1,0,1,2]): [-7, -5, 5, 47]
8.4. Python Classes 229
Inheritance
Note: If we want to write a class that is just a specialized version of
another class, we do not need to write the class from scratch.
• We call the specialized class a child class and the other general
class a parent class.
• The child class can inherit all the attributes and methods form the
parent class.
– It can also define its own special attributes and methods or even
overrides methods of the parent class.
Classes can import functions implemented earlier, to define methods.
Classes.py
1 from util_Poly import *
2
3 class Polynomial():
4 """A class of polynomials"""
5
6 def __init__(self,coefficient):
7 """Initialize coefficient attribute of a polynomial."""
8 self.coeff = coefficient
9
10 def degree(self):
11 """Find the degree of a polynomial"""
12 return len(self.coeff)-1
13
14 class Quadratic(Polynomial):
15 """A class of quadratic polynomial"""
16
17 def __init__(self,coefficient):
18 """Initialize the coefficient attributes ."""
19 super().__init__(coefficient)
20 self.power_decrease = 1
21
22 def roots(self):
23 return roots_Quad(self.coeff,self.power_decrease)
24
25 def degree(self):
26 return 2
230 Chapter 8. Python Basics
util_Poly.py
1 def roots_Quad(coeff,power_decrease):
2 a,b,c = coeff
3 if power_decrease != 1:
4 a,c = c,a
5 discriminant = b**2-4*a*c
6 r1 = (-b+discriminant**0.5)/(2*a)
7 r2 = (-b-discriminant**0.5)/(2*a)
8 return [r1,r2]
call_Quadratic.py
1 from Classes import *
2
3 quad1 = Quadratic([2,-3,1])
4 print('quad1, roots:',quad1.roots())
5 quad1.power_decrease = 0
6 print('roots when power_decrease = 0:',quad1.roots())
Output
1 quad1, roots: [1.0, 0.5]
2 roots when power_decrease = 0: [2.0, 1.0]
8.4. Python Classes 231
Note: A while loop has not been considered in the lecture. However, you can figure it out
easily by yourself.
8.3. Write a function that takes as input a list of values and returns the largest value. Do
this without using the Python max() function; you should combine a for loop and an
if statement.
8.4. Let P4 (x) = 2x4 − 5x3 − 11x2 + 20x + 10. Solve the following.
Hint : For plotting, you may import: “import matplotlib.pyplot as plt” then use
plt.plot(). You will see the Python plotting is quite similar to Matlab plotting.
9
C HAPTER
Contents of Chapter 9
9.1. Subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.2. Orthogonal Sets and Orthogonal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.3. Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
9.4. The Gram-Schmidt Process and QR Factorization . . . . . . . . . . . . . . . . . . . . . 248
9.5. QR Iteration for Finding Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Exercises for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
233
234 Chapter 9. Vector Spaces and Orthogonality
9.1. Subspaces of Rn
Definition 9.1. A subspace of Rn is any set H in Rn that has three
properties:
a) The zero vector is in H.
b) For each u and v in H, the sum u + v is in H.
c) For each u in H and each scalar c, the vector cu is in H.
That is, H is closed under linear combinations.
Example 9.3.
Col A = {u | u = c1 a1 + c2 a2 + · · · + cn an }, (9.1)
Proof.
236 Chapter 9. Vector Spaces and Orthogonality
Remark 9.9.
(" # " #)
1 1
1. , is a basis for R2 .
0 2
1 0 0
0 1 0
, e2 = 0, · · · , en = .... Then {e1 , e2 , · · · , en } is called
2. Let e1 = 0
.. ..
. . 0
0 0 1
n
the standard basis for R .
Example 9.10. Find a basis for the column space of the matrix
1 0 −3 5 0
0 1 2 −1 0
B= .
0 0 0 0 1
0 0 0 0 0
Solution. Observation: b3 = −3b1 + 2b2 and b4 = 5b1 − b2 .
Example 9.12. Find bases for the column space and the null space of the
matrix
−3 6 −1 1
A = 1 −2 2 3.
2 −4 5 8
1 2 0 1
Solution. A ∼ 0 0 1 2
0 0 0 0
Theorem 9.13. A basis for Nul A can be obtained from the parametric
vector form of solutions of Ax = 0. That is, suppose that the solutions of
Ax = 0 reads
x = x1 u1 + x2 u2 + · · · + xk uk ,
where x1 , x2 , · · · , xk correspond to free variables. Then, a basis for Nul A
is {u1 , u2 , · · · , uk }.
y = c1 u1 + c2 u2 + · · · + cp up (9.2)
are given by
y•uj
cj = (j = 1, 2, · · · , p). (9.3)
uj •uj
An Orthogonal Projection
Note: Given a nonzero vector u in Rn , consider the problem of decompos-
ing a vector y ∈ Rn into sum of two vectors, one a multiple of u and the
other orthogonal to u. Let
y=y
b + z, b // u and z ⊥ u.
y
Let y
b = αu. Then
0 = z•u = (y − αu)•u = y•u − αu•u.
Thus α = y•u/u•u.
y=y
b + z, b // u and z ⊥ u.
y (9.4)
Then
y•u
y
b = αu = u, z = y − yb. (9.5)
u•u
The vector yb is called the orthogonal projection of y onto u, and z is
called the component of y orthogonal to u. Let L = Span{u}. Then
we denote
y•u
y
b= u = projL y, (9.6)
u•u
which is called the orthogonal projection of y onto L.
240 Chapter 9. Vector Spaces and Orthogonality
" # " #
7 4
Example 9.20. Let y = and u = .
6 2
Orthonormal Sets
Definition 9.21. A set {u1 , u2 , · · · , up } is an orthonormal set, if it is
an orthogonal set of unit vectors. If W is the subspace spanned by such a
set, then {u1 , u2 , · · · , up } is an orthonormal basis for W , since the set
is automatically linearly independent.
1 0
Example 9.22. In Example 9.18, p. 238, we know v1 = −2, v2 = 1,
1 2
−5
and v3 = −2 form an orthogonal basis for R3 . Find the corresponding
1
orthonormal basis.
Solution.
Proof.
Theorems 9.23 and 9.24 are particularly useful when applied to square ma-
trices.
Definition 9.25. An orthogonal matrix is a square matrix U such
that U T = U −1 , i.e.,
orthogonal_matrix.m Output
1 n = 4; 1 U =
2 2 -0.5332 0.4892 0.6519 0.2267
3 [Q,~] = qr(rand(n)); 3 -0.5928 -0.7162 0.1668 -0.3284
4 U = Q; 4 -0.0831 0.4507 -0.0991 -0.8833
5 5 -0.5978 0.2112 -0.7331 0.2462
6 disp("U ="); disp(U) 6 U'*U =
7 disp("U'*U ="); disp(U'*U) 7 1.0000 -0.0000 0 -0.0000
8 8 -0.0000 1.0000 0.0000 0.0000
9 x = rand([n,1]); 9 0 0.0000 1.0000 -0.0000
10 fprintf("\nx' ="); disp(x') 10 -0.0000 0.0000 -0.0000 1.0000
11 fprintf("||x||_2 =");disp(norm(x,2)) 11 x' = 0.4218 0.9157 0.7922 0.9595
12 fprintf("||U*x||_2=");disp(norm(U*x,2)) 12 ||x||_2 = 1.6015
13 ||U*x||_2= 1.6015
9.3. Orthogonal Projections 243
W ⊥ = {z | z•w = 0, ∀ w ∈ W }. (9.8)
ky − vk2 = ky − y
bk2 + kb
y − vk2 ,
If U = [u1 u2 · · · up ], then
T 1 1 −3 −2
Ans: (a) U U = (b)
10 −3 9 6
248 Chapter 9. Vector Spaces and Orthogonality
v1 = x1
x2 •v1
v2 = x2 − v1
v1 •v1
x3 •v1 x3 •v2
v3 = x3 − v1 − v2 (9.16)
v1 •v1 v2 •v2
..
.
xp •v1 xp •v2 xp •vp−1
vp = xp − v1 − v2 − · · · − vp−1
v1 •v1 v2 •v2 vp−1 •vp−1
Then {v1 , v2 , · · · , vp } is an orthogonal basis for W . In addition,
QR Factorization of Matrices
A = QR, (9.19)
where
• Q is an m × n matrix whose columns are orthonormal.
• R is an n × n upper triangular invertible matrix
with positive entries on its diagonal.
We may assume hat rkk > 0. (If rkk < 0, multiply both rkk and uk by −1.)
3. Let r k = [r1k , r2k , · · · , rkk , 0, · · · , 0]T . Then
xk = Qr k (9.22)
4. Define
def
R == [r 1 r 2 · · · r n ]. (9.23)
• Thus
A = [x1 x2 · · · xn ] = QR (9.25)
implies that
1 u2 · · · un ],
Q = [u
u1 •x1 u1 •x2 u1 •x3 ··· u1 •xn
0 u2 •x2 u2 •x3 ··· u2 •xn
= QT A. (9.26)
0
R = 0 u3 •x3 ··· u3 •xn
.. .. .. ... ..
. . . .
0 0 0 ··· un •xn
0.8 −0.6 5 0.4
Ans: Q = R=
0.6 0.8 0 2.2
252 Chapter 9. Vector Spaces and Orthogonality
AT Ax = AT b ⇒ b = (AT A)−1 AT b,
x
Solution.
Remark 9.45. It follows from (a) and (b) of Algorithm 9.44 that
and therefore
Ak = Rk Qk = QTk Ak−1 Qk = QTk QTk−1 Ak−2 Qk−1 Qk = · · ·
= QTk QTk−1 · · · QT1 A0 Q1 Q2 · · · Qk (9.32)
| {z }
Uk
Claim 9.46.
• Algorithm 9.44 produces an upper triangular matrix T , with its
diagonals being eigenvalues of A, and an orthogonal matrix U such
that
A = UT UT, (9.34)
which is called the Schur decomposition of A.
• If A is symmetric, then T becomes a diagonal matrix of eigenvalues
of A and U is the collection of corresponding eigenvectors.
254 Chapter 9. Vector Spaces and Orthogonality
3 1 3 4 −1 1
Example 9.47. Let A = 1 6 4 and B = −1 3 −2. Apply the QR
6 7 8 1 −2 3
algorithm, Algorithm 9.44, to find their Schur decompositions.
Solution. You will solve this example once more implementing the QR
iteration algorithm in Python; see Exercise 9.7.
qr_iteration.m
1 function [T,U,iter] = qr_iteration(A)
2 % It produces the Schur decomposition: A = U*T*U^T
3 % T: upper triangular, with diagonals being eigenvalues of A
4 % U: orthogonal
5 % Once A is symmetric,
6 % T becomes diagonal && U contains eigenvectors of A
7
8 T = A; U = eye(size(A));
9
10 % for stopping
11 D0 = diag(T); change = 1;
12 tol = 10^-15; iter=0;
13
14 %%-----------------
15 while change>tol
16 [Q,R] = qr(T);
17 T = R*Q;
18 U = U*Q;
19
20 % for stopping
21 iter= iter+1;
22 D=diag(T); change=norm(D-D0); D0=D;
23 %if iter<=8, fprintf('A_%d =\n',iter); disp(T); end
24 end
9.5. QR Iteration for Finding Eigenvalues 255
We may call it as
call_qr_iteration.m
1 A =[3 1 3; 1 6 4; 6 7 8];
2 [T1,U1,iter1] = qr_iteration(A)
3 U1*T1*U1'
4 [V1,D1] = eig(A)
5
A B
1 T1 = 1 T2 =
2 13.8343 1.0429 -4.0732 2 6.0000 -0.0000 0.0000
3 0.0000 3.3996 0.5668 3 -0.0000 3.0000 -0.0000
4 0.0000 -0.0000 -0.2339 4 0.0000 -0.0000 1.0000
5 U1 = 5 U2 =
6 0.2759 -0.5783 -0.7677 6 0.5774 0.8165 -0.0000
7 0.4648 0.7794 -0.4201 7 -0.5774 0.4082 0.7071
8 0.8414 -0.2409 0.4838 8 0.5774 -0.4082 0.7071
9 iter1 = 9 iter2 =
10 26 10 28
11 ans = 11 ans =
12 3.0000 1.0000 3.0000 12 4.0000 -1.0000 1.0000
13 1.0000 6.0000 4.0000 13 -1.0000 3.0000 -2.0000
14 6.0000 7.0000 8.0000 14 1.0000 -2.0000 3.0000
15 15
Cpnvergence Check
9.1. Suppose y is orthogonal to u and v. Prove that y is orthogonal to every w in Span{u, v}.
3 2 1 5
9.2. Let u1 = −3, u2 = 2, u3 = 1, and x = −3.
0 −1 4 1
(a) Use the Gram-Schmidt process to produce an orthogonal basis for the column
space of A.
(b) Use Algorithm 9.40 to produce a QR factorization of A.
(c) Apply the QR iteration to find eigenvalues of A(1:4,1:4).
Ans: (a) v4 = (0, 5, 0, 0, −5)
9.7. Solve Example 9.47 by implementing the QR iteration algorithm in Python; you may
use qr_iteration.m, p.254.
258 Chapter 9. Vector Spaces and Orthogonality
10
C HAPTER
Contents of Chapter 10
10.1.What is Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
10.2.Binary Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
10.3.Popular Machine Learning Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
10.4.Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
10.5.Scikit-Learn: A Python Machine Learning Library . . . . . . . . . . . . . . . . . . . . . 292
A Machine Learning Modelcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Exercises for Chapter 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
259
260 Chapter 10. Introduction to Machine Learning
Supervised Learning
Assumption. Given a data set {(xi , yi )}, where yi are labels,
there exists a relation f : X → Y .
Supervised learning:
(
Given : A training data {(xi , yi ) | i = 1, · · · , N }
(10.3)
Find : fb : X → Y , a good approximation to f
Unsupervised Learning
Note:
• In supervised learning, we know the right answer beforehand
when we train our model, and in reinforcement learning, we de-
fine a measure of reward for particular actions by the agent.
• In unsupervised learning, however, we are dealing with unla-
beled data or data of unknown structure. Using unsupervised learn-
ing techniques, we are able to explore the structure of our data
to extract meaningful information, without the guidance of a known
outcome variable or reward function.
• Clustering is an exploratory data analysis technique that allows
us to organize a pile of information into meaningful subgroups
(clusters) without having any prior knowledge of their group mem-
berships.
4. Interpretability:
Although ML has come very far, researchers still don’t know exactly
how some algorithms (e.g., deep nets) work.
• If we don’t know how training nets actually work, how do we make
any real progress?
5. One-Shot Learning:
We still haven’t been able to achieve one-shot learning. Traditional
gradient-based networks need a huge amount of data, and are
often in the form of extensive iterative training.
• Instead, we should find a way to enable neural networks to
learn, using just a few examples.
10.2. Binary Classifiers 265
Definition 10.3. Let {(x(i) , y (i) )} be labeled data, with x(i) ∈ Rd and
y (i) ∈ {0, 1}. A binary classifier finds a hyperplane in Rd that sepa-
rates data points X = {x(i) } to two classes:
where θ is a threshold.
For simplicity, we can bring the threshold θ in (10.6) to the left side of
the equation; define a weight-zero as w0 = −θ and reformulate as
(
1 if z ≥ 0,
φ(z) = z = wT x = w0 + w1 x1 + · · · + wd xd . (10.7)
−1 otherwise,
The update of the weight vector w can be more formally written as:
where η is the learning rate, 0 < η < 1, y (i) is the true class label of the
i-th training sample, and yb(i) denotes the predicted class label.
3 class Perceptron():
4 def __init__(self, xdim, epoch=10, learning_rate=0.01):
5 self.epoch = epoch
6 self.learning_rate = learning_rate
7 self.weights = np.zeros(xdim + 1)
8
32 #-----------------------------------------------------
33 def fit_and_fig(self, Xtrain, ytrain):
34 wgts_all = []
35 for k in range(self.epoch):
36 for x, y in zip(Xtrain, ytrain):
37 yhat = self.activate(x)
38 self.weights[1:] += self.learning_rate*(y-yhat)*x
39 self.weights[0] += self.learning_rate*(y-yhat)
40 if k==0: wgts_all.append(list(self.weights))
41 return np.array(wgts_all)
270 Chapter 10. Introduction to Machine Learning
Iris_perceptron.py
1 import numpy as np; import matplotlib.pyplot as plt
2 from sklearn.model_selection import train_test_split
3 from sklearn import datasets; #print(dir(datasets))
4 np.set_printoptions(suppress=True)
5 from perceptron import Perceptron
6
7 #-----------------------------------------------------------
8 data_read = datasets.load_iris(); #print(data_read.keys())
9 X = data_read.data;
10 y = data_read.target
11 targets = data_read.target_names; features = data_read.feature_names
12
Figure 10.6: A part of Iris data (left) and the convergence of Perceptron iteration (right).
10.2. Binary Classifiers 271
φ(wT x) = wT x.
The dominant algorithm for the minimization of the cost function is the the
Gradient Descent Method.
Algorithm 10.9. The Gradient Descent Method uses −∇J for the
search direction (update direction):
Thus, with φ = I,
X
∆w = −η∇w J (w, b) = η y − φ(z ) x(i) ,
(i) (i)
i
X
(i) (i)
(10.12)
∆b = −η∇b J (w, b) = η y − φ(z ) .
i
Hyperparameters
Definition 10.10. In ML, a hyperparameter is a parameter whose
value is set before the learning process begins. Thus it is an algorithmic
parameter. Examples are
• The learning rate (η)
• The number of maximum epochs/iterations (n_iter)
Note: There are effective searching schemes to set the learning rate η
automatically.
274 Chapter 10. Introduction to Machine Learning
Multi-class Classification
• − vs {◦, +} ⇒ weights w−
• + vs {◦, −} ⇒ weights w+
• ◦ vs {+, −} ⇒ weights w◦
Figure 10.11: Popular activation functions: (left) The standard logistic sigmoid func-
tion and (right) the rectifier and softplus function.
276 Chapter 10. Introduction to Machine Learning
i h
X
(i) (i)
i (10.26)
∆b = −η∇b J (w, b) = η y − φ(z ) .
i
Note: The above gradient descent rule for Logistic Regression is of the
same form as that of Adaline; see (10.12) on p. 272. Only the difference
is the activation function φ.
10.3. Popular Machine Learning Classifiers 279
To find an optimal hyperplane that maximizes the margin, let’s begin with
considering the positive and negative hyperplanes that are parallel to the
decision boundary:
w0 + wT x+ = 1,
(10.27)
w0 + wT x− = −1.
where w = [w1 , w2 , · · · , wd ]T . If we subtract those two linear equations from
each other, then we have
w · (x+ − x− ) = 2
and therefore
w 2
· (x+ − x− ) = . (10.28)
kwk kwk
280 Chapter 10. Introduction to Machine Learning
Figure 10.14: Illustration for how a new data point (?) is assigned the triangle class label,
based on majority voting, when k = 5.
282 Chapter 10. Introduction to Machine Learning
=⇒
Figure 10.17: Segmentation.
MNIST dataset :
A modified subset of two datasets collected by NIST (US National Insti-
tute of Standards and Technology):
• Its first part contains 60,000 images (for training)
• The second part is 10,000 images (for test), each of which is in 28 × 28
grayscale pixels
– Thus, if all four of these hidden neurons are firing, then we can
conclude that the digit is a 0.
where W denotes the collection of all weights in the network, B all the
biases, and a(x(i) ) is the vector of outputs from the network when x(i)
is input.
• Gradient descent method
" # " # " #
W W ∆W
← + , (10.33)
B B ∆B
e(1) , x
x e(2) , · · · , x
e(m) ,
13 class Network(object):
14 def __init__(self, sizes):
15 """The list ``sizes`` contains the number of neurons in the
16 respective layers of the network. For example, if the list
17 was [2, 3, 1] then it would be a three-layer network, with the
18 first layer containing 2 neurons, the second layer 3 neurons,
19 and the third layer 1 neuron. """
20
21 self.num_layers = len(sizes)
22 self.sizes = sizes
23 self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
24 self.weights = [np.random.randn(y, x)
25 for x, y in zip(sizes[:-1], sizes[1:])]
26
94 z = zs[-l]
95 sp = sigmoid_prime(z)
96 delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
97 nabla_b[-l] = delta
98 nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
99 return (nabla_b, nabla_w)
100
4 import network
5 n_neurons = 20
6 net = network.Network([784 , n_neurons, 10])
7
Validation Accuracy
Validation Accuracy
1 Epoch 0: 9006 / 10000
2 Epoch 1: 9128 / 10000
3 Epoch 2: 9202 / 10000
4 Epoch 3: 9188 / 10000
5 Epoch 4: 9249 / 10000
6 ...
7 Epoch 25: 9356 / 10000
8 Epoch 26: 9388 / 10000
9 Epoch 27: 9407 / 10000
10 Epoch 28: 9410 / 10000
11 Epoch 29: 9428 / 10000
Accuracy Comparisons
• scikit-learn’s SVM classifier using the default settings: 9435/10000
• A well-tuned SVM: ≈98.5%
• Well-designed Convolutional NN (CNN):
9979/10000 (only 21 missed!)
1. Selection of features
2. Choosing a performance metric
3. Choosing a classifier and optimization algorithm
4. Evaluating the performance of the model
5. Tuning the algorithm
In practice :
• Each algorithm has its own characteristics and is based on certain
assumptions.
• No Free Lunch Theorem: No single classifier works best across all
possible scenarios.
• Best Model: It is always recommended that you compare the perfor-
mance of at least a handful of different learning algorithms to
select the best model for the particular problem.
Why Scikit-Learn?
• Nice documentation and usability
• The library covers most machine-learning tasks:
– Preprocessing modules
– Algorithms
– Analysis tools
• Robust Model: Given a dataset, you may
(a) Compare algorithms
(b) Build an ensemble model
• Scikit-learn scales to most data problems
7 iris = datasets.load_iris()
8
9 feature_names = iris.feature_names
10 target_names = iris.target_names
11 print("## feature names:", feature_names)
12 print("## target names :", target_names)
13 print("## set(iris.target):", set(iris.target))
14
15 #------------------------------------------------------
16 # Create "model instances"
17 #------------------------------------------------------
18 from sklearn.linear_model import LogisticRegression
19 from sklearn.neighbors import KNeighborsClassifier
20
21 LR = LogisticRegression(max_iter = 1000)
22 KNN = KNeighborsClassifier(n_neighbors = 5)
23
24 #------------------------------------------------------
25 # Split, Train, and Test
26 #------------------------------------------------------
27 import numpy as np
28 from sklearn.model_selection import train_test_split
29
30 X = iris.data; y = iris.target
31 iter = 100; Acc = np.zeros([iter,2])
32
33 for i in range(iter):
34 X_train, X_test, y_train, y_test = train_test_split(
35 X, y, test_size=0.3, random_state=i, stratify=y)
36 LR.fit(X_train, y_train); Acc[i,0] = LR.score(X_test, y_test)
37 KNN.fit(X_train, y_train); Acc[i,1] = KNN.score(X_test, y_test)
38
39 acc_mean = np.mean(Acc,axis=0)
40 acc_std = np.std(Acc,axis=0)
41 print('## iris.Accuracy.LR : %.4f +- %.4f' %(acc_mean[0],acc_std[0]))
42 print('## iris.Accuracy.KNN: %.4f +- %.4f' %(acc_mean[1],acc_std[1]))
43
10.5. Scikit-Learn: A Python Machine Learning Library 295
44 #------------------------------------------------------
45 # New Sample ---> Predict
46 #------------------------------------------------------
47 sample = [[5, 3, 2, 4],[4, 3, 3, 6]];
48 print('## New sample =',sample)
49
Output
1 ## feature names: ['sepal length (cm)', 'sepal width (cm)',
2 'petal length (cm)', 'petal width (cm)']
3 ## target names : ['setosa' 'versicolor' 'virginica']
4 ## set(iris.target): {0, 1, 2}
5 ## iris.Accuracy.LR : 0.9631 +- 0.0240
6 ## iris.Accuracy.KNN: 0.9658 +- 0.0202
7 ## New sample = [[5, 3, 2, 4], [4, 3, 3, 6]]
8 ## sample.LR.predict : ['setosa' 'virginica']
9 ## sample.KNN.predict: ['versicolor' 'virginica']
7 #=====================================================================
8 # Upload a Dataset: print(dir(datasets))
9 # load_iris, load_wine, load_breast_cancer, ...
10 #=====================================================================
11 data_read = datasets.load_iris(); #print(data_read.keys())
12
13 X = data_read.data
14 y = data_read.target
15 dataname = data_read.filename
16 targets = data_read.target_names
17 features = data_read.feature_names
18
19 #---------------------------------------------------------------------
20 # SETTING
21 #---------------------------------------------------------------------
22 N,d = X.shape; nclass=len(set(y));
23 print('DATA: N, d, nclass =',N,d,nclass)
24 rtrain = 0.7e0; run = 50; CompEnsm = 2;
25
26 def multi_run(clf,X,y,rtrain,run):
27 t0 = time.time(); acc = np.zeros([run,1])
28 for it in range(run):
29 Xtrain, Xtest, ytrain, ytest = train_test_split(
30 X, y, train_size=rtrain, random_state=it, stratify = y)
31 clf.fit(Xtrain, ytrain);
32 acc[it] = clf.score(Xtest, ytest)
33 etime = time.time()-t0
34 return np.mean(acc)*100, np.std(acc)*100, etime # accmean,acc_std,etime
10.5. Scikit-Learn: A Python Machine Learning Library 297
35
36 #=====================================================================
37 # My Classifier
38 #=====================================================================
39 from myclf import * # My Classifier = MyCLF()
40 if 'MyCLF' in locals():
41 accmean, acc_std, etime = multi_run(MyCLF(mode=1),X,y,rtrain,run)
42
46 #=====================================================================
47 # Scikit-learn Classifiers, for Comparisions && Ensembling
48 #=====================================================================
49 if CompEnsm >= 1:
50 exec(open("sklearn_classifiers.py").read())
myclf.py
1 import numpy as np
2 from sklearn.base import BaseEstimator, ClassifierMixin
3 from sklearn.tree import DecisionTreeClassifier
4
sklearn_classifiers.py
1 #=====================================================================
2 # Required: X, y, multi_run [dataname, rtrain, run, CompEnsm]
3 #=====================================================================
4 from sklearn.preprocessing import StandardScaler
5 from sklearn.datasets import make_moons, make_circles, make_classification
6 from sklearn.neural_network import MLPClassifier
7 from sklearn.neighbors import KNeighborsClassifier
8 from sklearn.linear_model import LogisticRegression
9 from sklearn.svm import SVC
10 from sklearn.gaussian_process import GaussianProcessClassifier
11 from sklearn.gaussian_process.kernels import RBF
12 from sklearn.tree import DecisionTreeClassifier
13 from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
14 from sklearn.naive_bayes import GaussianNB
15 from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
16 from sklearn.ensemble import VotingClassifier
17
18 #-----------------------------------------------
19 classifiers = [
20 LogisticRegression(max_iter = 1000),
21 KNeighborsClassifier(5),
22 SVC(kernel="linear", C=0.5),
23 SVC(gamma=2, C=1),
24 RandomForestClassifier(max_depth=5, n_estimators=50, max_features=1),
25 MLPClassifier(hidden_layer_sizes=[100], activation='logistic',
26 alpha=0.5, max_iter=1000),
27 AdaBoostClassifier(),
28 GaussianNB(),
29 QuadraticDiscriminantAnalysis(),
30 GaussianProcessClassifier(),
31 ]
32 names = [
33 "Logistic-Regr",
34 "KNeighbors-5 ",
35 "SVC-Linear ",
36 "SVC-RBF ",
37 "Random-Forest",
38 "MLPClassifier",
39 "AdaBoost ",
40 "Naive-Bayes ",
41 "QDA ",
42 "Gaussian-Proc",
43 ]
10.5. Scikit-Learn: A Python Machine Learning Library 299
44 #-----------------------------------------------
45 if dataname is None: dataname = 'No-dataname';
46 if run is None: run = 50;
47 if rtrain is None: rtrain = 0.7e0;
48 if CompEnsm is None: CompEnsm = 2;
49
50 #=====================================================================
51 print('====== Comparision: Scikit-learn Classifiers =================')
52 #=====================================================================
53 import os;
54 acc_max=0; Acc_CLF = np.zeros([len(classifiers),1]);
55
59 Acc_CLF[k] = accmean
60 if accmean>acc_max: acc_max,algname = accmean,name
61 print('%s: %s: Acc.(mean,std) = (%.2f,%.2f)%%; E-time= %.5f'
62 %(os.path.basename(dataname),name,accmean,acc_std,etime/run))
63 print('--------------------------------------------------------------')
64 print('sklearn classifiers Acc: (mean,max) = (%.2f,%.2f)%%; Best = %s'
65 %(np.mean(Acc_CLF),acc_max,algname))
66
Output
1 DATA: N, d, nclass = 150 4 3
2 MyCLF() = DecisionTreeClassifier(max_depth=5)
3 iris.csv: MyCLF() : Acc.(mean,std) = (94.53,3.12)%; E-time= 0.00074
4 ====== Comparision: Scikit-learn Classifiers =================
5 iris.csv: Logistic-Regr: Acc.(mean,std) = (96.13,2.62)%; E-time= 0.01035
6 iris.csv: KNeighbors-5 : Acc.(mean,std) = (96.49,1.99)%; E-time= 0.00176
7 iris.csv: SVC-Linear : Acc.(mean,std) = (97.60,2.26)%; E-time= 0.00085
8 iris.csv: SVC-RBF : Acc.(mean,std) = (96.62,2.10)%; E-time= 0.00101
9 iris.csv: Random-Forest: Acc.(mean,std) = (94.84,3.16)%; E-time= 0.03647
10 iris.csv: MLPClassifier: Acc.(mean,std) = (98.58,1.32)%; E-time= 0.20549
11 iris.csv: AdaBoost : Acc.(mean,std) = (94.40,2.64)%; E-time= 0.04119
12 iris.csv: Naive-Bayes : Acc.(mean,std) = (95.11,3.20)%; E-time= 0.00090
13 iris.csv: QDA : Acc.(mean,std) = (97.64,2.06)%; E-time= 0.00085
14 iris.csv: Gaussian-Proc: Acc.(mean,std) = (95.64,2.63)%; E-time= 0.16151
15 --------------------------------------------------------------
16 sklearn classifiers Acc: (mean,max) = (96.31,98.58)%; Best = MLPClassifier
17 ====== Ensembling: SKlearn Classifiers =======================
18 EnCLF = ['KNeighbors-5', 'SVC-Linear', 'SVC-RBF', 'MLPClassifier', 'QDA']
19 iris.csv: Ensemble CLFs: Acc.(mean,std) = (97.60,1.98)%; E-time= 0.22272
Ensembling:
You may stack the best and its siblings of other options.
10.5. Scikit-Learn: A Python Machine Learning Library 301
• For a given training dataset, Adaline converges to a unique weights, while Percep-
tron does not.
• Note that the correction terms are accumulated from all data points in each itera-
tion. As a consequence, the learning rate η may be chosen smaller as the number
of points increases.
Implementation: In order to overcome the problem, you may scale the correction
terms by the number of data points.
– Redefine the cost function (10.9):
N
1 X (i) 2
J (w, b) = y − φ(z (i) ) . (10.36)
2N i=1
Contents of Chapter 11
11.1.Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
11.2.Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
11.3.Applications of the SVD to LS Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Exercises for Chapter 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
303
304 Chapter 11. Principal Component Analysis
3 # Generate data
4 def generate_data(n):
5 # Normally distributed around the origin
6 x = np.random.normal(0,1, n)
7 y = np.random.normal(0,1, n)
8 S = np.vstack((x, y)).T
9 # Transform
10 sx, sy = 1, 3;
11 Scale = np.array([[sx, 0], [0, sy]])
12 theta = 0.25*np.pi; c,s = np.cos(theta), np.sin(theta)
13 Rot = np.array([[c, -s], [s, c]]).T #T, due to right multiplication
14
17 # Covariance
18 def cov(x, y):
19 xbar, ybar = x.mean(), y.mean()
20 return np.sum((x - xbar)*(y - ybar))/len(x)
21
22 # Covariance matrix
23 def cov_matrix(X):
24 return np.array([[cov(X[:,0], X[:,0]), cov(X[:,0], X[:,1])], \
25 [cov(X[:,1], X[:,0]), cov(X[:,1], X[:,1])]])
Covariance.py
1 import numpy as np
2 import matplotlib.pyplot as plt
3 from util_Covariance import *
4
5 # Generate data
6 n = 200
7 X = generate_data(n)
8 print('Generated data: X.shape =', X.shape)
9
10 # Covariance matrix
11 C = cov_matrix(X)
11.1. Principal Component Analysis 307
12 print('C:\n',C)
13
14 # Principal directions
15 eVal, eVec = np.linalg.eig(C)
16 xbar,ybar = np.mean(X,0)
17 print('eVal:\n',eVal); print('eVec:\n',eVec)
18 print('np.mean(X, 0) =',xbar,ybar)
19
20 # Plotting
21 plt.style.use('ggplot')
22 plt.scatter(X[:, 0],X[:, 1],c='#00a0c0',s=10)
23 plt.axis('equal');
24 plt.title('Generated Data')
25 plt.savefig('py-data-generated.png')
26
Output
1 Generated data: X.shape = (200, 2)
2 C:
3 [[ 5.10038723 -4.15289232]
4 [-4.15289232 4.986776 ]]
5 eVal:
6 [9.19686242 0.89030081]
7 eVec:
8 [[ 0.71192601 0.70225448]
9 [-0.70225448 0.71192601]]
10 np.mean(X, 0) = 4.986291809096116 2.1696690114181947
C = U DU −1 , (11.5)
Z = X W, (11.6)
and then 2 finding the weight vector which extracts the maximum variance
from this new data matrix
wk = arg max kX bk wk2 , (11.10)
kwk=1
where
U : n × d orthogonal (the left singular vectors of X.)
Σ : d × d diagonal (the singular values of X.)
V : d × d orthogonal (the right singular vectors of X.)
where σ1 ≥ σ2 ≥ · · · ≥ σd ≥ 0.
• In terms of this factorization, the matrix X T X reads
X T X = (U ΣV T )T U ΣV T = V ΣU T U ΣV T = V Σ2 V T . (11.13)
X = U ΣV T . (11.14)
2. Set
W = V. (11.15)
Then the score matrix, the set of principal components, is
Z = XW = XV = U ΣV T V = U Σ
(11.16)
= [σ1 u1 |σ2 u2 | · · · |σd ud ].
kX − Xk k2 = kU ΣV T − U Σk V T k2
= kU (Σ − Σk )V T k2 (11.20)
= kΣ − Σk k2 = σk+1 ,
Image Compression
• Dyadic Decomposition: The data matrix X ∈ Rm×n is expressed as
a sum of rank-1 matrices:
n
X
T
X = U ΣV = σi ui viT , (11.21)
i=1
where
V = [v1 , · · · , vn ], U = [u1 , · · · , un ].
k = 20 k = 50 k = 100
314 Chapter 11. Principal Component Analysis
σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.
where
U : m × n orthogonal (the left singular vectors of A.)
Σ : n × n diagonal (the singular values of A.)
V : n × n orthogonal (the right singular vectors of A.)
Proof. (of Theorem 11.14) Use induction on m and n: we assume that the
SV D exists for (m − 1) × (n − 1) matrices, and prove it for m × n. We assume
A 6= 0; otherwise we can take Σ = 0 and let U and V be arbitrary orthogonal
matrices.
Av
• Let u = ||Av||2 , which is a unit vector. Choose Ũ , Ṽ such that
are orthogonal.
• Now, we write
" # " #
T T T
u u Av u AṼ
U T AV = · A · [v Ṽ ] =
Ũ T Ũ T Av Ũ T AṼ
Since
T (Av)T (Av) ||Av||22
u Av = = = ||Av||2 = ||A||2 ≡ σ,
||Av||2 ||Av||2
Ũ T Av = Ũ T u||Av||2 = 0,
we have
" # " #" #" #T
σ 0 1 0 σ 0 1 0
U T AV = = ,
0 U1 Σ1 V1T 0 U1 0 Σ1 0 V1
or equivalently
" #! " # " #!T
1 0 σ 0 1 0
A= U V . (11.27)
0 U1 0 Σ1 0 V1
U = [u1 u2 · · · un ],
Σ = diag(σ1 , σ2 , · · · , σn ),
V = [v1 v2 · · · vn ],
A = U Σ V T ⇐⇒ AV = U ΣV T V = U Σ,
we have
AV = A[v1 v2 ··· vn ] = [Av1 Av2 · · · Avn ]
σ1
...
= [u1 · · · ur · · · un ] σ r
(11.28)
...
0
= [σ1 u1 · · · σr ur 0 · · · 0].
Therefore,
(
Avj = σj uj , j = 1, 2, · · · , r
A = U ΣV T ⇔ (11.29)
Avj = 0, j = r + 1, · · · , n
• Equation (11.31) gives how to find the singular values {σj } and the
right singular vectors V , while (11.29) shows a way to compute the
left singular vectors U .
• (Dyadic decomposition) The matrix A ∈ Rm×n can be expressed as
n
X
A= σj uj vjT . (11.33)
j=1
This property has been utilized for various approximations and ap-
plications, e.g., by dropping singular vectors corresponding to small
singular values.
11.2. Singular Value Decomposition 319
AT A = V ΛV T ,
Lemma 11.17. Let A ∈ Rn×n be symmetric. Then (a) all the eigenvalues
of A are real and (b) eigenvectors corresponding to distinct eigenvalues
are orthogonal.
320 Chapter 11. Principal Component Analysis
1 2
Example 11.18. Find the SV D for A = −2 1 .
3 2
Solution.
" #
14 6
1. AT A = .
6 9
λ1 = 18 and λ2 = 5,
√ √ √ √ √
3. σ1 = λ1 = 18 = 3 2, σ2 = λ2 = 5. So
"√ #
18 0
Σ= √
0 5
√7
" # 7
√3 234
4
1
4. u1 = √1 A 13
= √118 √113 −4 = − √234
σ1 Av1 =
18 √2
13
13 √13
234
4
" −2
# 4 √
65
√
1 √1 A 13 1 √1 √7
u2 = σ2 Av2 = 5 3
= 5 13 7 = 65 .
√
√
13
0 0
√7 √4
234 65 "√
√3 √2
#" #
4 7
18 0 13 13
5. A = U ΣV T = − √234 √ √
65 0 5 2 3
− √13 √
13
√13 0
234
11.3. Applications of the SVD to LS Problems 321
where x
b called a least-squares solution of Ax = b.
b = (AT A)−1 AT b.
x (11.36)
A = U ΣV T .
x
b = V z. (11.41)
z = Σ+ + T
k c = Σk U b, (11.44)
where
Σ+ T
k = [1/σ1 , 1/σ2 , · · · , 1/σk , 0, · · · , 0] . (11.45)
Thus the corresponding LS solution reads
b = V z = V Σ+
x T
k U b. (11.46)
Note that x
b involves no components of the null space of A;
b is unique in this sense.
x
Remark 11.21.
• When rank (A) = k = n: It is easy to see that
−1 T
V Σ+ T
kU = VΣ U , (11.47)
A+ + T
k := V Σk U (11.48)
plays the role of the pseudoinverse of A. Thus we will call it the k-th
pseudoinverse of A.
Note: For some LS applications, although rank (A) = n, the k-th pseu-
doinverse A+
k , with a small k < n, may give more reliable solutions.
324 Chapter 11. Principal Component Analysis
4 %% Standardization
5 %%---------------------------------------------
6 S_mean = mean(A); S_std = std(A);
7 if S_std(1)==0, S_std(1)=1/S_mean(1); S_mean(1)=0; end
8 AS = (A-S_mean)./S_std;
9
18 sol_PCA = V*C*U'*b;
19 end
Regression_Analysis.m
1 clear all; close all;
2
3 %%-----------------------------------------------------
4 %% Setting
5 %%-----------------------------------------------------
6 regen_data = 0; %==1, regenerate the synthetic data
7 poly_n = 9;
8 npt=300; bx=5.0; sigma=0.50; %for synthetic data
9 datafile = 'synthetic-data.txt';
10
11 %%-----------------------------------------------------
12 %% Data: Generation and Read
13 %%-----------------------------------------------------
14 if regen_data || ~isfile(datafile)
15 DATA = util.get_data(npt,bx,sigma);
16 writematrix(DATA, datafile);
17 fprintf('%s: re-generated.\n',datafile)
18 end
19 DATA = readmatrix(datafile,"Delimiter",",");
20
21 %%-----------------------------------------------------
22 %% The system: A x = b
23 %%-----------------------------------------------------
24 A = util.get_A(DATA(:,1),poly_n+1);
25 b = DATA(:,2);
26
27 %%-----------------------------------------------------
28 %% Method of Noral Equations
29 %%-----------------------------------------------------
30 sol_NE = (A'*A)\(A'*b);
31 figure,
32 plot(DATA(:,1),DATA(:,2),'k.','MarkerSize',8);
33 axis tight; hold on
34 yticks(1:5); ax = gca; ax.FontSize=13; %ax.GridAlpha=0.25
35 title(sprintf('Synthetic Data: npt = %d',npt),'fontsize',13)
36 util.mysave(gcf,'data-synthetic.png');
37 x=linspace(min(DATA(:,1)),max(DATA(:,1)),51);
38 plot(x,util.predict_Y(x,sol_NE),'r-','linewidth',2);
39 Pn = ['P_',int2str(poly_n)];
40 legend('data',Pn, 'location','best','fontsize',13)
41 TITLE0=sprintf('Method of NE: npt = %d',npt);
42 title(TITLE0,'fontsize',13)
43 hold off
11.3. Applications of the SVD to LS Problems 327
44 util.mysave(gcf,'data-synthetic-sol-NE.png');
45
46 %%-----------------------------------------------------
47 %% PCA Regression
48 %%-----------------------------------------------------
49 for npc=1:size(A,2);
50 [sol_PCA,S_mean,S_std] = pca_regression(A,b,npc);
51 figure,
52 plot(DATA(:,1),DATA(:,2),'k.','MarkerSize',8);
53 axis tight; hold on
54 yticks(1:5); ax = gca; ax.FontSize=13; %ax.GridAlpha=0.25
55 x=linspace(min(DATA(:,1)),max(DATA(:,1)),51);
56 plot(x,util.predict_Y(x,sol_PCA,S_mean,S_std),'r-','linewidth',2);
57 Pn = ['P_',int2str(poly_n)];
58 legend('data',Pn, 'location','best','fontsize',13)
59 TITLE0=sprintf('Method of PC: npc = %d',npc);
60 title(TITLE0,'fontsize',13)
61 hold off
62 savefile = sprintf('data-sol-PCA-npc-%02d.png',npc);
63 util.mysave(gcf,savefile);
64 end
Figure 11.3: The synthetic data and the LS solution P9 (x), overfitted.
328 Chapter 11. Principal Component Analysis
Figure 11.4: PCA regression of the data, with various numbers of principal components.
The best regression is achieved when npc = 3.
11.3. Applications of the SVD to LS Problems 329
(a) Add lines to the code given, to verify (11.20), p.312. For example, set k = 5.
Wine_data.py
1 import numpy as np
2 from numpy import diag,dot
3 from scipy.linalg import svd,norm
4 import matplotlib.pyplot as plt
5
9 #-----------------------------------------------
10 # Standardization
11 #-----------------------------------------------
12 X_mean, X_std = np.mean(X,axis=0), np.std(X,axis=0)
13 XS = (X - X_mean)/X_std
14
15 #-----------------------------------------------
16 # SVD
17 #-----------------------------------------------
18 U, s, VT = svd(XS)
19 if U.shape[0]==U.shape[1]:
20 U = U[:,:len(s)] # cut the nonnecessary
21 Sigma = diag(s) # transform to a matrix
22 print('U:',U.shape, 'Sigma:',Sigma.shape, 'VT:',VT.shape)
Note:
• Line 12: np.mean and np.std are applied, with the option axis=0, to get
the quantities column-by-column vertically. Thus X_mean and X_std are row
vectors.
• Line 18: In Python, svd produces [U, s, VT], where VT = V T . If you would
like to get V , then V = VT.T.
Clue: The major reason that a class is used in the Matlab code in Example 11.22 is
to combine multiple functions to be saved in a file. In Python, you do not have to use
a class to save multiple functions in a file. You may start with the following.
util.py
1 mport numpy as np
2 import matplotlib.pyplot as plt
3
4 def get_data(npt,bx,sigma):
5 data = np.zeros([npt,2]);
6 data[:,0] = np.random.uniform(0,1,npt)*bx;
7 data[:,1] = np.maximum(bx/3,2*data[:,0]-bx);
8 r = np.random.normal(0,1,npt)*sigma;
9 theta = np.random.normal(0,1,npt)*np.pi;
10 noise = np.column_stack((r*np.cos(theta),r*np.sin(theta)));
11 data += noise;
12 return data
13
14 def mysave(filename):
15 plt.savefig(filename,bbox_inches='tight')
16 print('saved:',filename)
17
Regression_Analysis.py
1 import numpy as np
2 import numpy.linalg as la
3 import matplotlib.pyplot as plt
4 from os.path import exists
5 import util
6
7 ##-----------------------------------------------------
8 ## Setting
9 ##-----------------------------------------------------
10 regen_data = 1; #==1, regenerate the synthetic data
11 poly_n = 9;
12 npt=300; bx=5.0; sigma=0.50; #for synthetic data
13 datafile = 'synthetic-data.txt';
14 plt.style.use('ggplot')
15
16 ##-----------------------------------------------------
17 ## Data: Generation and Read
18 ##-----------------------------------------------------
19 if regen_data or not exists(datafile):
20 DATA = util.get_data(npt,bx,sigma);
21 np.savetxt(datafile,DATA,delimiter=',');
11.3. Applications of the SVD to LS Problems 331
32 ##-----------------------------------------------------
33 ## The system: A x = b
34 ##-----------------------------------------------------
Note: The semi-colons (;) are not necessary in Python nor harmful; they are in-
cluded from copy-and-paste of Matlab lines. The ggplot style emulates “ggplot",
a popular plotting package for R. When Regression_Analysis.py is executed, you
will have a saved image:
Appendices
Contents of Chapter A
A.1. Optimization: Primal and Dual Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 334
A.2. Weak Duality, Strong Duality, and Complementary Slackness . . . . . . . . . . . . . . 338
A.3. Geometric Interpretation of Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
A.4. Rank-One Matrices and Structure Tensors . . . . . . . . . . . . . . . . . . . . . . . . . 349
A.5. Boundary-Effects in Convolution Functions in Matlab and Python SciPy . . . . . . . . 353
A.6. From Python, Call C, C++, and Fortran . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
333
334 Appendix A. Appendices
where
L(x, α, β) = f (x) + α · h(x) + β · q(x).
Here the minimum does not require x in the feasible set C.
336 Appendix A. Appendices
Remark A.9. It is clear to see from the definition, the optimal value of
the dual problem, named as g ∗ , satisfies
min f (x)
x
subj.to hi (x) ≤ 0, i = 1, · · · , m (Primal) (A.2.1)
qj (x) = 0, j = 1, · · · , p
Theorem A.12. The dual problem yields a lower bound for the primal
problem. That is, the minimax f ∗ is greater or equal to the maximin g ∗ :
Notice that the left side depends on x, while the right side is a function of
(α, β). The inequality holds true for all x, α ≥ 0, β.
⇒ We may take min and max respectively to the left side and the right side,
x α≥0,β
to conclude (A.2.6).
Definition A.15. Given primal feasible x and dual feasible (α, β), the
quantity
f (x) − g(α, β) = f (x) − min L(x, α, β) (A.2.9)
x
is called the duality gap.
f (x) − g(α, β) ≥ f ∗ − g ∗ ≥ 0
Proposition A.16. With x, (α, β), the duality gap equals to 0 iff
(a) x is the primal optimal solution,
(b) (α, β) is the dual optimal solution, and
(c) the strong duality holds.
The duality gap equals to 0, iff the three inequalities become equalities.
A.2. Weak Duality, Strong Duality, and Complementary Slackness 341
Assume that strong duality holds, x∗ is the primal optimal, and (α∗ , β ∗ )
is the dual optimal. Then
def
f (x∗ ) = g(α∗ , β ∗ ) == min L(x, α∗ , β ∗ )
x
n Xm p
X o
∗ ∗
= min f (x) + αi hi (x) + βj qj (x)
x
i=1 j=1 (A.2.10)
m
X Xp
≤ f (x∗ ) + α∗i hi (x∗ ) + βj∗ qj (x∗ )
i=1 j=1
∗
≤ f (x ),
Theorem A.20. The strong duality holds, iff there exists a nonvertical
supporting hyperplane of A passing through (0, 0, f ∗ ).
min x2 + 1
x (A.3.9)
subj.to x ≥ 1.
L(x, α) = x2 + 1 + α(−x + 1) = x2 − αx + α + 1
α 2 α 2 (A.3.10)
= x− − + α + 1,
2 4
and therefore the dual function reads (when x = α/2)
α2
g(α) = min L(x, α) = − + α + 1. (A.3.11)
x 4
−x + 1 = r, x2 + 1 = t (A.3.14)
See Figure A.1, where the shaded region is the epigraph of the problem.
Figure A.1: The epigraph of the convex problem (A.3.9), the shaded region, and strong
duality.
−x + 1 ≤ 0 ⇒ r = −x + 1 ≤ 0.
Thus the left side of the t-axis in A corresponds to the feasible set; it
follows from (A.3.15) that
α2
t = −αr − + α + 1, (A.3.18)
4
which is a line in the (r, t)-coordinates for a fixed α. Figure A.1 depicts
two of the lines: α = 0 and α = 2.
f ∗ = g ∗ = 2; (A.3.20)
duality_convex.py
1 import numpy as np
2 from matplotlib import pyplot as plt
3
13 plt.fill_between(r,t,maxt,color='cyan',alpha=0.25)
14 plt.plot(r,t,color='cyan')
15 plt.xlabel(r'$r$',fontsize=15); plt.ylabel(r'$t$',fontsize=15)
16 plt.text(-1,12,r'$\cal A$',fontsize=16)
17 plt.plot([0,0],[mint-3,maxt],color='black',ls='-') # t-axis
18 plt.yticks(np.arange(-2,maxt,2)); plt.tight_layout()
19
34 plt.savefig('png-duality-example.png',bbox_inches='tight')
35 plt.show()
348 Appendix A. Appendices
Figure A.2: The nonconvex problem: (left) The graph of y = f (x) and (right) the epigraph
and weak duality.
Structure Tensor
Definition A.28. The structure tensor is a matrix derived from the
gradient of a function f (x), x ∈ Rn : it is defined as
λ1 = ||∇f ||22 , v1 = ∇f
(A.4.6)
λ2 = 0, v2 = [−fy , fx ]T
Proof.
(a) The matrix S is clearly symmetric. Let v ∈ R2 . Then
vT Sv = vT (∇f )(∇f )T v = |(∇f )T v|2 ≥ 0, (A.4.7)
which proves that S is positive semidefinite.
(b) It follows from (A.4.2). Since S is symmetric, ||S||2 must be the maxi-
mum eigenvalue of S. (See Theorem 5.44 (f).)
(c) det S = fx2 fy2 − (fx fy )2 = 0 ⇒ S is not invertible ⇒ An eigenvalue of S
must be 0.
(d) It is not difficult to check that Svi = λi vi , i = 1, 2.
A.4. Rank-One Matrices and Structure Tensors 351
sobel_derivatives.m
1 function [ux,uy] = sobel_derivatives(u)
2 % Usage: [ux,uy] = sobel_derivatives(u);
3 % It produces Sobel derivatives, using conv2(u,C,'valid').
4
matlab_conv2_boundary.m
1 %%--- Initial setting --------------------
2 n = 21; h = 1/(n-1);
3 x = linspace(0,1,n); [X,Y] = meshgrid(x);
4
scipy_convolve_boundary.py
1 import numpy as np; import scipy
2 import matplotlib.pyplot as plt
3 from matplotlib import cm; from math import pi
4
Python
• Advantages
– Easy to learn and use
– Flexible and reliable
– Extensively used in Data Science
– Handy for Web Development purposes
– Having Vast Libraries support
– Among the fastest-growing programming languages
in the tech industry, machine learning, and AI
• Disadvantage
– Slow!!
Python Extension
f90
test_f90.f90
1 subroutine test_f90_v(x,y,m,dotp)
2 implicit none
3 real(kind=8), intent(in) :: x(:), y(:)
4 real(kind=8), intent(out) :: dotp
5 integer :: m,j
6
7 do j=1,m
8 dotp = dot_product(x,y)
9 enddo
10 end
11
12 subroutine test_f90_s(x,y,m,dotp)
13 implicit none
14 real(kind=8), intent(in) :: x(:), y(:)
15 real(kind=8), intent(out) :: dotp
16 integer :: n,m,i,j
17
18 n =size(x)
19 do j=1,m
20 dotp=0
21 do i=1,n
22 dotp = dotp+x(i)*y(i)
23 enddo
24 enddo
25 end
A.6. From Python, Call C, C++, and Fortran 359
C++
The numeric library is included for vector operations.
test_gpp.cpp
1 #include <iostream>
2 #include <vector>
3 #include <numeric>
4 using namespace std;
5 typedef double VTYPE;
6
13 for(j=0;j<m;j++){
14 dotp = inner_product(x, x+n, y, 0.0);
15 }
16 return dotp;
17 }
18
25 for(j=0;j<m;j++){dotp=0.;
26 for(i=0;i<n;i++){
27 dotp += x[i]*y[i];
28 }
29 }
30 return dotp;
31 }
360 Appendix A. Appendices
Python
test_py3.py
1 import numpy as np
2
3 def test_py3_v(x,y,m):
4 for j in range(m):
5 dotp = np.dot(x,y)
6 return dotp
7
8 def test_py3_s(x,y,m):
9 n = len(x)
10 for j in range(m):
11 dotp = 0;
12 for i in range(n):
13 dotp +=x[i]*y[i]
14 return dotp
Compiling
Modules in f90, C, and C++ are compiled by executing the shell script.
Compile-f90-c-cpp
1 #!/usr/bin/bash
2
3 LIB_F90='lib_f90'
4 LIB_GCC='lib_gcc'
5 LIB_GPP='lib_gpp'
6
Python Wrap-up
An executable Python wrap-up is implemented as follows.
Python_calls_F90_GCC.py
1 #!/usr/bin/python3
2
3 import numpy as np
4 import ctypes, time
5 from test_py3 import *
6 from lib_f90 import *
7 lib_gcc = ctypes.CDLL("./lib_gcc.so")
8 lib_gpp = ctypes.CDLL("./lib_gpp.so")
9
10 n=100; m=1000000
11 #n=1000; m=1000000
12
13 x = np.arange(0.,n,1); y = x+0.1;
14
15 print('--------------------------------------------------')
16 print('Speed test: (dot-product: n=%d), m=%d times' %(n,m))
17 print('--------------------------------------------------')
18
33 ### C #####################################
34 lib_gcc.test_gcc_s.argtypes = [np.ctypeslib.ndpointer(dtype=np.double),
35 np.ctypeslib.ndpointer(dtype=np.double),
36 ctypes.c_int,ctypes.c_int] #input type
37 lib_gcc.test_gcc_s.restype = ctypes.c_double #output type
38
51 lib_gpp.test_gpp_s.argtypes = [np.ctypeslib.ndpointer(dtype=np.double),
52 np.ctypeslib.ndpointer(dtype=np.double),
53 ctypes.c_int,ctypes.c_int] #input type
54 lib_gpp.test_gpp_s.restype = ctypes.c_double #output type
55
Performance Comparison
A Linux OS is used with an Intel Core i7-10750H CPU @ 2.60GHz.
n=100, m=1000000 ⇒ 200M flops
1 --------------------------------------------------
2 Speed test: (dot-product: n=100), m=1000000 times
3 --------------------------------------------------
4 test_py3_v: e-time = 0.7672; result = 328845.00
5 test_py3_s: e-time = 18.2175; result = 328845.00
6
Projects
Contents of Chapter P
P.1. Project: Canny Edge Detection Algorithm for Color Images . . . . . . . . . . . . . . . . 366
P.2. Project: Text Extraction from Images, PDF Files, and Speech Data . . . . . . . . . . . 380
365
366 Appendix P. Projects
• In practice,
u : Ω ⊂ N2 → N3[0,255]
due to sampling & quantiza-
tion
• u = u(m, n, d), for d = 1 or 3
• rgb2gray formula:
0.299 ∗ R + 0.587 ∗ G + 0.114 ∗ B
24 %----------------------------------------------
25 %--- New Trial: Color Edge Detection ----------
26 %----------------------------------------------
27 ES = color_edge(v0,Name);
Figure P.1: Edge detection, using the built-in function “edge”, which is not perfect but
somewhat acceptable.
368 Appendix P. Projects
Figure P.2: Canny edge detection for color images: A synthetic image produced by
transform_to_gray.m, its grayscale, and the result of the built-in function “edge”.
8 [m,n,d] = size(v0);
9 vs = zeros(m,n,d); grad = zeros(m,n,d); theta = zeros(m,n,d);
10 TH = zeros(m,n); ES = zeros(m,n);
11
Numerical Discretization
For the time-stepping procedure, we simply employ the explicit method,
the forward Euler method:
un+1 − un ∇un
−∇· n
= β(v0 − un ), u0 = v0 , (P.1.2)
∆t |∇u |
which equivalently reads
un+1 = un + ∆t β(v0 − un ) − Aun , u0 = v0 ,
(P.1.3)
where ∇un un un
n x y
Au ≈ −∇ · n
=− n
− .
|∇u | |∇u | x |∇un | y
test_denoising.m
1 close all; clear all
2
13 ug = zeros(m,n,d); ut = zeros(m,n,d);
14 for k=1:d
15 ut(:,:,k) = tv_denoising(v0(:,:,k));
16 ug(:,:,k) = imgaussfilt(v0(:,:,k),2); % sigma=2
17 end
18
19 imwrite(ut,'Lena256_test-TV_denoised.png')
20 imwrite(ug,'Lena256_test-Gaussian-filter.png')
Figure P.3: Step 1: Image denoising or image blur. The original Lena (left), the TV-
denoised image (middle), and the Gaussian-filtered image (right).
372 Appendix P. Projects
That is,
ux = conv(u, Kx ), uy = conv(u, Ky ). (P.1.5)
• Then the magnitude G and the slope θ of the gradient are calculated
as follow:
q u
2 2 y
G = ux + uy , θ = arctan = atan2(uy , ux ). (P.1.6)
ux
See sobel_grad.m on the following page.
sobel_grad.m
1 function [grad,theta] = sobel_grad(u)
2 % [grad,theta] = sobel_grad(u)
3 % It computes the Sobel gradient magnitude, |grad(u)|,
4 % and edge normal angle, theta.
5
6 [m,n,d]=size(u);
7 grad = zeros(m,n); theta = zeros(m,n);
8
9 %%--------------------------------------------------
10 for q=1:n
11 qm=max(q-1,1); qp=min(q+1,n);
12 for p=1:m
13 pm=max(p-1,1); pp=min(p+1,m);
14 ux = u(pp,qm)-u(pm,qm) +2.*(u(pp,q)-u(pm,q)) +u(pp,qp)-u(pm,qp);
15 uy = u(pm,qp)-u(pm,qm) +2.*(u(p,qp)-u(p,qm)) +u(pp,qp)-u(pp,qm);
16 grad(p,q) = sqrt(ux^2 + uy^2);
17 theta(p,q) = atan2(uy,ux);
18 end
19 end
Figure P.5: Step 2: The maximum of (R,G,B) gradients, for the Lena image.
Figure P.6: The color checkerboard image in Figure P.2 (left) and the maximum of its
(R,G,B)-gradients (right).
4 [m,n] = size(E0);
5 Z = zeros(m,n);
6
7 TH(TH<0) = TH(TH<0)+pi;
8 R = mod(floor((TH+pi/8)/(pi/4)),4); % region=0,1,2,3
9
Figure P.7: Step 3: Non-maximum suppression. The Sobel gradient in Figure P.5 (left) and
the non-maximum suppressed (right).
P.1. Project: Canny Edge Detection Algorithm for Color Images 377
double_threshold.m
1 unction [strong,weak] = double_threshold(E1,highRatio,lowRatio)
2
3 highThreshold = highRatio*max(E1(:));
4 lowThreshold = lowRatio *highThreshold;
5
6 strong = (E1>=highThreshold);
7 weak = (E1<highThreshold).*(E1>=lowThreshold);
Figure P.8: Step 4: Double threshold. The strong pixels (left), the weak pixels (middle),
and the combined (right).
6. Extra Credit:
(a) Analysis: For noise reduction, you can employ either the TV-
denoising model or the builtin function imgaussfilt(Image,σ)
with various choices of σ. Analyze effects of different choices of
parameters and functions on edge detection.
(b) New Idea for the Gradient Intensity. We chose the maximum
of (R,G,B)-gradients; see Line 21 of color_edge.m. Do you have
any idea better than that?
There are some powerful text extraction software (having accuracy 98+%).
• However most of them are not freely/conveniently available.
• We will develop two text extraction algorithms, one for image data and
the other for speech data.
An Example
pdfim2text
1 #!/usr/bin/python
2
3 import pytesseract
4 from pdf2image import convert_from_path
5 from PIL import Image
6 from gtts import gTTS
7 from playsound import playsound
8 import os, pathlib, glob
9 from termcolor import colored
10
11 def takeInput():
12 pmode = 0;
13 IN = input("Enter a pdf or an image: ")
14 if os.path.isfile(IN):
15 path_stem = pathlib.Path(IN).stem
16 path_ext = pathlib.Path(IN).suffix
17 if path_ext.lower() == '.pdf': pmode=1
18 else:
19 exit()
20 return IN, path_stem, pmode
21
22 def pdf2txt(IN):
23 # you have to complete the function appropriately
24 return 'Aha, it is a pdf file.\
25 For pdf2txt, you may save the text here without return.'
26
27 def im2txt(IN):
28 # you have to complete the function appropriately
29 return 'Now, it is an image.\
30 For im2txt, try to return the text to play'
31
32 if __name__ == '__main__':
33 IN, path_stem, pmode = takeInput() #pmode=0:image; pmode=1:pdf
34 if pmode:
35 txt = pdf2txt(IN)
36 else:
37 txt = im2txt(IN)
38
What to Do
First download https://skim.math.msstate.edu/LectureNotes/data/Image-
Speech-Text-Processing.PY.tar. Untar it to see the file pdfim2text and
example codes in a subdirectory example-code.
1. Complete pdfim2text appropriately.
• You may find clues from example-code/pdf2txt.py
2. Implement speech2text from scratch.
• You may get hints from speech_mic2wave.py and image2text.py
in the directory example-code.
Try to put all functions into a single file for each command, which en-
hances portability of the commands.
Report
• Work in a directory, of which the name begins with your last name.
• Use the three-page project document as a data file for pdfim2text.
• zip or tar your work directory and submit via email.
• Write a report to explain what you have done, including images and
wave files; upload it to Canvas.
Bibliography
[1] R. B ARRETT, M. B ERRY, T. C HAN, J. D EMMEL , J. D ONATO, J. D ONGARRA , V. E I -
JKHOUT, R. P OZO, C. R OMINE , AND H. VAN DER V ORST , Templates for the solution of
linear systems: Building blocks for iterative methods, SIAM, Philadelphia, 1994. The
postscript file is free to download from http://www.netlib.org/templates/ along with
source codes.
[5] M. F ISCHLER AND R. B OLLES, Random sample consensus: A paradigm for model
fitting with applications to image analysis and automated cartography, Communica-
tions of the ACM, 24 (1981), pp. 381–395.
[6] B. G ROSSER AND B. L ANG, An o(n2 ) algorithm for the bidiagonal svd, Lin. Alg. Appl.,
358 (2003), pp. 45–70.
[7] C. K ELLY, Iterative methods for linear and nonlinear equations, SIAM, Philadelphia,
1995.
[9] M. N IELSEN, Neural networks and deep learning. (The online book can be found at
http://neuralnetworksanddeeplearning.com), 2013.
[10] F. R OSENBLATT, The Perceptron: A probabilistic model for information storage and
organization in the brain, Psychological Review, (1958), pp. 65–386.
[11] L. R UDIN, S. O SHER , AND E. FATEMI, Nonlinear total variation based noise removal
algorithms, Physica D, 60 (1992), pp. 259–268.
[12] P. H. T ORR AND A. Z ISSERMAN, Mlesac: A new robust estimator with application
to estimating image geometry, Computer vision and image understanding, 78 (2000),
pp. 138–156.
383
384 BIBLIOGRAPHY
[13] P. R. W ILLEMS, B. L ANG, AND C. V ÖMEL, Computing the bidiagonal SVD using
multiple relatively robust representations, SIAM Journal on Matrix Analysis and Ap-
plications, 28 (2006), pp. 907–926.
385
386 INDEX