0% found this document useful (0 votes)

25 views13 pages

机器学习前置数学知识概述

Uploaded by

caiyuzhu.cs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views13 pages

机器学习前置数学知识概述

Uploaded by

caiyuzhu.cs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Outline

机器学习 Linear Algebra

Machine Learning Probability

第 1.5讲：前置数学知识补遗
Information Theory
盛律/软件学院
2024 秋冬学期 Please refer to the appendices of Pattern Recognition and Machine Learning (C. M. Bishop) and
Machine Learning (Zhihua Zhou) for more details.

1 2

a2
Matrix
<latexit sha1_base64="yiXiQRbywkGJ2p95Nyrl8BrZD1Y=">AAAB83icbVDLSsNAFL3xWeur2qWbwSK4kJIURJcBNy4r2Ac0oUymk3boZBJmJkIJ2fgRblwo4tafcSH4B36DKydtF9p6YOBwzr3cMydIOFPatj+tldW19Y3N0lZ5e2d3b79ycNhWcSoJbZGYx7IbYEU5E7Slmea0m0iKo4DTTjC+KvzOHZWKxeJWTxLqR3goWMgI1kbyvAjrURBmOO83+pWaXbenQMvEmZOaW/34vv/Kz5r9yrs3iEkaUaEJx0r1HDvRfoalZoTTvOyliiaYjPGQ9gwVOKLKz6aZc3RilAEKY2me0Giq/t7IcKTUJArMZJFRLXqF+J/XS3V46WdMJKmmgswOhSlHOkZFAWjAJCWaTwzBRDKTFZERlphoU1PZlOAsfnmZtBt157xu3zg114UZSnAEx3AKDlyAC9fQhBYQSOABnuDZSq1H68V6nY2uWPOdKvyB9fYDst+V3w==</latexit>

2 3
a11 a12 ··· a1n
6 a21 a22 ··· a2n 7
6 7
A = [aij ]m⇥n =6 . .. .. .. 7 = [a1 , a2 , · · · , an ]
4 .. . . . 5
Linear Algebra am1 am2
amn
2
···
3
n Diagonal matrix: a11 0 ··· 0
6 0 a22 ··· 0 7
6 7
diag(a11 , a22 , · · · , ann ) = 6 . .. .. .. 7
4 .. . . . 5
n Identity matrix: I = diag(1, 1, · · · , 1) 0 0 ··· ann
Pn
n Trace: tr(A) = j=1 ajj
3 4
Matrix Addition/Subtraction Multiply a Vector by a Matrix
Ax = y
2 32 3 2 3
If C = A ± B, then [cij ] = [aij ] ± [bij ] a11 a12 ··· a1n x1 y1
<latexit sha1_base64="XBpUKcbxynlQKcYiLj01nvaLmi4=">AAACCXicbZBNS8MwGMdTX+d8q3oUJGwInkYriF6EghePE9wLdGWkabrFJU1JUmGUnQQvfgrvXjwo4tVv4M2P4s1s3UE3/xD45f88D8nzD1NGlXacL2thcWl5ZbW0Vl7f2Nzatnd2m0pkEpMGFkzIdogUYTQhDU01I+1UEsRDRlrh4GJcb90SqahIrvUwJQFHvYTGFCNtrK4NfdzN6c0ogOfQR1PspBz6YXHp2lWn5kwE58GdQtWrPN59H0Si3rU/O5HAGSeJxgwp5btOqoMcSU0xI6NyJ1MkRXiAesQ3mCBOVJBPNhnBQ+NEMBbSnETDift7IkdcqSEPTSdHuq9ma2Pzv5qf6fgsyGmSZpokuHgozhjUAo5jgRGVBGs2NICwpOavEPeRRFib8MomBHd25XloHtfck5pz5VY9DxQqgX1QAUfABafAA5egDhoAg3vwBF7Aq/VgPVtv1nvRumBNZ/bAH1kfP0vhnRw=</latexit>

n
6 a21 a22 ··· a2n 7 6 x2 7 6 y2 7
7 6 7 6 n
X
6 7
6 .. .. .. .. 7 6 .. 7 = 6 .. 7 and yi = aij xj
n Commutative: A + B = B + A 4 . . . . 54 . 5 4 . 5 j=1
n Associative: (A + B) + C = A + (B + C) am1 yn am2 ··· amn xn
Xn
write A = [a1 , a2 , . . . , an ] , then y = x j aj
j=1
l y can be written as a weighted sum of A’s column vectors
5 6

Matrix Multiplication Transpose

Pp
If Cm⇥n = Am⇥p Bp⇥n , then [cij ] = k=1 aik bkj
n If A> = B, then bij = aji
n (A> )> = A, (AB)> = B> A> , (A + B)> = A> + B>
<latexit sha1_base64="TzTuyz4RyLrUQng5h+aXhLWaYpI=">AAACV3icbVHdTsIwGO0GAuIf6KU3jcQEoyGbCdEbE8AbLyGRnwSQdKXThq5b2s6ELLyk8YYXMD6DN1o2AoJ+SduTc86Xfj11Akalsqy5YabSO5lsbje/t39weFQoHnekHwpM2thnvug5SBJGOWkrqhjpBYIgz2Gk60zuF3r3lQhJff6opgEZeuiZU5dipDQ1KvDywEPqxXGj+uxpoPzgIt7hHVzzVytcrjdWeiMB9fjY8FyuTYkKL5fu2ahQsipWXPAvsJegVIOtz49cttocFd4GYx+HHuEKMyRl37YCNYyQUBQzMssPQkkChCfomfQ15MgjchjFuczguWbG0PWFXlzBmP3dESFPyqnnaOdieLmtLcj/tH6o3NthRHkQKsJxcpEbMqh8uAgZjqkgWLGpBggLqmeF+AUJhJX+irwOwd5+8l/Qua7Y1YrV0mnUQFI5cArOQBnY4AbUwANogjbA4B18GSkjbcyNbzNj5hKraSx7TsBGmcUfM4q0gw==</latexit>

n In general, non-commutative: AB 6= BA
n Symmetric matrix: aij = aji or A = A>
n Associative: (AB)C = A(BC)
n Matrix A is orthogonal if A> A is diagonal
n Distributive: (A + B)C = AC + BC
n Matrix A is orthonormal if A> = A 1

7 8
Determinant Inverse

a11 a12 [cof(A)]>
n If A = , then |A| = a11 a22 a21 a12 <latexit sha1_base64="XC0rRSS/t29xdLLgtnaP4z3NSz0=">AAACBnicbVDLSsNAFJ34rPUVHzsRBovgxpIIohuxxYW6q2Af0MYymU7aoZNJmJkINWTlxl9x04Uibl34Be5c+idO2graeuDC4Zx7ufceN2RUKsv6NKamZ2bn5jML2cWl5ZVVc229IoNIYFLGAQtEzUWSMMpJWVHFSC0UBPkuI1W3e5b61VsiJA34teqFxPFRm1OPYqS01DS3Gz5SHdeLizfxvp0UE3gCf6TLpGnmrLw1AJwk9ojkTt/vvs77m3GpaX40WgGOfMIVZkjKum2FyomRUBQzkmQbkSQhwl3UJnVNOfKJdOLBGwnc1UoLeoHQxRUcqL8nYuRL2fNd3ZleKMe9VPzPq0fKO3ZiysNIEY6Hi7yIQRXANBPYooJgxXqaICyovhXiDhIIK51cVodgj788SSoHefswb13ZuUIBDJEBW2AH7AEbHIECuAAlUAYY3INH8ASejQejb7wYr8PWKWM0swH+wHj7BnscnDY=</latexit>

A 1
A=I and A 1
=
a21 a22 |A|
Pn Cofactor of element aij
n In general, |A| = j=1 aij cof(aij ) n (A 1 ) 1 = A
n Properties: 1
n (AB) = B 1A 1
p Determinant is a scalar quantity
p If |A| = 0, then A is singular, otherwise non-singular
n (A> ) 1 = (A 1 )>
p |A> | = |A|
p |AB| = |BA| = |A||B|

9 10

Inner Product, Outer Product Gradient Vector（写法不怎么严格）

n The inner product of two vectors x, y 2 Rn is a scalar
X n n Given: f (x) is a real valued function
> >  >
hx, yi = x y = y x = x i yi @f @f @f @f
i=1
rx f (x) = = , ,...,
p If hx, yi = 0, then x and y are orthogonal
@x @x1 @x2 @xn
p First order derivatives
n The outer product of two vectors x 2 Rm and y 2 Rn is
a matrix 2 3
x 1 y1 x 1 y 2 · · · x 1 y n n Extension: how about f (x) is a vector?
6 x 2 y1 x 2 y 2 · · · x 2 y n 7
> 6 7
x ⌦ y = xy = 6 . .. . .. 7
4 . . . . . . 5
x m y1 x m y2 ··· x m yn 11 12
Gradient Vector: Properties Hessian Matrix
n Second order derivatives
rx (x> y) = rx (y> x) = y  2
@ 2 f (x) @ f (x)
H(x) = =
rx (x> x) = 2x @x@x> @xi @xj
2 @ 2 f (x) 3
rx (x> Ay) = Ay 2
@ 2 f (x)
··· @ 2 f (x)
@x1 @x2 @x1 @xn
6 @ 2@x 1
7
rx (y> Ax) = A> y 6 f (x) @ 2 f (x)
··· @ 2 f (x) 7 Always
6 @x @x @x22 @x2 @xn 7
> >
rx (x Ax) = Ax + A x, if A is symmetric := 2Ax = 6 2. 1 .. .. .. 7 symmetric!!
6 . . 7
4 . . . 5
@ 2 f (x) @ 2 f (x) @ 2 f (x)
@xn @x1 @xn @x2 ··· @x2n
13 14

Eigenvalue and Eigenvector

Av = v
(A I)v = 0 Probability
|A I| = 0 (characteristic equation)

n Solutions to the characteristic equation are called

eigenvalues and their corresponding v eigenvectors

15 16
Motivation Axioms for Probabilities
n We have 25 PhD and 15 MPhil students. If a student is n All probabilities are between 0 and 1: 0  P (A)  1
randomly picked from these 2 groups, which group will you
guess (s)he is from? n The certain event has probability 1
n The impossible event has probability 0
2 classes: !1 = PhD , !2 = MPhil A\B
n If A and B are any two events,
n The state of nature is unpredictable
P (A [ B) = P (A) + P (B) P (A \ B) A B
n Use probability!!

17 18

Mutually Exclusive Events Conditional Probability

n Two events are mutually exclusive if they cannot n Let A and B be two events such that P (A) > 0, P (B) > 0
occur at the same time n P (B|A) : probability of B given that A has occurred

A single card is chosen at random from

a standard deck of 52 playing cards P (A \ B)
P (B|A) =
P (A)
p E1: the card chosen is a five, E2: the card A B
chosen is a king
p Mutually exclusive? P (A \ B) = P (A)P (B|A)
A\B
19 20
Conditional Probability Independence
n For any three events A1 , A2 , A3 : n Two random variables A and B are independent if
P (A1 \ A2 \ A3 ) = P (A1 )P (A2 |A1 )P (A3 |A1 \ A2 ) P (B|A) = P (B) or P (A|B) = P (A)
Pn p The probability of B occurring is not affected by the
n If A1 , . . . , An are mutually exclusive with i=1 P (Ai ) = 1 occurrence or non-occurrence of A
p Knowledge about A contains no information about B
P (B) = P (A1 )P (B|A1 ) + · · · + P (An )P (B|An ) p This is also equivalent to P (A \ B) = P (A)P (B)
Xn
n If n Boolean variables (A1 , A2 , . . . , An ) are independent
= P (B|Ai )P (Ai ) Yn
i=1 P (A1 \ A2 \ · · · \ An ) = P (Ai )
21 22
i=1

Bayes Theorem or Rule Bayes Theorem or Rule

P (x|!i )P (!i )
P (A \ B) P (B|A)P (A) P (!i |x) =
P (A|B) = = P (x)
P (B) P (B)
n P (!i ) : prior probability of !i
p Initial probability for !i , before observing the training data
A\B n P (!i |x) : posterior probability for !i after observing x
A B n P (x|!i ) : likelihood of observing x given class !i
n P (x) : probability that training data x will be observed

23 24
Discrete Probability Distributions Example: Uniform Distribution
X : discrete random variable n Outcome of throwing a fair die
n Probability function or probability distribution 1
P (X = 1) = P (X = 2) = · · · = P (X = 6) =
6
P (X = x)

n Cumulative distribution function (or distribution function)

F (x) = P (X  x)

25 26

Example: Binomial Distribution Continuous Probability Distributions

n Given: probability of getting a head is p, #heads when X : discrete random variable
the biased coin is tossed n times n The probability that X takes on any one value is general zero
⇣ ⌘
n x n The probability that X lies between two different values is
P (X = x) = Bi(x; n, p) = p (1 p)n x
x more meaningful
Z b
P (a < X < b) = p(x)dx Probability density function (PDF)
a
n Cumulative distribution function (or distribution function)
Z x
dF (x)
F (x) = P (X  x) = p(x)dx = p(x)
27
1 dx 28
Example：Normal (Gaussian)
Example: Uniform Distribution
Distribution
8
( > x
if a  x  b " ✓ ◆2 #
1
if a  x  b <b a 1 1 x µ
p(x) = b a F (x) = 0 if x < a p(x) = p exp
0 otherwise >
: 2⇡ 2
1 if x > b

29 30

Joint Distribution: Discrete Joint Distribution: Continuous

n If X and Y are two discrete random variables, we define
the joint probability function of X and Y by n If X and Y are continuous random variables
Z b Z d
P (X = x, Y = y) = p(x, y) P (a < X < b, c < Y < d) = p(x, y)dxdy
x=a y=c
XX Z 1 Z 1
where p(x, y) 0 and p(x, y) = 1
p(x, y) 0 and p(x, y)dxdy = 1
x y X 1 1
n Marginal probability function P (X = x) = p(x, yj ) Joint density function
Z 1
n Joint distribution function j
XX n Marginal density function p(x) = p(x, v)dv
F (x, y) = P (X  x, Y  y) = p(u, v) 1
ux vy 31 32
Joint Distribution: Continuous Example
n Joint distribution function
Z x Z y
F (x, y) = F (X  x, Y  y) = p(u, v)dudv n Random vector: X = [X1 , X2 , . . . , Xn ]>
u= 1 v= 1 n Multivariate Gaussian: X ⇠ N (µ, ⌃)
@ 2 F (x, y)
= p(x, y)
@x@y

nMarginal distribution function 1 1 1
Z x Z 1 p(X) = exp (X µ)> ⌃ (X µ)
(2⇡)n/2 |⌃|1/2 2
Distribution function
P (X  x) = p(u, v)dudv of X
u= 1 v= 1
33 34

Mathematical Expectation Moments

n Expected value, expectation, or mean of a random n rth moment: E(X r )
variable X Mean µ = E(X) is the 1st moment
n Discrete n rth central moment: µr = E[(X µ)r ]
Xn
E(X) = xj P (X = xj ) µ0 = 1, µ1 = 0, µ2 = variance
j=1 n For multivariate random vector X:
n Continuous p 2nd central moment: covariance matrix
Z 1
E(X) = xp(x)dx ⌃ = cov(X) = E[(X µ)(X µ)> ]
1
35 36
Covariance Matrix
n For a 2-D vector X = [X1 , X2 ]>
  !
>
⌃=E
X1 µ 1 X1 µ 1
X2 µ 2 X2 µ 2 Information Theory
✓ ◆
(X1 µ1 )2 (X1 µ1 )(X2 µ2 )
=E
(X2 µ2 )(X1 µ1 ) (X2 µ2 )2
  2
11 12 1 12
= = 2
21 22 12 2
37 38

Entropy: Intuitive Notion Entropy: Formal Definition

Measures the impurity, uncertainty, irregularity, surprise n X: discrete random variable with alphabet X = {x1 , x2 , . . . , xn }
and probability mass function p(x) = P (X = x), x 2 X
n Suppose we have two discrete classes X
p S: a set of training examples Entropy:
H(X) = p(x) logb p(x)
p p+: proportion of positive examples in S x2X

p p-: proportion of negative examples in S

n X: continuous random variable, the entropy is differentiable
n Optimal purity (impurity/uncertainty = 0) Z
p p+ = 1, p- = 0 or p+ = 0, p- = 1
Entropy: h(X) = p(x) logb p(x)dx
n Least pure (maximum impurity/uncertainty) x2X
p p+ = 0.5, p- = 0.5
39 40
Entropy: Formal Definition Example: Coding
a b c d
The unit of entropy
P(X) 0.5 0.25 0.125 0.125
n Depends on the base b of the log operation
p b = e: nats, b = 2: bits (adopted usually) a b c d
n Code 1:
n Entropy can be changed from one base to another Code 00 01 10 11

p Hb (X) = (logb a)Ha (X) Expected length to encode one symbol from X: 2bits

In general, when X can take n values a b c d

n Code 2: Code 0 10 110 111
n H(X) 0, with H(X) = 0 if there is a xk with p(xk ) = 1
n H(X)  log n, with H(X) = log n if p(x) = 1/n Expected length: 0.5x1 + 0.25x2 + 0.125x3 + 0.125x3 = 1.75bits
41 42

Relationship with Coding Joint Entropy

n Optimal length code assigns log2 p bits to message n Joint entropy of a pair of discrete random variables X
having probability p and Y with a joint distribution p(x, y):
XX
p Expected number of bits to encode + or - of a random H(X, Y ) = p(x, y) log p(x, y)
member of S x2X y2Y
p+ ( log2 p+ ) + p ( log2 p ) = H(S)
n If X and Y are two independent sample spaces, then
p Entropy = expected number of bits needed to encode class
of a randomly drawn member of S under the optimal, H(X, Y ) = H(X) + H(Y )
shortest-length code
43 44
Conditional Entropy Conditional Entropy
X
H(Y |X) = p(x)H(Y |X = x) n In general, H(Y |X) 6= H(X|Y )
x2X n Chain rule:
X X
= p(x) p(y|x) log p(y|x) H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y )
x2X y2Y n When X and Y are independent H(Y |X) = H(Y )
XX
= p(x, y) log(y|x) n For multiple variables
x2X y2Y H(X1 , X2 , . . . , Xn ) =
H(X1 ) + H(X2 |X1 ) + · · · + H(Xn |X1 , . . . , Xn 1 )
n Uncertainty about Y, given that we know X
p When X1, ..., Xn are i.i.d., H(X1 , X2 , . . . , Xn ) = nH(X1 )
45 46

Kullback-Leibler Divergence Kullback-Leibler Divergence

n Motivation n Information inequality
p Suppose there is a r.v. with true distribution p p KL(p||q) 0 , with equality if and only if p(x) = q(x), 8x
p However, we do not know p; instead we assume that the
distribution of the r.v. is q KL-divergence often used as a ``distance’’ measure
p The code would need more bits to represent the r.v., and the between distributions, but
difference in the number of bits is denoted as KL(p||q)
p Not symmetric: KL(p||q) 6= KL(q||p)
n KL-divergence from p(x) to q(x): p Does not satisfy the triangle inequality
X X p(x) X X
KL(p||q)  KL(p||r) + KL(r||q) does not hold in general
p(x)
KL(p||q)KL(p||q)
= =p(x) p(x)
log log = p(x) log q(x) + p(x) log p(x) p Not a distance between distributions
x2X q(x)
q(x)
x2X x2X
x2X
as relative entropy 47 48
Mutual Information Mutual Information
X ✓ ◆
How much information does one random variable X tell about 1 p(x, y)
another one Y? I(X; Y ) = p(x, y) log ·
x,y
p(x) p(y)
n Given: two random variables X and Y with a joint probability X p(x|y)
mass function p(x, y) and marginal probability mass functions = p(x, y) log
p(x) and p(y) x,y
p(x)
n Mutual information I(X; Y) X X
= p(x, y) log p(x) + p(x, y) log p(x|y)
I(X; Y ) = KL(p(x, y)||p(x)p(y)) x,y x,y
KL-divergence between
XX p(x, y) the join distribution and
= p(x, y) log = H(X) H(X|Y ) MI is the reduction in the uncertainty
p(x)p(y) the product distribution of X due to the knowledge of Y
x2X y2Y
49 50

Mutual Information
n I(X; Y ) = H(X) H(X|Y ) = H(Y ) H(Y |X) symmetric
n I(X; Y ) = H(X) + H(Y ) H(X, Y )
p Information that X tells about Y = uncertainty in X +
uncertainty in Y – uncertainty in both X and Y
n I(X; X) = H(X) H(X|X) = H(X) is the entropy itself
n I(X; Y ) 0 with the equality if and only if X and Y are
independent, i.e.,
H(X, Y )  H(X) + H(Y )

Comprehensive Linear Algebra Formulas
No ratings yet
Comprehensive Linear Algebra Formulas
2 pages
Linear Algebra Course Syllabus 2023
No ratings yet
Linear Algebra Course Syllabus 2023
5 pages
Matrix Inversion Warning
No ratings yet
Matrix Inversion Warning
4 pages
MATLAB Basics: Linear Algebra Guide
No ratings yet
MATLAB Basics: Linear Algebra Guide
30 pages
Advanced Linear Algebra Assignment 3
No ratings yet
Advanced Linear Algebra Assignment 3
13 pages
Linear Algebra Cheat Sheet Basics
No ratings yet
Linear Algebra Cheat Sheet Basics
1 page
Importance of Linear Algebra in ML
No ratings yet
Importance of Linear Algebra in ML
36 pages
Linear Algebra Essentials for Machine Learning
No ratings yet
Linear Algebra Essentials for Machine Learning
152 pages
MATLAB Matrix Operations Guide
No ratings yet
MATLAB Matrix Operations Guide
6 pages
Linear Algebra Basics: Vectors & Matrices
No ratings yet
Linear Algebra Basics: Vectors & Matrices
20 pages
Linear Algebra Essentials for ML
No ratings yet
Linear Algebra Essentials for ML
50 pages
Linear Algebra Concepts and Operations
No ratings yet
Linear Algebra Concepts and Operations
16 pages
Matlab Eigenvalues and Eigenvectors Guide
No ratings yet
Matlab Eigenvalues and Eigenvectors Guide
3 pages
Linear Algebra Essentials for ML
No ratings yet
Linear Algebra Essentials for ML
62 pages
Linear Algebra in Machine Learning Basics
No ratings yet
Linear Algebra in Machine Learning Basics
30 pages
Linear Algebra's Role in Machine Learning
No ratings yet
Linear Algebra's Role in Machine Learning
23 pages
Matrix and Tensor Factorization in ML
No ratings yet
Matrix and Tensor Factorization in ML
49 pages
Matrix Operations and Linear Equations
100% (1)
Matrix Operations and Linear Equations
46 pages
Numerical Linear Algebra Basics
No ratings yet
Numerical Linear Algebra Basics
265 pages
Matrix Theory Overview
No ratings yet
Matrix Theory Overview
22 pages
L2 Norms and Matrix Fundamentals
No ratings yet
L2 Norms and Matrix Fundamentals
3 pages
Linear Algebra Exercises for Machine Learning
No ratings yet
Linear Algebra Exercises for Machine Learning
5 pages
Inverse Matrices in Maple
No ratings yet
Inverse Matrices in Maple
7 pages
Linear Algebra 1 Notes Overview
No ratings yet
Linear Algebra 1 Notes Overview
3 pages
Scilab Matrix Operations and Analysis
No ratings yet
Scilab Matrix Operations and Analysis
9 pages
Real-World Uses of Matrices
No ratings yet
Real-World Uses of Matrices
19 pages
Linear Algebra Essentials for ML
No ratings yet
Linear Algebra Essentials for ML
34 pages
Scilab Matrix Operations Guide
No ratings yet
Scilab Matrix Operations Guide
22 pages
Linear Algebra for Machine Learning
No ratings yet
Linear Algebra for Machine Learning
34 pages
Linear Algebra Basics for ML & Deep Learning
No ratings yet
Linear Algebra Basics for ML & Deep Learning
33 pages
Solving Linear Equation Systems
No ratings yet
Solving Linear Equation Systems
20 pages
Solving Linear Systems in MATLAB
No ratings yet
Solving Linear Systems in MATLAB
80 pages
Linear Algebra for Machine Learning
No ratings yet
Linear Algebra for Machine Learning
59 pages
Linear Algebra Concepts and Visualizations
No ratings yet
Linear Algebra Concepts and Visualizations
42 pages
Linear Algebra Concepts with MATLAB
No ratings yet
Linear Algebra Concepts with MATLAB
16 pages
Matrix Operations in Math 415 Lecture 4
No ratings yet
Matrix Operations in Math 415 Lecture 4
8 pages
Linear Algebra Essentials for ML
No ratings yet
Linear Algebra Essentials for ML
62 pages
System of Linear Equations Explained
No ratings yet
System of Linear Equations Explained
28 pages
MAT212 Linear Algebra Lecture Notes
No ratings yet
MAT212 Linear Algebra Lecture Notes
82 pages
Linear Algebra: Determinants & Cramer's Rule
No ratings yet
Linear Algebra: Determinants & Cramer's Rule
47 pages
Solving Linear Equations and Matrices
No ratings yet
Solving Linear Equations and Matrices
18 pages
Understanding Vector Spaces and Operations
No ratings yet
Understanding Vector Spaces and Operations
21 pages
Matrix Operations and Notation Overview
No ratings yet
Matrix Operations and Notation Overview
35 pages
Introducción a MATLAB: Comandos Básicos
No ratings yet
Introducción a MATLAB: Comandos Básicos
7 pages
Coordinate Vectors and Transition Matrices
No ratings yet
Coordinate Vectors and Transition Matrices
9 pages
Understanding Matrices and Their Properties
No ratings yet
Understanding Matrices and Their Properties
5 pages
Spectral Theorem and Matrix Diagonalization
No ratings yet
Spectral Theorem and Matrix Diagonalization
12 pages
Matrix Representation in Linear Algebra
No ratings yet
Matrix Representation in Linear Algebra
6 pages
Octave Basics: A Beginner's Guide
No ratings yet
Octave Basics: A Beginner's Guide
5 pages
Scilab Matrix Operations and Analysis
No ratings yet
Scilab Matrix Operations and Analysis
9 pages
Solution Sets for Augmented Matrices
No ratings yet
Solution Sets for Augmented Matrices
193 pages
Linear Algebra Report: Matrix Applications
No ratings yet
Linear Algebra Report: Matrix Applications
12 pages
MA106 Linear Algebra Lecture Notes
No ratings yet
MA106 Linear Algebra Lecture Notes
61 pages
Linear Algebra Tutorial Exercises
No ratings yet
Linear Algebra Tutorial Exercises
2 pages
Linear Algebra Lecture 03: REF & Solutions
No ratings yet
Linear Algebra Lecture 03: REF & Solutions
22 pages
Minor and Cofactor of Matrix A31
No ratings yet
Minor and Cofactor of Matrix A31
87 pages
Linear Algebra Assignment 1 - Spring 2025
No ratings yet
Linear Algebra Assignment 1 - Spring 2025
2 pages
Vector Analysis Lecture Notes
No ratings yet
Vector Analysis Lecture Notes
29 pages
Origin 2025b Tutorial Overview
No ratings yet
Origin 2025b Tutorial Overview
65 pages
Risk Management Strategies for Firms
No ratings yet
Risk Management Strategies for Firms
69 pages
FFT Algorithm Implementation in C
No ratings yet
FFT Algorithm Implementation in C
9 pages
Introduction to Parser Combinators
No ratings yet
Introduction to Parser Combinators
7 pages
Wavelet Filter Banks for Medical Imaging
No ratings yet
Wavelet Filter Banks for Medical Imaging
10 pages
Grigori Grabovoi: Using Number Sequences
No ratings yet
Grigori Grabovoi: Using Number Sequences
2 pages
Grade 9 Mathematics Exam Paper 2024
No ratings yet
Grade 9 Mathematics Exam Paper 2024
10 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
21 pages
Class 12 Half Yearly Exam Dates 2024-25
No ratings yet
Class 12 Half Yearly Exam Dates 2024-25
2 pages
June 2017 (GCE) MS - S2 Edexcel
No ratings yet
June 2017 (GCE) MS - S2 Edexcel
13 pages
EWMA vs GARCH(1,1) Volatility Analysis
No ratings yet
EWMA vs GARCH(1,1) Volatility Analysis
4 pages
Overview of Computer Structure
No ratings yet
Overview of Computer Structure
11 pages
JEE Mains 1-Month Study Plan Guide
No ratings yet
JEE Mains 1-Month Study Plan Guide
2 pages
Find Zeroes in Factorial Products
No ratings yet
Find Zeroes in Factorial Products
70 pages
Zengeza 7 Grade 6 Maths Exam 2022
No ratings yet
Zengeza 7 Grade 6 Maths Exam 2022
3 pages
Question Bank - S&S - 1
No ratings yet
Question Bank - S&S - 1
5 pages
Grade 2 Commutative Property Lesson Plan
No ratings yet
Grade 2 Commutative Property Lesson Plan
2 pages
Quantum Mechanics Question Bank
No ratings yet
Quantum Mechanics Question Bank
2 pages
Third Periodical Test: Math 5 Specs
No ratings yet
Third Periodical Test: Math 5 Specs
9 pages
Newtons Law of Motion Worksheet 1 Oct
No ratings yet
Newtons Law of Motion Worksheet 1 Oct
2 pages
Understanding The Confusion Matrix in Machine Learning - GeeksforGeeks
No ratings yet
Understanding The Confusion Matrix in Machine Learning - GeeksforGeeks
8 pages
Pull-In Analysis of Torsion Microactuator
No ratings yet
Pull-In Analysis of Torsion Microactuator
7 pages
Grade 10 Math Lesson Plan: Combinations
No ratings yet
Grade 10 Math Lesson Plan: Combinations
4 pages
Lagrange's Method for Optimization
No ratings yet
Lagrange's Method for Optimization
11 pages
Joint PDF and Independence of Variables
No ratings yet
Joint PDF and Independence of Variables
8 pages
Grade 1 Math Challenge Overview
No ratings yet
Grade 1 Math Challenge Overview
50 pages
Using C++ DLL in C# with P/Invoke
No ratings yet
Using C++ DLL in C# with P/Invoke
4 pages
Crystallization: Process and Importance
No ratings yet
Crystallization: Process and Importance
23 pages
Potain MR 295 H20 Load Diagrams
No ratings yet
Potain MR 295 H20 Load Diagrams
4 pages

机器学习前置数学知识概述

Uploaded by

机器学习前置数学知识概述

Uploaded by

Outline

机器学习 Linear Algebra

Machine Learning Probability

Matrix Multiplication Transpose

Inner Product, Outer Product Gradient Vector（写法不怎么严格）

Eigenvalue and Eigenvector

n Solutions to the characteristic equation are called

Mutually Exclusive Events Conditional Probability

A single card is chosen at random from

Bayes Theorem or Rule Bayes Theorem or Rule

n Cumulative distribution function (or distribution function)

Example: Binomial Distribution Continuous Probability Distributions

Joint Distribution: Discrete Joint Distribution: Continuous

Mathematical Expectation Moments

Entropy: Intuitive Notion Entropy: Formal Definition

p p-: proportion of negative examples in S

In general, when X can take n values a b c d

Relationship with Coding Joint Entropy

Kullback-Leibler Divergence Kullback-Leibler Divergence

You might also like