0% found this document useful (0 votes)
16 views

Ai Application

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Ai Application

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Linear Algebra- How it is

used in AI ?

Understand How Linear Algebra is applying in AI.

How Linear Algebra (Mathematical Objects) is using in Artificial


Intelligence.

Sub-fields in AI

Well, Artificial Intelligence is not a single subject it has sub-fields


like Learning (Machine Learning & Deep Learning), Communication
using NLP, Knowledge Representation & Reasoning, Problem
Solving, Uncertain Knowledge & Reasoning.

In this article will explain how objects and its properties are using in
AI’s sub-fields ML, NLP, DL,etc., algorithms.

Describing the sub-field concepts where LA Objects can be applied.

Going through each sub-field explain bit to the concerned topic and
how applying it. The following diagram explain the areas where we
apply Linear Algebra in AI.
LA Objects applying in these areas of AI

N ote: Please note that Data representation, Data Processing are

not the sub areas of AI, these are using in ML,DL & NLP areas.
In the above diagram other sub-areas like Problem solving,
Knowledge representation and knowledge reasoning LA objects are
used but not as much in Learning(ML/DL) and NLP.

Describing LA Objects & properties in these sub-fields

Linear Algebra or Mathematical objects are Vectors, Matrices and


Tensors. Depend upon the dimensions of your data you have to
choose the right object to store and process, Title diagram describes
this.

Before starting how to use Mathematical Objects in AI, it is better to


refresh Linear Algebra.
Data representation: Explained in terms of Mathematical
Objects Vector, Matrix and Tensor.

Data set: It is a collection of examples or data points or objects.


Each example is a collection of features. Each example is a row and
feature is a column.

Design Matrix: A Data set can be described through Design


Matrix. A Design Matrix is a matrix containing a different example
in each row. For example:

Design Matrix representation

If the data is not in particular order, i.e., columns are not same for
each example/row. In such cases we describe as set containing ‘m’
elements of which has different vector size.

In supervised learning, data set contains a Label or target as well as


collection of features.
Design Matrix for Supervised Learning

Data Processing: Before we use Data sets in our ML algorithms or


in any sub-field of AI, it is necessary that the data set should be
ready (cleansed & filtered).

There are 3 forms of Data processing Mean subtraction,


Normalization and PCA &whitening. These forms described in short
in the below diagram.
Data processing operations explained through Numpy

These 3 form access Matrix and produces the desired. The 3rd form
PCA is used for dimensionality reduction and it is totally works in
pure linear algebra, following algorithm describes it.

PCA Algorithm for Training Data

Some of the operations used in Data selection, engineering, Data


cleansing, etc., argmin and argmax are the operations in Data
Processing. It works on Matrices and vectors and selecting the rows
or columns of minimum or maximum respectively.
Here axis can be column or row. Axis 0 (zero) means Column, axis
1(one) means Row.

argmin: Returns the indices of the minimum values along an axis.

argmax: Returns the indices of the maximum values along an axis.

Machine Learning (ML) : ML is an algorithmic based approach


that learns from training data and give decisions on unseen data.
There are many algorithms exists in ML for supervised and
unsupervised learning.

How LA concepts applied in ML-Regression


Algorithm: Here describes how Linear Algebra applies to
Regression analysis. Explaining the concepts through Linear
Multiple Regression Algorithm. The following diagram describes LA
concepts in ML and DL.
LA Objects, properties and usages in ML and DL

Regression Analysis explain in terms of Vectors, Matrices and


their properties.

What is Regression? It is a statistical technique for estimating the


relationships between a dependent and independent variables.

The most common form of regression analysis is Linear


Regression
In the following equations will describe Simple and Multiple Linear
Regression.

Simple & Multiple Regression with examples

This technique predicts continuous responses — for example,


forecasting stock prices, House Rent, etc.,.

Residual: In Machine Learning/statistical terminology, it is a


difference between the observed value and the estimated value of the
target variable.

Notation is given below:

Notation for observed & estimated value of target variable


Residual in Multiple regression

Sum of Squares of Residuals: Let’s define residual as ‘r’.

Least Square method: Least squares method is the standard


approach and it minimizes the Sum of Squares of Residuals ‘S’.

Ordinary Least Squares (OLS) or Linear Least Squares estimate the


parameters in a regression model by minimizing the sum of the
squares of residuals. It draws a line through the data points that
minimize the SSE between observed and predicted (or fitted or
estimated ) values.

The most important application is data fitting.


Data Fitting: It is the process of constructing a curve fitting or
mathematical function, that has the best fit to a set of data points.

Curve fitting can be linear or non-linear. The following describes


both curves.

Linear Curves:

Linear Curve

After the introduction of Regression Analysis let us define the loss


and cost function of it.

Loss Function: The Loss function of Linear Regression is defined


is as follows

The loss function of Regression

Finding out the parameters by differentiation w.r.to parameters.


Finding weights or parameters by applying a gradient on the Loss
function

What is Regularization: To avoid the over-fitting problem, the


regularization technique is used to shrink the magnitude of
Parameters. This can be achieved by adding a penalty (a function of
the sum of parameters) into the cost function. L1, L2, Drop out and
Max norm constraints used in DL, whereas L1, L2, L1+L2 used in
ML.

If you are using neural networks for ML algorithms you can apply all
of the above 4 regularization techniques.

L2 Regularization: It is the most common form of Regularization.


It can be implemented by penalizing the squared magnitude of all
parameters directly in the objective.
L1 Regularization: Each weight w we add the term param*|w| to
the objective function, both L1, L2 is defined is as follows:

Generalized Regularization

Use of Vector Norms in Machine Learning for


Regularization:

Vector Norms in Regularization to avoid the Over fitting problem

D eep Learning (DL): It is a branch of ML and deeply learns

the text, images, or videos. Unstructured data like images or videos


can be processed using DL. There are many applications of DL like
Image Processing ( Computer Vision using CNNs), Video Processing
(Computer Vision using RNNs), Text Processing (NLP using RNNs,
LSTMs ), etc., we can combine with Reinforcement Learning (DEEP
RL).

DL is inspired by Neurons. One neuron gets connected with multiple


neurons and applies activation function at the neuron.

Vectors, Matrices and Tensors are objects used in DL area. The


Following diagram is the sample of a Neural Network and describes
Input, Neurons, Layers, Feed Forward Propagation, Back-
propagation, etc.,

There are many Mathematical Subjects involved in Deep Learning,


in this article Linear Algebra is considered. Describing how
Mathematical objects are being used in each stage.

Common Neural Network Architecture

Input: Input is in the form of Vectors, Matrices or Tensors to the


Neural Network. Finally each data object/sample will be in Vectors.
Here Input is a vector of n - dimensions. It is an example or data
point in the data set.

Neurons or Nodes: Here we apply activation function for the


input of previous layer and weights or connections. It is an
interconnected group of natural or Artificial Neurons that uses
mathematical or computational model for information processing
based on a connectionistic approach to computation.

Connections: Connections of the biological neuron are modeled as


weights.

Each Neuron will be connected to other neurons in next layer

Layer: Each layer contains set of neurons the following picture


depicts.

Layer Contains neurons and will operated in vector level.


Feedforward Propagation: These are called Deep feedforward
networks or feedforward neural networks or Multilayer perceptrons
(MLPs). These are called feedforward because information flows
through the function being evaluated from x, through the
intermediate computations used to define f, and finally to the output
y.

Feedforward neural networks are called networks because they are


typically represented by composing together many different
functions.

Let us say for an example our network has 3 functions connected in


a chain, to form

These chain structures are used structure of neural networks.

Let us see how we are applying using vectors and matrices in Feed
forward Networks.

1. Vectorizing Inputs, Weights and Bias : x: input vector of n-


dimensions; w-weight matrix of n rows and m neurons in
the next layer, and bias of m neurons in the next layer.

Overall Calculation of Input, Weights and Bias into temporary


variable Z
From this it is concluded that

Generalized Approach

2. Apply intermediate Variable Z into activation function

Feedforward into next layer

3. The above steps repeated and results getting feed to next layer in
the forward way.

At each neuron Intermediate calculation & activation function will


be is as follows:

Consider an example of Neural Network of Input 2-Features, 3-


Hidden Layers and 1-Output Layers with dimensions 3,5,4,2,1
hidden units.

Neural Network with 1-Input, 4 Hidden and 1 Output Layers


Let us apply Vector, Matrix operations for forward propagation.

Forward Propagation for 4 layers

Know your Matrix dimensions:

Dimensions of the Matrix, Vector in Feed forward propagation

Feedforward Propagation = Matrix-Vector product rule,


addition of matrices along with activation functions.

Back Propagation = Matrix Calculus + Linear Algebra Product


Rules — will cover in the next article.

Natural Language Processing (NLP): NLP is concerned with


the interactions between human and computers, in particular how to
program computers to process and analyze large amounts of natural
language data.

Here we describe Word2Vector (W2V)technique that is for NLP. In


Word2Vec represents each distinct word with a particular list of
numbers called a Vector. Based on W2V we can apply vector
properties for checking the similarity and semantic similarity
between vectors.

In NLP we use Vectors and Matrices is as follows:

Vectors and Matrices using in W2V Algorithms

W2V used in many of the tasks in NLP and it is the base of capturing
word in to vector. Natural Language Text = Sequence of discrete
symbols

Produce Dense vector representation based on the context /use of


words.

What is Target & Context words: Consider a text instance with


context window size =2. Following describes
Context and Target/Current Words

How to represent One-hot representation?

Vocabulary: The set of words encoded in to the feature vector is


called the Vocabulary, so the dimension of vector is equal to the size
of the Vocabulary. In short, |V| = size of the Vocabulary.

Let us say our text data set contains the following lines

1. “And the Cute kitten purred and then …

2. “ Cute furry cat purred and miaowed…”

3. “ That the small kitten miaowed and she ..”

4. “ the loud furry dog ran and bit… ”

From these 4 sentences basis vocabulary : { bit, cute, furry, loud,


miaowed, purred, ran, small} — 8 is the vocabulary length. Let us
define target and context words.

Target Word: Kitten, Context words: { Cute, purred, small,


miaowed}
Target Word: Cat, Context words: { Cute, Furry, miaowed}

Target Word: Dog, Context words: { Loud, Furry, ran, bit }.

Now we represent as a vector of vocabulary length 8.

Words as Vectors

We defined the vectors as when context words appears then specify


‘1’, otherwise ‘0’ at the dimension of vector.

Checking the similarity between vectors: To check the


similarity we can use the Inner product (or) cosine as similarity
kernel.

Sim(Kitten,Cat)=Cosine(Kitten,Cat)~0.58; Sim(Kitten, Dog)=


Cosine(Kitten, Dog) ~ 0.00; Sim(Cat,Dog)=Cosine(Cat,Dog)~0.29

Cosine, Dot and Cross product between vectors

Embedding Matrix: Embedding Matrix can be defined as Rows ->


Target words and Columns -> Number of Context words are length
of context window
Embedding Matrix Dimension

Rows are word vectors, so we can retrieve them with one hot vectors

Word representation using one hot

Embedding Matrix with row as Target word and its context words

Algorithm for constructing Embedding Matrix:

Steps to construct Embedding Matrix

A Vector that captures the meaning of a Word. It can also be known


as Word2Vec, Word Emebedding. Following are the algorithms
1. Skip-gram (SG) : Predicting Context words given by the
target word

2. Continuous bag of words (CBOW): Predicts target


word given by the context words

3. Glove: It makes use of global co-occurrence statistics.


Glove consists of a weighted least squares model that trains
on global word-word co-occurrence counts.

The above 3 algorithms explained in the usage of Linear Algebra .

Step-1: Skip-gram (SG): The objective of the skip-gram (SG)


model is to maximize the average log probability

Describing Context and Target words

Step-2: Project into Vocabulary Softmax


Step 3: Learn to estimate likelihood of Context words

SKIP-GRAM

Continuous bag of words (CBOW): It predicts target or current


word based on its context words. Its possibility distribution would
be

 Project back to vocabulary size / softmax


 Embed context words, add them.

Expressing current word in the form of softmax of vector-matrix


product rules in LA

CBOW
GLOVE: Like word2vec, Glove is a set of vectors that capture the
semantic information (i.e., meaning about words. It consists of a
weighted least squares model that trains on global word-word co-
occurrence counts.

Glove makes use of Global-occurrence statistics.

Co-occurrence matrix: We define the this matrix using the


following corpus.

I like deep learning; I like NLP; I enjoy flying.

Co-occurrence Matrix

Let X be the word-word co-occurrence counts matrix.

Like the case in word2vec, each word has 2 vectors, input(v) and
output(u)

Cost Function of Glove model


Conclusion: Described how Linear Algebra applied in various
fields of AI, it is better to be keen in Linear Algebra stuff before we
move on ML, DL or NLP. I tried to cover how to apply Linear
Algebra stuff in algorithmic perspective, I hope it may give strength
to be involve more into Linear Algebra.

Linear Algebra promotes to other subjects like Matrix Calculus


which is heavily used in Back propagation in DL.

Thanks for reading this article, please drop a note if there are any
mistake(s) and appreciated your feedback.

References :

1. Artificial Intelligence: A Modern Approach by Stuart


Russell, Peter Norvig,

2. Deep Learning Book by Ian Goodfellow and Yoshua Bengio


and Aaron Courville

3. https://en.wikipedia.org/wiki/Regression_analysis

4. http://web.stanford.edu/class/cs224n/

5. Efficient Estimation of Word Representations in vector


space

6. https://nlp.stanford.edu/projects/glove/

You might also like