A Concise Introduction To Machine Learni PDF
A Concise Introduction To Machine Learni PDF
1 Introduction
In this paper I will try to give a concise and comprehensive introduction to the theory of Artificial
Neural Networks.
2 Machine Learning
In recent years Machine Learning became one of the most promising and rapidly developing
fields in Computer Science. It tackles the problems that classical programming and sometimes
also humans can’t handle. In this section I will give the short introduction to the field of Machine
Learning.
In his book Information Theory, Inference, and Learning Algorithms [3] David MacKay writes:
Machine learning allows us to tackle tasks that are too dicult to solve with fixed
programs written and designed by human beings. From a scientic and philosophical
point of view, machine learning is interesting because developing our understand-
ing of machine learning entails developing our understanding of the principles that
underlie intelligence.
Definition The intuitive definition of machine learning was given by Arthur Samuel in 1959:
Machine Learning is a field of study that gives computers the ability to learn without
being explicitly programmed.
This definition is nice and easy to understand. Though, to work with machine learning as a
scientific field (indeed, it is a field of computer science, strongly related to some mathematical
fields, such as computational statistics and mathematical optimization), we need a more for-
mal definition. One can be found in Tom Mitchell’s book Machine Learning (1997) [4]. This
definition is widely known and often reffered to as a well-posed learning problem:
A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P , if its performance at tasks in T , as measured
by P , improves with experience E.
1
3 NEURAL NETWORKS
• Supervised Learning - the agent receives the set of examples with labels (”right an-
swers”) to learn from
• Unsupervised Learning - no explicit feedback is provided. The agent should learn
patterns in an unlabeled dataset
• Reinforcement Learning - the agent receives series of reinforcements - rewards or pun-
ishments (for example, winning or loosing the chess game)
Here are the formal definitions for the problems of supervised and unsupervised machine learn-
ing1 .
3 Neural Networks
An artificial neural networks is one of the most developed and widely used algorithms of machine
learning. It is the mathematical model of brain’s activity that is able to tackle both problems of
classification and regression. Neural network can function as a model of supervised, unsupervised
or reinforcement learning.
2
4 MODEL OF A NEURON
Since their invention in 50-s neural networks have been used to model human brain and approach
the goal of creating human-like artificial intelligence. Nowadays it is more common to think of
neural networks as of the statistical models that perform well on some extremely complicated
tasks. For example, Hastie et. al. [7] view neural networks as nonlinear statistical models, the
two-stage regression or classification models. David MacKay [3] sees them as parallel distributed
computational systems consisting of many interacting simple elements. And Goodfellow et. al.
(MIT) [2] write the following
4 Model of a Neuron
A biological neural network (brain) consists of cells called neurons. Human brain is composed
of about 10 billion neurons, each connected to about 10,000 other neurons. The same applies
to artificial neural network - they consists of many artificial neurons - mathematical models of
biological ones. I will start this section by describing the structure of a biological neuron. Then
I will provide a formal description of an artificial neuron as a mathematical model.
3
4 MODEL OF A NEURON
y = gw (x) (1)
n n
where x ∈ X , y ∈ Y and w ∈ R .
Function gw is a superposition of two functions, sw : X n → R and f : R → Y :
gw (x) = f (sw (x))
where sw is defined as follows:
n
X
sw (x) = wi xi
i=0
f is a nonlinear activation function in case of classification and an identity function ∀xf (x) = x
in case of regression.
4
5 STRUCTURE AND REPRESENTATION
Figure 3: (1) Recurrent neural network represented by directed graph. (2) 3-layered feedfor-
ward neural network represented by 3-partide directed graph. (3) 3-layered fully-connected
feedforward neural network represented by a complete 3-partide directed graph
Network topologies As it was already said, neural networks consist of neurons. These
neurons are connected with directed links (synaptic connections) with numeric weights that
determine the strength and sign of the connection.
Neurons are grouped into layers. First layer is called input layer, the last one - output layer.
All the layers between input and output layers are called hidden layers. Number of hidden
layers is one of the tunable metaparameters that define the architecture of a neural network.
According to [7], typically the number of hidden units is somewhere in the range of 5-100.
To satisfy the linear model of a regression each layer, except for the output one, has an additional
bias unit b = 1.
For a specific example of neural network architecture see Figure 4.
By the type of connections neural network can be either feedforward or recurrent:
• Feedforward network - has connections only in one direction (outputs of neurons from
layer k can be connected only to neurons of layers k + c where c > 0). The network
diagram of a feedforward netwond forms a directed acyclic graph.
• Recurrent network - feeds its outputs to its own inputs. This network has at least one
cycle (at least one connection from neuron in layer k to neuron in layer k − c where c ≥ 0).
If in neural network with N layers ∀k : 0 ≤ k ≤ N − 1 every neuron in layer k is connected to
all the neurons of layer k + 1, the network is called fully-connected.
Figure 4 depicts the schematic of a feedforward neural network with one hidden layer.
5
6 NETWORK TRAINING
Figure 4: Feedforward artificial neural network with one hidden layer. Number of neurons in
each layer: 3 (2-dimentional input + bias unit) in the input layer, 4 in the hidden layer, 1 in
the output layer (1-demensional output)
6 Network Training
I will describe the training of a neural network with backpropagation of errors.
As the number of hidden layers grows, the problem arises, as we can only compute the error of
output layer by finding the deviation of hypothesis hw from the desired output y.
The fitness function for regression is the sum of squared errors
K X
X n
J(w) = (yi − fk (xi ))2 (2)
k=1 i=1
The task of learning is to minimize the fitness function J(w). There are different learning algo-
rithms for doing that. In this paper I will explain only the idea of the classical backpropagation
algorithm.
Backpropagation is an abbreviation for ”backward propagation of errors”. According to [7],
backpropagation is a two-pass procedure, used to compute the gradients for the updates in
gradient descent algorithm:
• Forward pass -the current weights are fixed and the predicted values are computed
6
REFERENCES REFERENCES
• Backward pass - the errors δki are computed and then backpropagated to give errors
sij . Both sets of errors are then used to compute the gradients for updates.
The main advantage of backpropagation is that it has local nature, and thus it can be efficiently
implemented on a parallel architecture computer.
References
[1] Simon Haykin. Neural Networks. A Comprehensive Foundation. Prentice Hall, second edi-
tion, 2005.
[2] Yoshua Bengio Ian Goodfellow and Aaron Courville. Deep learning. Book in preparation
for MIT Press, 2016.
[3] David J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge
University Press, 2003.