PPT Lecture 1 & 2
PPT Lecture 1 & 2
Definition: A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
1. The first design choice we face is to choose the type of training experience from which our system
will learn.
One key attribute is whether the training experience provides direct or indirect feedback regarding
the choices made by the performance system.
In indirect : the learner faces an additional problem of credit assignment, or determining the degree
to which each move in the sequence deserves credit or blame for the final outcome.
2. A second important attribute of the training experience is the degree to which the learner controls the
sequence of training examples. the learner might rely on the teacher to select informative board
states and to provide the correct move for each. Alternatively, the learner might itself propose board
states that it finds particularly confusing and ask the teacher for the correct move. Or the learner
may have complete control over both the board states and (indirect) training classifications, as it does
when it learns by playing against itself with no teacher
3. A third important attribute of the training experience is how well it represents the distribution of
examples over which the final system performance P must be measured. In general, learning is most
reliable when the training examples follow a distribution similar to that of future test examples.
In our checkers learning scenario, the performance metric P is the percent of games the system wins
in the world tournament. If its training experience E consists only of games played against itself,
there is an obvious danger that this training experience might not be fully representative of the
distribution of situations over which it will later be tested.
In order to complete the design of the learning system, we must now choose
Let us call this target function V and again use the notation V : B R to denote that V maps
any legal board state from the set B to some real value (we use R to denote the set
of real numbers).
What exactly should be the value of the target function V for any given board state? Of course any
evaluation function that assigns higher scores to better board states will do.
Let us therefore define the target value V(b) for an arbitrary board state b in B, as follows:
In the current discussion we will use the symbol V cap to refer to the function that is actually learned by
our program, to distinguish it from the ideal target function V.
Now that we have specified the ideal target function V, we must choose a representation that the learning
program will use to describe the function V cap that it will learn. As with earlier design choices, we
again have many options. We could, for example, allow the program to represent using a large table with
a distinct entry specifying the value for each distinct board state. Or we could allow it to represent using a
collection of rules that match against features of the board state, or a quadratic polynomial function of
predefined board features, or an artificial neural network. In general, this choice of representation
involves a crucial tradeoff. On one hand, we wish to pick a very expressive representation to allow
representing as close an approximation as possible to the ideal target function V.
On the other hand, the more expressive the representation, the more training data the program will require
in order to choose among the alternative hypotheses it can represent. To keep the discussion brief, let us
choose a simple representation:
for any given board state, the function V cap will be calculated as a linear combination
of the following board features:
0 xl: the number of black pieces on the board
x5: the number of black pieces threatened by red (i.e., which can be captured on red's next turn)
X6: thenumber of red pieces threatened by black
Thus, our learning program will represent c(b) as a linear function of the form where wo through W6 are
numerical coefficients, or weights, to be chosen by the learning algorithm. Learned values for the weights
wl through W6 will determine the relative importance of the various board features in determining the
value of the board, whereas the weight wo will provide an additive constant to the board value.