AI Answers
AI Answers
Answer: Perceptron P gets the first instance wrong, with an error of |(2*1)+(1*2)-4.5|=0.5 and the second
instance wrong with an error of |(2*2)+(1*1)-4.5|=0.5. The total error is therefore 1.0.
B. Find a set of weights and a threshhold that categorizes all this data correctly. (Hint: Sketch a graph).
Answer: wA=0, wB=1, T=1.5 will do fine.
Problem 6 (10 points)
The first choice in designing a learning algorithm is the choice of hypothesis space. What are the advantages
and disadvantages of a large hypothesis space versus a small one?
Answer: The advantage of a large hypothesis set is that it is more likely to contain the true theory of the
classification, or something close to the true theory. The disadvantages of a large hypothesis set are, first, that
it is harder to search; second, that it is more susceptible to overfitting.
Problem 7 (10 points)
You have developed a wonderful new theory of classification learning, called "the NYU algorithm". You input
data, it outputs a formula. You decide to test your theory on the problem of predicting in January who will win
a political election in November given attributes like office, region, party, incumbency, success in raising
funds to date, current poll numbers, etc. You have a database with all this data for all American elections over
the last twenty years. You ask a research assistant to test the NYU algorithm on this data. He comes back with
the good news that when he ran the NYU algorithm over the your data set, it output a formula that gives the
right answer 80% of the time on the data.
A. What further tests should you run before you publish a paper in a political science journal, claiming that you
have found a formula that solves the problem of predicting political elections?
B. What further tests should you run before you publish a paper in a machine learning journal, claiming that
you have promising new technique for machine learning?
Answer: The test that has been run, using the same data for training and testing, is pretty much worthless, as it
can be passed at 100% by a learning algorithm that just memorizes the data. Since the data set is very large,
there should be no problems running tests that separate out the test set from the training set.
There are also two other types of tests that should be run. First is a test to check whether the NYU formula is
actually more accurate than simpler formulas that could be found by other learning algorithms, such as 1R. It
could well be, for example, that the single rule "Predict that the incumbent will win, if he is running; else guess
randomly" will do just as well. You would want to check this before publishing either the formula or the
algorithm.
Second are tests to see how well this algorithm runs on other data sets, or on small subsets of this data set.
These are of no importance to the political science journal, but would be important in the machine learning
journal.
Problem 8 (10 points)
Consider the following inference: Given the rules and facts,
R1: If X is a close relative of Y and Y is a close relative of Z then X is acquainted with Z.
R2: If X is a parent of Y, then X is a close relative of Y.
R3: If X is married to Y, then X is a close relative of Y.
F1: Sam is a parent of Mike.
F2: Mike is married to Alice.
A. Express rules R1-R3 and facts F1,F2 in the Datalog representation using the following primitives:
c(X,Y) --- X is a close relative of Y.
p(X,Y) --- X is a parent of Y.
m(X,Y) --- X is married to Y.
a(X,Y) --- X is acquainted with Y.
sam, mike, alice : Constants.
Answer:
R1: a(X,Z) :- c(X,Y),c(Y,Z).
R2: c(X,Y) :- p(X,Y).
R3: c(X,Y) :- m(X,Y).
F1: p(sam,mike),
F2. m(mike,alice).
B. Explain how Datalog can carry out the inference. You may use either backward or forward chaining.
Answer: In backward chaining you start with the goal G0 ?- a(sam,alice), and proceed as follows:
G0 ?- a(sam,alice) matches R1 giving G1
G1 ?- c(sam,Y1), c(Y1,alice) generates first G2
G2 ?- c(sam,Y1) matches R2 giving G3
G3 ?- p(sam,Y1) matches F1 with Y1=mike. G3 succeeds.
G2 succeeds
G1 generates G4:
G4 ?- c(mike,alice) matches R2 giving G5
G5 ?- p(mike,alice) fails. Return to G4
G4 matches R3 giving G6
G6 ?- m(mike,alice) matches F2. G6 succeeds.
G4 succeeds
G1 succeeds
G0 succeeds.