Learning Disjunctive Sets of Rules
Learning Disjunctive Sets of Rules
Machine Learning 1
Sequential Covering Algorithm
Machine Learning 2
Sequential Covering Algorithm
• The sequential covering algorithm for learning a disjunctive set of rules.
• The covering algorithm learns rules until it can no longer learn a rule
whose performance is above the given Threshold.
Machine Learning 3
LEARN-ONE-RULE
• The search for rule preconditions as LEARN-ONE-RULE proceeds
from general to specific.
• At each step, the preconditions of the best rule are specialized in all
possible ways.
• Rule postconditions are determined by the examples found to satisfy the
preconditions.
Machine Learning 4
LEARN-ONE-RULE
Machine Learning 5
LEARN-ONE-RULE
Pos positive Examples
Neg negative Examples
while Pos is not empty do
Learn a NewRule
– NewRule most general rule possible
– NewRuleNeg Neg
– while NewRuleNeg is not empty do
Add a new literal to specialize NewRule
– Candidate literals generate candidates
– Best literal argmax LCandidate literals Performance(SpecializeRule(NewRule, L))
– add Best literal to NewRule preconditions
– NewRuleNeg subset of NewRuleNeg that satisfies NewRule preconditions
– Learned rules Learned rules + NewRule
– Pos Pos – { members of Pos covered by NewRule }
Return Learned rules
Machine Learning 6
Performance in LEARN-ONE-RULE
• Relative frequency
– Let n denote the number of examples the rule matches and let nc denote the
number of these that it classifies correctly.
– The relative frequency estimate of rule performance is nc / n
• Entropy
– Let S be the set of examples that match the rule preconditions.
– Entropy measures the uniformity of the target function values for this set of
examples.
– We take the negative of the entropy so that better rules will have higher scores.
– where c is the number of distinct values the target function may take on, pi is the
proportion of examples from S for which the target function takes on the ith value.
Machine Learning 7
Learning First Order Rules
• The problem is that propositional representations offer no general way
to describe the essential relations among the values of the attributes.
• In contrast, a program using first-order representations could learn the
following general rule:
Machine Learning 8