0% found this document useful (0 votes)
89 views

Learning Disjunctive Sets of Rules

The document discusses two methods for learning disjunctive sets of rules from data: 1) learn a decision tree and convert it to rules, and 2) use a sequential covering algorithm. The sequential covering algorithm learns one rule at a time to cover positive examples, removes those covered examples, and repeats until no further rules can be learned above a threshold. It describes the LEARN-ONE-RULE subroutine that searches for the best rule at each step by starting general and specializing the rule preconditions until no negative examples remain. Performance of the rules can be evaluated using relative frequency or entropy measures. First order logic is also discussed for representing more general relations between attributes.

Uploaded by

Ganesh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

Learning Disjunctive Sets of Rules

The document discusses two methods for learning disjunctive sets of rules from data: 1) learn a decision tree and convert it to rules, and 2) use a sequential covering algorithm. The sequential covering algorithm learns one rule at a time to cover positive examples, removes those covered examples, and repeats until no further rules can be learned above a threshold. It describes the LEARN-ONE-RULE subroutine that searches for the best rule at each step by starting general and specializing the rule preconditions until no negative examples remain. Performance of the rules can be evaluated using relative frequency or entropy measures. First order logic is also discussed for representing more general relations between attributes.

Uploaded by

Ganesh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Learning Disjunctive Sets of Rules

• Method 1. Learn decision tree, convert to rules

• Method 2. Sequential covering algorithm


i) Learn one rule with high accuracy, any coverage
ii) Remove positive examples covered by this rule
iii) Repeat

Machine Learning 1
Sequential Covering Algorithm

Machine Learning 2
Sequential Covering Algorithm
• The sequential covering algorithm for learning a disjunctive set of rules.

• LEARN-ONE-RULE return a single rule that covers at least some of the


Examples.

• PERFORMANCE is a user-provided subroutine to evaluate rule quality.

• The covering algorithm learns rules until it can no longer learn a rule
whose performance is above the given Threshold.

Machine Learning 3
LEARN-ONE-RULE
• The search for rule preconditions as LEARN-ONE-RULE proceeds
from general to specific.
• At each step, the preconditions of the best rule are specialized in all
possible ways.
• Rule postconditions are determined by the examples found to satisfy the
preconditions.

Machine Learning 4
LEARN-ONE-RULE

Machine Learning 5
LEARN-ONE-RULE
Pos  positive Examples
Neg  negative Examples
while Pos is not empty do
Learn a NewRule
– NewRule  most general rule possible
– NewRuleNeg  Neg
– while NewRuleNeg is not empty do
Add a new literal to specialize NewRule
– Candidate literals  generate candidates
– Best literal  argmax LCandidate literals Performance(SpecializeRule(NewRule, L))
– add Best literal to NewRule preconditions
– NewRuleNeg  subset of NewRuleNeg that satisfies NewRule preconditions
– Learned rules  Learned rules + NewRule
– Pos  Pos – { members of Pos covered by NewRule }
Return Learned rules
Machine Learning 6
Performance in LEARN-ONE-RULE
• Relative frequency
– Let n denote the number of examples the rule matches and let nc denote the
number of these that it classifies correctly.
– The relative frequency estimate of rule performance is nc / n
• Entropy
– Let S be the set of examples that match the rule preconditions.
– Entropy measures the uniformity of the target function values for this set of
examples.
– We take the negative of the entropy so that better rules will have higher scores.

– where c is the number of distinct values the target function may take on, pi is the
proportion of examples from S for which the target function takes on the ith value.
Machine Learning 7
Learning First Order Rules
• The problem is that propositional representations offer no general way
to describe the essential relations among the values of the attributes.
• In contrast, a program using first-order representations could learn the
following general rule:

IF Father(y, x) and Female(y), THEN Daughter(x, y)

• where x and y are variables that can be bound to any person.

Machine Learning 8

You might also like