Data Mining Presentation
Data Mining Presentation
Mohammadi Zanjireh
◦ Slice
◦ Dice
◦ OLAP
◦ OLTP
Which Technologies Are Used?
Statistics
Observations
1001 Ali 17.12
1002 Reza 13.89
1003 Maryam 16.02
1004 Hasan 15.45
Attribute types:
◦ Nominal: Subject, Occupation.
◦ Numeric
Interval_scaled.
Ratio_scaled.
Discrete vs. Continuous attributes.
Mean:
Median:
Mode:
Variance and Standard Deviation:
Data Matrix:
Student _1
Student_i
Student_n
Dissimilarity Matrix:
Similarity Matrix:
Sim(i,j)=1-d(i,j)
Proximity measures for Nominal attributes
Example:
Id Subject Birth_City Living_City Eye_Colour
1 Computer Teh Teh Black
2 Electricity Teh Kar Brown
3 Mechanic Qaz Qaz Brown
4 Computer Kar Qaz Green
0.00
Dissimilarity 0.75 0.00
=
Matrix 1.00 0.75 0.00
0.75 1.00 0.75 0.00
Manhattan Distance:
Euclidean Distance:
Manhattan Distance:
Supremum Distance:
Proximity measures for Ordinal attributes
Small Medium Large
1 2 3
Example:
Degree={Diploma, Undergraduate, Master, PhD}
Drink_size={Small, Medium, Large}
Id Degree Drink_size
1 Undergraduate Medium
2 PhD Small
3 Diploma Medium
4 Undergraduate Large
Example:
Degree={Diploma, Undergraduate, Master, PhD}
Drink_size={Small, Medium, Large}
Id Degree Drink_size
1 0.33 0.50
2 1.00 0.00
3 0.00 0.50
4 0.33 1.00
Normalising:
Id Grade
1 30
2 52
3 84
4 45
5 25
6 20
7 91
8 65
9 42
10 32
Normalising:
Id Grade Grade-min
1 30 10
2 52 32
3 84 64
4 45 25
5 25 5
6 20 0
7 91 71
8 65 45
9 42 22
10 32 12
Normalising:
(Grade-min)/(max-min)
Id Grade Grade-min
1 30 10 0.14
2 52 32 0.45
3 84 64 0.90
4 45 25 0.35
5 25 5 0.07
6 20 0 0.00
7 91 71 1.00
8 65 45 0.63
9 42 22 0.31
10 32 12 0.17
Proximity measures for mixed types:
Examples:
Id Test_1(nominal) Test_2(ordinal) Test_3(numeric)
1 Code A Excellent 45
2 Code B Fair 22
3 Code C Good 64
4 Code A Excellent 28
Examples:
Id Test_1(nominal) Test_2(ordinal) Test_3(numeric)
1 Code A 1 0.55
2 Code B 0 0.00
3 Code C 0.5 1.00
4 Code A 1 0.14
0.00
Dissimilarity 0.85 0.00
=
Matrix 0.65 0.83 0.00
0.13 0.71 0.79 0.00
Exercise:
Id Test_1(nominal) Test_2(ordinal) Test_3(numeric)
1.00
Similarity 0.94 1.00
=
Matrix 0.17 0.12 1.00
0.07 0.17 0.23 1.00
3.1-Data Cleaning.
3.2-Data Integration.
3.3-Data Reduction.
3.4-Data Transform.
3.1- Data Cleaning
3.1.1-Missing values
o Ignore the tuple.
o Fill in the missing value manually.
o Use a global constant.
o Use the attribute mean or median.
o Use the class mean or median.
3.1.2-Noise
o Binning
o Smoothing by bin means.
o Smoothing by bin boundaries.
o Outlier analysis.
Example:
4, 8, 15, 21, 21, 24, 25, 28, 34
Example:
4, 8, 15, 21, 21, 24, 25, 28, 34
Partition into bins: bin1: 4, 8, 15. bin2: 21, 21, 24. bin3: 25, 28, 34
Example:
4, 8, 15, 21, 21, 24, 25, 28, 34
Partition into bins: bin1: 4, 8, 15. bin2: 21, 21, 24. bin3: 25, 28, 34
Partition into bins: bin1: 4, 8, 15. bin2: 21, 21, 24. bin3: 25, 28, 34
3.3.2-Numerosity reduction
o Sampling.
o Clustering.
3.4- Data Transform
Normalisation:
Min_max normalisation:
Z-Score normalisation:
Example:
3, 4, 4, 5, 9
Ã=5.00, σ = 2.34
• Support
Support(A=>B)=P(A∪B)
• Confidence
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐴∪𝐵) 𝑆𝑢𝑝𝑝𝑜𝑟𝑡_𝐶𝑜𝑢𝑛𝑡(𝐴∪𝐵)
Confidence(A=>B) = P(B|A) = =
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐴) 𝑆𝑢𝑝𝑝𝑜𝑟𝑡_𝐶𝑜𝑢𝑛𝑡(𝐴)
Mining Association Rules
1. Find all frequent itemsets.
2. Generate strong association rules from the frequent itemsets.
• Example:
} 𝟖𝑻 D = {𝑻𝟏 , 𝑻𝟐 , 𝑻𝟑 , 𝑻𝟒 , 𝑻𝟓 , 𝑻𝟔 , 𝑻𝟕 ,
}جعفری ،پیاز ،زیتون ،خیار ،گوجه فرنگی{= 𝟏𝑻
}جعفری ،خیار ،گوجه فرنگی{= 𝟐𝑻
}جعفری ،پیاز،نان ،نمک ،خیار ،گوجه فرنگی{= 𝟑𝑻
}پیاز ،نان ،خیار ،گوجه فرنگی{= 𝟒𝑻
}پیاز ،نمک ،گوجه فرنگی{= 𝟓𝑻
}پنیر ،نان{= 𝟔𝑻
}خیار ،پیاز ،گوجه فرنگی{= 𝟕𝑻
}کره ،نان{= 𝟖𝑻
Itemset Sup_Count
{𝐼1 , 𝐼2 , 𝐼3 } 2
{𝐼1 , 𝐼2 , 𝐼5 } 2
6. 𝑳𝟑
Itemset Sup_Count
{𝐼1 , 𝐼2 , 𝐼3 } 2
{𝐼1 , 𝐼2 , 𝐼5 } 2
7. 𝑪𝟒 = 𝑳𝟑 ∗ 𝑳𝟑 , Prune
𝐿4 = ∅.
End of Step 1.
Code of Apriori Algorithm:
Confidence: X={𝐼1 , 𝐼2 , 𝐼5 }
Confidence: X={𝐼1 , 𝐼2 , 𝐼5 }.
2
𝐼1 =>{𝐼2 , 𝐼5 }: Confidence= = 33%
6
2
𝐼2 =>{𝐼1 , 𝐼5 }: Confidence= = 29%
7
2
𝐼5 =>{𝐼1 , 𝐼2 }: Confidence= = 100%
2
2
{𝐼1 , 𝐼2 }=>{𝐼5 }: Confidence= = 50%
4
2
{𝐼1 , 𝐼5 }=>{𝐼2 }: Confidence= = 100%
2
2
{𝐼2 , 𝐼5 }=>{𝐼1 }: Confidence= = 100%
2
Confidence: X={𝐼1 , 𝐼2 , 𝐼5 }.
2
𝐼1 =>{𝐼2 , 𝐼5 }: Confidence= = 33% ×
6
2
𝐼2 =>{𝐼1 , 𝐼5 }: Confidence= = 29% ×
7
2
𝐼5 =>{𝐼1 , 𝐼2 }: Confidence= = 100%
2
2
{𝐼1 , 𝐼2 }=>{𝐼5 }: Confidence= = 50% ×
4
2
{𝐼1 , 𝐼5 }=>{𝐼2 }: Confidence= = 100%
2
2
{𝐼2 , 𝐼5 }=>{𝐼1 }: Confidence= = 100%
2
2- T100={𝐼1 , 𝐼2 , 𝐼5 }
Null
𝟏
I2
𝟏
I1
𝟏
I5
3- T200={𝐼2 , 𝐼4 }
Null
𝟐
I2
𝟏 𝟏
I1 I4
𝟏
I5
4- T300={𝐼2 , 𝐼3 }
Null
𝟑
I2
𝟏 𝟏 𝟏
I3 I1 I4
𝟏
I5
5- T400={𝐼1 , 𝐼2 , 𝐼4 }
Null
𝟒
I2
𝟏 𝟐 𝟏
I3 I1 I4
𝟏 𝟏
I5 I4
6- T500={𝐼1 , 𝐼3 }
Null
𝟏
I2
𝟒 I1
𝟏
I3 𝟏
I1
𝟐
I4
𝟏 I3
𝟏 𝟏
I5 I4
7- T600={𝐼2 , 𝐼3 }
Null
𝟏
I2
𝟓 I1
𝟏
I3 𝟐
I1
𝟐
I4
𝟏 I3
𝟏 𝟏
I5 I4
8- T700={𝐼1 , 𝐼3 }
Null
𝟐
I2
𝟓 I1
𝟐
I3 𝟐
I1
𝟐
I4
𝟏 I3
𝟏 𝟏
I5 I4
9- T800={𝐼1 , 𝐼2 , 𝐼3 , 𝐼5 }
Null
𝟐
I2
𝟔 I1
𝟐
𝟐
I3 I1
𝟑
I4
𝟏 I3
𝟏 𝟏 𝟏
I5 I4 I3
𝟏
I5
10- T900={𝐼1 , 𝐼2 , 𝐼3 }
Null
𝟐
I2
𝟕 I1
𝟐
𝟐
I3 I1
𝟒
I4
𝟏 I3
𝟏 𝟏 𝟐
I5 I4 I3
𝟏
I5
11- FP-tree is mined: We start from the last item of L.
{𝑰𝟑 , 𝑰𝟒 } {}
{𝑰𝟑 , 𝑰𝟓 } {T800}
{𝑰𝟒 , 𝑰𝟓 } {}
itemsets List of items
{𝑰𝟏 , 𝑰𝟐 , 𝑰𝟑 } {T800, T900}
{𝑰𝟏 , 𝑰𝟐 , 𝑰𝟒 } {T400}
{𝑰𝟏 , 𝑰𝟐 , 𝑰𝟓 } {T100, T800}
• Classification
• Supervised learning vs. unsupervised learning.
30 65 G
(Salary ≤ 40) (Salary ≤ 50)
23 15 B
40 75 G
55 40 B
B G B G
55 100 G
45 60 G
Classification Rules:
Class B: {(Age<=35) and (Salary<=40)} or {(Age>35) and (Salary<=50)}
Class G: {(Age<=35) and (Salary>40)} or {(Age>35) and (Salary>50)}
Test data: (Age=25 and Salary=50) => Class G
• Generating Decision tree:
1. Information Gain.
2. Gain Ratio.
3. Gini Index.
• Information Gain (Entropy)
𝒎
𝑰𝒏𝒇𝒐(D) = - 𝒊=𝟏 𝒑𝒊 ∗ 𝒍𝒐𝒈𝟐 (𝒑𝒊 )
|𝑪𝒊,𝑫 |
𝒑𝒊 =
|𝑫|
𝑽 |𝑫𝒋 |
𝑰𝒏𝒇𝒐𝑨 𝑫 = - 𝒋=𝟏 |𝑫| * Info(𝑫𝒋 )
RID age income student credit-rating Class: buy-computer
𝒂𝒊 +𝒂𝒊+𝟏
( )
𝟐
𝑫𝟏 : 𝑨 ≤ 𝒔𝒑𝒍𝒊𝒕 − 𝒑𝒐𝒊𝒏𝒕
𝑫𝟐 : 𝑨 > 𝒔𝒑𝒍𝒊𝒕 − 𝒑𝒐𝒊𝒏𝒕
𝑮𝒂𝒊𝒏(𝑨)
GainRatio(A) =
𝑺𝒑𝒍𝒊𝒕𝑰𝒏𝒇𝒐𝑨 (𝑫)
𝒗 |𝑫𝒋 | |𝑫𝒋 |
𝑺𝒑𝒍𝒊𝒕𝑰𝒏𝒇𝒐𝑨 (𝑫) = - 𝒋=𝟏 |𝑫| ∗ 𝒍𝒐𝒈𝟐 ( )
|𝑫|
• Example:
𝟎.𝟐𝟓
GainRatio(age) = =0.16
𝟏.𝟓𝟕
• Exercise:
GainRatio(income) = ?
GainRatio(credit_rating) = ?
GainRatio(student) = ?
|𝑫𝟏 | |𝑫𝟐 |
𝑮𝒊𝒏𝒊𝑨 𝑫 = Gini(𝑫𝟏 ) + Gini(𝑫𝟐 )
|𝑫| |𝑫|
𝑮𝒊𝒏𝒊 𝑫 = 𝟏 − 𝒑𝟐𝒊
𝒊=𝟏
|𝑪𝒊,𝑫 |
𝒑𝒊 =
|𝑫|
• Gini Index: Is a binary index
• Example:
𝑮𝒊𝒏𝒊_𝑰𝒏𝒅𝒆𝒙𝒂𝒈𝒆 𝒚𝒐𝒖𝒕𝒉,𝒔𝒆𝒏𝒊𝒐𝒓 𝑫 = 𝑮𝒊𝒏𝒊_𝑰𝒏𝒅𝒆𝒙𝒂𝒈𝒆 𝒎𝒊𝒅𝒅𝒍𝒆 𝑫 = 𝟎. 𝟑𝟔
• Exercise: Calculate the Gini Index for all attributes and all combinations.
• Tree pruning
• Bayes’ Theorem
𝑷 𝑩 𝑨 . 𝑷(𝑨)
𝑷 𝑨𝑩 =
𝑷(𝑩)
𝒏
𝑷 𝑿 𝑪𝒊 = 𝒌=𝟏 𝑷 𝒙𝒌 𝑪𝒊 = 𝑷 𝒙𝟏 𝑪𝒊 . 𝑷 𝒙𝟐 𝑪𝒊 . … . 𝑷 𝒙𝒏 𝑪𝒊
(𝒙𝒌 −𝛍𝑪 )𝟐
−( 𝒊 )
𝟏
𝑷 𝒙𝒌 𝑪𝒊 = 𝒆 𝟐𝛔𝒄𝒊
𝟐π𝛔𝑪𝒊
RID age income student credit-rating Class: buy-computer
P(𝑪𝟏 ). 𝑷 𝑿 𝑪𝟏 = 𝟎. 𝟎𝟐𝟖
P(𝑪𝟐 ). 𝑷 𝑿 𝑪𝟐 = 𝟎. 𝟎𝟎𝟕
• Laplacian correction:
𝑪𝟏 : buy_computer = yes : 1000 records
P(𝑪𝟏 ). 𝑷 𝑿 𝑪𝟏 = 𝟎. 𝟎𝟐𝟖
P(𝑪𝟐 ). 𝑷 𝑿 𝑪𝟐 = 𝟎. 𝟎𝟎𝟕
• Laplacian correction:
𝑪𝟏 : buy_computer = yes : 1000 records
𝟎
income = low : 0 records =>P(income=low|𝑪𝟏 )= =0
𝟏𝟎𝟎𝟎
𝟗𝟗𝟎
income = medium : 990 records =>P(income=medium|𝑪𝟏 )=𝟏𝟎𝟎𝟎=0.99
𝟏𝟎
income = high: 10 records=>P(income=high|𝑪𝟏 )=𝟏𝟎𝟎𝟎 =0.010
• Laplacian correction:
𝑪𝟏 : buy_computer = yes : 1000 records
𝟏
income = low : 0 records =>P(income=low|𝑪𝟏 )= =0.001
𝟏𝟎𝟎𝟑
𝟗𝟗𝟏
income = medium : 990 records =>P(income=medium|𝑪𝟏 )=𝟏𝟎𝟎𝟑=0.988
𝟏𝟏
income = high: 10 records=>P(income=high|𝑪𝟏 )=𝟏𝟎𝟎𝟑 =0.011
• Using IF-THEN Rules for Classification:
antecedent consequent
Example:
R1: IF age = youth AND student = yes THEN buy_computer = yes
𝒏𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒏𝒄𝒐𝒗𝒆𝒓𝒔
• Accuracy(R) = Coverage(R) =
𝒏𝒄𝒐𝒗𝒆𝒓𝒔 |𝑫|
• Extracting classification rules from a
decision tree:
=𝑃′ + 𝑁′
𝑻𝑷+𝑻𝑵
• Accuracy = 𝑷+𝑵
𝑻𝑷 𝑻𝑷
• precision = =
𝑻𝑷+𝑭𝑷 𝑷′
𝑻𝑷 𝑻𝑷
• recall = 𝑻𝑷+𝑭𝑵 = 𝑷
• Example:
Predicted Total
yes no
Actual yes 90 210 300
no 140 9560 9700
Total 230 9770 10000
𝟗𝟎
• precision = = 39.13%
𝟐𝟑𝟎
𝟗𝟎
• recall = 𝟑𝟎𝟎 = 𝟑𝟎. 𝟎𝟎%
𝟐 ∗𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 ∗ 𝒓𝒆𝒄𝒂𝒍𝒍
• F= 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏+ 𝒓𝒆𝒄𝒂𝒍𝒍
• In addition to accuracy-based measures, classifiers can also be compared with
respect to the following additional aspects:
Speed
𝑹𝒐𝒃𝒖𝒔𝒕𝒏𝒆𝒔𝒔
• Other aspects:
𝑺𝒄𝒂𝒍𝒂𝒃𝒊𝒍𝒊𝒕𝒚
Interpretability
• Introduction: Cohesion and Separation.
Step_1: 𝒄𝟏 =3, 𝒄𝟐 =6
𝑪𝟏 = 𝑿𝟑 , 𝑿𝟓 , 𝑿𝟔 , 𝑿𝟗 , 𝑪𝟐 = 𝑿𝟏 , 𝑿𝟐 , 𝑿𝟒 , 𝑿𝟕 , 𝑿𝟖 , 𝑿𝟏𝟎
• Example 6-1:
Step_1: 𝒄𝟏 =3, 𝒄𝟐 =6
𝑪𝟏 = 𝑿𝟑 , 𝑿𝟓 , 𝑿𝟔 , 𝑿𝟗 , 𝑪𝟐 = 𝑿𝟏 , 𝑿𝟐 , 𝑿𝟒 , 𝑿𝟕 , 𝑿𝟖 , 𝑿𝟏𝟎
𝟑+𝟒+𝟐+𝟑 𝟓+𝟗+𝟔+𝟖+𝟓+𝟕
Step_2: 𝒄𝟏 = = 3.0, 𝒄𝟐 = = 6.67
𝟒 𝟔
𝑪𝟏 = 𝑿𝟑 , 𝑿𝟓 , 𝑿𝟔 , 𝑿𝟗 , 𝑪𝟐 = 𝑿𝟏 , 𝑿𝟐 , 𝑿𝟒 , 𝑿𝟕 , 𝑿𝟖 , 𝑿𝟏𝟎
• Exercise 1: Solve the previous example with 𝒄𝟏 = 2.0 and 𝒄𝟐 = 7.0
• Exercise 2: Solve the previous example with 𝒄𝟏 = 2.5, 𝒄𝟐 = 5.0 , and 𝒄𝟑 = 7.5
• Example 6-2 : (𝐢) (𝐢)
i 𝐗𝟏 𝐗𝟐
𝒄𝟏 =(1.0, 1.1), 𝒄𝟐 =(5.0, 7.0)
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
Step_1: 𝒄𝟏 =(1.0, 1.1), 𝒄𝟐 =(5.0, 7.0)
𝟏 𝟐 𝟑 𝟒 𝟓 𝟔 𝟕
𝑪𝟏 = {𝑿 ,𝑿 ,𝑿 } , 𝑪𝟐 = {𝑿 ,𝑿 ,𝑿 ,𝑿 }
Step_1: 𝒄𝟏 =(1.0, 1.0), 𝒄𝟐 =(5.0, 7.0)
𝟏 𝟐 𝟑 𝟒 𝟓 𝟔 𝟕
𝑪𝟏 = {𝑿 ,𝑿 ,𝑿 } , 𝑪𝟐 = {𝑿 ,𝑿 ,𝑿 ,𝑿 }
𝟏+𝟏.𝟓+𝟑 𝟏+𝟐+𝟒
Step_2: 𝒄𝟏 = ( , 𝟑 ) = (𝟏. 𝟖𝟑, 𝟐. 𝟑𝟑)
𝟑
𝟓+𝟑.𝟓+𝟒.𝟓+𝟑.𝟓 𝟕+𝟓+𝟓+𝟒.𝟓
𝒄𝟐 = ( 𝟒
, 𝟒
) = (𝟒. 𝟏𝟐𝟓, 𝟓. 𝟑𝟕𝟓)
𝟏 𝟐 𝟑 𝟒 𝟓 𝟔 𝟕
𝑪𝟏 = {𝑿 ,𝑿 } , 𝑪𝟐 = {𝑿 , 𝑿 ,𝑿 ,𝑿 ,𝑿 }
𝟏+𝟏.𝟓 𝟏+𝟐
Step_3: 𝒄𝟏 = ( , ) = (𝟏. 𝟐𝟓, 𝟏. 𝟓)
𝟐 𝟐
𝟑+𝟓+𝟑.𝟓+𝟒.𝟓+𝟑.𝟓 𝟒+𝟕+𝟓+𝟓+𝟒.𝟓
𝒄𝟐 = ( , ) = (𝟑. 𝟗, 𝟓. 𝟏)
𝟓 𝟓
𝑪𝟏 = {𝑿 𝟏 ,𝑿 𝟐 } , 𝑪𝟐 = {𝑿 𝟑 , 𝑿 𝟒 ,𝑿 𝟓 ,𝑿 𝟔 ,𝑿 𝟕 }
• Exercise 3: Solve the previous example with 𝒄𝟏 = (3.0, 3.0) and 𝒄𝟐 = (4.0, 4.0)
• Complexity of k-means: O(nkt)
• n: No of Subjects.
• k: No of Clusters.
• t: No of iterations.
• k << n
• t << n
• Drawbacks of k-means:
• Evaluations of Clusters:
𝒌
E= 𝒊=𝟏 𝒑∈𝑪𝒊 𝒅𝒊𝒔𝒕(𝒑, 𝒄𝒊 )
• Hierarchical Clustering:
• Bottom-Up: AGglomerative NESting (AGNES).
• Single Link
• Complete Link
• Average Link
r
• Centroid s
• Ward’s method
• Single Linkage - Nearest Neighbour
d
a b
e g
c
f
r
s
dist(r,s) = dist(b,e)
• Farthest Neighbour – Complete Linkage
d
a b
e g
c
f
r
s
dist(r,s) = dist(a,g)
• Average Linkage
d
a b
e g
c
f
r
s
d
a m b
e n g
c
f
r
s
dist(r,s) = dist(m,n)
• Ward’s method: Minimizes the total within-cluster variance.
a
𝑒1
𝑒2 b
𝑒3
c
• Example 6-3: Bottom-Up approach, Single-Linkage, Similarity more than 1.8
i 𝐗𝟏 𝐗𝟐
A 1.0 1.0
B 1.0 3.5
C 3.0 1.0
D 2.0 4.0
E 1.0 3.0
F 3.0 2.0
• Example 6-3:
A B C D E F
i 𝐗𝟏 𝐗𝟐 A 0 2.5 2.0 3.16 2.0 2.24
A 1.0 1.0
B 1.0 3.5 B - 0 3.24 1.12 0.5 2.5
C 3.0 1.0
C - - 0 3.16 2.83 1.0
D 2.0 4.0
D - - - 0 1.41 2.24
E 1.0 3.0
F 3.0 2.0 E - - - - 0 1.41
F - - - - - 0
• Example 6-3:
merge A B C D E F
i 𝐗𝟏 𝐗𝟐 A 0 2.5 2.0 3.16 2.0 2.24
A 1.0 1.0
B 1.0 3.5 B - 0 3.24 1.12 0.5 2.5
C 3.0 1.0
C - - 0 3.16 2.83 1.0
D 2.0 4.0
D - - - 0 1.41 2.24
E 1.0 3.0
F 3.0 2.0 E - - - - 0 1.41
F - - - - - 0
• Example 6-3:
A BE C D F
i 𝐗𝟏 𝐗𝟐 A 0 2.0 2.0 3.16 2.24
A 1.0 1.0
BE 1.0 3.25 BE - 0 2.83 1.12 1.41
C 3.0 1.0
C - - 0 3.16 1.0
D 2.0 4.0
D - - - 0 2.24
F 3.0 2.0
F - - - - 0
• Example 6-3:
A BE C D F
i 𝐗𝟏 𝐗𝟐 merge A 0 2.0 2.0 3.16 2.24
A 1.0 1.0
BE 1.0 3.25 BE - 0 2.83 1.12 1.41
C 3.0 1.0
C - - 0 3.16 1.0
D 2.0 4.0
D - - - 0 2.24
F 3.0 2.0
F - - - - 0
• Example 6-3:
i 𝐗𝟏 𝐗𝟐 A BE CF D
A 1.0 1.0 A 0 2.0 2.0 3.16
BE 1.0 3.25
CF 3.0 1.5 BE - 0 1.41 1.12
D 2.0 4.0 CF - - 0 2.24
D - - - 0
• Example 6-3:
𝐗𝟏 𝐗𝟐 A BE CF D
i merge
A 1.0 1.0 A 0 2.0 2.0 3.16
BE 1.0 3.25
CF 3.0 1.5 BE - 0 1.41 1.12
D 2.0 4.0 CF - - 0 2.24
D - - - 0
• Example 6-3:
i 𝐗𝟏 𝐗𝟐 A BDE CF
A 1.0 1.0 A 0 2.0 2.0
BDE 1.0 3.25
BDE - 0 2.24
CF 3.0 1.5
CF - - 0
• Example 6-3:
i 𝐗𝟏 𝐗𝟐 A BDE CF
A 1.0 1.0 A 0 2.0 2.0
BDE 1.0 3.25
BDE - 0 2.24
CF 3.0 1.5
CF - - 0
1.8≥…
• Example 6-3: Dendrogram - AGglomerative NESting (AGNES)
2.0
1.5
1.0
0.5
A B E D C F
• Exercise: Solve the previous example with Complete-Link, Average-Link,
Centroid, and ward’s method.