0% found this document useful (0 votes)

63 views72 pages

CSCI946 W5-Classification

The document outlines the Week 5 lecture on Classification in the CSCI446/946 Big Data Analytics course at the University of Wollongong. It covers various classification methods such as K-Nearest Neighbor, Multi-Layers Perceptron, Decision Trees, and Naïve Bayesian Classifiers, along with their applications and performance indicators. The lecture also includes a recap of clustering analysis techniques like K-means, DBSCAN, and Self-Organizing Maps.

Uploaded by

Masud Zaman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views72 pages

CSCI946 W5-Classification

Uploaded by

Masud Zaman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CSCI446/946 Big Data Analytics

Week 5 – Lecture: Classification

School of Computing and Information Technology

University of Wollongong Australia
Spring 2024
Content

18/08/2024 4
K-means Clustering
• Application to image processing
K-means Clustering
• Application to image processing

Original K=2
K=3
K=10
DBScan
• Given a density threshold (𝑀𝑖𝑛𝑃𝑡𝑠) and a radius (𝐸𝑝𝑠),
the points in a dataset are classified into three types:
core point, border point, and noise point.
– Core points: 𝑃𝑜𝑖𝑛𝑡 𝑤ℎ𝑜𝑠𝑒 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 >= 𝑀𝑖𝑛𝑃𝑡𝑠
– Core points are in the interior of a density-based cluster.

Example: If 𝑀𝑖𝑛𝑃𝑡𝑠 =
6 then A is a core point
because its density = 7
(7>6)
DBScan Example
Original Points

Eps = 10, MinPts = 4

Mark core, border and noise points Mark connected core points
Self-Organizing Maps
• Self-organizing maps have two
layers:
– An input layer and
– An output layer called the feature
map.
• The feature map consists of
neurons.
– organized on a regular grid.
– Unlike other ANN types, the neurons
in a SOM don’t have an activation
function.

• Each neuron in a SOM is assigned a

weight vector with the same
dimensionality as the input space.
Self-Organizing Maps
• SOMs are an excellent choice for data
visualization
– Dimension reduction techniques
• Why use Self-Organizing Maps (SOMs) in BDA?
– Topology preservation (unlike PCA)
– Able to deal with new data & missing values (unlike t-SNE)
• When not to use SOMs in BDA:
– When the data is very sparse
– When cardinality (limited resolution) of the map is a
problem.
Content

• Brief Recap
– Clustering Analysis
• K-means, DBSCAN, SOM
• Classification
– Overview
– K-Nearest Neighbor (KNN)
– Multi-Layers Perceptron (MLP)
– Decision Tree (DT)
– Naïve Bayesian Classifier
• Diagnostics and Performance Indicators
Overview of Classification
• Classification is a fundamental learning
method that appears in applications related to
data mining
• The primary task performed by classifiers is to
assign class labels to new observations
– Sets: training, (validation), testing
• Classification methods are supervised
– Start with a training set of labelled observations
– Predict the outcome for new observations
Overview of Classification
• Example of classifiers:
– K-nearest neighbour (KNN): model free classifier
– Neural Networks (NN): Massive parallel nonlinear
parametric methods
– Decision Tree and Random Forests: Makes explanatory
if-then decisions
– Naïve Bayes (NB) Classifier: Probabilistic methods
– Logistic regression: Linear method (LR)
– Support Vector Machines (SVM): non-parametric
classifiers.
–…
Nearest-Neighbor Classifiers
Unknown record Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

To classify an unknown record:

– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

that have the k smallest distance to x
Nearest Neighbor Classification
• Compute distance between two points:
– Euclidean distance
d ( p, q ) =  ( pi
i
−q )
i
2

• Determine the class from nearest neighbor list

– take the majority vote of class labels among the k-
nearest neighbors
– Weigh the vote according to distance
• weight factor, w = 1/d2
Nearest Neighbor Classification…
• Choosing the value of k:
– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from
other classes
– Computational cost often increases when k increases

X
Neural Networks - MLP
Neural Networks - MLP
Neural Networks - MLP

Bias inputs are not shown

Neural Networks - MLP
• Weights are initially unknown
– Initialized with small values
– Are updated by a learning algorithm.
• NN can produce a non-linear mapping of the
input to the output.
– Coding of attribute values is non-critical as long as
the inputs are numeric.
– Inputs for NN are often normalized. Why?
– Few exceptions: i.e. SOM
Neural Networks - MLP
• Main challenges:
– Network design:
• How to organize the neurons?
• How many layers, how many neurons in each layer?
• Which activation function?
– Learning algorithm:
• How to update the weights?
• How to update the weights effectively?
Neural Networks - MLP
• It has been proven:
– Three layers are enough (if neurons are linear)
– Two layers are enough (if neurons are non-linear).
• Common activation functions:
Neural Networks - MLP
• Weight updates
– Compute the network error 𝐸 = σ𝑛𝑖 𝑜𝑖 − 𝑡𝑖 2 ,where 𝑜𝑖 is the i-th
network output, and 𝑡𝑖 the desired network output (the target).
• When updating the weights, the aim is to minimize 𝐸 for all inputs.
• Many algorithms are based on gradient descent methods.
𝛿𝐸
• Update weights: Δ𝑤𝑖𝑗 = −𝛼 , where 𝛼 ∈ 0,1 is a learning
𝛿𝑤𝑖𝑗
rate.
• Repeat for a number of epochs:
– Select a training sample
– Compute the output, then compute the error.
– Compute the gradient then update the weights.
Neural Networks – MLP vs DNN
• Formally it has been proven:
– Three layers are enough (if neurons are linear)
– Two layers are enough (if neurons are non-linear)

• A surprise: Deep Neural Networks

– For complex problems it was found that deep NN are
much better.
• Many layers (possibly hundreds)
• CNN (1995)
• Breakthrough after 2000 (massive parallel GPUs)
Neural Networks in R
• Example: A training dataset

• Can a neural network predict placement given knowledge

score and communication score of a student?
Neural Networks in R
Neural Networks in R
Neural Networks in R

• Your results may vary. Why?

• We expected results such as 0s and 1s. What to do?
Neural Networks in R
Reason to choose
• Neural Nets are massive parallel systems
– Can be implemented efficiently on multi-core (i.e. GPU)
systems.
– Trained models are computationally very efficient when
processing new inputs.
• Neural Nets can solve a wide range of problems, and can
classify samples into an arbitrary number of classes.
– NN perform better than humans on growing number of tasks
(i.e. playing chess, Go, lip reading,...)
• Limited data pre-processing required.
• Insensitive to noise
• Often a tool of choice in Big Data Analytics.
Caution
• Most supervised Neural Networks are “black box” classifiers.
– They are unable to show or explain how a result came to be.
– i.e. what in the input caused the network to respond in a certain
way?
• They have problems with unbalanced learning problems.
– i.e. when there are many more samples in one class than in
another class.
• The model is prone to overfit the training data when choosing too
many neurons and/or layers.
– Performance may be sub-optimal if choosing to few neurons or
layers.
– Finding the best number of neurons and layers is an art.
• Training can be time consuming.
– They tend to require a lot of training samples to perform well.
Decision Tree
• A decision tree uses a tree structure to specify
sequences of decisions and consequences
• Given input variable X = {x1,x2,…,xn}, the goal is
to predict an output variable Y
Decision Tree
• Each node tests a particular input variable
• Each branch represents the decision made
• Classifying a new observation is to traverse
this decision tree.
Decision Tree
• The depth of a node is the minimum number
of steps required to reach the node from root
• Leaf nodes are at the end of the last branches
on the tree, representing class labels
The General Algorithm of DT
• The objective of a decision tree algorithm
– Construct a tree T from a training set S
• The algorithm picks the most informative
attribute to branch the tree and does this
recursively for each of the sub-trees.
• The most informative attribute is identified by
– Information gain, calculated based on Entropy
The General Algorithm of DT
• Entropy

Question:
In a bank marketing dataset, there are 2000
customers in total. Among them, 1789
subscribed term deposit. What is the entropy of
the output variable “subscribed” (Hsubscribed)?
The General Algorithm of DT
• Conditional entropy
The General Algorithm of DT
• Information gain

• It compares
– The degree of purity of the parent node before a split
– The degree of purity of the child node after a split
The General Algorithm of DT

• The algorithm constructs sub-trees recursively

until one of the following criteria is met
– All the leaf nodes in the tree satisfy the minimum
purity threshold (i.e., are pure enough)
– There is no sufficient information gain by splitting
on more attribute (i.e., not worth anymore)
– Any other stopping criterion is satisfied (such as
the maximum depth of the tree)
Decision Tree
• An example: A bank markets its term deposit
product. So the bank needs to predict which
clients would subscribe to a term deposit
– The bank collects a dataset of 2000 previous
clients with known “subscribe or not”.
– Input variables to describe each client are
• Job, marital status, education level, credit default,
housing loan, personal loan, contact type, previous
campaign contact
Decision Tree

…
The training dataset of the bank example
Decision Tree
From your point of view,
what is the most
important issue in
building a decision tree?

A decision tree built over the bank marketing training dataset

The General Algorithm of DT
• Assume the attribute X is “contact”
– Its value x takes one value in {cellular,
telephone, unknown}
• The outcome Y is “subscribed”
– Its value y takes one value in {no, yes}
The General Algorithm of DT
The General Algorithm of DT

• The algorithm splits on the attribute with the

largest information gain at each round
The General Algorithm of DT
• Information gain

• It compares
– The degree of purity of the parent node before a split
– The degree of purity of the child node after a split
Properties of Decision Tree

• Computationally inexpensive, easy to classify

• Classification rules can be understood
• Handle both numerical and categorical input
• Handle variables that have a nonlinear effect
on the outcome, better than linear models
• Not a good choice if there are many irrelevant
input variables
– Feature selection will be needed
Caution

• Decision tree uses greedy algorithms

– It always chooses the option that seems the best
available at that moment
– However, the option may not be the best overall
and this could cause overfitting
– An ensemble technique can address this issue by
combining multiple decision trees that use
random splitting
Evaluating a Decision Tree

• Evaluate a decision tree

– Evaluate whether the splits of the tree make sense
and whether the decision rules are sound (say,
with domain experts)
– Having too many layers and obtaining nodes with
few members might be signs of overfitting
– Use standard diagnostics tools for classifiers
Naïve Bayes Classifier
• A probabilistic classification method based on
Bayes’ theorem
• A naïve Bayes classifier assumes that the
presence or absence of a particular feature of
a class is unrelated to the presence or absence
of other features (conditional independence assumption)
• Output includes a class label and its
corresponding probability score
Naïve Bayes Classifier
on Bayes’ Theorem
Thomas Bayes
1702-1761
Bayes’ Theorem
• A more practical form of Bayes’ theorem

• Given A, how to calculate P(ci|A)?

Naïve Bayes Classifier
• With two simplifications, Bayes’ theorem
induces a Naïve Bayes classifier
• First, Conditional independence assumption
– Each attribute is conditionally independent of
every other attribute given a class label ci

– This simplifies the computation of P(A|ci)

Naïve Bayes Classifier
• Second, ignore the denominator P(A)
– Removing the denominator has no impact on the
relative probability scores
• In this way, the classifier becomes
Caution
• An issue on rare event
– What if one of the attribute values does NOT
appear in a class ci in a training dataset?
– P(aj|ci) for this attribute value will equal zero!
– P(ci|A) will simply become zero!
• Smoothing technique
– It assigns a small nonzero probability to rare
events not included in a training dataset
Naïve Bayes Classifier
• Laplace smoothing (add-one smoothing)
– It pretends to see every outcome once more than
it actually appears
Naïve Bayes Classifier
• Advantages
– Simple to implement, commonly used for text
classification
– Handle high-dimensional data efficiently
– Robust to overfitting with smoothing technique
• Disadvantages
– Sensitive to correlated variables (Why?)
– Not reliable for probability estimation
Naïve Bayes Classifier
• An example
– With the bank marketing dataset, use Naïve Bayes
Classifier to predict if a client would subscribe to a term
deposit
• Building a Naïve Bayes classifier requires to calculate
some statistics from training dataset
– P(A|ci) for each class i=1,2,…,n
– P(aj|ci) for each attribute j=1,2,…,m in each class
Naïve Bayes Classifier
• P(A|ci) for each class

• P(aj|ci) for each attribute

in each class
Naïve Bayes Classifier
• Testing a Naïve Bayes classifier on a new data
Naïve Bayes in R
• Two methods
– Build the classifier from the scratch
– Call naiveBayes function from e1071 package
Naïve Bayes in R
Content

• Brief Recap
– Clustering Analysis
• K-means, DBSCAN, SOM
• Classification
– Overview
– K-Nearest Neighbor (KNN)
– Multi-Layers Perceptron (MLP)
– Decision Tree (DT)
– Naïve Bayesian Classifier
• Diagnostics and Performance Indicators
Diagnostics
• Holdout method
– Given data is randomly partitioned into two independent
sets
• Training set (e.g., 80%) & Test set (e.g., 20%)
– Random sampling: a variation of holdout
• Repeat holdout k times, avg. + std accuracy
• Cross-validation (k-fold, where k = 10 is most popular)
– Randomly partition the data into k mutually exclusive
subsets, each approximately equal size
– At i-th iteration, use Di as test set and others as training set
– Leave-one-out: k folds where k = # of tuples, for small sized
data
– *Stratified cross-validation*: folds are stratified so that
class dist. in each fold is approx. the same as that in the
initial data
Performance Indictors
Confusion Matrix:

Actual class\Predicted C1 ¬ C1
class
C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)

Example of Confusion Matrix:

buy_computer = yes buy_computer = no Total
Actual class\Predicted class

buy_computer = yes 6954 46 7000

buy_computer = no 412 2588 3000
Total 7366 2634 10000

71
Performance Indicators
A\P C ¬C
◼ Class Imbalance Problem:
C TP FN P ◼ One class may be rare, e.g. fraud,
¬C FP TN N or HIV-positive
P’ N’ All ◼ Significant majority of the negative
class and minority of the positive
• Classifier Accuracy, or class
recognition rate: percentage ◼ Sensitivity: True Positive
of test set tuples that are recognition rate
correctly classified ◼ Sensitivity = TP/P
Accuracy = (TP + TN)/All
◼ Specificity: True Negative
recognition rate
• Error rate: 1 – accuracy, or
◼ Specificity = TN/N
Error rate = (FP + FN)/All

72
Performance Indicators
• Precision: exactness – what % that the classifier labeled as
positive are actually positive

• Recall: completeness – what % of the positives did the

classifier label as positive? (equals to sensitivity)

– Perfect score is 1.0

– In practice, inverse relationship between precision & recall

• F measure (F1 or F-score): harmonic mean of precision and

recall,

73
Performance Indicators
– Precision = 90/230 = 39.13%
– Recall = 90/300 = 30.00% = Sensitivity

cancer = cancer Total Recognition(%)

Actual Class\Predicted class
yes = no
cancer = yes 90 210 300 30.00 (sensitivity
cancer = no 140 9560 9700 98.56
(specificity)
Total 230 9770 10000 96.50 (accuracy)

74
Summary
• Supervised methods:
– Model observed data to predict future outcomes.
• Care must be taken in performing and
interpreting classification results
– How determine the best input variables and their
relationship to outcome variables.
– Understand and validate underlying assumptions.
– Transform variables when necessary.
– If in doubt, use a non-linear classification method
• Examples: Neural Nets, Naïve Bayes, SVM, …
Additional Classification Models
• Support Vector Machines
– Max-margin linear classifier, kernel trick.
• Supervised Neural Networks
– RNNs, Convolutional Networks, GNNs, …
• Bagging
– Bootstrap technique, ensemble method.
– N x weak learners -> vote on results (i.e. random forrest)
• Boosting
– Weighted combination, ensemble method.
– N x weak learners in series, each tasked to improve on the
previous.
Q&A

Images Courtesy of Google Image

Intro to Machine Learning for Data Science
No ratings yet
Intro to Machine Learning for Data Science
37 pages
Supervised Learning: Adane Letta Mamuye (PHD)
No ratings yet
Supervised Learning: Adane Letta Mamuye (PHD)
41 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Session 5
No ratings yet
Session 5
36 pages
Classification
No ratings yet
Classification
53 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
Unit 3 ML
No ratings yet
Unit 3 ML
25 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
12 pages
Deep Learning Models Overview
No ratings yet
Deep Learning Models Overview
100 pages
CH 4
No ratings yet
CH 4
106 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Unit 5
No ratings yet
Unit 5
73 pages
Unit 2 1
No ratings yet
Unit 2 1
28 pages
Dsbdunitiii T1729232981820-1
No ratings yet
Dsbdunitiii T1729232981820-1
26 pages
Classification
No ratings yet
Classification
50 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Unit 3
No ratings yet
Unit 3
15 pages
Module Iii
No ratings yet
Module Iii
15 pages
Data Mining and Classification Basics
No ratings yet
Data Mining and Classification Basics
129 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Classification
No ratings yet
Classification
34 pages
Data Mining Overview & Applications
No ratings yet
Data Mining Overview & Applications
47 pages
Donalek Classif
No ratings yet
Donalek Classif
69 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Unit4 PPT
No ratings yet
Unit4 PPT
118 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
98 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Overview of Machine Learning Algorithms
No ratings yet
Overview of Machine Learning Algorithms
123 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
47 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Machine Learning: Supervised vs Unsupervised
100% (1)
Machine Learning: Supervised vs Unsupervised
47 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
70 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Data Science Interview - 1
No ratings yet
Data Science Interview - 1
32 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Unit 4 Supervised Learning
100% (1)
Unit 4 Supervised Learning
75 pages
Classification
No ratings yet
Classification
23 pages
Dav Unit 3
No ratings yet
Dav Unit 3
50 pages
Lec08 Classification KNN ANN
No ratings yet
Lec08 Classification KNN ANN
39 pages
Unit 3
No ratings yet
Unit 3
33 pages
Big Data Analytics: Data Prep
No ratings yet
Big Data Analytics: Data Prep
58 pages
CSCI946 Assignment - 1 - Task - Sheet
No ratings yet
CSCI946 Assignment - 1 - Task - Sheet
4 pages
COMP90049 2021S1 A3-Spec
No ratings yet
COMP90049 2021S1 A3-Spec
7 pages
CSCI835 Database Systems Assignment 0 (Zero) : Saturday 29 August, 2020, 7.00 PM (Sharp)
No ratings yet
CSCI835 Database Systems Assignment 0 (Zero) : Saturday 29 August, 2020, 7.00 PM (Sharp)
12 pages
2020 Spring CSCI251 Lab Preliminary
No ratings yet
2020 Spring CSCI251 Lab Preliminary
3 pages
Chapter 3 - Kinematic Graphical
No ratings yet
Chapter 3 - Kinematic Graphical
37 pages
4 Solar Energy Storage System: Dept of Mechanical Engineering International Islamic University Malaysia
No ratings yet
4 Solar Energy Storage System: Dept of Mechanical Engineering International Islamic University Malaysia
27 pages
All Appendices
No ratings yet
All Appendices
5 pages
Six Sigma Project Steps & Tools Guide
No ratings yet
Six Sigma Project Steps & Tools Guide
1 page
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
6 pages
Dynamic Conditional Correlation Models
No ratings yet
Dynamic Conditional Correlation Models
12 pages
15 - Statistical Quality Control
No ratings yet
15 - Statistical Quality Control
82 pages
Understanding Hypothesis Testing
No ratings yet
Understanding Hypothesis Testing
21 pages
Mind Map Sample
No ratings yet
Mind Map Sample
1 page
MDAE FC 2022-23 Stats Problem&Solutions
No ratings yet
MDAE FC 2022-23 Stats Problem&Solutions
17 pages
Assignment 4 Simple Linear Regression
100% (1)
Assignment 4 Simple Linear Regression
3 pages
Q126831 - 5 - Sampling Plans For Process Validaton and Inspection of Isolated Batches
No ratings yet
Q126831 - 5 - Sampling Plans For Process Validaton and Inspection of Isolated Batches
18 pages
Parametric vs Nonparametric Statistics
No ratings yet
Parametric vs Nonparametric Statistics
2 pages
FIN435 Individual Assignment
No ratings yet
FIN435 Individual Assignment
13 pages
Binary Regression Models Explained
100% (1)
Binary Regression Models Explained
69 pages
Anova Ancova Manova Mancova
100% (5)
Anova Ancova Manova Mancova
1 page
CT-2 QP With Answer Key
No ratings yet
CT-2 QP With Answer Key
6 pages
Ppt. Correlation and Regression
No ratings yet
Ppt. Correlation and Regression
33 pages
AD3491 - Unit 4 - Analysis of Variance Important Questions 2 Marks With Answer - 3-9
No ratings yet
AD3491 - Unit 4 - Analysis of Variance Important Questions 2 Marks With Answer - 3-9
7 pages
Heterogeneity and Endogeneity in Panel Data
No ratings yet
Heterogeneity and Endogeneity in Panel Data
64 pages
Waja Additional Mathematics SPM 2008 - Topic 08 Statistics
No ratings yet
Waja Additional Mathematics SPM 2008 - Topic 08 Statistics
5 pages
Comprehensive Python & Machine Learning Course
No ratings yet
Comprehensive Python & Machine Learning Course
13 pages
Solve Class Imbalance in ML: 10 Techniques
No ratings yet
Solve Class Imbalance in ML: 10 Techniques
16 pages
Activity 1 - Finals
No ratings yet
Activity 1 - Finals
2 pages
Comparative Study of Ratio and Regression Estimators Using Double Sampling For Estimation of Population Mean
No ratings yet
Comparative Study of Ratio and Regression Estimators Using Double Sampling For Estimation of Population Mean
7 pages
Conditional Regression
No ratings yet
Conditional Regression
7 pages
P&S R19 - Unit-5
No ratings yet
P&S R19 - Unit-5
16 pages
Statistics: A Gentle Introduction 4th Edition Frederick L. Coolidge PDF Download
100% (1)
Statistics: A Gentle Introduction 4th Edition Frederick L. Coolidge PDF Download
52 pages
Assignment 2
No ratings yet
Assignment 2
42 pages
SPSS Guide for Data Analysis
No ratings yet
SPSS Guide for Data Analysis
53 pages
IBM Stock Price Forecast Analysis
No ratings yet
IBM Stock Price Forecast Analysis
19 pages
Ge Math 4 Finals Module
No ratings yet
Ge Math 4 Finals Module
15 pages
Lampiran SPSS Stunting Kembang
No ratings yet
Lampiran SPSS Stunting Kembang
7 pages

CSCI946 W5-Classification

Uploaded by

CSCI946 W5-Classification

Uploaded by

CSCI446/946 Big Data Analytics

Week 5 – Lecture: Classification

School of Computing and Information Technology

Eps = 10, MinPts = 4

• Each neuron in a SOM is assigned a

To classify an unknown record:

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

• Determine the class from nearest neighbor list

Bias inputs are not shown

• A surprise: Deep Neural Networks

• Can a neural network predict placement given knowledge

• Your results may vary. Why?

• The algorithm constructs sub-trees recursively

A decision tree built over the bank marketing training dataset

• The algorithm splits on the attribute with the

• Computationally inexpensive, easy to classify

• Decision tree uses greedy algorithms

• Evaluate a decision tree

• Given A, how to calculate P(ci|A)?

– This simplifies the computation of P(A|ci)

• P(aj|ci) for each attribute

Example of Confusion Matrix:

buy_computer = yes 6954 46 7000

• Recall: completeness – what % of the positives did the

– Perfect score is 1.0

• F measure (F1 or F-score): harmonic mean of precision and

cancer = cancer Total Recognition(%)

Images Courtesy of Google Image

You might also like