CS685: Data Mining: Arnab Bhattacharya
CS685: Data Mining: Arnab Bhattacharya
Introduction
Arnab Bhattacharya
[email protected]
Exams: 20-30%
Project: 40-45%
Results: 20%
Presentation and/or Demonstration: 10%
Report: 10%
Assignments and Quizzes: 20-30%
Exams: 20-30%
Project: 40-45%
Results: 20%
Presentation and/or Demonstration: 10%
Report: 10%
Assignments and Quizzes: 20-30%
Paper presentation and discussion: 10%
Depends on class strength
Exams: 20-30%
Project: 40-45%
Results: 20%
Presentation and/or Demonstration: 10%
Report: 10%
Assignments and Quizzes: 20-30%
Paper presentation and discussion: 10%
Depends on class strength
Things may be changed by mutual consent after discussion in class
Slides
Classwork
Book: no text book
Reference books
Many
References mentioned in slides
Conference proceedings and journal articles
KDD, ICDM, SDM, PKDD, PAKDD, etc.
TKDE, KDD, DMKD, etc.
Scalability
High dimensionality
Heterogeneous and complex data
Web
Unstructured text
Graph
Distributed data
Data ownership and privacy
How to access knowledge without violating privacy
Classification
Predicting the class of a data object
Clustering
Finding groups in data
Association
Finding co-occurring and related itemsets
Visualization
Facilitating human discovery of patterns
Summarization
Succinctly describing a group
Anomaly detection
Identifying abnormal behavior
Estimation
Predicting values of a data object
Link analysis
Finding relationships among data objects
Arnab Bhattacharya ([email protected]) CS685: Introduction 2018-19 10 / 16
Extra-sensory perception (ESP)
A lady claimed that she can sense if tea or milk was mixed later
A lady claimed that she can sense if tea or milk was mixed later
Fisher tested with 8 cups, with 4 having tea mixed later
Only 1 chance of being correct out of 84 = 70 possibilities
A lady claimed that she can sense if tea or milk was mixed later
Fisher tested with 8 cups, with 4 having tea mixed later
Only 1 chance of being correct out of 84 = 70 possibilities
A lady claimed that she can sense if tea or milk was mixed later
Fisher tested with 8 cups, with 4 having tea mixed later
Only 1 chance of being correct out of 84 = 70 possibilities
A lady claimed that she can sense if tea or milk was mixed later
Fisher tested with 8 cups, with 4 having tea mixed later
Only 1 chance of being correct out of 84 = 70 possibilities
Rhine paradox
ESP story (extra-sensory perception)
Rhine paradox
ESP story (extra-sensory perception)
Moral: Knowing what data mining is and is not will help you look
smarter (than others not taking this course)
Rhine paradox
ESP story (extra-sensory perception)
Moral: Knowing what data mining is and is not will help you look
smarter (than others not taking this course)
Just doing it once may not prove or disprove anything
Tea taster story
Rhine paradox
ESP story (extra-sensory perception)
Moral: Knowing what data mining is and is not will help you look
smarter (than others not taking this course)
Just doing it once may not prove or disprove anything
Tea taster story
Moral: Multiple random tests are needed
Rhine paradox
ESP story (extra-sensory perception)
Moral: Knowing what data mining is and is not will help you look
smarter (than others not taking this course)
Just doing it once may not prove or disprove anything
Tea taster story
Moral: Multiple random tests are needed
Bonferroni’s principle: if you look in more places for interesting
patterns than your amount of data supports, you are bound to “find”
something “interesting” (most likely spurious)
Terrorism story
Rhine paradox
ESP story (extra-sensory perception)
Moral: Knowing what data mining is and is not will help you look
smarter (than others not taking this course)
Just doing it once may not prove or disprove anything
Tea taster story
Moral: Multiple random tests are needed
Bonferroni’s principle: if you look in more places for interesting
patterns than your amount of data supports, you are bound to “find”
something “interesting” (most likely spurious)
Terrorism story
Moral: When checking a particular rule or property, if there are many
possibilities, then it will happen
Rhine paradox
ESP story (extra-sensory perception)
Moral: Knowing what data mining is and is not will help you look
smarter (than others not taking this course)
Just doing it once may not prove or disprove anything
Tea taster story
Moral: Multiple random tests are needed
Bonferroni’s principle: if you look in more places for interesting
patterns than your amount of data supports, you are bound to “find”
something “interesting” (most likely spurious)
Terrorism story
Moral: When checking a particular rule or property, if there are many
possibilities, then it will happen
Obvious rules may not always make sense
Ice-cream story
Rhine paradox
ESP story (extra-sensory perception)
Moral: Knowing what data mining is and is not will help you look
smarter (than others not taking this course)
Just doing it once may not prove or disprove anything
Tea taster story
Moral: Multiple random tests are needed
Bonferroni’s principle: if you look in more places for interesting
patterns than your amount of data supports, you are bound to “find”
something “interesting” (most likely spurious)
Terrorism story
Moral: When checking a particular rule or property, if there are many
possibilities, then it will happen
Obvious rules may not always make sense
Ice-cream story
Moral: When deducting rules, look at correct attributes, i.e., those
that explain the phenomenon
Arnab Bhattacharya ([email protected]) CS685: Introduction 2018-19 16 / 16