Concepts and
Techniques
Chapter 11
Applications and Trends in Data
Mining
Additional Theme: Visual Data Mining
Jiawei Han and Micheline Kamber
Department of Computer Science
University of Illinois at Urbana-Champaign
[Link]/~hanj
2006 Jiawei Han and Micheline Kamber. All rights reserved.
March 16, 20 Data Mining: Concept 1
March 16, 20 Data Mining: Concept 2
Visual Data Mining: An
Overview
What is Visual Data Mining?
Survey of techniques
Data Visualization
Visualizing Data Mining Results
Visual Data Mining
March 16, 20 Data Mining: Concept 3
What Is Visual Data Mining?
Visual data mining discovers implicit and
useful knowledge from large data sets
using data and/or knowledge visualization
techniques
Data visualization + Data mining
techniques
March 16, 20 Data Mining: Concept 4
Why Visual Data Mining?
Advantages of human visual system
Highly parallel processor
Sophisticated reasoning engine
Large knowledge base
Can be used to comprehend data distributions,
patterns, clusters, and outliers
Data Mining Visualization
Algorithms
Actionable +
Evaluation +
Flexibility +
User 16, 20
March Concept
Data Mining: + 5
Why Not Only Visual Data
Mining?
Disadvantages of human visual system
Needs training
Not automated
Intrinsic bias
Limit of about 106 or 107 observations
(Wegman 1995)
Power of integration with analytical
methods
March 16, 20 Data Mining: Concept 6
Scope of Visual Data Mining
Visualization: Use of computer graphics to create
visual images which aid in the understanding of
complex, often massive representations of data
Visual Data Mining: The process of discovering
implicit but useful knowledge from large data sets
using visualization techniques
Human
Compute Multimedi
Compute
r a Systems
r
Graphics
Interface
High Pattern s
Performance Recognitio
Computing n
March 16, 20 Data Mining: Concept 7
Purpose of Visualization
Gain insight into an information space by
mapping data onto graphical primitives
Provide qualitative overview of large data sets
Search for patterns, trends, structure,
irregularities, relationships among data
Help find interesting regions and suitable
parameters for further quantitative analysis
Provide a visual proof of computer
representations derived
March 16, 20 Data Mining: Concept 8
Visual Data Mining & Data
Visualization
Integration of visualization and data mining
data visualization
data mining result visualization
data mining process visualization
interactive visual data mining
Data visualization
Data in a database or data warehouse can be
viewed
at different levels of abstraction
as different combinations of attributes or
dimensions
Data can be presented in various visual forms
March 16, 20 Data Mining: Concept 9
Abilities of Humans and Computers
abilitiesof DataStorage
thecomputer NumericalComputation
Searching
Planning
Diagnosis
Logic
Prediction
Perception
Creativity
GeneralKnowledge
humanabilities
March 16, 20 Data Mining: Concept 10
Visual Mining vs. Scientific Vis. &
Graphics
Scientific Visualization
Often visualize physical model, low
dimensionality
Graphics
More concerned with how to render
(draw) rather than what to render
March 16, 20 Data Mining: Concept 11
Data Visualization
View data in database or data warehouse
User may control
Different levels of details
Subset of attributes
Drawn using boxplots, histograms,
polylines, etc.
March 16, 20 Data Mining: Concept 12
Historical Overview of Exploratory
Data Visualization Techniques (cf. [WB 95])
Pioneering works of Tufte [Tuf 83, Tuf 90] and Bertin [Ber
81] focus on
Visualization of data with inherent 2D-/3D-semantics
General rules for layout, color composition, attribute
mapping, etc.
Development of visualization techniques for different
types of data with an underlying physical model
Geographic data, CAD data, flow data, image data,
voxel data, etc.
Development of visualization techniques for arbitrary
multidimensional data (w.o. an underlying physical model)
Applicable to databases and other information
resources
March 16, 20 Data Mining: Concept 13
Dimensions of Exploratory Data Visualization
DataVisualizationTechniques
Geometric
Iconbased
DistortionTechniques
Pixeloriented
Hierarchical Complex
Simple
Graphbased
InteractionTechniques
Mapping Projection Filtering Link&Brush Zooming
March 16, 20 Data Mining: Concept 14
Classification of Data Visualization Techniques
Geometric Techniques:
Scatterplots, Landscapes, Projection Pursuit, Prosection Views,
Hyperslice, ParallelCoordinates...
Icon-based Techniques:
Chernoff Faces, Stick Figures, Shape-Coding, Color Icons,
TileBars,...
Pixel-oriented Techniques:
Recursive Pattern Technique, Circle Segments Technique, Spiral- &
Axes-Techniques,...
Hierarchical Techniques:
Dimensional Stacking, Worlds-within-Worlds,Treemap, Cone Trees,
InfoCube,...
Graph-Based Techniques:
Basic Graphs (Straight-Line, Polyline, Curved-Line,...)
Specific Graphs (e.g., DAG, Symmetric, Cluster,...)
Systems (e.g., Tom Sawyer, Hy+, SeeNet, Narcissus,...)
Hybrid Techniques: arbitrary combinations from above
March 16, 20 Data Mining: Concept 15
Distortion & Dynamic/Interaction
Techniques
Distortion Techniques
Simple Distortion (e.g. Perspective Wall, Bifocal Lenses,
TableLens, Graphical Fisheye Views,...)
Complex Distortion (e.g. Hyperbolic Repr. Hyperbox,...)
Dynamic/Interaction Techniques
Data-to-Visualization Mapping (e.g. Auto Visual, S Plus,
XGobi, IVEE,...)
Projections: (e.g. GrandTour, S Plus, XGobi,...)
Filtering (Selection, Querying) (e.g. MagicLens, Filter/Flow
Queries, InfoCrystal,...)
Linking & Brushing (e.g. Xmdv-Tool, XGobi, DataDesk,...)
Zooming (e.g. PAD++, IVEE, DataSpace,...)
Detail on Demand (e.g. IVEE, TableLens, MagicLens, VisDB,...)
March 16, 20 Data Mining: Concept 16
Visual Survey
Data visualization techniques
Scatterplot Matrices, Landscapes, Parallel
Coordinates
Icon-based, Dimensional Stacking, Treemaps
March 16, 20 Data Mining: Concept 17
Direct Visualization
Ribbons with Twists Based on Vorticity
March 16, 20 Data Mining: Concept 18
Geometric Techniques
Basic Idea
Visualization of geometric transformations and
projections of the data
Methods
Landscapes [Wis95]
Projection Pursuit Techniques [Hub85] (a
techniques for finding meaningful projections of
multidimensional data)
Scatterplot-Matrices [And72, Cle93]
Prosection Views [FB94, STDS95]
Hyperslice [WL93]
Parallel Coordinates [Ins85, ID90]
March 16, 20 Data Mining: Concept 19
Scatterplot-Matrices [Cleveland 93]
Used by ermission of M. Ward, Worcester Polytechnic Institute
matrix of scatterplots (x-y-diagrams) of the k-dimensional data [total of
(k2/2-k) scatterplots]
March 16, 20 Data Mining: Concept 20
Landscapes [Wis 95]
Used by permission of B. Wright, Visible Decisions Inc.
news articles
visualized as
a landscape
Visualization of the data as perspective landscape
The data needs to be transformed into a (possibly artificial) 2D
spatial representation which preserves the characteristics of the data
March 16, 20 Data Mining: Concept 21
Parallel Coordinates [Ins 85, ID 90]
n equidistant axes which are parallel to one of the screen
axes and correspond to the attributes
the axes are scaled to the [minimum, maximum]range of
the corresponding attribute
every data item corresponds to a polygonal line which
intersects each of the axes at the point which corresponds
to the value for the attribute
Attr. 1 Attr. 2 Attr. 3 Attr. k
March 16, 20 Data Mining: Concept 22
Parallel Coordinates
March 16, 20 Data Mining: Concept 23
Icon-Based Techniques
Basic Idea
Visualization of the data values as features of icons
Overview
Chernoff-Faces [Che73, Tuf83]
Stick Figures [Pic70, PG88]
Shape Coding [Bed90]
Color Icons [Lev91, KK94]
TileBars [Hea95]
(use of small icons representing the relevance
feature vectors in document retrieval)
March 16, 20 Data Mining: Concept 24
Stick Figures
used by permission of G. Grinstein, University of Massachusettes at Lowell
census data
showing age,
income, sex,
education, etc.
March 16, 20 Data Mining: Concept 25
Hierarchical Techniques
Basic Idea: Visualization of the data using
a hierarchical partitioning into subspaces.
Overview
Dimensional Stacking [LWW90]
Worlds-within-Worlds [FB90a/b]
Treemap [Shn92, Joh93]
Cone Trees [RMC91]
InfoCube [RG93]
March 16, 20 Data Mining: Concept 26
Dimensional Stacking [LWW90]
attribute4
attribute2
attribute3
attribute1
partitioning of the n-dimensional attribute space in 2-
dimensional subspaces which are stacked into each
other
partitioning of the attribute value ranges into classes the
important attributes should be used on the outer levels
adequate especially for data with ordinal attributes of low
cardinality
March 16, 20 Data Mining: Concept 27
Dimensional Stacking
Visualization of oil mining data with longitude and
Used by permission of M. Ward, Worcester Polytechnic Institute
latitude mapped to the outer x-, y-axes and ore grade
and depth mapped to the inner x-, y-axes
March 16, 20 Data Mining: Concept 28
Dimensional Stacking
Disadvantages:
Difficult to display more than nine
dimensions
Important to map dimensions
appropriately
May be difficult to understand
visualizations at first
March 16, 20 Data Mining: Concept 29
Treemap [JS 91, Shn92,
Joh93]
Screen-filling method which uses a hierarchical
partitioning of the screen into regions depending on the
attribute values
The x- and y-dimension of the screen are partitioned
alternately according to the attribute values (classes)
MSR Netscan image:
March 16, 20 Data Mining: Concept 30
March 16, 20 Data Mining: Concept 31
Treemap of a File System
(Schneiderman)
March 16, 20 Data Mining: Concept 32
Treemaps
The attributes used for the partitioning and
their ordering are user-defined (the most
important attributes should be used first)
The color of the regions may correspond to
an additional attribute
Suitable to get an overview over large
amounts of hierarchical data (e.g., file
system) and for data with multiple ordinal
attributes (e.g., census data)
March 16, 20 Data Mining: Concept 33
Data Mining Result
Visualization
Presentation of the results or knowledge obtained
from data mining in visual forms
Examples
Scatter plots and boxplots (obtained from
descriptive data mining)
Decision trees
Association rules
Clusters
Outliers
Generalized rules
Text mining
March 16, 20 Data Mining: Concept 34
Boxplots from Statsoft: Multiple
Variable Combinations
March 16, 20 Data Mining: Concept 35
Visualization of Data Mining Results
in SAS Enterprise Miner: Scatter
Plots
March 16, 20 Data Mining: Concept 36
Visualization of Association
Rules in SGI/MineSet 3.0
March 16, 20 Data Mining: Concept 37
Visualization of Decision Tree in
SGI/MineSet 3.0
March 16, 20 Data Mining: Concept 38
Vizualization of Decision Trees
March 16, 20 Data Mining: Concept 39
Visualization of Cluster Grouping
IBM Intelligent Miner
March 16, 20 Data Mining: Concept 40
Association Rules (MineSet)
LHS and RHS items
are mapped to x-,
y-axis
Confidence,
support
correspond to
height of the bar
or disc,
respectively
Interestingness is
mapped to Color
March 16, 20 Data Mining: Concept 41
MineSet: Association Rules
March 16, 20 Data Mining: Concept 42
Association Ball Graph
(DBMiner)
Items are
visualized as balls
Arrows indicate
rule implication
Size represents
support
March 16, 20 Data Mining: Concept 43
Classification (SAS EM [SAS 01])
Tree Viewer
Color corresponds to relative frequency of a class in a
node
Branch line thickness is proportional to the square root of
the objects
March 16, 20 Data Mining: Concept 44
Cluster Analysis (H-BLOB: Hierarchical BLOB)
[SBG 00]
Cluster Form ellipsoids Form blobs
(implicit surfaces)
March 16, 20 Data Mining: Concept 45
H-BLOB
March 16, 20 Data Mining: Concept 46
Text Mining (ThemeRiver [WCF+
00])
Visualization of thematic Changes in documents
Vertical distance indicates collective strength of the themes
March 16, 20 Data Mining: Concept 47
Data Mining Process Visualization
Presentation of the various processes of data mining
in visual forms so that users can see the flow of
data cleaning, integration, preprocessing, mining
Data extraction process
Where the data is extracted
How the data is cleaned, integrated,
preprocessed, and mined
Method selected for data mining
Where the results are stored
How they may be viewed
March 16, 20 Data Mining: Concept 48
Visualization of Data Mining
Processes by Clementine
See your solution
discovery
process clearly
Understand
variations with
visualized data
March 16, 20 Data Mining: Concept 49
Interactive Visual Data Mining
Using visualization tools in the data mining process
to help users make smart data mining decisions
Example
Display the data distribution in a set of attributes
using colored sectors or columns (depending on
whether the whole space is represented by either
a circle or a set of columns)
Use the display to which sector should first be
selected for classification and where a good split
point for this sector may be
March 16, 20 Data Mining: Concept 50
Visual data mining
Projection Pursuits
(Class) Tours [Dhillon et al. 98]
Visual Classification [Ankerst et al. KDD 99]
March 16, 20 Data Mining: Concept 51
Projection Pursuits
Exploratory projection pursuit:
Goal: reduce dimensionality
Define interestingness index to each
possible projection of a data set
Maximize this index, project linearly
Not always possible/useful
March 16, 20 Data Mining: Concept 52
Class Tours
Visualizing Class Structure of
Multidimensional Data by Dhillon et al.
1998
Problem: Visualize multidimensional data
categorized into classes
Solution: Project data into 2D while
preserving distances between class means
March 16, 20 Data Mining: Concept 53
Class-Preserving Projection:
Preserves distances between
projected means
March 16, 20 Data Mining: Concept 54
Tours
Tours are animated and interpolated
sequences of 2D projections [Asimov 1985]
Class tours: sequences of class-preserving
2-dimensional projections
Captures inter-class structure of complex,
multi-dimensional data
March 16, 20 Data Mining: Concept 55
Perception-Based Classification
(PBC)
March 16, 20 Data Mining: Concept 56
Visual Classification
Visual Classification:
An Interactive Approach
to Decision Tree
Construction by
Ankerst et al. KDD 99
Exploit experts domain
knowledge and human
visual processing
March 16, 20 Data Mining: Concept 57
Visual Classification
March 16, 20 Data Mining: Concept 58
Visual Classification Results
Comparable classification accuracy
Can produce more understandable decision
trees
Expert domain knowledge can be exploited
March 16, 20 Data Mining: Concept 59
Audio Data Mining
Uses audio signals to indicate the patterns of data
or the features of data mining results
An interesting alternative to visual mining
An inverse task of mining audio (such as music)
databases which is to find patterns from audio
data
Visual data mining may disclose interesting
patterns using graphical displays, but requires
users to concentrate on watching patterns
Instead, transform patterns into sound and music
and listen to pitches, rhythms, tune, and melody in
order to identify anything interesting or unusual
March 16, 20 Data Mining: Concept 60
Summary
Many visualization methods available
How to evaluate and compare methods?
Need for:
Integrated visualization/exploration
systems
Studies of interaction techniques for
mining
Practical case studies
March 16, 20 Data Mining: Concept 61
Acknowledgments
Many slides and images from Mihael Ankerst,
Boeing, Daniel A. Keim, AT&T, Tutorial at
PKDD'2001
Some pictures from Information Visualization in
Data Mining and Knowledge Discovery, edited by
Usama Fayyad, Georges Grinstein and Andreas
Wierse
A good set of slides were prepared by Andrew Wu
(Spring 2004)
March 16, 20 Data Mining: Concept 62
March 16, 20 Data Mining: Concept 63