Charts and Chart Crimes
Saravanan Thirumuruganathan
Common Analysis Tasks
Common Analysis Tasks
Juice Analytics - Graph Chooser | Extreme Presentations.com
Keys and Values
• Key
• independent attribute
• used as unique index to look up items
• used as unique index to look up items
• typically each table has a single key but complex tables can have many
• Value
• dependent attribute, value of cell
Keys and Values
• Visual Encoding choices: how many keys and how many values does it
have?
• no key and two value: scatter plots
• one key and one value: bar charts
• two keys and many values: heat map
• many keys and many values: scatter plot matrices, parallel coordinates etc
Scatterplot
• No keys only values
• Analysis
• Attributes : both are quantitative
• Marks: point
• Channels:
• horizontal and vertical position
• Task:
• find trends, outliers, distribution
• correlation, clusters
• Scalability
• hundreds of items
Scatterplot for Multidimensional Data
Krzywinski and Savig "Points of view: Multidimensional data" Nature Methods 10, 595 (2013) doi:10.1038/nmeth.2531
Scatterplot Tasks: Correlation
https://www.mathsisfun.com/data/scatter-xy-plots.html
Scatterplot Tasks: Clusters, Groups and Classes
https://www.cs.ubc.ca/labs/imager/tr/2014/DRVisTasks/
Scatterplot: Transformations
Scatterplot: Transformations
Connected Scatterplots: Encoding Time
Moritz Stefaner: http://truth-and-beauty.net/projects/remixing-rosling/
Scatterplot for Bivariate Data
• Hexagonal Binning
• Subdivide space into
discrete cells
• Color code based on
class and density
• Splatter Plots
• Visualize dense part
as contours
• Easy to find outliers
Pairwise comparison: Scatterplot Matrix (SPLOM)
• Pros
• Scalable
• Good overview of dataset
• Easy to understand and
decode
• Cons
• Too many dimensions
• Might need pan and zoom
• Use diagonal for labels or
other plots
Bar Charts
• One key and One value
• Data: 1 categorical attribute
and 1 quantitative attribute
• Mark: lines
• Channel: length
• Task: comparison
• Scalability:
• Dozens for key (i.e. many bars)
• Hundreds for values
Stacked Bar Charts
• One more key
• Data: 2 categorical attribute and 1
quantitative attribute
• Mark: lines
• Channel: length, color hue
• Task: part-to-whole relationship
• Scalability:
• stacked key attrib, 10-12 levels
[segments]
• main key attrib, dozens to hundreds of
levels [bars]
Variants of Bar Charts
Dot/Line Charts
• ? key and ? value
• Data: ? quantitative attribute
• Mark: ?
• Channel:
• ?
• Task: ?
• Scalability:
• ? key levels
• ? value levels
Dot/Line Charts
• One key and One value
• Data: 2 quantitative attribute
• Mark: points AND lines
connecting them Quantitative
attribute
• Channel:
• Length
• separated and ordered by key
attribute into horizontal regions
• Task: find trend
• Scalability:
• hundreds of key levels
• hundreds of value levels
Ordered
Line Charts with multiple attributes
Line charts can be extended to show a second categorical attribute using color and size channels.
Choosing bar vs line charts
• depends on type of key attribute
• bar charts if ?
• line charts if ?
Choosing bar vs line charts
• depends on type of key attribute
• bar charts if categorical
• line charts if ordered
• Bar charts encourage discrete comparisons whereas Line charts
encourage trend assessments
• do not use line charts for categorical key attributes
• violates expressiveness principle
• implication of trend (between categorical attributes) so strong that it
overrides semantics!
Choosing bar vs line charts
Bars and Lines: A Study of Graphic Communication. Zacks and Tversky. Memory and Cognition 27:6 (1999), 1073–1079
Choosing bar vs line charts
ok: Men are taller than WTF: The more male a
women (on average) person is, the taller
he/she is
ok: 12 year old are ok: height increases
taller than 10 year old with age
(on average)
Bars and Lines: A Study of Graphic Communication. Zacks and Tversky. Memory and Cognition 27:6 (1999), 1073–1079
Pie Charts
• Pie charts encode a ? attribute
• Marks: ?
• Channel: ?
• Tasks: ?
Pie Charts
• Useful for showing proportions
• Instead of raw counts
• Pie charts encode a single attribute
• Marks: area
• Channel: angle
• Tasks: relative contribution of parts to a whole
Pie Charts vs Bar Charts
Which is better?
Pie Charts vs Bar Charts
• We are a lot better at visualizing line lengths than angles and areas
• Normalized stacked bar chart is typically a better alternative
• Avoid Pie charts when possible Sums to 100
Histogram
• static item aggregation
• task: find distribution
• derived data
• new table: keys are bins,
values are counts
• bin size crucial
• pattern can change
dramatically depending on
discretization
Histogram
Heatmap
• 2 keys and 1 value
• Data: 2 categorical attributes and 1 quantitative attributes
• Mark: points
• Channels: color by the quantitative attribute
• Task: find clusters and outliers
• Scalability: 1M items, ~100 category levels
Heatmap
http://graphics.wsj.com/infectious-diseases-and-vaccines/
Heatmap Tasks
• Compare the values of a parameter for different observations (row)
• Compare the values for a single observation (column)
• Compare the patterns for different rows or columns
• Find similar observations (areas with the same color intensity)
• Find which variables define similarity for a group
• Find correlated variables (similar pattern within a column)
SPLOM vs Parallel Coordinates Plot
Parallel Coordinates Plot (PCP)
• Create one vertical line for
every variable
• Minimum and Maximum
values / multiple scales
• Plot every entity across
each variable
• Line connects values for a
single entity
Parallel Coordinates Plot (PCP)
• Positive correlation between two adjacent variables: almost all segments are
parallel to each other
• Negative correlation the line segments mostly cross over each other at a single
spot between axes.
• Clusters in some variable space: several trace lines that are near each other
and have similar pattern
• Outliers: trace lines that have unusual pattern and/or fall out outside the
common plot area Nathan Yau “Visualize This” 2011 and “Data Points” 2013
Parallel Coordinates Plot (PCP)
https://visualizationcheatsheets.github.io/pcp.html
Star Plots
• Add additional axes in a radial
fashion
• Lends itself to circular
relationships like time
Choropleth Map
• When central task is understanding
spatial relationships
• Data
• geographic geometry
• 1 quant attribute per region
• Encoding
• Position (geographic boundaries)
• Color
Choropleth Map
• Spurious correlations: most
attributes just show where
people live
• Consider when to normalize by
population density
https://extremepresentation.typepad.com/files/chart-chooser-2020.pdf
Chart Crimes
What is the issue?
What is the issue?
• Truncate y-axis only if
• Zero value does not makes sense
• It is the norm (e.g. stocks)
https://qz.com/418083/its-ok-not-to-start-your-y-axis-at-zero/
What is the issue?
Solution 1: Clip the outliers
Solution 2: Log Scale
What is the issue?
What is the issue?
https://blog.datawrapper.de/dualaxis/
Solution: Separate Charts
https://blog.datawrapper.de/dualaxis/
What is the issue?
Solution: Avoid Spurious Correlations
What is the issue?
What is the issue?
• There is a positive relationship
between cigarette consumption
and life expectancy at a country-
by-country level!
[Alberto Cairo. How Charts Lie, 2019]
What is the issue?
• Now the
relationship is not
so clear
Solution: Visualize the effect directly
Rules of Thumb for
Good Visualizations
Is Visualization even necessary?
How do you feel about doing science?
Is Visualization even necessary?
Is Visualization even necessary?
Is Visualization even necessary?
Is Visualization even necessary?
Is Visualization even necessary?
Is Visualization even necessary?
Charting Advice
•Function first, form next
• dangerous to start with aesthetics
• start with focus on functionality
• possible to improve aesthetics later on, as refinement
Charting Advice: Labelling
• Make visualizations as self-documenting as possible
• Meaningful & useful title, labels, legends
• axes and panes/subwindows should have labels and tick marks
• everything that’s plotted should have a legend
• use reasonable numerical format
• Eg thousand separators are different in US/EU/India etc
• avoid scientific notation in most cases
Chart Axes
• Labelled axis is critical
• Avoid cropping y-axis
• include 0 at bottom left
• otherwise slope misleads
• Some exceptions
• Eg small change matters
Avoid Dual y-axes if possible
Avoid Dual y-axes if possible
• Scale and origin of dual y-axis
are not the same
• Changing them can result
in completely different
interpretations
• Our visual system usually
interprets the meeting point of
two lines as significant even
though it is not here
Solution 1: Use Multiple Plots
Solution 2: Redraw with common metric
• Same data plotted with single y-axis
• Tells a more interesting story
https://www.storytellingwithdata.com/blog/2016/2/1/be-gone-dual-y-axis
Avoid Dual y-axes if possible
• Only use if both axes measure the same thing
Tick Placement
• Ticks help in user interpretation of data, but too much may hinder
Aspect Ratio
https://eagereyes.org/blog/2013/banking-45-degrees http://vis.stanford.edu/papers/banking
Aspect Ratio
• Aspect Ratio: the ratio of width to height of the entire plot
• Our ability to judge angles is more accurate at exact diagonals (i.e., 45
degrees) than at arbitrary directions.
• The banking to 45 degree idiom computes the best aspect ratio for a
chart in order to maximize the number of line segments that fall close
to the diagonal.
Teaching Yourself
Useful Resources
• https://www.cs.ubc.ca/~tmm/talks.html#vadallslides
• Everything from Munzner is great, and the VADA slides are excellent
• Other good courses
• 6.859 from MIT
• CSE 442 from University of Washington
Useful Resources
• https://informationisbeautiful.net/
• https://winners.webbyawards.com/winners/websites-and-mobile-
sites/features-design/best-data-visualization?years=0&sort=0
• https://www.reddit.com/r/ChartCrimes/
• https://viz.wtf/