0% found this document useful (0 votes)
49 views16 pages

Visualizing Many Distributions at Once

The document discusses various visualization methods for comparing multiple distributions simultaneously, such as boxplots, violin plots, strip charts, and ridgeline plots. It highlights the advantages and limitations of each method, emphasizing the importance of selecting the right visualization based on data characteristics and sample sizes. The document also illustrates how these methods can reveal patterns and relationships in data, using examples like temperature distributions and voting patterns.

Uploaded by

rohankumar809280
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views16 pages

Visualizing Many Distributions at Once

The document discusses various visualization methods for comparing multiple distributions simultaneously, such as boxplots, violin plots, strip charts, and ridgeline plots. It highlights the advantages and limitations of each method, emphasizing the importance of selecting the right visualization based on data characteristics and sample sizes. The document also illustrates how these methods can reveal patterns and relationships in data, using examples like temperature distributions and voting patterns.

Uploaded by

rohankumar809280
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Visualizing Many Distributions at Once

To reveal patterns, variability, and relationships when dealing with


multiple groups of data
Compare multiple distributions simultaneously

 Earlier methods like histograms or single distribution plots don’t scale well to many
groups
 Example: weather data → showing temperature distributions for each of the 12
months in a year.
 For that e use specialized visualization methods such as:
Boxplots
Violin plots
Ridgeline plots
Strip charts
Sina plots
Compare multiple distributions simultaneously

Key Terms
Response variable: The variable whose distribution is being studied (e.g.,
temperature).
Grouping variable: Defines subsets of data for comparison (e.g., months).

General approach: One axis shows the response variable, the other the grouping
variable.
Visualizing Distributions Along the Vertical Axis

Mean/Median with Error


Bars
Simplest approach: plot a point
(mean/median) with error
bars.
Problems:
Oversimplification → hides
distribution details.
Ambiguity → unclear whether it’s
mean/median.
Error bars’ meaning is
inconsistent (standard deviation?
standard error? CI?).
Misleading for skewed data.
Visualizing Distributions Along the Vertical Axis

Boxplots

Components
Median: central line.
Interquartile Range (IQR): 25th to
75th percentile.
Whiskers: extend up to 1.5×IQR from
box.
Outliers: dots beyond whiskers.

Advantages:
Compact → works well when comparing
many groups.
Shows skewness and spread.
Standardized → universally understood.
Visualizing Distributions Along the Vertical Axis

Boxplots

• Work well when plotted next to


each other to visualize many
distributions at once.
• Reveals December temperatures
are skewed (long tail for extremely
cold days)

• Limitation: Cannot reveal


multimodality (two peaks).
Visualizing Distributions Along the Vertical Axis

Violin Plots
Violin plots extend boxplots by
showing distribution shape
through kernel density estimation
(KDE).

Width represents density at that


value.

Symmetric shape → mirrored KDE.


Can reveal multiple peaks
(multimodality).
Visualizing Distributions Along the Vertical Axis

When to use:
Large sample sizes.
Detecting subtle distribution
shapes.
Limitations:
Requires enough data → small
samples make density misleading.

May suggest smooth patterns that


don’t exist.
Visualizing Distributions Along the Vertical Axis

Lincoln temperatures as violin plots.


Reveals bimodality in November
(two peaks at ~35°F & ~50°F).
Boxplot cannot reveal this.
More detailed than boxplots.
Needs large sample sizes, otherwise
smoothing creates misleading shapes.
Visualizing Distributions Along the Vertical Axis

Strip Charts
Definition: Plot all data points
individually (raw visualization).
Problem: Overplotting (points
overlap, hiding density).
Visualizing Distributions Along the Vertical Axis

Jittering
Add random horizontal noise
so points don’t overlap.
Reveals frequency without
density smoothing.
Best for small to medium
samples.
Not suitable for large datasets
(becomes unreadable).
Visualizing Distributions Along the Vertical Axis

Sina Plots
Sina plot of Lincoln temperatures.
Hybrid of violin plot + jittered points.
Points spread horizontally
proportional to density at that value.
Shows both individual
observations and overall
distribution.
Best for medium datasets.
Combines advantages of violins
(shape) and strip charts (raw points).
Visualizing distributions along the horizontal axis

Ridgeline Plots

Each month’s distribution stacked


vertically.
November’s two clusters (35°F,
50°F) clearly visible.
Looks like mountain ridges → hence
the name.
Excellent for time series
comparisons.
Very intuitive for showing gradual
changes.
Visualizing distributions along the horizontal axis

Scaling to Large Datasets


Movie lengths (1913–2005) as
ridgeline plot.
Nearly 100 distributions
shown.
Early decades = wide variety of
lengths.
Since 1960s = standardization
at ~90 minutes.
Ridgeline plots scale well to
large numbers of groups.
Visualizing distributions along the horizontal axis

Comparing Two Groups


Over Time
Voting patterns in U.S. House
of Representatives (1963–
2013).
Uses DW-NOMINATE
scores to represent ideology.
Two ridgeline distributions at
each Congress session:
Democrats (blue), Republicans
(red).
Shows increasing
polarization over time.
Perfect for showing divergence
between two categories across
time.
Thank You!

Questions and Queries are Invited

You might also like