0% found this document useful (0 votes)

12 views16 pages

Unit III Notes (First Half) 1

Effective data visualization is crucial in data science for exploratory data analysis, error detection, and communication of insights. It involves using various tools and techniques to analyze and present data clearly, ensuring that visualizations are aesthetically pleasing and informative. Key principles include maximizing data-ink ratio, minimizing misleading elements, and selecting appropriate chart types based on the data's purpose.

Uploaded by

ragulrr.22msc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views16 pages

Unit III Notes (First Half) 1

Uploaded by

ragulrr.22msc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Visualizing Data

Effective data visualization is an important aspect of data science, for at least

three distinct reasons:
 Exploratory data analysis: What does your data really look like? Plots and
visualizations are the best way I know of to do this.
 Error detection: Feeding unvisualized data to any machine learning
algorithm is asking for trouble. Problems with outlier points, insu_cient
cleaning, and erroneous assumptions reveal themselves immediately when
properly visualizing your data.
 Communication: Can you present what you have learned effectively to
others? Meaningful results become actionable only after they are shared.

1. Exploratory Data Analysis

Exploratory Data Analysis (EDA) is the process in data science of analyzing and
visualizing data to understand its main characteristics, discover patterns and trends
(finding how data behaves, repeats, or changes over time) , detect outliers or errors,
test assumptions, and generate insights before applying formal modeling or
machine learning techniques.
i. Face a New Data Set: What should you do when encountering a new data set?
Answer the basic questions:
 Who constructed this data set, when, and why? Understanding how your
data was obtained provides clues as to how relevant it is likely to be, and
whether we should trust it.
 How big is it? How rich is the data set in terms of the number of fields or
columns? How large is it as measured by the number of records or rows?
 What do the fields mean? Walk through each of the columns in your data
set, and be sure you understand what they are. Which fields are numerical
or categorical?

Look for Familiar or Interpretable Records::

 Get to know a few data records closely (persons, places, or things).
 This helps you understand whether the data makes sense and spot errors.
 If familiar records don’t exist, focus on special cases like maximum or
minimum values.
Summary Statistics
 Check basic statistics(information) for each column to understand data
spread and center.
 For numerical data, use minimum, maximum, median, and quartiles.
 For categorical data, count unique categories and find the most frequent
ones.
Pairwise Correlations
 Analyze correlations between variables to see how they are related.
 Good features should correlate strongly with the target but not too much
with each other.
Class Breakdowns :
 Break data by major categories (like gender or location).
 Compare distributions to see if meaningful differences exist between groups.
Plots of Distributions
 Use graphs to visually inspect data distributions.
 Look for patterns, trends, and outliers, and decide if data cleaning or
transformation is needed.
EDA helps you understand your data using records, statistics, correlations,
categories, and visualizations before modeling.
2. Visualization Tools
Visualization tools help us understand data using graphs and charts. The choice
of tool depends on the purpose of visualization. Visualization tasks are mainly
grouped into three categories:

1. Exploratory Data Analysis (EDA)

 Used for quick and interactive exploration of data

 Helps to find patterns, trends, and errors
 Tools: Excel, Python (iPython/Jupyter), R, Mathematica
 These tools hide complexity and provide reasonable default plots, which
can be customized if needed

2. Publication / Presentation-Quality Charts

 Used to create high-quality, clear, and attractive graphs

 Focuses on accurate and informative presentation

 Tools: Matplotlib, Gnuplot, R visualization libraries

 Allows full control over design, style, and layout

3. Interactive Visualization for External Applications

 Used to build interactive dashboards for users

 Helps non-technical users explore data easily

 Tools: Python dashboards, Tableau

 Supports interaction like filtering, zooming, and linked views

Visualization tools help explore data, present insights clearly, and build interactive
systems depending on the goal.
3. Developing a Visualization Aesthetic

It means choosing and designing charts in a clean, clear, and visually pleasing
way so that people can easily understand the data.

📌 In simple words:
👉 Make your graphs look neat, meaningful, and easy to read.

This includes:

 Using the right type of chart

 Choosing clear colors and labels

 Avoiding clutter and confusion

 Highlighting important patterns or trends

The visual aesthetic and vocabulary is largely derived from the books of Edward
Tufte [Tuf83, Tuf90, Tuf97]. He is an artist. He has thought long and hard about
what makes a chart or graph informative and beautiful, basing a design aesthetic on
the following principles:

_Maximize data-ink ratio: Your visualization is supposed to show off your data.
So why is so much of what you see in charts the background grids, shading, and
tic-marks?
_ Minimize the lie factor: As a scientist, your data should reveal the truth,
ideally the truth you want to see revealed.
_ Minimize chart junk: Modern visualization software often adds cool visual
effects that have little to do with your data set. Is your graphic interesting because
of your data, or in spite of it? ie) 3D effect, shadow, animation not necessary
_ Use proper scales and clear labeling: Accurate interpretation of data depends
upon non-data elements like scale and labeling.
_ Make effective use of color: The human eye has the power to discriminate
between small gradations in hue and color saturation. Are you using color to
highlight important properties of your data, or just to make an artistic statement?
_ Exploit the power of repetition: Arrays of similar graphics with different but
related data elements provide a concise and powerful way to enable visual
comparisons.
i. Maximizing Data-Ink Ratio :: In any graphic, some of the ink is used to
represent the actual underlying data, while the rest is employed on graphic effects.

Data-Ink Ratio = (Ink used to show data) / (Total ink used in the graphic)

Goal : 👉 Maximize the data-ink ratio (remove unnecessary lines, 3D effects,

and decorations)

ii. Minimizing the Lie Factor : A visualization seeks to tell a true story about what
the data is saying. Changing data itself is an obvious lie, but even correct data can
be shown in a misleading way using bad charts. A good chart keeps this value as
close to 1 as possible.

A company’s sales increased from 100 units to 110 units.

🔹 Actual effect in the data

Actual increase = 110 − 100 = 10 units

Percentage increase = 10%
Now suppose a bar chart shows the sales bar doubling in height, making it look
like sales increased by 100%(Wrong)..

Lie Factor = 10% / 10% = 1

Bad practices include:

 Presenting means without variance: The data values {100; 100; 100; 100;
100} and {200; 0; 100; 200; 0} tell different stories, even though both means
are 100.
 Presenting interpolations without the actual data: Regression lines and fitted
curves are effective at communicating trends and simplifying large data sets
 Distortions of scale : This happens when the shape or size of a chart (its
width vs. height, called the aspect ratio) changes how we see the data.
Even if the numbers are the same, stretching or squashing the chart can
make trends look bigger, smaller, faster, or slower than they really are.
 Eliminating tick labels from numerical axes : Tick labels are the numbers
along the axes (like 0, 10, 20…). If you hide these numbers, people
cannot tell the exact values of the data from the chart.
 Hiding the origin (zero point) : Usually, charts assume the y-axis starts at 0.
If you start the y-axis at some higher value instead of 0: The biggest
number looks much bigger compared to the smallest.
Chart Types

It branches into three main purposes:

1. Distribution → Showing how values are spread.

2. Relationship → Showing how two or more variables relate.

3. Comparison → Showing differences among groups or over time.

Step 2: Distribution

 Goal: See how data values are spread.

 Chart options:

1. Histogram – Shows frequency of values.

2. Boxplot – Shows median,outliers.

3. Density plot / Line plot – Smooth curve showing distribution.

4. Dot plot / Strip chart – Each value is a dot.

Step 3: Relationship

 Goal: See how two variables are connected.

 Chart options:

1. Scatter plot – Shows correlation between variables.

2. Line plot – Shows trend over continuous variable (like time).

Step 4: Comparison

 Goal: Show differences across categories or time.

 Chart options:
o For time series / ordered data:

 Line chart

o For categorical comparison:

 Bar chart (vertical or horizontal)

o For parts of a whole:

 Pie chart

 Step 1: Decide your goal (distribution, relationship, comparison).

 Step 2: Look at your data type (categorical, numerical, time series).

 Step 3: Pick the chart that best communicates your message.

Chart Types - Primary types of data visualizations.

6.3.1 Tabular Data - Tables of numbers can be beautiful things,

and are very effective ways to present data.

Although they may appear to lack the visual appeal of graphic

presentations, tables have several advantages over other
representations, including:

1. Representation of precision: The resolution of a number tells

you something about the process of how it was obtained:
 an average salary of $79,815 says something different than
$80,000.
 Such subtleties are generally lost on plots, but openly clear
in numerical tables.

2. Representation of scale: When numbers are written in columns:

 Longer numbers mean bigger values

 Shorter numbers mean smaller values

If numbers are right-aligned, your eyes can easily compare:

 thousands vs lakhs

 hundreds vs thousands

3. Multivariate data (many variables)

When data has more than two variables, graphs become confusing.

 2 variables → easy graph

 3+ variables → hard to imagine

👉 Tables don’t have this problem.

They can handle many columns easily.

4. Different types of data together (heterogeneous data)

Tables are best when data includes:

 Numbers (marks, salary)

 Text (names, cities)
 Categories (grade, role)
 Symbols or emojis (✔️)

Graphs struggle with mixed data, but tables handle all of it neatly.
5. Compactness: Tables are particularly useful for representing small numbers of points.
Two points in two dimensions can be drawn as a line, but why bother? A small table is
generally better than a sparse visual.

Best practices include:

1. Order rows to invite comparison: You have the freedom to order the rows in a table any
way you want, so take advantage of it. Sorting the rows according to the values of an
important column is generally a good idea.
2. Order columns to highlight importance, or pairwise relationships: Eyes darting from left-
to-right across the page cannot make e_ective visual comparisons, but neighboring fields
are easy to contrast.

3. Right-justify uniform-precision numbers:

4. Use emphasis, font, or color to highlight important entries:
5. Avoid excessive-length column descriptors:

6.3.2 Dot and Line Plots

Dot Chart (Dot Plot)
 A dot chart shows only dots for data values.
 Each dot represents one data point.
 No lines are drawn between points.
When to use:
 When data is separate or categorical
 When values exist only at specific points (like marks of students, income of states)
Example:
Marks of students in a test:
 Each student’s mark = one dot
👉 Dot charts show actual data clearly and avoid confusion.
Line Chart (Line Plot)
 A line chart shows dots connected by lines.
 It shows how data changes continuously.
When to use:
 When data changes over time (days, months, years)
 When values in between also make sense
Example:
Temperature over a week:
 Points are connected to show rising or falling trend
👉 Line charts help us see trends and patterns.

Advantages of Line Charts

1. Shows trend clearly

Line charts clearly show whether values are increasing, decreasing, or stable over time.

2. Good for time-based data

Best used when data changes with time (days, months, years).

3. Helps in interpolation and prediction

Lines help estimate values between known points and predict future behavior.

4. Easy comparison
Multiple lines can be drawn to compare different groups (e.g., sales of two products).

5. Highlights patterns
Seasonal effects, cycles, and sudden changes are easy to spot.

Best Practices for Line Charts

 Use line charts only for continuous data

Do not connect points for categorical data (like states or names).

 Always show actual data points

Show dots along with the line so viewers can see real observations.

 Keep the chart simple

Avoid too many lines in one chart (2–4 lines maximum).

 Choose proper axis ranges

 Start from zero when it makes sense

 Avoid misleading truncation of axes

 Use consistent scale and labels
Clearly label axes and units to avoid confusion.

 Use colors carefully

Use different colors or line styles to distinguish lines, but don’t overuse them.
6.3.3 Scatter Plots

Why large data is hard to show

 When a dataset has thousands of points, graphs can become messy.

 Too many dots overlap and form a dark blob (often called a “black ball”).
 This hides useful patterns.

👉 But if done correctly, scatter plots can clearly show even very large datasets.

What is a scatter plot?

 A scatter plot shows every data point as a dot.

 Each dot represents an (x, y) value.

Example:

 Height on x-axis

 Weight on y-axis

 Each person = one dot

Best practices for scatter plots (simple)

✔ Use the right dot size

 Biggest mistake: dots are too large

 Large dots overlap and hide data

✔ Handle overlapping points properly

 When many points have the same values (especially integers):

✔ Showing many variables (more than 2)

Problem:
 Humans can’t easily visualize 4 or more dimensions

Solution 1: Reduce dimensions

 Convert many variables into two new axes

 Techniques like PCA do this

Solution 2: Pairwise scatter plots (better)

 Draw many small scatter plots

 Each plot compares two variables
 Helps find:
o Relationships
o Correlations
6.3.4 Bar Plots and Pie Charts

What bar charts and pie charts show

Both bar charts and pie charts show how data is divided into groups.

Use a Bar Chart when:

 You want to compare values accurately.

 You want to see which is largest or smallest.

 You want to track changes across categories or time.

Use a Pie Chart when:

 You want to show parts of a whole (percentages).

 You only have a few categories.

6.3.5 Histograms - understand distribution of data

 A histogram shows how data is distributed.

 Data is grouped into ranges (bins).

 Each bar shows how many values fall in that range.

6.3.6 Data Maps - quickly see where data is high or low

 A heatmap uses colors to show data values.

 Darker or brighter colors = higher values
 Lighter colors = lower values

👉 Instead of numbers, color intensity shows meaning.

Example:

 Student performance:
o Green → good
o Yellow → average
o Red → poor

Key points:

 Best for large datasets

 Makes patterns and clusters easy to see

6.4 Great Visualizations - Developing your own visualization aesthetic gives you
a language to talk about what you like and what you don't like.

1. Marey's Train Schedule - Time of day → x-axis

 Stations from Paris to Lyon → y-axis

 Each line = one train
Instead of a table of times, Marey drew lines showing where each train is at every moment.

 Steep line → fast train

 Flat line → slow train

 Horizontal line → train stopped at a station.

 Where two lines cross = trains pass each other

2. Snow's Cholera Map (Cholera Disease)

New York's Weather Year

Unit4 Dev
No ratings yet
Unit4 Dev
37 pages
Data Visualization Techniques and Best Practices
No ratings yet
Data Visualization Techniques and Best Practices
5 pages
Data Visualization and EDA Essentials
No ratings yet
Data Visualization and EDA Essentials
39 pages
Data Visualization Fundamentals and Techniques
No ratings yet
Data Visualization Fundamentals and Techniques
34 pages
Understanding Data Visualization Essentials
No ratings yet
Understanding Data Visualization Essentials
12 pages
Essential Data Visualization Techniques
No ratings yet
Essential Data Visualization Techniques
14 pages
Data Visualization Techniques and Tools
No ratings yet
Data Visualization Techniques and Tools
17 pages
Data Visualization Techniques Explained
No ratings yet
Data Visualization Techniques Explained
8 pages
Mid Term Data
No ratings yet
Mid Term Data
21 pages
UNIT-II
No ratings yet
UNIT-II
19 pages
703 (A) Data Visualization Unit-1 Notes
No ratings yet
703 (A) Data Visualization Unit-1 Notes
5 pages
Understanding Data Visualization Basics
No ratings yet
Understanding Data Visualization Basics
24 pages
Data Visualization Basics with Tableau
No ratings yet
Data Visualization Basics with Tableau
77 pages
Seven Stages of Data Visualization
No ratings yet
Seven Stages of Data Visualization
51 pages
Essential Guide to Data Visualization
No ratings yet
Essential Guide to Data Visualization
6 pages
Essential Data Visualization Techniques
No ratings yet
Essential Data Visualization Techniques
17 pages
Relational Data Visualization Insights
No ratings yet
Relational Data Visualization Insights
8 pages
Effective Visualization Techniques
No ratings yet
Effective Visualization Techniques
27 pages
Effective Data Visualization Principles
No ratings yet
Effective Data Visualization Principles
12 pages
Data Visualization Techniques for Managers
No ratings yet
Data Visualization Techniques for Managers
34 pages
Data Visualization in Data Science
No ratings yet
Data Visualization in Data Science
21 pages
Pie Charts: Visualizing Data Patterns
No ratings yet
Pie Charts: Visualizing Data Patterns
18 pages
Essential Data Visualization Interview Questions
No ratings yet
Essential Data Visualization Interview Questions
17 pages
Key Insights on Data Visualization
No ratings yet
Key Insights on Data Visualization
12 pages
Datawrapper: Essential for Visualizing Data
No ratings yet
Datawrapper: Essential for Visualizing Data
12 pages
Data Visualization: Benefits and Types
No ratings yet
Data Visualization: Benefits and Types
12 pages
Basics of Data Visualization Explained
No ratings yet
Basics of Data Visualization Explained
60 pages
Data Visualization Techniques and Tools
No ratings yet
Data Visualization Techniques and Tools
7 pages
DATA VISUALIZATION
No ratings yet
DATA VISUALIZATION
7 pages
Data Visualization: Concepts & Tools
No ratings yet
Data Visualization: Concepts & Tools
10 pages
Mackinlay's Visualization Algorithm Explained
No ratings yet
Mackinlay's Visualization Algorithm Explained
11 pages
Data Visualization Techniques Explained
No ratings yet
Data Visualization Techniques Explained
9 pages
DMV 3
No ratings yet
DMV 3
38 pages
Essential Data Visualization Techniques
No ratings yet
Essential Data Visualization Techniques
34 pages
Exploratory Data Analysis Techniques
No ratings yet
Exploratory Data Analysis Techniques
12 pages
HR Analytics Project
No ratings yet
HR Analytics Project
17 pages
Effective Data Visualization Techniques
No ratings yet
Effective Data Visualization Techniques
63 pages
Data Visualization Principles and Tools
No ratings yet
Data Visualization Principles and Tools
31 pages
Data Visualization Techniques Overview
100% (1)
Data Visualization Techniques Overview
10 pages
UNIT 3 DV
No ratings yet
UNIT 3 DV
33 pages
Best Practices in Data Visualization
No ratings yet
Best Practices in Data Visualization
30 pages
DVT All Units 1 To 5 PDF
No ratings yet
DVT All Units 1 To 5 PDF
206 pages
Data Visualization Overview and Best Practices
No ratings yet
Data Visualization Overview and Best Practices
4 pages
Effective Data Visualization Techniques
No ratings yet
Effective Data Visualization Techniques
34 pages
Data Visualization Techniques Explained
No ratings yet
Data Visualization Techniques Explained
23 pages
Data Visualization in Analysis Process
No ratings yet
Data Visualization in Analysis Process
8 pages
Understanding Data Visualization Basics
No ratings yet
Understanding Data Visualization Basics
7 pages
UNIT-1 DV Notes
No ratings yet
UNIT-1 DV Notes
15 pages
Essential Guide to Data Visualization
No ratings yet
Essential Guide to Data Visualization
5 pages
Data Visulaization Notes
No ratings yet
Data Visulaization Notes
11 pages
Data Analytics Tools and Techniques
No ratings yet
Data Analytics Tools and Techniques
21 pages
Exploratory Data Analysis Techniques
No ratings yet
Exploratory Data Analysis Techniques
11 pages
Data Visualization: Importance & Principles
No ratings yet
Data Visualization: Importance & Principles
42 pages
Data Visualization: Types and Importance
No ratings yet
Data Visualization: Types and Importance
3 pages
Foundations of Data Visualization Techniques
100% (1)
Foundations of Data Visualization Techniques
12 pages
Data Visualization Techniques Explained
No ratings yet
Data Visualization Techniques Explained
13 pages
Data Visualization and Analytics Guide
No ratings yet
Data Visualization and Analytics Guide
35 pages
DV Unit1
No ratings yet
DV Unit1
26 pages
Comprehension Passages with Questions
No ratings yet
Comprehension Passages with Questions
36 pages
DevOps Engineer Resume - AWS & Docker Expertise
No ratings yet
DevOps Engineer Resume - AWS & Docker Expertise
2 pages
Installing Tomcat Web Server Guide
No ratings yet
Installing Tomcat Web Server Guide
9 pages
Nokia Crisis Management SOP Guide
No ratings yet
Nokia Crisis Management SOP Guide
21 pages
Antifuse Technology in ASIC Design
No ratings yet
Antifuse Technology in ASIC Design
13 pages
Fluid Mechanics: Lift and Drag Analysis
No ratings yet
Fluid Mechanics: Lift and Drag Analysis
16 pages
Helix Waltz 3.2.1 Download Guide
No ratings yet
Helix Waltz 3.2.1 Download Guide
62 pages
Parallel Corpora in Linguistics
No ratings yet
Parallel Corpora in Linguistics
7 pages
Wireless Sensor Networks & IoT Course Guide
No ratings yet
Wireless Sensor Networks & IoT Course Guide
9 pages
Data Privacy and Anonymization Techniques
No ratings yet
Data Privacy and Anonymization Techniques
59 pages
AWS Solutions Architect Design Scenarios
No ratings yet
AWS Solutions Architect Design Scenarios
18 pages
Engine Inspection Maintenance Guidelines
100% (1)
Engine Inspection Maintenance Guidelines
10 pages
TOEFL Test N°1 Answer Key for Data Science
No ratings yet
TOEFL Test N°1 Answer Key for Data Science
8 pages
Adopt Me Trading Guide and Values
No ratings yet
Adopt Me Trading Guide and Values
1 page
B.Tech Engineering Courses Overview
No ratings yet
B.Tech Engineering Courses Overview
200 pages
Social Media's Impact on Communication
No ratings yet
Social Media's Impact on Communication
3 pages
Understanding Data Leakage in ML
No ratings yet
Understanding Data Leakage in ML
28 pages
Context-Driven Services Integration Guide
No ratings yet
Context-Driven Services Integration Guide
18 pages
Arduino Water Level Controller System
100% (2)
Arduino Water Level Controller System
8 pages
Python Basics Cheatsheet Guide
No ratings yet
Python Basics Cheatsheet Guide
3 pages
Boundary Wall Construction Guidelines
No ratings yet
Boundary Wall Construction Guidelines
25 pages
Vikram Showtimes Confirmation
No ratings yet
Vikram Showtimes Confirmation
2 pages
IST 346 - Chapter 3
No ratings yet
IST 346 - Chapter 3
28 pages
ICT Grade 9 Mie Book PDF
85% (13)
ICT Grade 9 Mie Book PDF
252 pages
HSSC-I Statistics Model Paper 2023
No ratings yet
HSSC-I Statistics Model Paper 2023
3 pages
SAP MM Data Migration Guidelines
100% (1)
SAP MM Data Migration Guidelines
32 pages
AIB Contact Information and Support
No ratings yet
AIB Contact Information and Support
1 page
Evolution of Computing and Communication
No ratings yet
Evolution of Computing and Communication
11 pages
Indigenous 5G Antenna Design Challenges
No ratings yet
Indigenous 5G Antenna Design Challenges
41 pages
AI-Powered Environmental Monitoring System
No ratings yet
AI-Powered Environmental Monitoring System
9 pages