0% found this document useful (0 votes)
55 views5 pages

703 (A) Data Visualization Unit-1 Notes

Uploaded by

coc7987515756
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views5 pages

703 (A) Data Visualization Unit-1 Notes

Uploaded by

coc7987515756
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chameli Devi Group of Institutions

Department of Artificial Intelligence & Data Science

Subject Name: Data Visualization Subject Code: 703 (A)


Subject Notes
Syllabus: Introduction to Data Visualization

Overview of data visualization, Definition, Significance in AI and Data Science, Principal of Data Visualization,
Methodology, Applications, Data pre-processing for visualization: Extraction, Cleaning, Transformation, Aggregation,
Data Integration, Data Reduction.
__________________________________________________________________________________________
Course Outcome (CO1): Understand the representation of complex and voluminous data.
Unit-I
Overview of data visualization-

Data visualization is a graphical representation of quantitative information and data by using visual elements
like graphs, charts, and maps. Data visualization convert large and small data sets into visuals, which is easy to
understand and process for humans.Data visualization tools provide accessible ways to understand outliers,
patterns, and trends in the data. In the world of Big Data, the data visualization tools and technologies are
required to analyze vast amounts of information.

Data visualizations are common in your everyday life, but they always appear in the form of graphs and charts.
The combination of multiple visualizations and bits of information are still referred to as Infographics.

Data visualizations are used to discover unknown facts and trends. You can see visualizations in the form of
line charts to display change over time. Bar and column charts are useful for observing relationships and
making comparisons. A pie chart is a great way to show parts-of-a-whole. And maps are the best way to share
geographical data visually.

Today's data visualization tools go beyond the charts and graphs used in the Microsoft Excel spreadsheet,
which displays the data in more sophisticated ways such as dials and gauges, geographic maps, heat maps, pie
chart, and fever chart.

Effective data visualization are created by communication, data science, and design collide. Data visualizations
did right key insights into complicated data sets into meaningful and natural.

American statistician and Yale professor Edward Tufte believe useful data visualizations consist of ?complex
ideas communicated with clarity, precision, and efficiency.
Chameli Devi Group of Institutions
Department of Artificial Intelligence & Data Science

Figure 1.1 Overview of data visualization

Why Use Data Visualization

1. To make easier in understand and remember.


2. To discover unknown facts, outliers, and trends.
3. To visualize relationships and patterns quickly.
4. To ask a better question and make better decisions.
5. To competitive analyze.
6. To improve insights.

Principal of Data Visualization

1. Clarity

The visualization should be clear and easily understood by the intended audience.

2. Simplicity

Keep the visualization simple and avoid unnecessary complexity.

3. Purposeful

Understand what message or insight you want to communicate and design for that purpose.

4. Consistency

Maintain consistency in the design elements throughout the visualization.

5. Contextualization
Chameli Devi Group of Institutions
Department of Artificial Intelligence & Data Science

Provide context for the data being presented.

6. Accuracy

Ensure the visualization accurately represents the underlying data.

7. Visuals Encoding

Choose appropriate visual encodings for the data types you are visualizing.

8. Intuitiveness

Design the visualization to be intuitive and easy to comprehend.

9. Interactivity

Consider adding interactive elements to the visualization, such as tooltips, zooming, filtering, or highlighting.

10. Aesthetics

Although aesthetics are subjective, a visually appealing design can engage viewers and increase their interest in
the data.

11. Accessibility

Accessibility is key; if users can’t read the data, it’s useless.

12. Hierarchy

Work out hierarchy of information early on and always remind yourself of what the purpose of representing the
data is.

Applications of Data Visulaization –

1. Business Intelligence
2. Finance Industries
3. E-commerce
4. Education
5. Data Science
6. Military
7. Healthcare Industries
8. Marketing
9. Real Estate Business
10. Food Delivery Apps
Chameli Devi Group of Institutions
Department of Artificial Intelligence & Data Science

Data pre-processing for visualization

1. Data Cleaning

 Handling Missing Values: Identify and handle missing data by either removing rows/columns with
missing values, imputing them with statistical measures (mean, median, mode), or using more
advanced techniques like K-nearest neighbors.
 Removing Duplicates: Detect and remove duplicate records to ensure that each data point is unique.
 Outlier Detection and Treatment: Identify outliers using statistical methods or visual tools (e.g., box
plots) and decide whether to remove, cap, or transform them.
 Data Type Conversion: Ensure that all data is in the correct format for analysis (e.g., converting strings
to dates, or categorical values to numerical codes).

2. Data Transformation

 Normalization and Scaling: Transform numerical features to a common scale, often between 0 and 1
(normalization) or standardize to have zero mean and unit variance (scaling), to ensure comparability.
 Encoding Categorical Variables: Convert categorical data into numerical format using methods such as
one-hot encoding or label encoding.
 Feature Engineering: Create new features from existing data to enhance model performance and
insights. This can include binning continuous variables, extracting date parts, or combining features.

3. Data Integration

 Combining Data Sources: Merge or join multiple datasets that share common keys or attributes.
Ensure that the combined data is consistent and meaningful.
 Aggregation: Summarize or aggregate data to reduce dimensionality and focus on key insights, using
operations like sum, mean, median, or count.

4. Data Reduction

 Dimensionality Reduction: Reduce the number of features while retaining most of the data's variance.
Techniques include Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor
Embedding (t-SNE).
 Sampling: Reduce the size of large datasets to make processing and visualization more manageable,
ensuring the sample remains representative of the whole dataset.

5. Data Formatting

 Structuring Data for Visualization Tools: Ensure data is in the right format (e.g., long vs. wide format)
for the specific visualization tool or library (e.g., Tableau, Matplotlib, Seaborn).
 Annotating Data: Add labels, annotations, and metadata that may enhance the interpretability of
visualizations.
Chameli Devi Group of Institutions
Department of Artificial Intelligence & Data Science

6. Data Validation and Verification

 Consistency Checks: Ensure data consistency across different sources or time periods.
 Validation: Use validation techniques to ensure data integrity, such as cross-referencing against
trusted sources or using summary statistics to detect anomalies.

7. Preparation for Specific Visualizations

 Filtering Data: Focus on the subset of data relevant to the visualization goals (e.g., filtering by time
range, region, or category).
 Grouping Data: Aggregate data into meaningful groups or categories to provide clearer insights (e.g.,
grouping sales data by quarter, year, or region).

8. Visual Aesthetics Preparation

 Color Coding and Mapping: Prepare data attributes that can be represented by colors (e.g., categorical
data or ranges of values).
 Label Preparation: Ensure labels, titles, and legends are accurate and descriptive to make the
visualization self-explanatory.

By carefully pre-processing your data, you ensure that the visualizations produced are both accurate and
insightful, effectively communicating the story your data is telling.

You might also like