Data visualization
Data visualization is a powerful tool for enhancing
understanding and communication of complex data. It
involves representing data in a graphical or pictorial form,
making it easier to understand and interpret.
The importance of data visualization in effectively
communicating and analyzing data to provide insights into the
various types of data visualization tools and techniques
available.
Data Visualization Tools : Data visualization tools can be
broadly classified into three categories: spreadsheets, data
visualization software, and programming libraries.
Spreadsheets - Spreadsheets, such as Microsoft Excel and
Google Sheets, are one of the most common data visualization
tools used in various [Link] provide basic data
visualization capabilities, such as bar charts, line graphs, and
scatter plots.
Data Visualization Software - Data visualization software is
a specialized tool designed for data visualization and analysis.
Examples of data visualization software include Tableau,
QlikView, and Power BI. These tools provide advanced data
visualization capabilities, including interactive dashboards,
heat maps, and network diagrams.
Programming Libraries - Programming libraries, such as
Matplotlib, ggplot2, and [Link], are a type of data visualization
tool that can be used to create custom data visualizations.
Foundations of Data Visualization
Visualization Process Details:
Data preprocessing and transformation: The starting
point is to process the raw data into something usable by
the visualization system. The first part is to make sure
that the data are mapped to fundamental data types for
computer ingestion.
The second step entails dealing with specific application
data issues such as missing values, errors in input, and
data too large for processing. The data may be simulated
or sampled.
Mapping for visualizations : Once the data are clean,
we can decide on a specific visual representation. This
requires representation mappings.
Rendering transformations :The final stage involves
mapping from geometry data to the image. This includes
interfacing with a computer graphics Application
Programmer’s Interface (API).We need to select the
viewing parameters, shading technique if 3D, device
transformations (for display, printers, . . . ). This stage of
the pipeline is very dependent on the underlying graphics
library.
Expressiveness : An expressive visualization presents all
the information, and only the information Expressiveness
thus measures the concentration of information. Given
information that we actually display to the user.
Effectiveness : A visualization is effective when it can be
interpreted accurately and quickly and when it can be
rendered in a cost-effective manner. Effectiveness thus
measures a specific cost of information perception.
Data visualization techniques : Data visualization
techniques are used to represent data in a graphical or
pictorial form, making it easier to understand and
interpret. There are several types of data visualization
techniques, including:
Bar charts are useful for comparing data across
categories, where the length of each rectangular bar
represents the magnitude of a particular data point.
Line graphs are used to visualize trends over time, with a
series of points connected by lines where each point
represents a data point for a particular period.
Scatter plots, on the other hand, are used to visualize the
relationship between two variables. A series of points
represents a pair of values for two variables.
Heat maps are helpful in visualizing the distribution of
values in a matrix or table. The colour of each cell in a
coloured grid represents the magnitude of a particular
data point.
Network diagrams are useful in visualizing relationships
between entities, consisting of nodes representing entities
and edges representing relationships, where the size and
colour of nodes and edges can represent data attributes.
Different types of visualizations
General Types of Visualizations:
Chart: Information presented in a tabular, graphical form
with data displayed along two axes. Can be in the form of a
graph, diagram, or map.
Table: A set of figures displayed in rows and columns.
Graph: A diagram of points, lines, segments, curves, or areas
that represents certain variables in comparison to each other,
usually along two axes at a right angle.
Geospatial: A visualization that shows data in map form
using different shapes and colors to show the relationship
between pieces of data and specific locations.
Infographic: A combination of visuals and words that
represent data. Usually uses charts or diagrams.
Dashboards: A collection of visualizations and data displayed
in one place to help with analyzing and presenting data.
More specific examples
• Area Map: A form of geospatial visualization, area maps
are used to show specific values set over a map of a country,
state, county, or any other geographic location. Two common
types of area maps are choropleths and isopleths.
• Bar Chart: Bar charts represent numerical values
compared to each other. The length of the bar represents the
value of each variable.
• Box-and-whisker Plots: These show a selection of
ranges (the box) across a set measure (the bar).
• Bullet Graph: A bar marked against a background to
show progress or performance against a goal, denoted by a
line on the graph.
• Gantt Chart: Typically used in project management,
Gantt charts are a bar chart depiction of timelines and tasks.
• Heat Map: A type of geospatial visualization in map
form which displays specific data values as different colors
(this doesn’t need to be temperatures, but that is a common
use).
• Highlight Table: A form of table that uses color to
categorize similar data, allowing the viewer to read it more
easily and intuitively.
• Histogram: A type of bar chart that split a continuous
measure into different bins to help analyze the distribution.
• Pie Chart: A circular chart with triangular segments that
shows data as a percentage of a whole.
• Treemap: A type of chart that shows different, related
values in the form of rectangles nested together.
Importance of Data Visualization
Data visualization is essential for understanding and
communicating information effectively. Here are some key
reasons why it's important:
1. Simplifies Complex Data: It turns large and complicated
data into visual formats like charts and graphs, making the
information easier to understand.
2. Reveals Patterns and Trends: It helps identify trends,
relationships, and patterns that are not easily seen in raw
data or tables.
3. Saves Time: Visuals allow quicker interpretation of data,
helping users spot key information at a glance instead of
manually scanning through numbers.
4. Improves Communication: It makes it easier to explain
data insights to others, especially those who may not be
familiar with the technical details.
5. Tells a Clear Story: Data visuals guide the audience
through the information step-by-step, making it easier to
reach conclusions and make informed decisions.
6. Faster Decision Making
If the data communicates well, decision-makers can quickly
take action based on the new data insights, accelerating
decision-making, and business growth simultaneously.
7. Making Sense of Complicated Data
Data visualization allows business users to gain insight into
their vast amounts of data. It benefits them to recognize
new patterns and errors in the data. Making sense of these
patterns helps the users pay attention to areas that indicate
red flags or progress.
Visualization Foundation
1. Defining Clear Objectives
2. Selecting appropriate Visual Representation.
3. Keep it simple
4. Comprehensive Labelling
5. Careful Colour Usage.
[Link] Comparison
7. Creating a Narrative
8. Including Interactive Elements
9. Iterative Refinement
10. Audience – Centric Approach
11. Ethical Consideration
12. Continuous Learning
13. Selecting Right Visualization Tools
14. Accessibility
Benefits of Data Visualization
Solves data inefficiencies and absorbs vast amounts of
data presented in visual formats.
Identifies errors and inaccuracies in data quickly.
It promotes storytelling and conveys the right message to
the audience.
Stay on top of the game by discovering the latest trends.
Increases the speed of decision-making.
Access real-time information and assist in management
functions.
Explore business insights and achieve business goals in
the right direction
Optimize and instantly retrieve data via tailor-made
reports.
Power BI
Power BI is one of the most popular Data
Visualization and Business Intelligence tool developed
by Microsoft. The Power BI tool is the collection of
apps, data connectors, and software services which are
used to get the data from different data sources,
transforms data, and produces useful reports.
Power BI services are based on SaaS and mobile
Power BI apps that are available for different
platforms. These set of services are used by the
business users to consume data and to build Power BI
reports.
Features of Power BI
Data connection and preparation
Data connectivity: Connect to numerous data sources,
including databases, cloud services, and spreadsheets.
Power Query: A powerful tool for importing, transforming, and
shaping data before analysis.
Dataflows: Create reusable data preparation processes that can
be used across different reports and dashboards.
Reporting and visualization
Interactive dashboards: Create and share dynamic reports with
rich visualizations like charts, graphs, and maps.
Custom visuals: Build custom visualizations using languages
like R and Python.
AI-powered visuals: Generate reports and insights
automatically using AI.
Mobile access: View and interact with reports on the go through
the Power BI mobile app.
Collaboration and sharing
Workspaces: Organize and collaborate on projects with teams
in a shared workspace.
Publishing: Distribute reports and dashboards as apps for end-
users to consume.
Real-time collaboration: Work with others on dashboards in
real-time to accelerate decision-making.
Advanced analytics and features
Natural language Q&A: Ask questions about your data in
conversational language to get instant answers and create
reports.
DAX(Data Analysis Expressions): Write complex calculations
using the DAX formula language.
Data modelling: Build relationships between different data
tables to create a unified data model.
Automatic refresh: Schedule automatic data refreshes to keep
reports and dashboards up-to-date.
Security and integration
Security: Implement row-level security, sensitivity labels, and
audit logs to control access and protect data.
Microsoft integration: Integrate seamlessly with other
Microsoft products like Excel, Teams, and Dynamics 365.
Power BI Advantages
1. Secure Report Publishing: You can automate setup data
refresh and publish reports that allowing all the users to avail
the latest information.
2. No Memory and Speed Constraints: To Shift an
existing BI system into a powerful cloud environment with
Power BI embedded eliminates memory. Speed constraints
ensure that data is quickly retrievable and analyzed.
3. No Specialized Technical Support required: The
Power BI provides quick inquiry and analysis without the
need for specialized technical support. It also supports a
powerful natural language interface and the use of intuitive
graphical designer tools.
4. Simple to Use: Power BI is simple to use. Users can
easily find it only on behalf of a short learning curve.
5. Constant innovation: The Power BI product is updated in
every month with new functions and features.
6. Rich, personalized dashboard: The crowning feature of
Power BI is the information dashboards that can be
customized to meet the exact need of any enterprise. You can
easily embed the dashboards, and BI reports in the
applications to provide a unified user experience.
Variants of Power BI
Power BI, a business analytics service by Microsoft, offers
several scalable options to meet the needs of organizations of
various sizes
Power BI Desktop: This is a free, standalone application
for designing and creating Power BI reports and
dashboards on a local machine. It's suitable for individual
users and small-scale projects.
Power BI Service: This is a cloud-based service that
allows you to publish, share, and collaborate on Power
BI reports and dashboards.
Power BI Pro: This is a subscription-based license that
allows individual users to create and share reports and
dashboards. It is suitable for small to medium-sized
businesses.
Power BI Premium: This is designed for larger
enterprises with more extensive data needs. It offers
dedicated cloud capacity and enhanced performance.
Power BI Components
[Link] Query: It is used to access, search, and transform
public and internal data sources.
2. Power Pivot: power pivot is used in data modelling for
in-memory analytics.
3. Power View: By using the power view, you can analyse,
visualize, and display the data as an interactive data
visualization.
4. Power Map: It brings the data to life with interactive
geographical visualisation.
5. Power BI Service: You can share workbook and data
views which are restored from on-premises and cloud-
based data sources.
6. Power BI Q&A: You can ask any questions and get an
immediate response with the natural language query.
7. Data Management Gateway: you get periodic data
refreshers, expose tables, and view data feeds.
8. Data Catalog: By using the data catalog, you can
quickly discover and reuse the queries.
Power BI Architecture
Power BI architecture is an Azure-based solution. This system
can connect to a variety of data sources. You can use Power
BI Desktop to produce reports and data visualizations based
on the dataset. To obtain continuous data for reporting and
analytics, the Power BI gateway is linked to on-premise data
sources.
The cloud services are used to publish Power BI reports
and data visualizations. You can connect to their data from
anywhere by using Power BI mobile apps. Apps for Power BI
are available for Windows, iOS, and Android.
1. Data Sourcing and Integration: The data is Extracted
from different sources such as excel, spreadsheet, CSV
files, and different Databases. They can come from
different servers. The data from various sources can be in
different types and formats.
If you import the file into the Power BI, it compresses the
data sets up to 1GB, and it uses a direct query if the
compressed data sets exceed more than [Link] the data
is integrated into a standard format and stored at a place
called a staging area. There are two choices for big data
sets. They are as follows:
Azure Analytics Services
Power BI premium.
2. Data Transforming:
Integrated data is not ready to visualize data because the
data should be transformed. To transform the data, it should
be cleaned or pre- processed. For example, redundant or
missing values are removed from the data sets. After data is
pre-processed or cleaned, business rules are applied to
transform the data. After processing the data, it is loaded
into the data warehouse.
3. Reporting & Publishing:
After sourcing and cleaning the data, you can create the
reports. Reports are the visualization of the data in the form
of slicers, graphs, and charts. Power BI offers a lot of
custom visualization to create the reports. After creating
reports, you can publish them to power bi services and also
publish them to an on-premise power bi server.
4. Creating Dashboards:
You can create dashboards after publishing reports to Power
BI services, by holding the individual elements. The visual
retains the filter when the report is holding the individual
elements to save the report. Pinning the live report page
allows the dashboard users to interact with the visual by
selecting slicers and filters.
Note: Power BI service is the Software as a Service (SaaS)
part of Power Bl. It is also known as Power BI Online. To
access Power BI Service, you need to log in to Power BI
service.
Power BI Tools
1. Power BI Desktop: It is a primary authority and
publishing tool. Power BI users and developers use it to create
brand new models and reports.
Power BI desktop tool is available at free of cost.
2. Power BI Service: The power BI data modules,
dashboards, and reports are hosted in the online software as a
service (SaaS). Sharing, administration, and collaboration
happen in the cloud.
Power BI service is available at the pro license, and the user
has to pay $10 per month.
3. Power BI data gateway: It works as the bridge between
the Power BI service on-premises data sources such as Import,
Direct Query, and Live Query.
4. Power BI Report Server: It hosts paging reports,
mobile reports, KPIs, and Power BI desktop reports. It
requires updates in every four months and managed by
the IT team.
5. Power BI Mobile Apps: It is available for Android,
iOS, and windows. Microsoft in tune manages it by
using this tool. You can view reports and dashboards on
the Power BI Service Report Server
Dashboard : Dashboard is a collection which contains zero or
more tiles and widgets. Each tile displays a single
visualization that was created from a dataset and pinned to the
dashboard. Each dashboard is used to represent a customized
view of some subset of the underlying dataset
Reports:-A Power BI report is one or more pages of
visualizations (charts, graphs and images).Reports can be
created from scratch, imported to dashboard and created using
datasets.
Datasets:-A dataset is something that you import or connect
to. Datasets can be refreshed, renamed, explored and
removed. Each listed dataset represents a single source of
data.
Power BI Data Access
In Power BI, data access refers to the ability to connect to and
retrieve data from various sources for the purpose of creating
reports and dashboards. Power BI provides a wide range of
options for data access to accommodate different types of data
sources.
1. Data Sources
Files
Excel (.xlsx, xlxm)- a workbook can have data entered
manually or data which is queried and loaded from external
data sources. Data can be in simple worksheets or loaded into
a data model
Comma Separated Value (.csv) - Files are simple text files
with rows of data. Each row can contain one or more values,
each separated by a comma
Power BI Desktop (.pbi) -You can use Power BI Desktop to
query and load data from external data sources, extend your
data model with measures and relationships, and create
[Link] can import your Power BI Desktop file into your
Power BI site.
Databases
Databases in the Cloud – From the Power BI service,
you can connect live to Azure SQL Database, Azure SQL
Data Warehouse ,etc. Connections from Power BI to
these databases are live.
Databases on-premises – From the Power BI service,
you can connect directly to SQL Server Analysis
Services Tabular model databases. A Power BI Enterprise
Gateway is required.
Online services
It including web services, APIs, and online platforms like
SharePoint, Dynamics 365, Salesforce, and more.
2. Data Connections
Power BI offers different methods to connect to data sources,
such as:
Direct Query: This allows Power BI to send queries
directly to the data source in real-time, without importing
the data into Power BI.
Import: Data can be imported into Power BI, and reports
are created based on the imported data.
3. Power Query Editor
Power Query is a powerful data transformation tool
within Power BI that enables users to connect to various
data sources, transform and shape the data, and then load
it into the Power BI model.
It supports cleaning, transforming, and merging data
from different sources before loading it into the Power BI
model.
4 . Gateways:
Power BI Gateway is a tool that allows for secure and
efficient communication between Power BI and on-premises
data sources.
5 Data Refresh: For datasets that are regularly updated,
Power BI provides data refresh options to keep the data in
reports up-to-date. The refresh can be scheduled for automatic
updates.
Spatial DATA: Spatial data, refers to data that includes
information about specific locations on the Earth's surface.
A pair of latitude and longitude coordinates defines a specific
location on earth.
EX: locations of various cities around the world
Spatial data are of two types according to the storing
technique, namely, raster data and vector data.
Raster data are composed of grid cells identified by row
and column. The whole geographic area is divided into
groups of individual cells, which represent an image.
Satellite images, photographs, scanned images, etc., are
examples of raster data.
Vector data are composed of points, polylines, and
polygons. Wells, houses, etc., are represented by points.
Roads, rivers, streams, etc., are represented by polylines.
Villages and towns are represented by polygons.
Geospatial Data
Geospatial data, or geodata, is data that includes information
related to locations on the Earth's surface. You can map
objects, events, and other real-world phenomena to a specific
geographical area identified by latitude and longitude
coordinates.
Geospatial data is a subset of spatial data and is specifically
related to the Earth's surface and its features. It involves the
use of geographic information systems (GIS) to analyze and
interpret spatial data.
Examples of geospatial data include weather maps, traffic and
accident data. This information has a geographic component
that can tie it to an address or relative location.
EX: Additional information such as the population, area,
and elevation of each city.
Visualization Techniques for Spatial Data, Geospatial
Data, Time-Oriented Data, Multivariate Data:
1. Choropleth Maps:
Choropleth map uses different colourings, colour shades,
or pattern fills in various areas of a map to represent the value
or range of values of some variable. Colour shades are
patterned differently for the represented geographic areas. The
shaded areas are patterned according to statistics and values
that are displayed on the map.
Ex: Population density in USA , here the states with
lower population density are shaded with lighter Gray Colour
and Higher population density are shaded with Dark gray
colour. The states with population densities are between two
extremes are shaded on a continuum from the lightest Gray to
darkest gray.
2. Heat maps :
Heat maps use colours to communicate numeric data by
varying the underlying values. Heat maps can be used to
represent large sets of spatial data. Data values are
represented red, green, and blue (RGB). In this way, the
colour variations represent continuous changes in the
values on the map.
The main purpose of heatmap visualization is to uncover
patterns, trends, and relationships within the data by
highlighting areas of high or low values, and lowest
value in the heat map to dark blue, the highest value to a
bright red.
3. Dot map : Dot maps, alternatively known as point maps,
use dots to represent the data. Dots are placed into an
approximate location where the data value was obtained. In
general, dots are equally sized and are placed based on the
geographical location. The user can obtain information by
hovering around the dot on the map.
4 Bubble map :
In a bubble map, a data source is properly represented
with a bubble (circle) of variable. The size of the bubble
represents the number of cases at a particular location.
Bubbles are distributed over the map according to their
geographical coordinates. Bubble maps are simple and easy to
read.
bubble maps can show several pieces of information in a
single view by using a different colour for each type of
information.
5. Topographic map :
A topographic map refers to a detailed, graphical, and
accurate representation of features that appear on the
Earth’s surface.
A topographic map is a map that represents the
locations of geographical features. geographical
features can be mountains, valleys, plain surfaces,
water bodies and many more.
6. Flow map : Flow maps, also known as ‘path’ maps, are
more specialized versions of line maps. Instead of focusing
on physical features of the earth, they are used to represent
the movement of things across the earth over time. Flow
maps depict the movement of people, goods, or information
between different locations.
7. Cartograms: Cartograms distort the geographic boundaries
of regions based on a specific variable (population, GDP,
etc.).
8. Proportional Symbol Maps:
Proportional symbol maps use symbols of different sizes to
represent the quantity of a variable at specific locations.
The size of the symbol corresponds to the magnitude of the
variable.
Proportional Symbol Maps:
Multivariate Data:
Multivariate data contains, at each sample point, multiple
scalar values that represent different simulated or
measured quantities.
Multivariate analysis takes place when you have a data
set with 4 or more dependent variables which are to be
examined against an independent variable or variables
Visualization Techniques for Multivariate Data:
Scatter Charts, Bubble Charts, Line Charts with Multiple
Axes, Stacked Bar/Column Charts, Heat Maps, Water fall
Charts, Radar Charts, Tree Maps etc.
Time -oriented Data: It is a sequence of data points that
occur in successive order over some period of time.
Visualization Techniques:
Bar charts:
Bar charts are useful for comparing data across
categories, where the length of each rectangular bar represents
the magnitude of a particular data point.
Line Charts :
Line Charts are used to visualize trends over time, with a
series of points connected by lines where each point
represents a data point for a particular period.
X-Axis: Represents the independent variable or the
domain (e.g., time, distance, categories).
Y-Axis: Represents the dependent variable or the range
(e.g., values, quantities).
Scatter plots:
Scatter plots, on the other hand, are used to visualize the
relationship between two variables. A series of points
represents a pair of values for two variables.
Heat maps
Heat maps are helpful in visualizing the distribution of values
in a matrix or table. The colour of each cell in a coloured grid
represents the magnitude of a particular data point.
Area Charts:
Area charts are similar to line charts but fill the area below the
line, providing a visual representation of the cumulative
values over time.
Useful for comparing multiple time series on the same chart.
Power Query
Power Query is a powerful tool by Microsoft for data
connection and transformation, which is widely used in Excel
and Power BI. It does not need any advanced coding skills
and helps users in importing, cleaning, reshaping, and
combining data from various sources. Power Query makes
data preparation easy and repeatable, even if you are working
with messy spreadsheets, databases, web pages, or cloud
services.
There are four key stages in the working of Power Query:
1. Connect
Power Query starts by connecting to your data source. The
data source can be Excel files, CSV files, databases like SQL
Server and online services like SharePoint or Salesforce. You
can simply select the source, and Power Query will establish a
link to fetch the raw data.
2. Transform
When the data is once connected, Power Query loads the data
into a preview window known as the Power Query Editor. In
this preview Window you can shape your data according to
your needs. You can remove columns, filter rows, change data
types, merge tables, pivot/unpivot data, split columns, and
much more. It records each transformation when you apply it
step by step in the background. For doing this Power Query
uses a scripting language called M language (or M Code).
3. Combine
In this step you can combine multiple data sources. For
instance, if you want to append monthly sales files from
January to December into a single unified table or if two
tables share a common field, you can merge them to create a
more complete dataset.
4. Load
By this step, your data will be transformed and ready then,
Power Query loads it into Excel or Power BI. Here you have a
choice to load it directly into a worksheet or into the data
model for further analysis. You can do it by using PivotTables,
dashboards, or visualizations.
There is one more step named as Automation (optional). This
is recognized as one of the biggest strengths of Power Query.
Once you have defined your steps, you can refresh your data
with just one click. After that Power Query will reapply all the
transformations to the new or updated data automatically.
Benefits of Power Query
There are many benefits of Power Query in data analysis and
preparation. All these benefits and other aspects make it a
useful tool for analysts. Some of the main benefits that one
cannot simply ignore are
1. Graphical User Interface (GUI)
It has an intuitive graphical user interface. Individuals can use
a GUI without having extensive programming knowledge.
Anyone can interactively create and customize data
transformation steps.
2. Ease of Data Import
This data transformation tool automates data importing from
different sources, including databases, web pages, files and
clouds. This means there is no need for manual data entry like
copy and past.
3. Data Integration
It can connect with many types of data sources to extract
information from them. It combines all this information into a
single dataset that gives reporting and analysis across diverse
databases.
4. Advanced Capabilities with M Language
M is the main programming language used in this data
modelling tool. It gives many data manipulation capabilities
like complicated data transformations. These transformations
include grouping, aggregation, pivoting, unpivoting, merging,
error handling, etc.
5. Integration with Excel & Power BI
This tool can smoothly integrate with Excel and Power BI.
This integration gives the capability to build dynamic reports
and dashboards using its data.
6. Data Refresh & Automation
Individuals can schedule automatic data refreshes on this tool.
This way, they can make sure their analysis process is perfect
and use the latest data. It is the best approach for dynamic
data sources.
7. Improved Productivity
The task automation capabilities of this tool save time and
effort for analysts. This way, they can do the same work in
less time to focus on more important tasks like analysis and
decision-making