0% found this document useful (0 votes)
7 views

Unit IV

Data Visualization
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit IV

Data Visualization
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

Unit IV

Data Visualization
• Data visualization is the graphical representation of information and
data. By using visual elements like charts, graphs, and maps, data
visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.
• Additionally, it provides an excellent way for employees
or business owners to present data to non-technical
audiences without confusion.
• In the world of Big Data, data visualization tools and
technologies are essential to analyze massive amounts
of information and make data-driven decisions.
Data Visualization
Advantages of data visualization
• Easily sharing information.
• Interactively explore opportunities.
• Visualize patterns and relationships.
Disadvantages
• Biased or inaccurate information.
• Correlation doesn’t always mean causation.
• Core messages can get lost in translation.
Why data visualization is important?
• it helps people see, interact with, and better understand data.
Whether simple or complex, the right visualization can bring everyone
on the same page, regardless of their level of expertise.
• Every STEM field benefits from understanding data—and
so do fields in government, finance, marketing, history,
consumer goods, service industries, education, sports,
and so on.

Why data visualization is important?
• Data Visualization Discovers the Trends in Data
Why data visualization is important?
• Data Visualization Provides a Perspective on the Data
Why data visualization is important?
• Data Visualization Puts the Data into the Correct Context
Why data visualization is important?
• Data Visualization Saves Time
Why data visualization is important?
• Data Visualization Tells a Data Story
General Types of Visualizations
• Chart: Information presented in a tabular, graphical form with data
displayed along two axes. Can be in the form of a graph, diagram, or map.
• Table: A set of figures displayed in rows and columns.
• Graph: A diagram of points, lines, segments, curves, or areas that
represents certain variables in comparison to each other, usually along two
axes at a right angle.
• Geospatial: A visualization that shows data in map form using different
shapes and colors to show the relationship between pieces of data and
specific locations.
• Infographic: A combination of visuals and words that represent data.
Usually uses charts or diagrams.
• Dashboards: A collection of visualizations and data displayed in one place
to help with analyzing and presenting data.
Categories of Data Visualization
Numerical Data
• Numerical data is also known as Quantitative data. Numerical data is
any data where data generally represents amount such as height,
weight, age of a person, etcNumerical data is categorized into two
categories :
• Continuous Data –
• It can be narrowed or categorized (Example: Height measurements).
• Discrete Data –
• This type of data is not “continuous” (Example: Number of cars or children’s a household
has).
• The type of visualization techniques that are used to represent
numerical data visualization is Charts and Numerical Values. Examples
are Pie Charts, Bar Charts, Averages, Scorecards, etc.
Categorical Data
• Categorical data is also known as Qualitative data. Categorical data is any data where
data generally represents groups. It simply consists of categorical variables that are
used to represent characteristics such as a person’s ranking, a person’s gender, etc.
Categorical data visualization is all about depicting key themes, establishing
connections, and lending context. Categorical data is classified into three categories :
• Binary Data –
• In this, classification is based on positioning (Example: Agrees or Disagrees).
• Nominal Data –
• In this, classification is based on attributes (Example: Male or Female).
• Ordinal Data –
• In this, classification is based on ordering of information (Example: Timeline or processes).
• The type of visualization techniques that are used to represent categorical data is
Graphics, Diagrams, and Flowcharts. Examples are Word clouds, Sentiment Mapping,
Venn Diagram, etc.
Top Data Visualization Tools
• The following are the 10 best Data Visualization Tools
• Tableau
• Looker
• Zoho Analytics
• Sisense
• IBM Cognos Analytics
• Qlik Sense
• Domo
• Microsoft Power BI
• Klipfolio
• SAP Analytics Cloud
Generating Data
• Enter Data Manually in Editor Window
• Read Data from Clipboard
• Prepare Data using sequence of numeric and character values
• Generate Random Data
• Create Categorical Variables
Matplotlib
• Matplotlib is a low level graph plotting library in python
that serves as a visualization utility.
• Matplotlib was created by John D. Hunter.
• Matplotlib is open source and we can use it freely.
• Matplotlib is mostly written in python, a few segments
are written in C, Objective-C and Javascript for Platform
compatibility.
Installation
Matplotlib Pyplot
• Most of the Matplotlib utilities lies under the pyplot submodule, and
are usually imported under the plt alias
Matplotlib Pyplot
Simple line plots
• First import Matplotlib.pyplot library for plotting functions. Also,
import the Numpy library as per requirement.
• Then define data values x and y.
Labeling
Multiple charts
Multiple plots on the same axis
Fill the area between two plots
Matplotlib Markers
Matplotlib Adding Grid Lines
Random Walk
• A random walk is a mathematical object, known as a stochastic or
random process, that describes a path that consists of a succession of
random steps on some mathematical space such as the integers.
• In mathematics, a random walk, sometimes known as a drunkard's walk,
is a random process that describes a path that consists of a succession of
random steps on some mathematical space.
• An elementary example of a random walk is the random walk on the
integer number line which starts at 0, and at each step moves +1 or −1
with equal probability.
• Other examples include the path traced by a molecule as it travels in a
liquid or a gas
One-dimensional random walk
• A marker is placed at zero on the number
line, and a fair coin is flipped. If it lands on
heads, the marker is moved one unit to
the right. If it lands on tails, the marker is
moved one unit to the left. After five flips,
the marker could now be on -5, -3, -1, 1,
3, 5. With five flips, three heads and two
tails, in any order, it will land on 1.
2D Random Walk
• The idea behind the random walk in 2D is the exact same as in one
dimension. Now the movement of the object is no longer restricted to
up/down. Instead, it can move to the left or right too.

• In the 2D case, you are not plotting against time anymore. Instead, it
is possible to visualize the walk by plotting the x, and y coordinate
pairs into the graph. This draws the 2D path the object took with n
steps.
3D Random Walk
• Now that you have the 1D and 2D random walks working, let’s finally
implement the 3D one.

• The idea is exactly the same as in 2D, but now you can move
up/down, left/right, and also inward/outward.
JSON
• JSON is JavaScript Object Notation. It means that a script (executable)
file which is made of text in a programming language, is used to store
and transfer the data.
• Python supports JSON through a built-in package called JSON. To use
this feature, we import the JSON package in Python script.
• The text in JSON is done through quoted-string which contains the
value in key-value mapping within { }.
Serializing JSON
• The process of encoding JSON is usually
called serialization.
• This term refers to the transformation of
data into a series of bytes (hence serial) to
be stored or transmitted across a network.
• To handle the data flow in a file, the JSON
library in Python uses dump() function to
convert the Python objects into their
respective JSON object, so it makes it easy
to write data to files.
Working with API’s
What is an API?
• An API, or Application Programming Interface, is a
server that you can use to retrieve and send data to
using code.
• APIs are most commonly used to retrieve data.
• When we want to receive data from an API, we need to make a
request.
• Requests are used all over the web.
• For instance, when you visited this blog post, your web browser made
a request to the Dataquest web server, which responded with the
content of this web page.
What is an API?
• API requests work in exactly the same way – you make a
request to an API server for data, and it responds to
your request.
Types of API’s
1. WEB APIs
A Web API also called Web Services is an extensively used API over the web and
can be easily accessed using the HTTP protocols. A Web API is an open-source
interface and can be used by a large number of clients through their phones,
tablets, or PCs.
2. LOCAL APIs
In this type of API, the programmers get the local middleware services. TAPI
(Telephony Application Programming Interface), and .NET are common examples
of Local APIs.
3. PROGRAM APIs
It makes a remote program appear to be local by making use of RPCs (Remote
Procedural Calls). SOAP is a well-known pear example of this type of API.
REST API’s
• REST stands for Representational State Transfer, and follows the
constraints of REST architecture allowing interaction with RESTful web
services. It defines a set of functions (GET, PUT, POST, DELETE) that
clients use to access server data. The functions used are:
• GET (retrieve a record)
• PUT (update a record)
• POST (create a record)
• DELETE (delete the record)
Advantages of APIs
• Efficiency: API produces efficient, quicker, and more reliable results than
the outputs produced by human beings in an organization.
• Flexible delivery of services: API provides fast and flexible delivery of
services according to developers’ requirements.
• Integration: The best feature of API is that it allows the movement of data
between various sites and thus enhances the integrated user experience.
• Automation: As API makes use of robotic computers rather than humans,
it produces better and more automated results.
• New functionality: While using API the developers find new tools and
functionality for API exchanges.
Disadvantages of APIs
• Cost: Developing and implementing API is costly at times and requires
high maintenance and support from developers.
• Security issues: Using API adds another layer of surface which is then
prone to attacks, and hence the security risk problem is common in
APIs.
HTTP
• HTTP is a set of protocols designed to enable communication
between clients and servers.
• It works as a request-response protocol between a client and a server.
A web browser may be the client, and an application on a computer
that hosts a website may be the server.
• So, to request a response from the server, there are mainly two
methods:

• GET: To request data from the server.


• POST: To submit data to be processed to the server.
Making API Requests in Python
• In order to work with APIs in Python, we need tools that will make
those requests.
• In Python, the most common library for making requests and working
with APIs is the requests library.
Making Our First API Request
• There are many different types of requests. The most
commonly used one, a GET request, is used to retrieve
data.
• When we make a request, the response from the API comes
with a response code which tells us whether our request
was successful. Response codes are important because
they immediately tell us if something went wrong.
• To make a ‘GET’ request, we’ll use the requests.get() function, which
requires one argument — the URL we want to make the request to.
• The get() function returns a response object. We can use the
response.status_code attribute to receive the status code for our request:
API Status Codes
PUT Http Method
• PUT is a request method supported by HTTP used by the World Wide
Web.
• The PUT method requests that the enclosed entity be stored under
the supplied URI.
• If the URI refers to an already existing resource, it is modified and if
the URI does not point to an existing resource, then the server can
create the resource with that URI.
Post Method
Delete Method
Using Plotly for Interactive Data
Visualization in Python
• Plotly is an open-source module of Python which is used for data
visualization and supports various graphs like line charts, scatter plots,
bar charts, histograms, area plot, etc.
• Plotly has hover tool capabilities that allow us to detect any outliers
or anomalies in a large number of data points.
• It is visually attractive that can be accepted by a wide range of
audiences.
• It allows us for the endless customization of our graphs that makes
our plot more meaningful and understandable for others.
Installation
Overview of Plotly Package
Structure
• in Plotly, there are three main modules –

• plotly.plotly acts as the interface between the local machine and Plotly. It
contains functions that require a response from Plotly’s server.
• plotly.graph_objects module contains the objects (Figure, layout, data, and
the definition of the plots like scatter plot, line chart) that are responsible for
creating the plots.
• The Figure can be represented either as dict or instances of
plotly.graph_objects. Figure and these are serialized as JSON before it gets
passed to plotly.js. Figures are represented as trees where the root node has
three top layer attributes – data, layout, and frames and the named nodes
called ‘attributes’.
Line chart
• A line chart is one of the simple plots where a line is drawn to shoe
relation between the X-axis and Y-axis. It can be created using the
px.line() method with each data position is represented as a vertex
(which location is given by the x and y columns) of a polyline mark in
2D space.
Bar Chart
• A bar chart is a pictorial representation of data that presents
categorical data with rectangular bars with heights or lengths
proportional to the values that they represent. In other words, it is
the pictorial representation of dataset. These data sets contain the
numerical values of variables that represent the length or height. It
can be created using the px.bar() method.
Scatter Plot
• A scatter plot is a set of dotted points to represent individual pieces of
data in the horizontal and vertical axis. A graph in which the values of
two variables are plotted along X-axis and Y-axis, the pattern of the
resulting points reveals a correlation between them. it can be created
using the px.scatter() method.
3D Scatter Plot
• 3D Scatter Plot can plot two-dimensional graphics that can be
enhanced by mapping up to three additional variables while using the
semantics of hue, size, and style parameters. All the parameter
control visual semantic which are used to identify the different
subsets. Using redundant semantics can be helpful for making
graphics more accessible. It can be created using the scatter_3d
function of plotly.express class.
Histogram
• A histogram is basically used to represent data in the form of some
groups. It is a type of bar plot where the X-axis represents the bin
ranges while the Y-axis gives information about frequency. It can be
created using the px.histogram() method.
Pie Chart
• A pie chart is a circular statistical graphic, which is divided into slices
to illustrate numerical proportions. It depicts a special chart that uses
“pie slices”, where each sector shows the relative sizes of data. A
circular chart cuts in a form of radii into segments describing relative
frequencies or magnitude also known as a circle graph. It can be
created using the px.pie() method.
Box Plot
• A Box Plot is also known as Whisker plot is created to display the
summary of the set of data values having properties like minimum,
first quartile, median, third quartile and maximum. In the box plot, a
box is created from the first quartile to the third quartile, a vertical
line is also there which goes through the box at the median. Here x-
axis denotes the data to be plotted while the y-axis shows the
frequency distribution. It can be created using the px.box() method
Violin Plot
• Violin Plot is a method to visualize the distribution of numerical data
of different variables. It is similar to Box Plot but with a rotated plot
on each side, giving more information about the density estimate on
the y-axis. The density is mirrored and flipped over and the resulting
shape is filled in, creating an image resembling a violin. The advantage
of a violin plot is that it can show nuances in the distribution that
aren’t perceptible in a boxplot. On the other hand, the boxplot more
clearly shows the outliers in the data. It can be created using the
px.violin() method.

You might also like