Visualization Combined
Visualization Combined
DADS304
VISUALIZATION
Unit 1
Introduction to Data Visualization
Table of Contents
The graphical representation of data and information is known as data visualization. Data
visualization tools offer an easy approach to observe and analyze trends, outliers, and
patterns in data by utilizing visual elements like charts, graphs, and maps. Additionally, it
offers a great tool for employees or business executives to clearly deliver data to non-
technical audiences.
A bar graph, pie chart, line chart, or other type of visual representation may be used to
graphically display data in data visualization. Visual insights that cannot be produced with
other data presentation methods are made possible by this style of representation. It
facilitates the brain’s ability to comprehend patterns, trends, and outliers and gain insights
into them. The written language or text is not how humans are wired to digest information.
Once we become familiar with a pattern, the brain is very good at recognizing
it. Research and data analysis are greatly accelerated by visualizations, which are also
effective communication tools.
1.1 Objectives:
Nowadays data analysis has become very popular because it gives more importance to
visualization.
Data
Analysis Discover
Better Trends
Market
Analysis
Role of
Data
Visualizat
ion in
Improve Business
Customer Find
Acquisition Analytics Patterns
Create
Decision
Impact on
Making
Audience
Here are listed some of the scenarios where data visualization plays an important role as
shown in Fig.1.1.
1. Better approach for data analysis: Business stakeholders can concentrate on areas
that need attention by analyzing visualization reports containing different graphs,
charts and tables for data comparison and analysis. These visual mediums aid analysts
in comprehending crucial information required for their line of work. Whether it's a
sales report or a marketing strategy, a visual depiction of the data helps businesses
make better analyses and decisions that further enhance business revenues.
2. Quick decision making: People process images more quickly than they do long,
laborious tabular forms or reports. Decision-makers can move fast based on fresh data
insights if the data is well-communicated, accelerating both decision-making and
corporate growth.
3. Discover patterns and trends in data: Business users can utilize data visualizations
to understand their massive data sets. Data analysts benefit from data visualizations
because it helps them spot new patterns and mistakes in the data. The users can focus
on areas that show error flags or progress by making sense of these patterns. This
procedure then propels the company forward. Finding correlations between
independent variable relationships is difficult without data visualization. We can
improve our business decisions if we can make sense of those independent variables.
Although it would seem like an obvious application for data visualizations, this is
actually one of its most beneficial uses. Without the required knowledge from the past
and the present, it is impossible to make forecasts. Trends over time show us where we
have been and where we might go.
4. Highlighting and strengthening impact of message for audiences: Data
visualization increases the impact of your message on your target audiences and
delivers the findings from data research in the most convincing way. It combines the
messaging platforms used by all organizational groups and departments. With the use
of visualization, you may quickly and more effectively make sense of large amounts of
data. It aids in better comprehending the data to assess its impact on the business and
visually conveys the information to all stakeholders, internal participants and external
audiences.
5. Formulating better customer acquisition strategies: Frequency is closely tied to
patterns over time. We can get a clearer sense of how potential new consumers would
behave and respond to various marketing and customer acquisition efforts by looking
at the rate, or how frequently, they make purchases and when they do so. This helps
business personnel in formulating better approaches for customer acquisition.
Self-Assessment Questions - 1
1. Business
Understanding
6. Data 2. Data
Visualization Understanding
3. Data
5. Data Validation Preparation
4. Explortory
Data Analysis
and Data
Modeling
3. Preparing the data: This task entails improving the data quality to the standard
required by the analytic methods you've chosen. In order to do this, one may choose
clean subsets of the data, insert appropriate defaults, or use more ambitious strategies,
including modeling to estimate missing data.
4. Performing exploratory analysis and modeling: Exploratory data analysis is the
crucial process in the preliminary analysis of data in order to find patterns, identify
anomalies, test hypotheses with the help of graphical representations. After completing
Exploratory Data Analysis you will choose the actual modeling technique that will be
employed as the initial modeling phase. It may happen that a tool has already been
chosen during the business understanding step, but now you will have to choose the
specific modelling technique, such as decision-tree construction using C5.0 or neural
network formation with back propagation.
5. Validating your data: The accuracy and generality of the model were two issues that
were addressed in earlier steps of the lifecycle. In this step, you'll evaluate how well the
model satisfies your business objectives and look for any commercial reasons why the
model may be flawed. If time and financial limitations allow, another alternative is used
to test the model(s) or test applications in the real application. The evaluation process
also entails evaluating any additional data mining findings you may have produced. The
outcomes of data mining include models that must be tied to the original business goals
and all other discoveries that may not be related to the goals but may also reveal new
problems, details, or hints for future endeavors.
6. Visualizing and Presenting your findings: The last step of the CRISP-DM lifecycle is
to communicate the evaluated results. It involves determining the best method to
present the insights based on analysis and concerned audience. It begins by creating
dynamic dashboards that highlight the business analysis and creates a greater impact
about the business problems on the audiences. Next, it involves combining insights to
form a compelling story for the business problem and suggesting relevant
recommendations at the end.
Understanding your audience before selecting a visual chart or graph can help you select the
one that will effectively convey your message. The findings you want to communicate to your
audience will have a direct impact on the chart you use. For this some relevant questions can
be put forward, such as:
❖ Do you wish to demonstrate how combining data columns can result in insightful
information?
❖ Do you wish to display some dataset’s data patterns?
❖ Do you wish to demonstrate the comparison of different data variables?
❖ Would you like to illustrate the connections between the data variables?
Selecting a few of these can assist in determining which charts are most appropriate for you.
Choosing the best chart usually needs some experimentation with various charts.
Self-Assessment Questions - 2
4. Can you use data visualization during exploratory data analysis step?
a) Yes
b) No
There are various participants in the process of Data Visualization. But there are some
important audiences for whom the visualization outputs are most important and crucial
because these audiences are directly related to the process of decision-making. These
visualizations help them achieve the goal of quick and reliable decision-making for business
progress. These data visualization audiences are listed below:
Did you know: the popular tools that etc.): More often than the executive management,
are used for data visualization? These this group will likely be the one you work with if you
are: Tableau Dundas BI, JupyteR, Zoho are leading a team of Data Analysts. This
Reports, Google Charts, Visual.ly, RAW. demographic is often experienced, middle-aged, and
IBM Watson, Sisense, Plotly, Data technical skill-above-average. They enjoy going a
Wrapper, Highcharts, Fusioncharts, little deeper into the numbers, but they generally
Power BI and QlikView.
prefer to maintain a high-level perspective. This
category of managers might benefit from using
Tableau since they enjoy exploring the data on their own.
3. Mid-Level Management (Marketing Automation Manager, Sales Development
Manager, etc.): One of the groups you will work with the most frequently is this one if
you are the beginning of your career or have been promoted to work in Data Analytics
team. This group is more technically skilled, younger and comparatively less
experienced in business management. They are more interested in delving into the
figures than other groups because they grew up at the dawn of the information age and
are more keen to learn new technologies for their career progress.
4. Specialized Positions/Individuals (BI Developer, Web Analyst, Customer
Development Representative, etc.): This group is very inexperienced and young, but
they have strong technical aptitude. All reporting and visualizations tools are effective
since they are likely to be understandable to them. They are specialists of tools as they
continuously work with new tools based on the projects and scenarios given to them to
achieve a particular solution.
Activity I
Assume that you are a project manager in company ABC. Do you think data
visualization is more attractive than tables and other format? Justify your answers.
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
---
Charts and graphs are crucial components of working with data because they allow for the
condensing of large amounts of data into a format that is simple to comprehend. Data
visualizations can communicate findings to people who won't see the raw data as well as
reveal insights to someone looking at the data for the first time. There are innumerable chart
types, each with a unique set of applications. Choosing the right sort of chart for the task at
hand is frequently the most challenging aspect of developing a data visualization. The type
of chart you choose will rely on a number of variables: the categories of metrics,
characteristics, or other variables you intend to plot; the type of inferences you want the
audience to draw; and so on.
1. Bar Chart: We can compare numerical quantities like percentages and integers using
bar charts. The value of each variable is represented by the length of each bar. Using
basic, evenly spaced bars or rectangles, bar charts, for instance, display differences in
categories or subcategories scaling width or height. Quantitative measures can be
displayed in bar charts either vertically, on the y-axis, or horizontally, on the x-axis. The
style is determined by the data and the issues the visualization attempts to solve. The
qualitative dimension will follow the axis that runs counterclockwise to the
quantitative metric. Usually, the baseline of a bar chart is zero. To avoid deceiving the
viewer, the axis should be clearly identified if a different beginning point is chosen.
There are plenty of additional bar chart variations. Bar charts that are stacked, side-by-
side, or grouped/ clustered bar charts. Labels and legends enable the audience to
understand and interpret the details represented in bar charts as shown in Fig.1.3.
Fig. 1.3: A vertical bar chart showing the top most populated countries as per 2020 World
Population Data
Fig. 1.4: A horizontal bar chart showing top most populated countries as per 2020 World
Population Data
Activity II
Activity: Make a bar chart that represents exotic pet ownership in UK. The data are:
Activity III
Activity: Consider that you have drawn a bar chart (overview) to compare your
product with another’s product. Is it correct? Justify the answer.
2. Pie Chart: A pie chart is useful for organizing and displaying data by percentage of
the total. In keeping with its moniker, this type of visualization uses a circle to
represent the entire thing and slices, or ‘pies’, of that circle to symbolize the various
categories that make up the whole. A user can compare the relationship between
various dimensions (such as categories, products, people, countries, etc.) within a
particular context using this sort of chart. The numerical data (measure) is typically
divided into percentages of the overall sum on the chart. Each slice is a representation
of the value's percentage, and should be measured as such as shown in Fig.1.6.
❖ Pie charts should be used to illustrate how various components relate to the overall.
❖ They perform best when applied to dimensions with a small number of category
options.
❖ A pie chart can help the data story shine if you need to show that one part of the entire
is overrepresented or underrepresented.
❖ Pie charts are ineffective for comparing precise figures.
❖ You have a sum that can be divided into two to five groups.
❖ There is a big difference between the weight of each category.
❖ Each pie slice needs to be properly labelled and have the correct number or percentage
assigned to it.
❖ To make it simple for the user to compare the slices, the slices should be arranged
according to size, either smallest to biggest or biggest to smallest.
❖ When possible, labels must be given to the slices. Try not to make the visual display too
complex.
❖ If the chart includes more than five slices, make sure to use a legend, list, or table to
provide the reader more context.
❖ If there needs to be a comparison between several categories, think about using a line
chart. Line charts offer a quick overview of the patterns and trends present in various
data sets as well as how they interact with one another.
Fig. 1.6: Pie chart showing land area and density comparison
Self-Assessment Questions - 3
5. Can you draw a pie chart for multiple groups with more than 100 category?
a) Yes
b) No
3. Line Chart: A line chart, also known as a line graph or a line plot, uses a line to link a
group of data points as shown in Fig.1.7. This type of graph uses sequential values to
show trends. The x-axis (horizontal axis) often shows a succession of numbers in a
consecutive order. The values for a chosen metric across that progression are then
provided on the y-axis (vertical axis). When you need to illustrate data across time,
this basic graphic works wonderfully. To create forecasts for the coming year, one use
case would involve tracking customer interest in a particular category of good or
service over the course of the year.
A line graph makes it possible to track a set of data's behavior as these graphs can be used
for purposes other than observing change through time.
These graphs also aid in bringing out variations and connections in your data. A line chart
can also assist a viewer in forecasting potential future events.
Case 1: Consider a line graph that shows how the real estate market in India is seasonal. This
knowledge could be used by a user to research many things before making a property
purchase. They can try to determine the ideal time to buy or sell a house or how a recession
might affect the availability of homes.
Case 2: Consider a stock market line graph for a particular company’s stocks. To help users
make purchasing and selling decisions, they frequently employ line charts. Line graphs can
display how a value has changed over time from yearly to minute-by-minute.
Activity IV
The following table provides the information on the favorite colors with the group of
people. Draw a line graph for the information
4. Tree Map: The tree map serves as a rectangle-nested visualization. These rectangles are
arranged in a hierarchy, or ‘tree’, to represent specific categories within a chosen
dimension as shown in Fig.1.8. In a limited chart space, quantities and patterns can be
compared and displayed. Tree maps show relationships between parts and wholes. This
particular visualization was created by University of Maryland computer science
professor Ben Shneiderman to maximize available space.
❖ With the help of tree maps, readers may quickly and easily analyze their data.
Fig.1.8: Tree map showing region wise sales nested year wise
Case 1: A user may utilize a category palette for measurements, designating a different color
for each delivery option. A continuous color palette for measures would display a business's
sales figures or profit. The largest box displays the largest portion of the entire, and the
smallest box displays the smallest portion, while looking for insights in a tree map. These
boxes can be nested to show various categories for a more in-depth investigation. The ‘Total
Sales’ data set, for instance, might have a field that says ‘Region wise Sales’. That box may
show ‘Year wise Sales’ in a box that is nested inside it.
5. Histogram: Histograms, a particular type of bar chart, offer a way to display data
distributions as shown in Fig.1.9. A histogram displays the various values of a single
piece of data as a network of interconnected bars. A single continuous measure is
divided into groups or bins by histograms that each reflect a particular range of values.
Then, these evenly sized bins are filled with data points. The bins are then graphically
represented as bars that are piled on top of one another. The number of occurrences
within each range of values is used to quantify bins. Depending on where the data's
values are concentrated, this count will change how the view looks. Skew is the term
used to describe when values are concentrated on either side of the midpoint.
Fig.1.9: Histogram showing stock price distribution. Maximum count in bin 300-400
Self-Assessment Questions - 4
a) Continuous value
b) Categorical value
c) All the above
d) Skewed value
6. Map: The majority of the data gathered contain a location variable, making map
visualization simple as shown in Fig.1.10.
Case 1: For instance, a map visualization might show how many clients there are in
each country of the world, with each country standing in for a certain number of
customers. Businesses can expand in an area where they have not yet dispersed as
much as in other places with the aid of location information.
7. Scatter Plot: Another name for a scatter plot is an XY graph, scatter chart, or
scattergram. The scatter diagram displays the relationship between pairs of
numerical data by graphing them with one variable on each axis as shown in Fig.1.11.
The following situations call for the use of scatter plots:
❖ In the case of paired numerical data
❖ When more than one value of the dependent variable is associated with a
particular value of the independent variable
❖ When determining the relationships between variables, it might be helpful to look
for potential problem-solving causes, see if two products that seem connected
both have the same root cause, and so on.
We are aware that the correlation is a statistical indicator of the relationship between the
relative motions of the two variables. If the variables are correlated, a line or curve will be
formed by the points. If the points touch the line more closely, it indicates better correlation.
Correlation types:
The correlation between two characteristics or variables is explained by the scatter plot. It
shows how closely related the two variables are. To determine the relationship between the
two variables, there are three possible scenarios:
Positive Correlation: The scatter plot reveals a positive association when the graph's points
are increasing and travelling from left to right. It indicates that one variable's values are
rising in relation to another.
Negative Correlation: A negative correlation is present when the points in the scatter graph
go decreasing from left to right. It indicates that one variable's values are falling in relation
to another.
No Correlation: There is no association between the variables if the points are dispersed
around the graph and it is impossible to determine whether the values are rising or falling.
Activity V
Consider a scenario that you wanted to buy secondhand car. Consider the two
variables: age of car vs price. What kind of correlation is this?
Justify it.
------------------------------------------------------------------------------------------------------------
----------
Figure 1.11 given below describes the detailed view of types of correlations with respect to
the strength of correlation as well. Figure 1.12 shows Scatter Plots showing the profit
distribution of category of expenses region wise along with correlations shown by trend
Lines
Fig.1.12: Scatter plots showing the profit distribution of category of expenses region wise
along with correlations shown by trend Lines
Activity VI
Assume that you are a data analyst. You have been given 2 features (variables) namely
color of the house and price of the house.
What kind of correlation occurs in between these variables? Justify your answers
------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Self-Assessment Questions - 5
Cons:
7. SUMMARY
Now let's review the main ideas covered in this chapter:
❖ The act of converting raw data into visual representations in the form of charts, graphs,
and dashboards is known as data visualization.
❖ The idea behind using data visualization is to simply and quickly understand data.
8. GLOSSARY
9. TERMINAL QUESTIONS
10. ANSWERS
Self-Assessment Questions Answers
Nowadays data analysis has become popular that gives more importance to
visualization.
Plateau Distribution – It rises to a few levels and sustains there for most of the bins.
Edge Peak Distribution – Similar to normal frequency distribution. But here one bin at
the end is (higher) greater than the rest, serves as a sort of tail.
The following are the scenarios where data visualization plays an important role
A pie chart is useful for organizing and displaying data by percentage of the total. In
keeping with its moniker, this type of visualization uses a circle to represent the entire
thing and slices, or ’pies’, of that circle to symbolize the various categories that make
up the whole. A user can compare the relationship between various dimensions (such
as categories, products, people, countries, etc.) within a particular context using this
sort of chart. The numerical data (measure) is typically divided into percentages of the
overall sum on the chart. Each slice is a representation of the value's percentage, and
should be measured as such.
• You have a sum that can be divided into two to five groups.
• There is a big difference between the weight of each category. (For more details
refer Section 1.5 Techniques of Data Visualization.)
9. Dashboard
• It is a visual display of all your data. The primary intention of dashboard is to
provide information at-a-glance (KPIs)
• It allows all kinds of professionals to easily monitor performance and then create
a report. The benefits are dashboards are
✓ The ability to identify trends
✓ An easy way to measure efficiency
✓ Provides a detailed report with a single click
✓ Helps in making decisions
✓ Easy to identify data outliers and correlations
10. Line chart
A line chart, also known as a line graph or a line plot, uses a line to link a group of data
points as shown in Fig.1.13. This type of graph uses sequential values to show trends.
The x-axis (horizontal axis) often shows a succession of numbers in a consecutive
order. The values for a chosen metric across that progression are then provided on the
y-axis (vertical axis). When you need to illustrate data across time, this basic graphic
works wonderfully. To create forecasts for the coming year, one use case would involve
tracking customer interest in a particular category of good or service over the course
of the year.
Briton’s diet: This data shows how Briton’s diet changed over past decades.
It shows using trending lines that more fatty foods are being consumed and healthy foods
are being consumed less. It is more understandable and easier to analyze this data using
trending lines.
12. REFERENCES:
Recommended Readings
• Hoelscher, J., & Mortimer, A. (2018). Using Tableau to visualize data and drive decision-
making. Journal of Accounting Education, vol. 44, pp. 49-59.
• Friendly, M. (2008). A brief history of data visualization. In Handbook of data
visualization (pp. 15-56). Springer, Berlin, Heidelberg.
• Healy, K. (2018). Data visualization: a practical introduction. Princeton University
Press.
DADS304
VISUALIZATION
Unit 2
Basic Visualization Using R
Table of Contents
1. INTRODUCTION
There is an enormous amount of data available in the market. In recent times, online
platform utilization is more than offline platforms. Data are getting stored in multiple
formats and redundancy of the data is also very high. It is getting incremented exponentially
day by day. If it keeps progressing like this, one day there will be a huge storage crisis. This
problem should be addressed immediately. Data visualization is the process through which
raw data can be visualized and some inference can be derived and based on that inference,
other methods of data pre-processing can be applied. The pre-processed data is the ready
data made available for further analysis to get proper inference from the data. It is in turn
helpful in clearing the storage to a certain extent, so that it can be utilized further in an
effective manner. There are many platforms to visualize data. In this unit, R software will be
used for data visualization, as R software is a free open-source software with GNU package.
R is a statistical programming language where data can be visualized from a basic to
advanced level before and after statistical modelling.
1.1 Objectives:
2. FEATURES OF R SOFTWARE
R supports basic statistical calculations to advanced statistical calculations to get proper and
effective inference from the data.
3. STEPS TO INSTALL R
Step 1: Go to the website The Comprehensive R Archive Network (r-project.org)
Step 2: Download R as per operating system availability in the system. Follow the below links
as per OS availability.
https://cran.r-project.org/bin/linux/
https://cran.r-project.org/bin/macosx/
https://cran.r-project.org/bin/windows/
Step 6c: Select the components you wish to install and click next.
This part deals with basic R commands. The R console can be visualized as
Fig. 2.13: data (), name (), dim (), str (), View () Command
Self-Assessment Questions - 1
Fig. 2.14: mean (), median (), summary (), var (), sd (), quantile () Command
Self-Assessment Questions - 2
2. Box Plot: It is used to demonstrate locality, spread and skewness of the data.
Hist ()
5. pnorm (), qnorm (): pnorm is cumulative density function of the normal distribution.
qnorm finds the boundary value.
6. Line Graph: It is used to plot lines as per the relationships between two variables.
Plot()
7. Pie Chart: Pie chart is used to plot percentage distribution of the data.
Pie ()
8. Stacked Bar Graphs: Data visualization of bar charts use horizontal columns to exhibit
numerical comparisons between categories.
Self-Assessment Questions -3
Data visualization is very important to identify the trend of the data. There are different
types of plots to get inference from the data. Libraries are needed to be included for advanced
plotting using R.
1. ggplot2: It is a plotting package which helps to create complex plots from various data
in a data frame. It is a more advanced plot to get programmatic interface, clear
visibility and proper inference.
2. Lattice: It is a package which has graphics and data visualization property that
originated from Trellis graphics package. It can plot multivariate data. It can first plot
basic visualization and advance it based on enhanced features.
3. High charter: It is associated with java script library and its modules. It is highly flexible
and customizable. It has a very high powerful API. Chart visualization of the data is
possible using this.
4. Leaflet: It is an open-source java script library used to create dynamic online maps. It
can create graph through layer wise.
5. R Color Brewer: It is an important tool for color management. It offers several color
pallets and provides some unique graphical visualization.
6. Plotly: It is a R package which can help create interactive web services with the help
of java scripts. It is an open-source library.
8. RGL: It is used to produce interactive 3-D plots. It contains high level graphic commands
along with basic commands. It is used for 3-D visualization with openGL.
9. Dygraphs: It is a java script charting library. It creates high facilities of time series data
using R.
Self-Assessment Questions -4
7. _____________ which has graphics and data visualization property originated from
Trellis graphics package.
8. ____________ produce interactive 3-D plot.
8. ACTIVITIES
Activity A
Create a database of cancer patients with proper feature set and infer using data
visualization techniques.
………………………………………………………………………………………………………………………………………
……………………………………………………………………… Activity:
Identify the major symptoms of cancer and its stages and create a data set with minimum
100 cases. Visualize the data and get inference.
Activity B
………………………………………………………………………………………………………………………………………
……………………………………………………………………… Activity:
9. SUMMARY
Data visualization is a major technology for data analysis. Data is a very important asset of a
company. But at the same time repetitive and useless data creates unnecessary storage
overflow and leads to wrong decision-making. There are multiple tools and techniques
available for data visualization such as tableau, Microsoft Power BI, Python, Microsoft Excel,
Mongo DB, R Studio and many more. The visualization techniques are more or less the same
for all the cases. The main challenge for data visualization is to identify proper visualization
technique for respective datasets. For example, if it is required to display data based on
percentage, pie chart is the best option. Box plot is used to identify spread and skewness of
the data. There are advanced plotting techniques where libraries are needed to be added to
R and proper inferences from the data is possible. However, before undertaking any data
visualization through R, the basic commands used in the platform of R are required to be
understood. In this unit, the basic R commands, essential statistical commands, basic and
advanced data visualization techniques were discussed.
10. GLOSSARY
Braided Graph: a novel visualization technique where filled areas are sorted in depth order.
Bullet Graphs: used for comparing the performance of primary measures with other
measures.
Did You Know? About 65% of the brands use infographics for marketing. 84% of the people
accepted infographics as a powerful tool. Infographics are the fourth most used type of
content marketing. Infographics can boost website traffic by 12%.
Did You Know? 80% of the health care market players invested in big data analytics and
artificial intelligence driven by market demand.
Collect the Retinopathy database from your nearby health care center and visualize the
cases of positive, negative.
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Ans: There are many data visualization tools available in the market. Some features in
them are common. It is a software that can be used to visualize the data. Data
visualization is very important for the purpose of data analysis. The most popular data
visualization tools are tableau, Microsoft excel, Microsoft power BI, Dundas BI, Jupiter,
Zoho reports, Google Charts, Visual.ly, RAW, IBM Watson, Sisense, Plotly, data wrapper,
fusion charts, Qlik view, info grams, chart blocks, D3.js, Chart.js, chartist.js, sigma.js, and
polymaps.
Ans: Box plots are used to visualize the spread of data. If the data values are beyond the
maximum and minimum boundary values, those are known as outliers and cannot be
considered to calculate the mean of the data. The median will be at the middle and the
full data sets are divide into four quartiles. The difference between lower and upper
quartile is known as Inter Quartile Range (IQR).
Ans: There are several basic and advanced tools are available for data visualization
using R. So, it is very easy to visualize direct data and /or processed data after statistical
processing. R is a strong statistical tool for data visualization.
Ans:
1. print () – to print
Ans:
Ans:
Data visualization is an important technique to identify the trends in the data. There
are different types of plots to get inference from the data. Libraries are needed to be
included for advanced plotting using R.
1. ggplot2: It is a plotting package which helps to create complex plots from various
data in a data frame. It is a more advanced plot to get programmatic interface, clear
visibility and proper inference.
2. Lattice: It is a package which has graphics and data visualization property that
originated from Trellis graphics package. It can plot multivariate data. It can first
plot basic visualization and advance it based on enhanced features.
3. High charter: It is associated with java script library and its modules. It is highly
flexible and customizable. It has a very high powerful API. Chart visualization of the
data is possible using this.
4. Leaflet: It is an open-source java script library, used to create dynamic online maps.
It can create graph through layer wise.
5. R Color Brewer: It offers several color pallets. It is an important tool for color
management. It offers several color pallets and provides some uniqueness in
graphical visualization.
6. Plotly: It is a R package which can help to create interactive web services with the
help of java scripts. It is an open-source library.
7. Sunburst R: It is a special type of data visualization tool in R. It is customizable.
8. RGL: It is used to produce interactive 3-D plots. It contains high level graphic
commands along with basic commands. It is used for 3-D visualization with open GL.
9. Dygraphs: It is a java script charting library. It creates high facilities of time series
data using R.
17. REFERENCE
L. S. Hovhan, 7 Key Benefits of Interactive Data Visualization, October 2020, [online]
Available: https://infogram.com/blog/7-key-benefits-of-interactive-data-visualization/.
Likhitha Ravi, E. Kauffmann, J. Peral, D. Gil, A. Ferrández, R. Sellers and H. Mora, "A
framework for Big Data Analytics in Commercial Social Networks: A Case Study on Sentiment
Analysis and Fake Review Detection for Marketing Decision-Making", Industrial Marketing
Management, vol. 90, pp. 523–537, 2020.
E-References
https://onlinecourses.nptel.ac.in/noc22_mg09/
NOC | Essentials of Data Science With R Software - 1: Probability and Statistical Inference
(nptel.ac.in)
NPTEL
https://onlinecourses.nptel.ac.in/noc19_ma33
DADS304
VISUALIZATION
Unit 3
Introduction to R SHINY
Table of Contents
1. INTRODUCTION
In the developing field of visual analytics, interactive visual interfaces are used to facilitate
analytical reasoning. The fundamental concept is the combination of exceptional human
abilities for visual information exploration with enormous computing power to create a
potent environment for knowledge discovery.
Other visual analytics programmes like Flourish, Infogram, D3, and many others have the
issue of either being expensive to use or not being sophisticated enough to be used with more
complex statistical analysis programmes like dynamical modeling ones.
R Shiny is a free and open-source component of the R programming language, and as such, it
is integrated with R's enormous array of statistical, numerical, and computational
capabilities.
2. VISUAL ANALYTICS
The use of complex methods and tools to evaluate data using graphical representations of
the information is known as visual analytics. Users can see patterns and gain useful insights
by viewing the data as graphs, charts, and maps. Organizations can improve their data-driven
decisions thanks to these insights.
Benefits:
Share findings and monitor progress:Organize and share key performance indicators across
an organisation by using interactive reports and dashboards.
Take action more quickly: When working with data sets in a visual format, users can
comprehend data insights much more quickly.
Encourages Data Literacy: Data analytics becomes more accessible by making data easy to
use and comprehend, involving more individuals inside a company.
The term "data visualisation" typically refers to the graphical representation of data, or
representing data in bubble charts, heat maps, and other visuals to aid in understanding
patterns, relationships, trends, and other important insights in datasets. The term "visual
analytics" describes the use of an analytics tool to carry out in-depth analysis of large,
complicated datasets while enabling users to interact and explore dynamic visuals.
• Define goals
• Integrate and manage the data
• Simplify visualizations
• Get Inspired
Examples:
• Marketing:By enabling the marketer in this example to see and comprehend each stage
of the customer life cycle, visual analytics helps them increase ROI.
• Supply Chain:By displaying KPIs and enabling interactive exploration, big data visual
analytics can assist supply chain managers in quickly discovering relationships across
complicated, divergent data sources.
• Sales:The clear, organised presentation of sales data helps sales managers to boost
revenue, enhance forecasting, and spot important patterns.
• Finance:A loan manager at a consumer bank can investigate how various geographic
areas, products, and loan officers fare over time and determine which factors have the
most influence on revenue and profits.
• IT :Data analytics and visualisation can be used by IT administrators to better predict
future technology requirements and spot underutilized systems and applications.
Self-Assessment Questions - 1
1. Users can see patterns and gain useful insights by viewing the data as __________,
__________ , and ________.
2. The ______________ describes the use of an analytics tool to carry out in-depth
analysis of large, complicated datasets.
3. INTRODUCTION TO R-SHINY
Imagine being able to create a web application using your #datascience analysis. The R Shiny
programme allows you to do exactly that.
Using the wonderful R Shiny framework, you can quickly turn your data research into a web
app. Create incredible apps that your company can utilize in a matter of hours, not weeks or
months.
An R package called Shiny makes it possible to create interactive web applications that can
run R code in the background. With Shiny, you can create dashboards, embed interactive
charts in R Markdown papers, and host standalone applications on a website. Additionally,
you can add HTML widgets, JavaScript actions, and CSS themes to your Shiny applications.
The core functionality of the Shiny web framework is the ability to gather input values from
a web page, make those inputs readily available to the application in R, and have the output
values from the R code posted to the web page. A Shiny application needs a user interface
and a server function to do calculations in its most basic version. A server script and a user-
interface definition make up Shiny applications' two parts.
TOOLS
Self-Assessment Questions - 2
3. ______________ can be done using R Shiny for data science.
4. There are 2 phases of R-Shiny Applications (True/False).
4. WHAT IS R SHINY?
An R tool called Shiny makes it simple to create dynamic web applications directly from R.
You can create dashboards, embed standalone apps in R Markdown papers, or host them on
a website.
Installation:
install.packages(“shiny”)
FEATURES
Self-Assessment Questions - 3
5. We install Shiny by using the command ____________________________________.
6. R-Shiny helps to create straightforward web apps without ___________________.
Radio buttons, panels, and selection boxes are all managed by the user interface (ui) object,
which is used to manage the app's overall design and layout.
UI OBJECT
The fluidPage() method is used to construct the app's layout for the default app's user
interface. The fluidPage layout will adapt to changes in browser size automatically. The
fluidPage() uses a sidebarLayout() to divide the page into two sections: the mainPanel(),
which houses the histogram output in the example app, and the sidebarPanel(), which
houses the app's input (the slider). This is a reasonably standard layout for a Shiny
programme; take note that nothing prevents us from adding additional output before or after
the sidebarLayout(), such as the titlePanel(). Each page component, with fluidPage() at the
top, is an argument to a function.
There are more advanced layout options that are also more flexible, like navBarPage(), to
create a page with a navigation bar. A user interface can also be created entirely from scratch
using HTML, CSS, etc.
The web application's code is located in the server function. A render...() function
corresponds to each...Output() function in the user interface. The R code to create the object
we wish to render is provided as the first argument to the render...() function. For instance,
the code to create a histogram is located in the renderPlot() function of the example app.
Since this must be a single expression, it is usually found in s.
The value of the slider as defined in the user interface will be contained in input$bins (which
was called bins).
A SERVER FUNCTION
The web application's code is located in the server function. A render...() function
corresponds to each...Output() function in the user interface. The R code to create the object
we wish to draw is provided as the first parameter to the render...() function. For instance,
the code to create a histogram is located in the renderPlot() method of the sample app. Since
this must be a single expression, it is usually seen in s.
Reactive programming is the method used by Shiny. This implies that everything that
depends on something will immediately be updated when that item changes (such the slider
being adjusted). The distPlot graph is the only component of the sample app that is
dependent on the slider; as a result, if we change the slider, this will be redrew.
The graph is also influenced by several other, less obvious aspects of the programme, such
the window size. The graph will be re-drew if the window is resized since Shiny is aware that
the graph depends on that aspect of the app.
UI:
The second panel, which will house all of our visualisation work, will then be defined. We
must specify our main content and sidebar content before combining them in our second
tabPanel ().
We'll add a choose widget to our sidebar so the user may choose the Y variable for our plot.
This choose widget will be given the name "y var," and we'll use it later to modify the plot on
our server. R file. We use the term "plot" in our primary material but define it afterwards on
our server ,also a R file.
Call select values = colnames to set the select values variable for the selectInput choices
(data).
SERVER:
The function that assigns values to the 'output', as shown below, will be created in the
server.R file. It will accept input values specified by the UI.
To match the "plot" label we wrote in our UI main panel, we want to specify output$plot in
our server function as follows: plotOutput("plot"). RenderPlot() will be called to construct
the plot in order to initialise this.
The main aspect is incorporating your supplied variables. We created the UI input variable y
var to be utilised in choosing which variable appears in the y-axis (in a vertical bar plot, it
will show as the x-axis). This variable, which you may refer to as input$y var, is used to
organise, label, and show data in your plot.
The connections have been made, so you may view and use your visualisation now.
The main aspect is incorporating your supplied variables. We created the UI input variable y
var to be utilised in choosing which variable appears in the y-axis (in a vertical bar plot, it
will show as the x-axis). This variable, which you may refer to as input$y var, is used to
organise, label, and show data in your plot.
The connections have been made, so you may view and use your visualisation now.
Self-Assessment Questions - 4
7. The server phase in R-Shiny Load data files and libraries , gathers data and
_________________________________________________________.
8. _____________________ features are used to design the UI interface.
6. SUMMARY
In the developing field of visual analytics, interactive visual interfaces are used to facilitate
analytical reasoning. The fundamental concept is the combination of exceptional human
abilities for visual information exploration with enormous computing power to create a
potent environment for knowledge discovery.
An R package called Shiny makes it possible to create interactive web applications that can
run R code in the background. With Shiny, you can create dashboards, embed interactive
charts in R Markdown papers, and host standalone applications on a website. In addition ,
one can add HTML widgets, JavaScript actions, and CSS themes to your Shiny applications.
7.ACTIVITY
Take the gapminder data, examine it, then make an amusing app using r shiny to show the
dataset for fun.
8. GLOSSARY
R-Shiny:The open source R package that offers a beautiful and robust web framework for
creating online apps.
Visual Analytics: The use of advanced tools and procedures to evaluate datasets using visual
representations of the data is known as visual analytics.
Here, we examine data from the Consumer Product Safety Commission's (CPSC) National
Electronic Injury Surveillance System (NEISS).
Fig.15 Summary
Fig.16 Estimated number of injuries according to age of both male and female
Fig.17 Prototype
1. What is R-Shiny?
2. What is reactivity?
1. graphs, charts,maps
2. Visual Analytics
3. Web app
4. True
5. install.packages(“shiny”))
6. Javascript
7. Create graphs and charts using the server function.
8. HTML
1. What is R-Shiny?
An R package called Shiny makes it possible to create interactive web applications that
can run R code in the background. With Shiny, you can create dashboards, embed
interactive charts in R Markdown papers, and host standalone applications on a website.
Additionally, you can add HTML widgets, JavaScript actions, and CSS themes to your
Shiny applications.
2. What is reactivity?
Reactive programming is the method used by Shiny. This implies that everything that
depends on something will immediately be updated when that item changes (such the
slider being adjusted). The distPlot graph is the only component of the sample app that
is dependent on the slider; as a result, if we change the slider, this will be redrew.
The use of complex methods and tools to evaluate data using graphical representations
of the information is known as visual analytics. Users can see patterns and gain useful
insights by viewing the data as graphs, charts, and maps. Organizations can improve
their data-driven decisions thanks to these insights.
Take action more quickly: When working with data sets in a visual format, users can
comprehend data insights much more quickly.
Encourages Data Literacy: Data analytics becomes more accessible by making data easy
to use and comprehend, involving more individuals inside a company.
Difference between data analytics and data visualization-The term "data visualisation"
typically refers to the graphical representation of data, or representing data in bubble
charts, heat maps, and other visuals to aid in understanding patterns, relationships,
trends, and other important insights in datasets. The term "visual analytics" describes
the use of an analytics tool to carry out in-depth analysis of large, complicated datasets
while enabling users to interact and explore dynamic visuals.
• Define goals
• Integrate and manage the data
• Simplify visualizations
• Get Inspired
Marketing:By enabling the marketer in this example to see and comprehend each stage
of the customer life cycle, visual analytics helps them increase ROI.
Supply Chain:By displaying KPIs and enabling interactive exploration, big data visual
analytics can assistsupply chain managers in quickly discovering relationships across
complicated, divergent data sources.
Sales:The clear, organised presentation of sales data helps sales managers to boost
revenue,
Finance:A loan manager at a consumer bank can investigate how various geographic
areas, products, and loan officers fare over time and determine which factors have the
most influence on revenue and profits.
WHAT IS R SHINY?
An R tool called Shiny makes it simple to create dynamic web applications directly from
R. You can create dashboards, embed standalone apps in R Markdown papers, or host
them on a website.
install.packages(“shiny”))
FEATURES
UI OBJECT
The fluidPage() method is used to construct the app's layout for the default app's user
interface. The fluidPage layout will adapt to changes in browser size automatically. The
fluidPage() uses a sidebarLayout() to divide the page into two sections: the
mainPanel(), which houses the histogram output in the example app, and the
sidebarPanel(), which houses the app's input (the slider). This is a reasonably standard
layout for a Shiny programme; take note that nothing prevents us from adding
additional output before or after the sidebarLayout(), such as the titlePanel(). Each
page component, with fluidPage() at the top, is an argument to a function.
There are more advanced layout options that are also more flexible, like navBarPage(),
to create a page with a navigation bar. A user interface can also be created entirely from
scratch using HTML, CSS, etc.
The web application's code is located in the server function. A render...() function
corresponds to each...Output() function in the user interface. The R code to create the
object we wish to render is provided as the first argument to the render...() function.
For instance, the code to create a histogram is located in the renderPlot() function of
the example app. Since this must be a single expression, it is usually found in s.
The value of the slider as defined in the user interface will be contained in input$bins
(which was called bins).
A SERVER FUNCTION
For instance, the code to create a histogram is located in the renderPlot() method of the
sample app. Since this must be a single expression, it is usually seen in s.
15. REFERENCES
• https://mastering-shiny.org/basic-case-study.html
• https://www.analyticsvidhya.com/blog/2021/05/build-interactive-models-with-r-
shiny/
• https://shiny.rstudio.com/articles/build.html
• https://www.qlik.com/us/data-visualization/visual-analytics
DADS304
VISUALIZATION
Unit 4
Dashboard Design using R-Shiny
Table of Contents
1. INTRODUCTION
The R-Shiny framework is a package from R Studio that makes it easy to build interactive
web applications straight from R. R-Shiny offers powerful analysis tools and data
manipulation or wrangling. It also offers advanced forecasting packages. and statistical
modeling. Through the web applications, a user can visualize data like metadata,
bibliographic data, etc… and to create efficient data reports. These visualization tools can
stimulate other users to create open repositories and connect either regional, national or
international repositories networks. It compiles the user code into the HTML, JavaScript and
CSS needed to display users’ application on the web.
1.1 Objective
2.1 Dashboard
Dashboards are tools that offer current information while employing images to explain the
narratives underlying the data. They assist decision-makers in understanding the
connections in complex, massive data. They display images in a useful arrangement that
makes it easier for the organization to understand and appreciate the data.
Shiny dashboards give users inside the R environment access to a full web application
framework. You may quickly turn your R work, analysis, and visualizations, machine
learning models, among other things, into web applications that benefit companies. End-
users can use it as a complete application without having any prior knowledge of R. Deliver
a comprehensive, user-friendly, and interactive product that enhances your business
operations.
You may use Shiny as a dashboard development platform to access a variety of R packages
for data research, including the Tidyverse. For the visualization of data and models, you can
access advanced graphical features. Add responsiveness and engagement by embedding
these images in Shiny dashboards. This can be done by using an interface that R has allowed
to communicate with JavaScript-based charting packages.
The administration and structure of code is made easier by taking a look at the Shiny
dashboard's design, making use of functions, modules, and packages, and using rapid
prototyping. Simple source code controls and smaller, more manageable dashboard
components are both possible.
Self-Assessment Questions - 1
1. __________ environment is used in R-Shiny.
2. __________ assist decision-makers in understanding the connections in complex,
massive data.
To ensure that your Shiny apps are not only user-friendly but also offer a pleasurable
experience for your users, follow the 7 steps listed below.
Self-Assessment Questions – 2
3. To display a lot of information on a single webpage we use ___________
4. The script is made of code that describes _________ and ____________
Start by considering the following issues while choosing the best UX design for your Shiny
app:
• What are the users going to be able to do with your app? What is the company's
vision?
• How have they accomplished it thus far? Are there any tools or processes to which
they are already accustomed?
Prior to building the Shiny app, it's critical to know the answers to these questions in order
to ensure that the interface will support the main functionality. Knowing the users' identities
will enable you to highlight the important information and conceal or even skip the rest. The
best way to provide the results—in a table, as a downloadable file, or as a graph—depends
entirely on what the users need to complete. Observing how users now complete that task
will help you better understand the entire process or possibly identify a potential
competitive advantage.
Nevertheless, it's a wise idea to initially sketch out your ideas before you begin creating your
app. The explanation is straightforward: redesigning the wireframes is significantly simpler
than altering the UI code.
Although there are many tools available (such as Figma) for creating wireframes and
mockups, it is perfectly acceptable to start with simply a pen and paper.The UI components
may easily be removed and moved about the page to observe how they work together.
Don't stick with the first design; try different ones, get input from the intended audience, and
refine the design.
There are many tried-and-true packages that can give your Shiny app a polished appearance.
You may utilize Appsilon's shiny.fluent to create business applications, especially for
environments that lean heavily toward Microsoft.
Get motivated:
Try finding some inspiration if you're having trouble with the page layout. To see how similar
capabilities are handled on websites you enjoy, visit them and learn how to navigate easily.
You can peruse Shiny demos if you're in the mood for more application cases. Then, you can
go to websites that irritate you or on which you simply cannot seem to get the information
you seek, and try to comprehend the underlying issue that is affecting user experience.
A good user experience design should adhere to certain principles in UX design if you require
a more "formal" strategy.
The majority of Shiny programs include data visualization, making R an excellent choice for
data processing and analysis. Although data visualization design is a large subject, there are
a few crucial points to take into account to get us started.
Type of graph
Use a line graph if the information is primarily concerned with changes over time. Use a bar
chart when you want to demonstrate how the levels of various categories differ from one
another (and don't forget to always start the bar chart at 0!).
Colors
Choose your colors carefully because an overuse of them might make your graph clumsy to
look at and hard to read.
Axes
The axis should be adjusted to correspond to the data; otherwise, it will be challenging to
identify the data's variance.
Labels
Consider the labels and consider whether the precise value is crucial. If so, it's a good idea to
make the label on the graph clear. However, we can hide the labels and make the graph easier
to understand if the relationship between both the series is more significant.
The best approach to determine whether your Shiny app satisfies user expectations is
through in-depth user interviews. They are extended user sessions when we ask them to
utilise the product to complete a number of tasks. In this manner, we may determine whether
there are any persistent issues with the navigation or the usability in general.
The great news is that the app layout can be tested at any point throughout development,
even before it begins. One can manually alter the "screens" as the user "performs an action"
in the app using wireframes or mockups. Just keep in mind not to put off testing until the last
minute. Rebuilding the UI when app development is complete will be expensive.
It's simple to overlook the fact that the user experience of an app includes more than just
how it appears and functions. Check out the following items to get your Shiny app off to a
good start:
Remember that this is simply a starting point for you. In addition to the aesthetic element of
the user interface, there are more UX touch-points to take into account the more complicated
the app and user flow.
Installation:
Basics:
A dashboard is composed of a body, a sidebar, and a header. Here is the dashboard page's
user interface at its most basic.
Blank dashboard:
Using the shinyApp() function, you may immediately view it at the R console. (This code can
also be used to create a single-file app.)
Self-Assessment Questions – 3
5. A dashboard is composed of ________ , ________ , and ___________
6. Blank dashboard can be created by _________ function_
Basic dashboard:
The blank dashboard is obviously not very helpful. We'll need to include elements that are
functional. We can include content-filled boxes in the body.
The sidebar's content can then be added next. We'll add menu options that function like tabs
for this example. These work similarly to Shiny's tab Panels in that they display a distinct set
of material in the main body when you click on a certain menu item.
There are two tasks that must be completed. You must first add sidebar menu items with the
proper tabNames.
Add tabItems with the corresponding values for tabName to the body:
The default display, which also appears when the menu item "Dashboard" is clicked:
3.3 Boxes:
The primary components of dashboard pages are boxes. With the box() function, a simple
box may be built, and (most) any Shiny UI element can be used as its content.
Also with title and status settings, boxes can also have titles and different colored header
bars.
tabBox:
Use a tabBox if you need a container to just have tabs for showing various content sets.
infoBox:
Simple numerical or text values are typically displayed in a particular type of box with an
icon.
Fill=FALSE is the default setting for the initial row of infoBoxes, while fill=TRUE is used for
the second row. Shiny Dashboard includes the auxiliary functions infoBoxOutput and
renderInfoBox for dynamic content because infoBox content is typically dynamic.
ValueBoxes:
ValueBoxes resemble infoBoxes but are visually distinct from them. The following code will
create these valueBoxes. Some of these valueBoxes are static and some are dynamic, just
like the infoBoxes mentioned before.
Self-Assessment Questions – 4
7. A simple box can be created using ________ function.
8. Boxes design can be classified as ________ , ________ and ___________
Like div() and p(), the HTML tag methods in Shiny return objects that can be rendered as
HTML. For instance, when you issue the following instructions at the R terminal, HTML is
printed out:
These parts of HTML are used to create the UI for a Shiny app. A collection of utilities created
to construct HTML that will generate a dashboard are offered by the shinydashboard
package. The dashboard will print out HTML if you copy the UI code for a page in the
dashboard (above) and put it into the R console.
Let's look at a straightforward dashboard. You can see that the title is contained in
dashboardHeader () and that dashboardSidebar () contains a sidebarMenu(). Output is
contained in the dashboardBody ().
• Skin
• Header
• Sidebar
• Body
• Controlbar
• Footer
We'll now examine each of the six elements that make up a shinydashboard.
• skin()
The color theme is the skin. The backdrop of the sidebar will be light if the skin is light.
Depending on the type of app you build, it is simple to select the appearance you like.
The plot color should complement the skin you select for your application. The list of
skin tones is provided below.
• header()
Dropdown menus and titles are both possible in a header. Here's an illustration:
The dropdownMenu() function creates the dropdown menus. There are three different types
of menus: messages, notifications, and tasks, and each one needs a specific kind of material
to be filled with.
Message menus
Values for from and message are required for a messageItem in a message menu. The icon
and a notification time string are also under your control. Any text can be the time string.
Dynamic content
You'll want to make the content dynamic in the majority of circumstances. To put it another
way, the HTML content is created on the server and delivered to the client to be rendered.
And on the server side, you would create a renderMenu to build the complete menu, as in:
Notification menus
A notification contains a notificationItem that contains a text notification. The user can also
control the status color and icon.
Task menus
Progress bars and text labels are shown on task items. The bar's color can also be chosen.
You can prevent a header bar from appearing by using the following command:
• sidebar()
Usually, a sidebar is used for rapid navigation. It may also have Shiny inputs like sliders
and text inputs, as well as menu items that function similarly to tabs in a tabPanel.
Fig.33. sidebar
Links in the sidebar can be utilized similarly to Shiny's tabPanels. In other words, when you
click a link, the dashboard's body will change to show new content. Here is an illustration of
a basic tabPanel:
The main body's content changes when the user selects one of the menu items:
The sidebarMenu() function is used to insert the menu items. Make sure that the tabName
values for a menuItem and a tabItem match in order to connect them.
If you specify a value for href, a menuItem has additional capabilities beyond controlling tabs.
It also has the ability to link to other materials. These external links typically open in a new
tab or window in the browser; the new tab option allows you to change this behavior.
Shiny now offers the ability to bookmark and restore an application's state as of version 0.14.
In a shinydashboard-built project, you must call sidebarMenu() with an id in order to
bookmark and restore the currently selected tabItem. For instance:
Dynamic content
A sidebar can also consist of ordinary inputs, like sliderInputs and textInputs.
A sidebarSearchForm, which is visible at the top in the image above, is another unique sort
of input available in shinydashboard. This is essentially a text input that has been particularly
formatted, together with an actionButton that looks like a magnifying glass (the icon can be
changed with the icon argument).
• controlbar()
body()
The below figure is the example of Stock Market Forecasting Application - Interface UI
These templates for stock symbols include the stock, most recent trade price, price change
and percentage price change, return date, and volume.
The charts that follow are those for individual stock symbols. By using the Study button, the
user can select another sticker to explore. We will learn how to use the gradient colour for
this. For those plots, I used ggplot2 to make a static plot and Plotly to build an interactive
plot.
The final graph is a line graph that contrasts every stock symbol.
• The Footer
footer()
Self-Assessment Questions – 5
9. The __________ function creates the dropdown menus.
10. ____________ and ___________ enable the dynamic generation of a sidebar menu.
5. SUMMARY
In summary, business analytics and statistics are greatly aided by dashboards. Through a
wide range of sources, sizes, and types of data, dashboards offer a window into the
understanding and tracking of business indicators. Dashboards make it easier for users to
collaborate and make decisions. Data scientists don't need to spend much time presenting
the findings because the data story is easily understood by the general public. Additionally,
Shiny dashboards' availability on web and mobile platforms increases users' accessibility
and mobility.
R-based data specialists can easily incorporate Shiny into their development workflow.
Shiny dashboards are adaptable, dynamic, and simple to tailor to each customer's unique
requirements. This is partially attributable to the web framework's support for web
technologies including HTML, CSS, SCSS, JavaScript, and others. Utilizing Shiny makes it
possible to employ modularized codes and functions, quickly prototype ideas, and easily
manage dashboards using smaller components.
6. GLOSSARY
Dashboard: All of the user data is shown visually on a dashboard. Although it has a wide
range of applications, its main purpose is to present information quickly, like KPIs. The
information for a dashboard often comes from a linked database and is shown on its own
page.
R-Shiny: The open source R package Shiny offers a beautiful and robust web framework for
creating online apps. With the aid of Shiny, you can transform your studies into interactive
web applications without having to grasp HTML, CSS, or JavaScript.
Shinydashboard: An R tool called Shiny makes it simple to create dynamic web applications
directly from R. Because they are good at assisting businesses in drawing conclusions from
the data already available, dashboards are widely used.
7. CONCEPT MAP
Did you know?: You may access a full web application framework within the R environment
with shiny dashboards. You may quickly turn your R work, analysis, and visualizations,
machine learning models, among other things, into web applications that benefit companies.
9.CASE STUDY
Shiny dashboards in healthcare:
The following Shiny app was created by Christian Luz "in the setting of a 1339-bed academic
tertiary referral hospital to process the data of over 180,000 admissions." Users of the
software can filter patients using one of 17 distinct criteria. Antimicrobial resistance,
microbiological tests, and their application can all be researched by users. The
investigation's findings can be quickly classified and stratified "to compare predefined
patient groups based on specific patient attributes."
Voter profile:
The Voters Profiles dashboard is a classic illustration of R Shiny in elections. It offers access
to graphs, maps, and a novel way to look into the voting profiles in the 2014 elections in
Brazil. It received honourable mention status in the 2019 Shiny competition.
Government representatives and others can readily assess the Brazilian elections of 2014
thanks to this dashboard. The results for both the first and second rounds are available by
state and city. The number of votes cast for each contender is also displayed as a bar graph.
Votes for governors and senators are displayed on the second tab, State level, while votes for
the president are displayed on the first tab, Federal level.
10.TERMINAL QUESTIONS
Short Answer Type:
1. List out the 7 basic steps that one should consider when designing an dashboard.
2. Write a basic syntax for creating an dashboard.
3. Explain the skin component of dashboardPage() function.
4. Discuss the footer of dashboardPage() function.
1. R
2. Dashboards.
3. Dashboards.
4. user interface object and the server function
5. body, a sidebar, and a header.
6. shinyApp()
7. box()
8. tabBox,infoBox,ValueBox
9. dropdownMenu()
10. RenderMenu and sidebarMenuOutput
3.
skin()
The colour theme is the skin. The backdrop of the sidebar will be light if the skin is light.
Depending on the type of app you build, it is simple to select the appearance you like. The
plot colour should complement the skin you select for your application. The list of skin
tones is provided below.
4.
footer()
1. The primary components of dashboard pages are boxes. With the box() function, a
simple box may be built, and (most) any Shiny UI element can be used as its content.
tabBox:
Use a tabBox if you need a container to just have tabs for showing various content sets.
2.
Dynamic content
Notification menus
A notification contains a notificationItem that contains a text notification.the user can also
control the status color and icon.
Task menus
13. REFERENCES
• https://appsilon.com/dashboards-in-rshiny/
• https://appsilon.com/dashboards-in-rshiny/#using
• https://rstudio.github.io/shinydashboard/structure.html
• https://bookdown.org/loankimrobinson/rshinybook/stock-front-footer.html
DADS304
VISUALIZATION
Unit 5
Creating Advanced Dashboard and
Visualization
Table of Contents
1. INTRODUCTION
What is a dashboard?
Dashboards are tools that offer current information while providing images to explain the
information about the data. They assist decision-makers in understanding the connections
in complex, massive data. They display images in a useful arrangement that makes it easier
for the organization to understand and appreciate the data.
Shiny
The RStudio PBC team created the open source R package known as Shiny. To offer a
beautiful and simple web framework for creating online apps in R, RStudio created Shiny. R
users can build amazing apps, interactive maps, and dashboards using Shiny. And to
construct it, you don't need significant web development abilities!
Access to a full web application framework is made possible within the R environment via
shiny dashboards. You can quickly create web applications that benefit organizations from
your R work, including analyses, visualizations, machine learning models, and more. End-
users can use it as a complete application without knowing how to use R. Deliver a
comprehensive, user-friendly, and interactive product that enhances how you conduct
business.
The dashboard may be easily customized using unique HTML, CSS, SCSS, Javascript, and
other languages thanks to Shiny's web framework. With other BI software suites, it is
impossible to develop a distinctive, customized dashboard with this level of flexibility.
Include elements like colors, logos, typefaces, and others that better reflect your company.
Cost
Comparing Shiny to competitors like Power BI and Tableau, the latter two are less expensive
and open source. On the Appsilon blog, you can examine a detailed comparison of Shiny to
Power BI and Shiny to Tableau.
One may use Shiny as a dashboard development platform to access a variety of R packages
for data research, including the Tidyverse. For the visualization of data and models, you can
access advanced graphical features. Add reactivity and interaction by embedding these
images in Shiny dashboards. This can be done by using an interface that R has allowed to
communicate with JavaScript-based charting packages.
1.1 Objectives
R Shiny is a fantastic tool for quickly producing aesthetically stunning and practical
dashboards, and it's not too difficult to master. Deploying more sophisticated solutions,
nevertheless, faces two significant obstacles.
First off, without a basic understanding of CSS and JavaScript, implementing unique
dashboard designs could prove challenging. Second, the process of putting the software into
production also calls for more sophisticated expertise. There isn't much of a problem
because my app is never used by more than two people at once. It does need more steps,
though, to make it accessible to a few hundred individuals.
• Data sources
• Data transformation
• Shiny App structure
• Visualizations
• Dashboard deployment on the shinyapps.io website server.
1. Data Source:-
➔ Users can find the dataset for grads by university here: data.gov.sg/gra.
• Rate of Employment
• Gross Monthly Income
2. Data transformation
In the code given below (DataWrangling.R), we will essentially attach the dataset, remove
NAs, fill missing values, alter university names to make it more readable, and then store the
data frames in.rds format, which is much quicker to read at the subsequent steps.
Fig 6: Code
There are certain names that need cleaning if users look at the names of the school
unique(data e$school) in the dataset, for instance:
Fig 7: Code
Lastly, save the cleaned data frame with the Shiny.rds extension:
To run a Shiny App you need to install the package & import the library:
Shiny apps have two crucial parts, which I refer to as the front-end user interface and the
back-end server. R.
To design the front end of our web application, we use the ui.R programme.
4. Visualizations
The majority of visualizations are created using Plotly, and Kable is used to present the
data tables.
Result:
By assuming that server.R and ui.R are located in the same folder.
4. Set up your account so the rsconnect package may utilise it. Bringing up the website Click
the display button on the token page to start. A popup displaying the whole command to set
up your account with the correct arguments for the rsconnect::setAccountInfo function will
appear. To use this command, copy it to your clipboard, paste it into RStudio's command line,
and press enter.
5. Release the app. Use the following codes to deploy your application:
library(rsconnect)
deployApp()
Congratulations when the deployment is complete! Your initial Shiny App has been released.
You can publish the app again after making modifications to the server.R or ui.R files.
Self-Assessment Questions - 1
1. The majority of visualizations are created using _________, and __________ is used to
present the data tables.
2. Codes to deploy your application _____________ and _____________.
Today's world has an exponentially growing amount of data, making it impossible to tell tales
without them. Even while there are specialized tools available, such as Tableau, QlikView,
and d3.js, nothing can substitute a modeling and statistics tool with strong visualization
capabilities. Both feature engineering and any type of exploratory data analysis benefit
greatly from it. R is a huge assistance in this regard.
Basic Visualization
➔ Histogram
➔ Bar Chart
➔ Line Chart
Advanced Visualization
➔ Heat Map
➔ Map Visualization
➔ Correlogram
BASIC VISUALIZATION
1. HISTOGRAM
The most popular graph to represent continuous data is a histogram. It is a bar plot that
shows the measurements' frequencies of appearance and counts the number of
observations that fall within each interval. Additionally, the height is influenced by the
ratio of frequency to interval width.
Example:The code below creates a function called binner, which takes a vector named
var, and saves the histogram app as that. The function sends shinyApp its var
parameter, which causes shinyApp to start an app that displays var.
Output:
binner(faithful$eruptions)
binner(iris$Sepal.Length)
2. BAR CHART
Rectangular bars with lengths proportionate to the values of the variables are used in
bar charts to display data. Bar charts are made in R using the function barplot(). R could
produce both vertical and horizontal bars in a bar chart. Each bar in a bar graph can
have a distinct color.
Example:
Output:
3. LINE CHART
A line graph, also known as a line plot or a line chart, connects each individual data
point with a line. In a line graph, numbers are represented as a function of time.
Example:The code illustrates how to give each line on the chart a distinct color.
Output:
ADVANCED VISUALIZATION
1. HEAT MAP
A heatmap is a graphical depiction of data that uses color coding to illustrate different
values. Although heatmaps can be used for many different types of analytics, they are
most frequently used to display user behavior on certain webpages or web page
layouts.
Output:
2. MAP VISUALIZATION
Through map visualisation, spatially pertinent data is examined, shown, and presented.
This type of data expression is more comprehensible and transparent. The distribution
or percentage of data in each area can be seen visually.
Output:
3. CORRELOGRAM
Fig 32 : Correlogram
Self-Assessment Questions - 2
3. _______ provides a sufficient selection of built-in functions and libraries (such as
ggplot2, leaflet, and lattice).
4. A ________ is represented as a correlogram.
4. ACTIVITY
Activity A
In the Chinese city of Wuhan, incidences of severe respiratory sickness started to be reported
in December 2019. These were brought on by a novel coronavirus, and the illness is now
generally known as COVID-19. Midway through January, the number of COVID-19 cases
began to increase more swiftly, and the virus soon expanded outside of China. Since then,
this story has quickly developed, and every day we are presented with unsettling stories
about the outbreak's current situation.
These headlines can be challenging to understand on their own. How quickly is the virus
circulating? Work being done to control the disease? How does the current scenario differ
from past epidemics?
……………………………………………………………………………………………………………………………………….
The objective is to develop a visualization which updates data like number of affected
cases,deaths and mostly affected countries based on the mapping date.
5. SUMMARY
R-based data specialists can easily incorporate Shiny into their development workflow.
Shiny dashboards are adaptable, dynamic, and simple to tailor to each customer's unique
requirements. This is partially attributable to the web framework's support for web
technologies including HTML, CSS, SCSS, JavaScript, and others. Writing R codes to plot
graphs repeatedly can become very tiresome. Additionally, making an interactive
visualization for story narration is very challenging. Therefore, the issues can be easily fixed
by quickly building interactive charts in R using Shiny. Utilizing Shiny makes it possible to
employ modularized codes and functions, quickly prototype ideas, and easily manage
dashboards using smaller components.
6. GLOSSARY
Data wrangling - Eliminating errors and integrating complex data sets to make them more
accessible and understandable.
7. CONCEPT MAP
Study Notes – Shiny applets like the Covid Tracker and RadaR, an R-based interactive tool
for rapid analysis of diagnostic and antimicrobial patterns, provide an example.
Study Notes – With the help of the shiny package, you can quickly create a user interface (UI)
and use R code to update the plots and analyses that are displayed to the user in response to
their selection of various UI options.
9. CASE STUDY
The use of R-Shiny Visualizations in medical imaging has significant positive effects on
the healthcare industry. A significant study in this field is Big Data Analytics in
Healthcare, which was published in BioMed Research International. The study lists
several common imaging methods, such as computed tomography, computed
radiography, mammography, and magnetic resonance imaging (MRI). The disparity in
these images' modality, resolution, and dimensions is addressed using a variety of
techniques. To enhance image quality, more effectively extract data from photos, and
offer the most accurate interpretation, many more are currently being developed. By
learning from prior cases and then suggesting better treatment options, the deep-
learning based algorithms improve diagnostic accuracy.
Using only R shine mechanisms, this software interactively visualizes 3D MRI scans.
1.
2. The most popular graph to represent continuous data is a histogram. It is a bar plot that
shows the measurements' frequencies of appearance and counts the number of
observations that fall within each interval. Additionally, the height is influenced by the
ratio of frequency to interval width.
1. Data Source:-
• Rate of Employment
• Gross Monthly Income
2. Data transformation
In the code given below (DataWrangling.R), we will essentially attach the dataset, remove
NAs, fill missing values, alter university names to make it more readable, and then store the
data frames in.rds format, which is much quicker to read at the subsequent steps.
There are certain names that need cleaning if users look at the names of the school
unique(data e$school) in the dataset, for instance:
Numerical transformation
Lastly, save the cleaned data frame with the Shiny.rds extension:
Cleaning Data
You must install the package and import the library in order to start a Shiny app:
Packages
Shiny apps have two crucial parts, which I refer to as the front-end user interface and the
back-end server. R.
To design the front end of our web application, we use the ui.R programme.
Server code
4. Visualizations
The majority of visualizations are created using Plotly, and Kable is used to present the
data tables.
Visualization code
Result:
By assuming that server.R and ui.R are located in the same folder.
install.packages('rsconnect')
library (rsconnect)
2. Create an account on shinyapps.io. Please be aware that all of your apps on shinyapps.io
will use your account name as the domain name.
3. Obtain the website-generated token when you log in to shinyapps.io:
Website generated
4. Set up your account so the rsconnect package may utilize it. Bringing up the website
Click the display button on the token page to start. A popup displaying the whole
command to set up your account with the correct arguments for the
rsconnect::setAccountInfo function will appear. To use this command, copy it to your
clipboard, paste it into RStudio's command line, and press enter.
Setting up account
5. Release the app. Use the following codes to deploy your application:
library(rsconnect)
deployApp()
Publish
Congratulations when the deployment is complete! Your initial Shiny App has been released.
You can publish the app again after making modifications to the server.R or ui.R files.
2. ADVANCED VISUALIZATION
1. HEAT MAP
A heatmap is a graphical depiction of data that uses color coding to illustrate different
values. Although heatmaps can be used for many different types of analytics, they are
most frequently used to display user behavior on certain webpages or web page
layouts.
Output:
Heat Map
2. MAP VISUALIZATION
Through map visualisation, spatially pertinent data is examined, shown, and presented.
This type of data expression is more comprehensible and transparent. The distribution
or percentage of data in each area can be seen visually.
Output:
Map Visualization
13. REFERENCE
• https://appsilon.com/how-i-built-an-interactive-shiny-dashboard-in-2-days-without-
any-experience-in-r/
• https://towardsdatascience.com/end-to-end-dashboard-in-r-shiny-app-64c40d0351d8
• https://avikarn.com/2020-05-26-correlation_shiny/
• https://shiny.rstudio.com/gallery/covid19-tracker.html
DADS304
VISUALIZATION
Unit 6
Introduction to Tableau
Table of Contents
1. INTRODUCTION
Tableau is a data visualization tool or business intelligence tool/software which analyzes
and displays data in a chart or report easily. It is very easy to use because it does not require
any coding skill.
Users can build and distribute an interactive and shareable dashboard, which shows the
trends, variations, and density of the data in the form of graphs and charts and tables.
Tableau can connect to files, relational and Big Data sources to acquire and preprocess data.
The software allows combining data from multiple sources and real-time collaboration,
which makes it unique. It is used by businesses, academic researchers, and many
government organizations for visual data analysis. It is also positioned in top Business
Intelligence and Analytics Platform in Gartner Magic Quadrant
Tableau offers numerous appealing and distinctive features that make it a top tool for data
visualization. You can quickly get the answers to crucial queries thanks to its robust data
finding and exploration application. Tableau’s drag and drop interface makes it simple to
explore various views, merge numerous databases, and visualize any type of data. It doesn’t
need any difficult scripting. Anyone who is familiar with the business issues can solve them
by visualizing the pertinent facts. Sharing results with others is as simple as publishing to
Tableau Server after analysis.
2. TABLEAU FEATURES
Tableau offers solutions for various departments, industries, and data settings. The following
special attributes allow tableau to handle a variety of scenarios.
Speed of Analysis Since it does not call for a high level of programming knowledge, any user
with access to data can begin utilizing it to extract value from the data.
• Tableau is self-sufficient in that it doesn’t require a convoluted software setup. Most
customers utilize the desktop version, which is simple to install and has all the
functionality required to begin and finish data analysis.
• The user investigates and evaluates the data using visual tools including colors, trend
lines, charts, and graphs. Almost everything can be done by drag and drop, thus there
is very little script to write.
• Integrate Diverse Data Sets Tableau enables you to instantly combine many relational,
semi-structured, and unstructured data sources without incurring high upfront
integration expenses.
• Tableau operates on all types of devices where data flows, regardless of architecture.
As a result, the user does not have to be concerned with specific hardware or software
requirements to utilize Tableau.
• Real-Time Collaboration – Tableau can embed a live dashboard in portals like
Salesforce or a SharePoint site and filter, sort, and debate data instantly. By simply
refreshing their web browser, colleagues can see the most recent data by subscribing
to your interactive dashboards and saving your perspective of the data.
• All the organization’s published data sources may be managed in one place thanks to
Tableau server. In one simple place, you can remove, modify permissions, add tags, and
manage schedules. Extract refreshes may be easily scheduled and managed on the data
server.
Infogram: With Infogram, even non-designers can produce powerful data visualizations for
marketing reports, infographics, social media posts, maps, dashboards, and more. Infogram
is a fully featured drag-and-drop visualization tool. The following file types can be used to
export finished visualizations: PNG,.JPG,.GIF,.PDF, and.HTML. Additionally, interactive
visualizations are feasible and ideal for integrating into websites and applications.
Additionally, Infogram provides a WordPress plugin that streamlines the process of
integrating visualizations for WordPress users.
Power BI: A business intelligence (BI) platform called Microsoft Power BI gives non-
technical business people the means to gather, analyze, visualize, and share data. With its
strong interaction with other Microsoft products, Power BI is a versatile self-service tool that
requires little initial training. Its user interface is intuitive for Excel users.
Chartblocks: According to ChartBlocks, data may be loaded using their API from
"everywhere," even live streams. Even though they claim that it only takes a few clicks to
import data from any source, it is undoubtedly more difficult to use than other programs that
have automatic modules or extensions for particular data sources. The final representation
produced by the software can be heavily customized, and the chart construction wizard
assists users in selecting the ideal data for their charts before importing the data. A major
benefit for data visualization designers who wish to embed charts into websites that are
likely to be viewed on several devices is that designers may construct almost any type of
chart, and the output is responsive.
Data wrapper: To include charts and maps in news reports, Data wrapper was developed.
The produced maps and charts can be embedded on news websites and are interactive.
However, they only have a few data sources, and the main approach is to copy and paste data
into the program. Charts can be produced after data has been imported with only one click.
They use a variety of visualizations, including choropleth and symbol maps, column, line, and
bar charts, election donuts, area charts, scatter plots, and locator maps. The final visuals
resemble those that may be found on websites like the New York Times or Boston Globe. In
fact, magazines like Mother Jones, Fortune, and The Times use their charts.
RAW: RAW, also referred to as RawGraphs, operates on delimited data such as TSV or CSV
files. It acts as a bridge between spreadsheets and data visualization. Despite being a web-
based application, RawGraphs offers strong data protection and offers a variety of
unconventional and traditional layouts.
SELF-ASSESSMENT QUESTIONS – 1
4. HISTORY OF TABLEAU
Stanford University students Pat Hanrahan, Christian Chabot, and Chris Stolte launched
Tableau in 2003. Making the database industry dynamic and comprehensive was the
fundamental motivation for its inception. Tableau debuted at a time when Cognos, Microsoft
Excel, and Business Objects were already well-known brands.
The main features that led Tableau Software to achieve success are:
• VizQL is the language that powers it, increasing the flexibility to pull data from any
source.
• Provide the user with the ability to alter Tableau reports using a variety of visualization
tools.
• The drag-and-drop method can be used to create any complex graphs or maps.
• Multiple platforms allow for the insertion of Tableau data visualizations.
• Real-time data analysis and visualization are both possible.
3. Start Trial
5. Registration Complete
SELF-ASSESSMENT QUESTIONS – 2
Tableau’s native connectors can connect to the following types of data sources:
• File Systems: Such as Microsoft Excel, CSV, etc.
• Cloud Systems: Such as Google big Query, Windows Azure, etc.
• Relational System: Such as Microsoft SQL Server, Oracle, DB2, etc.
• Other Sources: It uses ODBC.
Tableau Desktop
Data from several sources may be connected using Tableau Desktop to create dashboards,
stories, and workbooks. You can publish the workbooks on the Tableau website and share
all the insights with other users using the Tableau Desktop.
Without developing any code, a user of Tableau Desktop can do direct queries on the
datasets. You only need to enter in the visualizations, such as a chart, table, graph, or map,
and then write the columns you wish to include. Additionally, Tableau Desktop creates
dashboards that mix numerous views from various data sources.
Tableau Public
This Tableau version was created with budget-conscious consumers in mind. The phrase
"Public" indicates that the generated workbooks cannot be locally saved. They ought to be
stored on Tableau's public cloud, which anyone can access and observe.
The files stored in the cloud have no privacy, therefore anyone can view and download the
same information. For people who want to study Tableau and for those who wish to publish
their data with the world, this version is optimal.
Tableau Online
Although its functionality is comparable to that of the tableau server, data is kept on servers
hosted in the cloud that are managed by the Tableau group.
The data that is made available via Tableau Online can be stored indefinitely. Over 40 cloud-
hosted data sources, including Hive, MySQL, Spark SQL, Amazon Aurora, and many more, are
directly connected via Tableau Online.
The workbooks produced by Tableau Desktop must be published for Tableau Server and
Tableau Online to function. Google Analytics and Salesforce.com are two web programs that
Tableau Server and Tableau Online may access data from.
Tableau Server
The software is properly utilized to distribute workbooks and visualizations produced by
the Tableau Desktop application around the company. You must publish your worksheet in
Tableau Desktop before sharing dashboards on the Tableau Server. Only the authorized
users will have access to the worksheet once it has been uploaded to the server.
Authorized users don't necessarily need to have Tableau Server installed on their computers.
They merely need the login information in order to examine reports using a web browser.
Tableau Server's high level of security is advantageous for efficient and speedy data
exchange.
The organization's administrator has complete control over the server. Both the software
and the hardware are maintained by the organization.
Tableau Reader
We may view the visualizations and workbooks made using Tableau Desktop or Tableau
Public using the free utility Tableau Reader. Filtering the data is possible, but changes and
editing are limited. Tableau Reader has no security because anyone may use it to read
workbooks.
The recipient of the dashboards you build themselves must have Tableau Reader to read the
file.
SELF-ASSESSMENT QUESTIONS – 3
5. Tableau is _______________Software?
6. Tableau is used to find insight from data. True/False
7. Tableau integrates _______________data sources.
8. GLOSSARY
• Visualization: finding insight from data
• Import data: Load the data into tableau for processing
9. TERMINAL QUESTIONS
1. What are the five basic Features of any data visualization Software?
Tableau offers solutions for various departments, industries, and data settings. The following
special attributes allow Tableau to handle a variety of scenarios.
Speed of Analysis Since it does not call for a high level of programming knowledge, any user
with access to data can begin utilizing it to extract value from the data.
• Tableau operates on all types of devices where data flows, regardless of architecture.
As a result, the user does not have to be concerned with specific hardware or software
requirements to utilize Tableau.
• Real-Time Collaboration – Tableau can embed a live dashboard in portals like
Salesforce or a SharePoint site and filter, sort, and debate data instantly. By simply
refreshing their web browser, colleagues can see the most recent data by subscribing
to your interactive dashboards and saving your perspective of the data.
• All of the organization’s published data sources may be managed in one place thanks to
Tableau server. In one simple place, you can remove, modify permissions, add tags, and
manage schedules. Extract refreshes may be easily scheduled and managed on the data
server.
2. Explain Different types of files used in tableau.
After data processing, Tableau’s output can be saved in a variety of formats and then
delivered across several platforms. The several distinct extensions serve to distinguish the
various types of different file categories. Their length varies depending on how they are
produced and how they are used.
Tableau’s native connectors can connect to the following types of data sources:
• File Systems: Such as Microsoft Excel, CSV, etc.
• Cloud Systems: Such as Google big Query, Windows Azure, etc.
• Relational System: Such as Microsoft SQL Server, Oracle, DB2, etc.
• Other Sources: It uses ODBC.
4. Explain Tableau and Other Visualization tools.
Data visualization uses visual components like graphs, charts, and maps to graphically
portray quantitative information and data. Data visualization turns both huge and small data
sets into graphics that are simple for people to comprehend and process. Data outliers,
patterns, and trends can be easily understood with data visualization tools. The tools and
technology for data visualization are essential in the realm of big data because they allow for
the analysis of enormous amounts of data.
• Chartblocks: According to ChartBlocks, data may be loaded using their API from
"everywhere," even live streams. Even though they claim that it only takes a few clicks
to import data from any source, it is undoubtedly more difficult to use than other
programs that have automatic modules or extensions for particular data sources. The
final representation produced by the software can be heavily customized, and the chart
construction wizard assists users in selecting the ideal data for their charts before
importing the data. A major benefit for data visualization designers who wish to embed
charts into websites that are likely to be viewed on several devices is that designers
may construct almost any type of chart, and the output is responsive.
• Data wrapper: To include charts and maps in news reports, Data wrapper was
developed. The produced maps and charts can be embedded on news websites and are
interactive. However, they only have a few data sources, and the main approach is to
copy and paste data into the program. Charts can be produced after data has been
imported with only one click. They use a variety of visualizations, including choropleth
and symbol maps, column, line, and bar charts, election donuts, area charts, scatter
plots, and locator maps. The final visuals resemble those that may be found on websites
like the New York Times or Boston Globe. In fact, magazines like Mother Jones, Fortune,
and The Times use their charts.
• Plotly: An interactive, open-source, and browser-based Python graphing toolkit is
called plotly.py.Plotly.py is a high-level, declarative charting framework that is built on
top of plotly.js. Plotly.js comes with more than 30 different chart kinds, including
financial charts, scientific charts, 3D graphs, and more. Plotly is MIT licensed software.
Plotly graphs can be seen in standalone HTML files, Jupyter notebooks, or Dash
programmers’ advice, dashboard creation, app integration, and feature requests, get in
touch with us.
• RAW: RAW, also referred to as RawGraphs, operates on delimited data such as TSV or
CSV files. It acts as a bridge between spreadsheets and data visualization. Despite being
a web-based application, RawGraphs offers strong data protection and offers a variety
of unconventional and traditional layouts.
10. SUMMARY
Tableau offers numerous appealing and distinctive features that make it a top tool for data
visualization. You can quickly get the answers to significant queries because of its robust
data finding and exploration application. Any data may be visualized using Tableau's drag
and drop interface.
Explore various perspectives, and even seamlessly integrate numerous databases. It doesn't
need any difficult scripting. Anyone who is familiar with the business issues can solve them
by visualizing the pertinent facts. Sharing results with others is as simple as publishing to
Tableau Server after analysis.
Tableau Types
Tableau
Tableau For
visualization
11. ANSWERS
Self-Assessment Questions
1. False
2. 2003
3. Workbooks, Bookmarks, Packaged Workbooks
4. Chris Stolte
5. data visualization
6. True
7. Multiple
12. REFERENCE
• Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master
• Tableau Your Data! Fast and Easy Visual Analysis with Tableau Software
DADS304
VISUALIZATION
Unit 7
Editing, Building Views and Formatting
Table of Contents
1. INTRODUCTION
In Tableau Cloud or Tableau Server, you can make changes to a view if you can see the Edit
button when viewing it. You can: Depending on your permissions and access level, edit a
workbook that has already been published and include worksheets for views, dashboards,
and stories.
A new worksheet should be created and edited using a published data source.
In the web or by launching the workbook in Tableau Desktop, you can edit an existing
workbook and add worksheets. While editing, connect to several published data sources. See
Connect to Published Data Sources while Web Editing for more information.
Live Connection
The Connect Live feature is used to analyse data in real time. Tableau connects to a real-time
data source and continues to read the data in this case. As a result, the analysis result is up
to the second, and the most recent changes are reflected in the result. However, the source
system is burdened because it must continue to send data to Tableau.
In-Memory
Tableau can also process data in-memory by caching it in memory and not connecting to the
source while analysing it. Of course, the amount of data cached will be limited by the amount
of memory available.
Tableau data extraction creates a subset of data from the data source. This is useful for
improving performance by using filters. It also aids in applying Tableau features to data that
may not be available in the data source, such as finding distinct values in the data. The data
extract feature, on the other hand, is most commonly used to create an extract to be stored
on the local drive for offline access by Tableau.
4. Extract History: You can check the history of data extracts to see how many times the
extract has occurred and when.
Practical steps:
You have the option to alter the workbook's data source at any time while conducting your
study.
Here, you can extract data from a single table or many tables. You can also extract data based
on the number of rows.
There is an option to pick the number of rows from the dataset you want to extract. check
the below diagram.
You may now choose Extract then History to view the history of Extract.
SELF-ASSESSMENT QUESTIONS – 1
• Column Alias
Each column in the data source can be given an alias, which aids in understanding the
column's nature.
Instead of manually creating views by dragging and dropping files, you can utilize show me
to do so.
1) For better understanding, you might want to automatically build views.
2) To make time.
As you can see in the graphic, the rows and columns are not identifiable after changing the
data source.
You can now alter the reference according to the requirement by going to the specific
dimension or measure.
SELF-ASSESSMENT QUESTIONS – 2
Swapping Dimensions
By swapping the positions of the dimensions, you can create a new view from an existing
one. This has no effect on the values of the measures, but it does change their position.
Consider a view for analyzing Profit for each year for each segment and product category.
You can drag the vertical line at the end of the category column to the segment column by
clicking and dragging it. The following screenshot depicts this action.
Data Joining
Data joining is a common requirement in all types of data analysis. You may need to join data
from multiple sources or from different tables within the same source. Tableau includes the
ability to join tables by using the data pane found under Edit Data Source in the Data menu.
Data blending
Tableau's Data Blending feature is extremely powerful. It is used when you want to analyze
related data from multiple data sources in a single view. Consider the following scenario:
Sales data is stored in a relational database, and Sales Target data is stored in an Excel
spreadsheet. To compare actual sales to target sales, you can now blend the data using
common dimensions to access the Sales Target measure. Primary and secondary data
sources are the two sources used in data blending. A left join is formed between the primary
and secondary data sources, with all data rows from the primary and matching data rows
from the secondary data source.
In addition to this, you can make a folder for a certain kind of work. For instance, you can
make a folder for time-related information.
Self-Assessment Questions - 3
5. What Type of Join Is Used in Blending?
You can now drag several measures and dimensions (attributes) to the axis in columns or
rows.
When you select an attribute on an axis in Tableau, it will recommend the type of visuals that
would fit the data the best from the show me alternatives.
whether it is a table, pie chart, bar graph, etc.
Here you can select color from the default properties to change the color of the visuals.
You can select aggregation type min, max, sum, count for the visuals.
You can add labels to drag the specific attributes to the text marks for the visuals.
Tableau can also process data in-memory by caching it in memory and not connecting to the
source while analysing it. Of course, the amount of data cached will be limited by the amount
of memory available.
Combine Data Sources
Tableau can connect to multiple data sources at once. For example, you can define multiple
connections in a single workbook to connect to a flat file and a relational source. This is used
in data blending, which is a very unique Tableau feature.
• Column Alias
Each column in the data source can be given an alias, which aids in understanding the
column's nature.
8. Explain view in tableau and which type of views are supported in tableau
A custom data view is used to supplement standard data views with additional features,
allowing the view to provide different types of charts for the same underlying data. You can,
for example, drill down a dimension field that is part of a pre-defined hierarchy to obtain
additional values of the measures at a different granularity. The following are some of the
most commonly used and important custom data views provided by Tableau.
2. Swapping Dimensions
By swapping the positions of the dimensions, you can create a new view from an existing
one. This has no effect on the values of the measures, but it does change their position.
Consider a view for analyzing Profit for each year for each segment and product category.
You can drag the vertical line at the end of the category column to the segment column by
clicking and dragging it. The following screenshot depicts this action.
You can View the map for different regions here show me option to change the visual type.
By clicking on the label and selecting "Show Mark Label" you may alter the label's size and
color as desired.
By clicking on the color and selecting "font and shading" you may alter the font's type and
color as desired.
By clicking on the filter and selecting TOP you may extract top 3 or 5 data for visuals as per
condition.
5. CASE STUDIES
J.P. Morgan Chase & Co. is a leading multinational bank and financial services company based
in the USA. It is the largest investment bank in the US and sixth largest in the world.
The challenge
The amount of data produced by the company increased with increasing growth and
expansion of business of the firm because of successful mergers and acquisitions. This led to
the need for a robust self-service data governance and analytics solution. Initially, JPMC had
an IT-owned analytics set-up which they planned to change into a business-owned system12.
The IT department used standard tools like Excel and SQL Server for data analytics and
reporting. But these tools were eventually proved inefficient as they caused confusion and
obscurity in data governance due to data replication.
After this, the company switched to BI tools Cognos and Business Objects, but those tools too
were not able to meet the company’s requirement. What the company needed was a data
governance tool taking care of business aspects like data access, data analysis, IT governance,
and business priorities of the team.
Implementation
JPMC deployed Tableau as its core self-service data governance tool at an enterprise level.
JPMC with the help of COE (The Center of Excellence) team facilitated the process of Tableau
adoption to their users by recruiting a team of 8 trainers who trained 1200 new developers
and analysts to work on Tableau. The training program involved online as well as classroom
learning sessions. The initial user base was 400 Tableau Server users in 2011 which has
grown into a family of 30,000 users today.
Skilled Tableau business users work in teams into different departments and sectors of
JPMC. Currently, there are about 500 teams using Tableau for analytical and data governance
purposes.
The changes
Deploying Tableau for data governance in JP Morgan Chase was a successful step as it
brought many positive changes.
• The marketing operations team could analyze customer data to track customer journey
and preferences which helped them in deciding website design, promotional materials
and launching new products like Chase mobile app.
• Financial and branch managers used Tableau apps to analyze customer data in order
to provide better customer banking experience.
• Tableau has given self-service analytical capabilities to a whole lot of people taking care
of different operations such as traders, risk analysts, compliance team, operations
analysts, sales analysts, etc.
• JPMC was able to reduce manual reporting time. Earlier, the team took months to create
reports but with the help of Tableau, detailed reports were made in weeks. Tableau has
saved a lot of the company’s valuable time and has shifted the focus from report
generation to analyzing the reports, gaining meaningful insights into data and efficient
decision-making.
• Tableau has enabled JPMC to establish stronger customer relationships by integrating
customer’s data with the line of business aspects such as products, marketing, services
and creating common data sets. Thus, Tableau acts as a front-end tool to maintain
customer relations.
• The marketing teams at JPMC can analyze population data using Tableau to determine
optimal targets to launch new campaigns.
• To ensure customer satisfaction by analyzing customer activities and behavior using
call center metrics and website analytics.
• JPMC’s retail branches also use Tableau dashboards to gain a better understanding of
the market and improve their business.
• Tableau successfully created a bridge between IT and business by providing apps and
dashboards for risk analysis and compliance data usage. It also functions as per the
government regulations.
6. TERMINAL QUESTIONS
1. How to extract Data in Tableau?
Tableau creates a subset of data from the data source using data extraction. By implementing
filters, this is helpful in boosting performance. Additionally, it aids in adding Tableau features
to data that may not be present in the data source, such as identifying different values in the
data. The data extract tool is typically used, though, to produce an extract that will be saved
to the local drive for offline access by Tableau.
By selecting Data > Extract Data from the menu, data can be extracted. It offers a variety of
options, including imposing constraints on the number of rows to be extracted and choosing
whether to aggregate data for dimensions. The Extract Data option is displayed on the
following screen.
You can design filters that will only return the pertinent rows if you want to extract a subset
of data from the data source. Create an extract using the Sample Superstore data set as our
example. Select the list under the filter option, then check the box next to the value you want
to pull from the source when the data is pulled.
When Tableau connects to a data source, it displays all the tables and columns that could be
included in the source. To check the metadata, think about the source "Sample Coffee shop."
Select "Connect to a data source" from the Data menu. Search for the "Sample - Coffee shop"
MS Access file. Drag the Product table onto the data canvas. The following screen, which
displays the column names and their data types, appears after selecting the file.
6.
7. In addition to this, you can make a folder for a certain kind of work. For instance, you
can make a folder for time-related information.
8.
10.
7. ANSWERS
Self-Assessment Questions
1. Data blending
2. twbx
3. YES
4. No
5. Left join
8. SUMMARY
All the most well-liked data sources are accessible through Tableau. The data sources listed
below can be connected to using Tableau's native connectors. file types including Excel, CSV,
and others. databases that are relational, including Oracle, SQL Server, DB2, and others.
Platforms for cloud computing include Windows Azure, Google Big Query, and others.
9. GLOSSARY
• Visualization: finding insight from data
• Import data: Load the data into tableau for processing
• Replace Data: Change the data source
11. REFERENCE
• Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master
• Tableau Your Data! Fast and Easy Visual Analysis with Tableau Software
DADS304
VISUALIZATION
Unit 8
Mapping, Sorting & Filters
Table of Contents
SL Topic Fig No / SAQ / Page No
No Table / Activity
Graph
1 Introduction
4
1.1 Objectives
2 Latitude and Longitude 1 5
3 Steps to simple map Geographical data with 1, 2, 3, 4, 5 2 5-8
Tableau
4 Steps to format maps 6, 7, 8, 9 3 9-11
5 Method to use custom geocoding feature within 10, 11, 12, 13, 4
tableau 12-16
14
6 Data Blending 17
7 Sorting 15, 16, 17, 18, 5
19, 20, 21, 22, 17-24
23, 24
8 Sorting across multiple dimension 25, 26, 27, 28, 6
29, 30, 31, 32, 25-32
33, 34
9 Steps to create and use filters 35, 36, 37, 38,
33-37
39, 40
10 Quick Filters 41, 42 7 37-39
11 Important points whileusing Filters 43, 44, 45, 46,
39-43
47
12 Self Assessment Questions 8 43-44
13 Summary 44-45
14 Glossary 46
15 Caselet 46
16 References 47
17 Conceptual Map 48 48
1. INTRODUCTION
Using maps in Tableau is a very powerful feature that can quickly show the geographical data
along with the data insights in a single glance. For plotting anydata,coordinate points are
needed. The coordinates are chosen such that one of the numbers represents the vertical
position and the other number represents the horizontal position. A common choice of
coordinates for mapping is latitude, longitude and elevation. To specify a location on a two-
dimensionalmap, a map projection within Tableau is required. The map image provides the
background and the coordinates are plotted on top of it. This chapter gives us knowledge
about how to exclude certain values or a range of values for any particular field and also to
sort data.
SELF-ASSESSMENT QUESTIONS – 1
Step 2: There are choices for area code, CBSA, City, congressional District, country, county,
State, zip code and choose any of the roles that matches the content in the field. Plot the states
and double click on the state geo field.
Step 3: Tableau will plot this on a map by automatically placing the generated latitude and
longitude fields on the columns and row shelf respectively. Alternatively, place these fields
manually by dragging the latitude and longitude from the measures area to the column and
row shelf. On the bottom right there is a warning indicating 16 unknown locations. Double
click on 16 unknown locations.
Step 4: Tableau does not recognize these locations so click on edit locations. The issue is
because the default country is set to India, but our data set belongs to US. Change India to US
and click OK, after changing India to US all of the states are recognized and are plotted on the
map by default
Step 5: Tableau uses the symbol maps for geographical data and a circle mark is used to
indicate the location. Change this to any other shape by choosing a different mark type. The
color can be changed as well. Drag sales to the map. There is a gradient now with the sales
value plotted onto the map. Switch the mark type to filled map. Label these states for easy
analysis. Drag and drop state onto the label field. Florida year has the highest sales compared
to Tennessee year which has the lower sales.
SELF-ASSESSMENT QUESTIONS – 2
STEP 2: Change that to a dark based on how you would like to display the data on your
storyboard. The washout adds a transparency and control the transparency by moving the
slider. Select whether to repeat the background. When the repeat background option is
selected the background map may show the same area. The map layers allow you to mark
points of interest. Choose to include the coastline or state borders.
STEP 3: Another useful feature these layers provide is the data layers. Tableau comes with
a set of predefined data layers that shows the census information Choose the per capita
income data layer and select by state and pick a color scheme. Data layer is added to the map
and there is a legend that explains the colors
STEP 4: Change the filled map back to symbols and choose circles from the marks and pick
size from the shelf. Symbol circle represents the sales and per capita income has been color
coded in the background. Change the color of the symbol for readability. Click on sales, edit
colors and change it to sea orange and click apply. This is much easier to read and interpret
the data
SELF-ASSESSMENT QUESTIONS – 3
STEP 2: Using the list of store addresses, find the appropriate latitude and longitude for this
address. There are multiple websites that can help you with this task of finding the latitude
and longitude, such as latlong.net etc.CSV file with the stores and the latitude and the
longitude is obtained. Note that while creating the CSV file the file must be saved as a dot CSV
file. The new role that is the geo code should be the column header.
STEP 3: The latitude and longitude must be spelled correctly and also do make a note that
they are case sensitive. The other thing to make sure is to include at least one decimal place
when specifying the values for latitude and longitude. If there is already an existing
geographical hierarchy. Wooden Tableau makes sure that your import file contains the
columns for each level in the existing hierarchy. There is an existing hierarchy with country,
state, city. While creating the CSV file, make sure that the CSV file contains the existing
hierarchy and then the new geographical role that is being added. Here is a list of built-in
hierarchies and order in which they should be organized in your import file. To add a new
hierarchy in Tableau, create multiple import files, each file representing a level in the new
hierarchy. For example, let's say you have revenue by train stations. The sales will need to
be organized by station, country, region and city.
STEP 4: To create a new hierarchy from these geographical roles, you will need to create
multiple import files, each representing a level in the new hierarchy. Use this for adding a
new role to an existing hierarchy. Import the CSV file into Tableau. To import this into
tableau, click on map, choose Geocoding and choose Import custom Geocoding. Choose the
folder that contains the CSV file for your new geographical role. In this case, our CSV file is
located within the CSV subfolder within the Datasets main folder.
STEP 5: Note that while importing all the files within the CSV folder gets imported. So, make
sure that only the relevant files meant for the workbook in that particular folder. Click on
Import and the Custom geocoding data is imported into the workbook. This will take a couple
of minutes and once the process is complete, new geographical role is being available. Assign
the newly created geographical role to the field. To assign the newly created geographic role,
right click on the field and select the role you want to assign. The new role that you've created
is Store Address. Map this data using Tableau. Double click on store address.
STEP 6: The data automatically has been plotted using the latitude and longitude that has
been input in the CSV file. Add sales and the size of the circle represents the sales for the
exact store locations. Doing this kind of mapping is particularly useful for analysis to either
open or close existing store locations based on the sales demand or demographics around
the area.
SELF-ASSESSMENT QUESTIONS – 4
6. DATA BLENDING
Another option in Tableau to map location data that cannot be automatically geocoded in
Tableau is to use the data blending option. Data blending works great if you're adding a single
level of geographical information with a latitude and a longitude, you can use any data
source, unlike Custom geocoding where you can only use text files. However, data planting
will not allow you to add these as new roles or create new geographical hierarchies, nor will
they let you reuse the same for other workshops. These possibilities are only made possible
using the Custom Geocoding option.
7. SORTING
In Tableau, either start analyzing the data right away or start exploring by asking questions
of your data and finding answers to them. There are multiple ways available to sort data.
STEP 1: The easiest way is to click on the Quick Sort option on the Access. Hovering over the
access labels brings up the Sort icon. A single click sorts the bar in descending order and a
second click switches it to ascending order. To clear the sort, click again and the original state
is restored. There is also an option on the Sort menu to clear sorts.
The Profit is not on the axis so there is no Quick Sort option. Right click on the pill directly to
sort this measure. Choose a manual sort such that 70,000 appears first.
STEP 2: Use the same approach as the Quick Sort and directly use the Sort option next to the
dimension. The first sort sorts it by descending, the second sort by ascending, and the third
sort clears the sort. Note that continuous pills can be sorted using the Quick Sort option, but
do not have a Sort option in the pill dropdown.
STEP 3: Discrete pills do have an option in the pill dropdown. With the full sorting options
available quick Sort is over limited in terms of the flexibility it offers to sort exactly. Click on
the pill to get the drop-down menu and click on Sort here. Sort either by alphabetic or manual
or also sort by a particular field.
STEP 4: Choose Profit here for the category Sort. The items in the view will be sorted by the
profit values, even though Profit is not present in the view. Change the aggregation type here
and specify the sort order. Click on Apply. The category has been sorted by the profit values
accordingly. Drag the category up or down by dragging the headers in the bar chart.
STEP 5: The items are in a legend. Color the bars by category and drag the headers in the
legend directly to order them. Move the dining all the way to the start and handbags to the
second. The ordering has happened.
STEP 2: Note that the sorting changes once the order of these pills’ changes. Click on Sales
to sort. This sorts of sales in a descending order. Note that for the men's category, the sales
for Accessories are less than active wear. But it appears above the active wear in the view,
although it had been sorted this in a descending manner. This is because Tableau is taking
into account sales across all department for Accessories. As the sales for Accessories is
greater than sales for active wear across all departments, it sorts the bars this way. This is
where a combined field might help.
STEP 3: Choose department and category and right click and choose Create combined Field.
A combined field for Department and Category has been combined into a single field. Drag
and drop this field to the view along with the sales. Sort the sales. The bars are sorted
accordingly
STEP 4: Sort these pills in different other way and pick the department. Sort department by
sales and sort category by profit. The sorting has happened based on the profit for category
and based on sales for Department
SELF-ASSESSMENT QUESTIONS – 5
8. The complexity of the sorting algorithm measures the ________as a function of the
number n of items to be sorter.
9. The complexity of bubble sort algorithm is__________
STEP 1: Filters help you to exclude certain values or a range of values for any particular field
within your view. The easiest way to add a filter is to draw, drag and drop the dimensions
onto the filter shelf. Drag and drop department, select men and women and click Apply. The
view has been filtered for men and women.
STEP 2: Take a deeper look at the various options presented while filtering either discrete
or continuous dimensions. Drag category to the filter shelf. Dialog box opens up with various
tabs. The first tab lists the values that are being filtering on. Select all or pick the values that
are interested in filtering. There isn't a search option, especially when the list of values is too
long and scroll all the way down. Filter for handbags luggage and shoes and click Apply.
STEP 3: The second tab here is the wildcard. The Wildcard feature is very useful, especially
when you want to filter out certain items or include certain items that match a certain
pattern. Find out all corporate contact emails or eliminate personal emails such as Gmail or
Hotmail, etc. from your list. This can be done using the Wildcard option. Enter a Wildcard
pattern that matches the category Shoes and click Apply. The view has gone ahead and
filtered for just shoes because that's the string that matches the wild card pattern that had
just been entered. There is also an option to exclude
STEP 4: Choose to exclude and pick all those patterns that match this and instead of
returning those, it would restrict them from appearing in the view. Click Apply, it's filtered
for luggage, handbags, and restricted shoes from appearing in the view.
STEP 5: Filter for items that satisfy a particular criterion. List all categories that have a sale
greater than 1.5 million. Pick sales and let's enter sales greater than 1.5 million as our
criteria. Click Apply. The view has filtered out and brought back those categories that have a
sale of greater than one 5 million. In this case, handbags have a sales amount of less than one
5 million and hence is no longer appearing in the view.
STEP 6: The Load button here helps to bring in the range of values that the field contains,
and a formula option here helps you to use calculations. The next tab option is top. So, Filter
also allows us to filter based on a rank for either top, end or bottom end of a field. Click apply
STEP 7: The view as filtered for the top five states by the Sales amount. Look at filtering for
measures or continuous dimensions. Drag a measure in this view and drag Sales. Measure as
a filter. Drag a measure specifically choose the aggregation type. Choose Sum and click on
Next. Specify the range, indicate a lower cut off and upper cut off. The default limits are in
the database, in this case, the relevant values for the particular field value. In this case I want
to restrict this to 5 million and click Apply
STEP 8: Going back to the options, there is at least and at most, which helps us specify a
lower or just an upper limit. There is a special option that lets us filter on nulls or non-null
values. While dragging a date field to the filter shelf, a menu pops up asking us on how to
filter the date. Either pick related date or a range of dates, or also pick the years, quarters, or
months of the particular date, or even just specific dates.
STEP 9: This would work just like how filtering on dimensions work. Focus on the relative
date and the range of dates. Choose Relative Date and click on Next as the name surges.
Relative Date will let you pick either the last three years or set ranges such as month to date
and by default you notice that the anchor is dynamic and is set to the current date. Change
this any time to a static date of your choice.
STEP 10: Pick for the last one year of data and click apply. The data is refreshed to bring back
data just for the last one year prior to 2015.The range of dates work as the same way as the
ranges work for measures. Specify start and end date range
SELF-ASSESSMENT QUESTIONS – 6
10. ________________ in business intelligence allows huge data and reports to be read in a
single graphical interface.
11. ______________function is used to create a horizontal bar chart.
9. QUICKFILTERS
STEP 1: Quick filters are a great way to add interactivity to your views. Dragging a field to
the filter shelf is the easiest way to filter, right click on the Dimension, or in this case the
Department field and click on Show Filter. Quick filter has been added to the right side of the
view.
STEP 2: To edit the filter, read the appearance or how it works. Click on the arrow from
within the Quick filter. A menu pops up. There is an option to apply this filter to either the
current worksheet or all worksheets. Format the filter to modify the font size, colors and
mode.
STEP 3: The customize option can be used to select all or hide or display the Search button,
or even include an Apply filter option. Click on Show Apply. Apply button now appears on
the filter. There is an option to hide or show the title or even edit it. Click on edit the filter.
STEP 4: Edit the title for this filter as Select Department. The title has been modified to select
Department. There is an option to customize the display whether it is a single value list or a
drop down, or a slider or a multiple value list. And there is this option to also use a wild card
display filter. Right at the end, there is an option to either display relevant values or all values
from the database.
STEP 5: Cascading filters are actually a set of filters in which the contents of a Quick filter is
affected by the selection in a previous filter. Choose City within a particular state. Select a
state option and then click on the city drop down. A long list appears, making it difficult for
the user to search and select. Only the relevant city must show up. Choose only relevant
values from the drop down.
Choose only relevant values for the Department filter drop down and for the category.
STEP 6: Filter Department for chess men and women. Categories for the home department
no longer appear in the category as only relevant values are chosen. These types of Cascading
filters are a great way to tidy up long list of values, to make it highly intuitive for the end user,
and also to make the views really interactive. One thing to note, however, is that performance
might be an issue given that the queries have to go back to the database to pull the relevant
data or in this case, the categories. While using these Quick filters in a Dashboard on a story,
the placement purely depends on the space and real estate available, and the usage should
be in line with the actual purpose of the Dashboard.
STEP 2: The Keep Onely will filter those for those marks. The same technique can be used by
clicking on the headers to filter just for that particular header value, as easy as dragging the
pill off the shelf
SELF-ASSESSMENT QUESTIONS – 7
STEP 2: Tableau now filters all of those data or marks that appear in the view at the city level
that do not meet this criterion. Many more marks in the view are seen although the filter
value is the same. This is because here an aggregated filter is used and hence Tableau filters
out at the lowest granularity or the city level year. Filter out at the data source level. To create
a data source filter, right click on the data source and edit this data source filter.
STEP 3: Something to note here is that you would not see these filters on the filter shelves
as they are applied to the data source and hence would filter out data in all sheets and is not
restricted to any particular sheet. There is one more type of filter called context filter.
Ingeneral, the filters and the filter shelves in Tableau is completely independent of each
other. If you want to apply filters in a particular order, Filter for all items that have a sales
value greater than 50000 for the state of California. First apply the condition filter for sales.
Drag product ID to the filter shelf and filter for sales greater than 50000
STEP 4: Drag state filter and filter for California. The sales lesser than 50000 also show up
although it had been filtered it out in the previous step. This is because the conditional filter
is applied before the stage filter and what’s appearing is the sum of sales for all those items
that have a sale of 50000 totally. This is where a context filter can help. It will logically apply
itself before any other filters that are present. Context filter forces each query to also utilize
a sub query the sales of the items in context. In this case apply a context filter for the
California state.
STEP 5: The second condition is greater than 50000 Therefore you see only items that have
a sale greater than 50000 for California appearing in the view. The context filters are gray in
color and any other subsequent filters now applied on this filter. Although context filters are
operated before any other dimension or measure filter it’s also important to note that if an
extract filter or data source filter is used then they are executed first before using context
filters
SELF-ASSESSMENT QUESTIONS – 8
11. barh( y )
12. box and whisker plot
13. 2 dimensional data structure
14. Calculated Fields
Sets: The sets are used to compute a condition on which the dataset will be prepared. Data
will be grouped together based on a condition. Fields which is responsible for grouping are
known assets. For example – students having grades of more than 70%.
13. SUMMARY
Let us recapitulate the important concepts discussed in this unit:
• To analyze your data geographically, plot your data on a map in Tableau.
• Provides explanations for when and why you should use a map to visualize your data.
• Based on a measure used in the view, items can be sorted in a table.
• In a table, sorting can reveal relationships between dimensions by controlling the order
in which they appear.
• By using Tableau filters, you can minimize the size of the data, clean up underlying data,
remove irrelevant dimension members, and set measures or date ranges.
14. GLOSSARY
Let us have an overview of the important terms mentioned in the unit:
Latitude: It is the angular distance in degrees, minutes and seconds of a point north or south
of the equator.
Data Blending: Tableau to map location data that cannot be automatically geocoded in
Tableau Filtering: Filtering involves deciding what should be kept and excluded from a view,
from filtering by category, date range, location, or a minimum value.
15. CASELET
1. Consider the dashboard that shows order quantity, average sales, and average profit
for customers. There are three views in it. In each view, a different data source is used
as the primary data source, but they all share one field: Customer Name. Filter the view
by Customer Name.This is an interesting dashboard with a lot of great information, but
you might want to update all of the views in the dashboard at the same time by the
customer you’re analyzing. For example, maybe you want to see the average sales,
profit, and number of orders you’ve received from one of your customers, Aaron
Riggs.To do so, we can filter all three data sources on the Customer Name field
16. REFERENCES
References and Suggested Reading
1. Joshua N. Milligan, Learning Tableau 2022: Create effective data visualizations, build
interactive visual analytics, and improve your data storytelling capabilities, 5th
Edition 5th ed. Edition
2. Visual Analytics with Tableau, Paperback, 31 May 2019.
Fig 8.48
DADS304
VISUALIZATION
Unit 9
Other Features
Table of Contents
SL Topic Fig No / SAQ / Page No
No Table / Activity
Graph
1 Introduction
3
1.1 Learning Objectives
2 Aggregating dimensionalities 1, 2, 3, 4 1 4-7
3 Steps to calculate aggregating dimensionalities 5 2 7-9
4 INCLUDE Level of detailed expressions 6, 7, 8, 9 3 9-13
5 Steps to calculate level of detail expression 10, 11, 12 4 13-15
6 Exclude level of detail expression 13 5 15-17
7 Nested lod 14, 15 6 17-18
8 Summary 19-22
9 Glossary 23
10 Caselet 23-24
11 Conceptual Map 25
1. INTRODUCTION
Tableau Groups are sets of multiple members combined into a single dimension for the
purpose of creating a higher level dimension. Grouping single-dimensional members in
Tableau automatically creates a new dimension with the group name at the end. The original
dimension of the members is not altered by Tableau. Group is used to combine members
present in a field.
2. GROUPS
A group is used to combine members in a field. Using group, you can aggregate the values of
'Furniture' and 'Office Supplies'. Using Tableau, aggregated values of 'Furniture' and 'Office
Supplies' can be shown in visuals after grouping the data. Following is a procedure for
grouping data in Tableau.
Step 2: It opens the ‘Create group’ window. Type the name of the group data in Tableau.
Select the members to be grouped. Click on ‘Group ‘button.
Step 3: In Edit Group Window, It creates groups in Tableau of ‘Furniture’ and ‘Office
supplies’. Click on Ok to create the group.
Step 4: A group in Tableau with the name of Category (Group) and added in the dimension
list is created. This can be used for visualizing the group by in Tableau method for members
present in a field.
The following image explains the functionality of Tableau create group. The sum of sales is
visualized for both furniture and office supplies for grouping in Tableau.
SELF-ASSESSMENT QUESTIONS – 1
Answers
1. Table calculation
2. Binomial trend line
Step 1: In the Data pane, right-click a field in which you want to create a parameter and then
select Create -> Parameter.
Step 2: Give the field a name and provide an optional comment so as to explain your
parameter
Step 4: Provide a current value, which will also be the default value of the parameter
Step 6: Based on the option selected for ‘Allowable Values’, you must provide the values to
the parameter defined
Step 7: When all the above steps are completed, click OK to complete the process. The newly
created parameter will be listed on the Parameters section (i.e., the bottom of the Data pane).
Using parameters in calculations is as simple as dragging them from the Data pane or
dropping them on the Calculation editor (either replacing a particular part of the formula or
at a new location). Click on the first parameter and click on Duplicate to create a new one
with the same configuration as the first. By renaming it, it is possible to create two
parameters with the same configuration. There are two parameters named Placeholder 1
Selector and Placeholder 2 Selector. On Analysis, select Create Calculated Field to open up
the calculation editor. Click Ok to close the calculation editor. Now create a duplicate
calculated field with the same configuration.
After the previous stages, it is simpler to set up the view aswe can simply drag and drop these
computed fields onto the area of the tool.Drag placeholders two to the columns and one to
the rows.Drag the Customer Name field to Detail and the Region field to Color.Click on each
parameter in the Data Pane's parameters section, then select "Show Parameter Control." The
other parameter should be treated similarly. The Tableau Desktop view is now ready . It is
possible to select the data that will be displayed on your X and Y axes, respectively, using the
parameter controls. The data can be altered by creating permutations and combinations
based on the parameters provided.
SELF-ASSESSMENT QUESTIONS – 2
Answers
3. Green colour
4. 5.03%
4. SETS
There are two types of sets: dynamic sets and fixed sets. The members of a dynamic set
change when the underlying data changes. Dynamic sets can only be based on a single
dimension.
Step 2: Set up your set in the Create Set dialogue box. The following tabs can be used to
configure your set:
Step 3: General: To choose one or more values that will be taken into account while
calculating the set, use the General tab.
Step 4: Alternately, the Use all option to always take into account all members, regardless of
whether new members are added or withdrawn can be used.
Step 5: Condition: To specify guidelines for choosing which members to include in the
collection, use the Condition tab. For instance, establish a condition based on total sales that
only takes into account goods with sales greater than $100,000.
Step 6: Top: Use the Top tab to define limits on what members to include in the set. For
example, specify a limit that is based on total sales that only includes the top 5 products
based on their sales. When finished, click OK.
Step 2: The members listed in the dialogue box are included in the set by default. Instead,
choose to exclude certain members. The set will contain every member that are not selected
when to exclude. By clicking the red "x" button that shows when you hover over a column
heading, it is possible to eliminate any measurements that are not required to be taken into
account. Click the red "x" button that appears when you hover over any individual rows that
are not required to be part of the collection.
SELF-ASSESSMENT QUESTIONS – 3
Answers
5. Microsoft
6. Sales Force
5. TRENDS
Trend lines are used to forecast whether a certain trend in a variable will continue. By
observing the trend in both variables at once, it is also possible to determine the correlation
between them. There are numerous mathematical techniques for drawing trend lines.
Tableau offers four choices. Linear, Logarithmic, Exponential, and Polynomial are the types.
To construct a Trend Line, Tableau needs a time dimension and a measure field.
Step 1: Drag the measure Sales to the Rows shelf and the dimension Order date to the
Column shelf. Select a line chart as the chart type. Navigate to model Trend Line under the
Analysis menu. The option to add several types of trend lines appears when the button for
Trend Line is clicked. Select the linear model as displayed in the screenshot below.
Step 2: A different trend lines after completing the aforementioned procedure is obtained.
The P-Value and R-Squared values are also displayed, along with the mathematical equation
for the correlation between the fields.
Step 3: Right-click on the chart and select the option Describe Trend Line to get a detailed
description of the Trend Line chart. It shows the coefficients, intercept value, and the
equation. These details can also be copied to the clipboard and used in further analysis.
Step 2: From Dimensions, drag Genre to Rows to the right of the Year pill.
Step 4: Hover over the Worldwide Gross Amount axis, and click on the sort icon once to sort
the bars in descending order.
Step 6: Under Custom, drag Reference Line from the Analytics tab and drop it onto the Table
placeholder.
Step 9: Under Custom, drag Reference Line again from the Analytics tab and this time drop
it onto Pane.
SELF-ASSESSMENT QUESTIONS – 4
ANSWERS
7. Time
8. twbx
SUM([Profit])/SUM([Sales])
Formulas use a combination of functions, fields, and operators. When finished, click OK.
The new calculated field is added to the Data pane. If the new field computes quantitative
data, it is added to Measures. If it computes qualitative data, it is added to Dimensions.
You are now ready to use the calculated field in the view.
SELF-ASSESSMENT QUESTIONS – 5
9. The icon associated with the field that has been grouped is a ________.
10. Tableau was introduced in the year of________
ANSWERS
9. paper clip
10. 2003
8. TABLE CALCULATIONS
Quick table calculations allow you to quickly apply a common table calculation to your
visualization using the most typical settings for that calculation type. This article
demonstrates how to apply a quick table calculation to a visualization using an example.
The following quick table calculations are available in Tableau for you to use:
• Running total
• Difference
• Percent difference
• Percent of total
• Rank
• Percentile
• Moving average
• YTD total
• Compound growth rate
• Year of year growth
• YTD growth
Step 1: Open Tableau Desktop and connect to the Sample-Superstore data source, which
comes with Tableau.
Step 3: From the Data pane, under Dimensions, drag Order Date to the Columns shelf.
Step 4:From the Data pane, under Dimensions, drag State to the Rows shelf.
Step 5: From the Data pane, under Measures, drag Sales to Text on the Marks Card.
Step 6: From the Data pane, under Measures, drag Profit to Color on the Marks Card.
Step 7:On the Marks card, click the Mark Type drop-down and select Square.
SELF-ASSESSMENT QUESTIONS – 6
ANSWERS
11. True
12. Calculated Fields
TERMINAL QUESTIONS
1. What is a Calculated Field, and How Will You Create One?
A calculated field is used to create new (modified) fields from existing data in the data source.
It can be used to create more robust visualizations and doesn’t affect the original dataset.
The data set considered here has information regarding order date and ship date for four
different regions. To create a calculated field:
Top Sales and profit can be clubbed together for different categories by creating a set:
1. Continuing with the above example of Sets, select the Bottom Customers set where
customer names are arranged based on profit.
2. Go to the ‘Groups’ tab and select the top five entries from the list.
3. Right-click and select create a group option.
4. Similarly, select the bottom five entries and create their group. Hide all the other
entries.
3. What is a Parameter in Tableau? Give an Example.
A parameter is a dynamic value that a customer could select, and you can use it to replace
constant values in calculations, filters, and reference lines.
9. GLOSSARY
Let us have an overview of the important terms mentioned in the unit:
Group: A group is used to combine members in a field.
SETS: Sets are custom fields that define a subset of data based on some conditions.
10. CASELET
Lenovo designs develop, manufactures and market its product like-PC, laptop, tablet, mobile
phones, servers, etc. Today, in 160 countries Lenovo expanded its empire.
The challenge
Creating reports in Excel was tiresome and required a team of 8 to 10 people for adoption to
other divisions and regions. In addition to this, the analytics team spent six to seven hours
creating one weekly report so, imagine what an overwhelming task it was to create 30
reports or more.
With Tableau, time spent on creating reports is much lesser than creating reports manually.
Teams were able to deliver reports much faster, sometimes even reporting on daily or hourly
basis.
The time saved is used in carrying out analysis and drawing insights from the information.
Implementation
Initially, Lenovo deployed an eight-core instance of Tableau Server which they quickly scaled
up to a 16-core server. The executives introduced Tableau in Lenovo India as a means to
analyze and govern data on a selected set of business use cases and scenarios to help in
decision making. But sooner than they realized, Tableau became an integral part of the
company’s functioning. The company experienced a cultural shift where the approach to
business and growth was more data-centric and data-driven.
The change
• Lenovo India’s BI Analytics & Visualization team created an interactive and flexible
Tableau sales dashboard for departments to use it for ad-hoc analysis and reporting.
• Lenovo has over 55,000 employees and a customer base spread across 160 plus
countries. More than 10,000 users access Tableau dashboards.
• Tableau has increased Lenovo’s efficiency by 95% with approximately 3000 users are
using it in about 28 countries by now.
• Lenovo’s e-commerce team was able to analyze customer engagement patterns to
improve brand perception and increase revenues.
• The human resource department converted 100 static reports into dynamic and
interactive Tableau dashboards which gave users and analysts a new perspective into
solving matters.
• The team is easily able to connect to data sources like Amazon Web Services and
Hortonworks Hadoop Hive. Along with this, a wholesome analysis is possible through
a dashboard where data is integrated from more than 30 data sources such as social
media, customer surveys, retailer websites, online shopping sites, etc.
• Lenovo supports self-service analytics where every user can conduct an individual
analysis on the set of data concerning their domain of activity and suiting their site
roles. Every Tableau user has identification credentials stored in the local identity store
of Tableau using which they can access and work on Tableau dashboards using single
sign-on process.
• Lenovo experienced lucrative growth in e-commerce by using Tableau to analyze
customer experience by fetching data from Lenovo’s unified customer intelligence
platform; LUCI Sky.
DADS304
VISUALIZATION
Unit 10
Level of Detail (LOD)
Table of Contents
SL Topic Fig No / SAQ / Page No
No Table / Activity
Graph
1 Introduction
3
1.1 Learning Objectives
2 Aggregating dimensionalities 4
3 Steps to calculate aggregating dimensionalities 1, 2, 3, 4 1 5-8
4 INCLUDE Level of detailed expressions 9
5 Steps to calculate level of detail expression 5, 6, 7 2 9-12
6 Exclude level of detail expression 8, 9, 10, 11, 3
13-16
12
7 Nested lod 13, 14, 15 4 17-21
8 Summary 22
9 Glossary 22
10 Caselet 22
11 Conceptual Map 16 23
1. INTRODUCTION
INCLUDE level of detail expressions compute values using specified dimensions as well as
any dimensions in the view. You can use INCLUDE level of detail expressions when you want
to calculate at a fine level of detail in the database and then re-aggregate and show at a
coarser level in your view. Adding or removing dimensions from the view will change fields
based on INCLUDE level of detail expressions. Include Level of Detail expressions compute
values using the specified dimensions in the formula in addition to whatever dimensions are
already present in the view. These expressions are useful when we want to calculate at a
finer or a lower level of detail in the database and then reaggregate and show at a higher
level in the view.
2. AGGREGATING DIMENSIONALITIES
This section describes how to aggregate dimensionalities other than the view level.Table
calculations can be used to roll data up to a higher level of fabrication, but this approach is a
little long winded and also slows down the performance.Additionally, table calculations are
limited to the values in the table or view.Level of detail can be helpful in this situation.Lod,
or level of detail, is a recent addition to tableau's capabilities that can also calculate values
for dimensions that do not appear in the view or table using simple formulas.
A table calculation is generated exclusively from the result of a query, whereas an Lod is
generated as part of the query sent to the database.Therefore, how a Lod works in tableau
results depends on the context.It is determined by the filters and the level of details in the
view, such as the dimensions on the rows, columns, color, size, detail, etc.An example would
be a state and a sum of sales. If a state is dropped onto a view, the sum of sales will give the
sum of all transactions. Go ahead and add department to this view. The sum of sales will give
the sales for each state by the department. So the more dimensions in our view, the results
would be more granular and less aggregated. So depending on the dimensions present, the
level of detail would vary in the visualization or our view. However, the dimensions placed
in the filter or the pages shelf do not vary the level of detail in our view, but only modifies
the data. An example to understand how a load can help has been illustrated below.
Fig 10.1 total sales value for all states against each of the state values
Step 1: A Hellod expression to sum up the sales irrespective of the level of detail in the view,
which in this case is state, and replicating it for every state in the view. So therefore we have
the overall sales appearing against each of the state sales value. Now let's look at a different
scenario where we want to see what the maximum yearly sales for the state is. In this case,
the year is not present in the view, so we cannot use a table calculation for this purpose, but
we can still use the lod. A new LOD has been defined for this case.
Step 2: Calculate the maximum yearly sales by state. Go ahead and drag this to the table.
Step 3: The maximum yearly sales showing per state row is demonstrated. Although a year
value is not provided in this table or the view in this case, the level of detail is much finer in
the calculation. Therefore, Tableau aggregates the results as needed and displays the
maximum value of the sales, in this case as a single value for each of the state. We have
created these level of detail expressions and also the options that are available within tableau
to create these expressions. We have three options called fix it, include and exclude.
Step 4: All of these three options are used to alter the scope of the expression. The syntax
structure for these lodi expressions should be understood for better perspective. First we
have the scope which could be fixed, Include or Exclude. Then the dimension, followed by
the aggregate expression for the field. The dimension is lodi will actually act upon and the
aggregate expression is actually the calculation that is required, such as the min of sales or
sum of sales, et cetera.
Step 5: The requirement is all the datas need to be enclosed within these braces or rather
the curly brackets. The multiple dimensions here separated by commas for aggregating at
multiple levels of detail. Fix It computes the value using the specified dimension without
necessarily having to reference to any other dimensions in the view.
Imagine a scenario where all our customers and when they were acquired and the sales
amount and you would like to see if there is a correlation or relationship between the time
when the customer is acquired and their contribution to the sales. For this purpose we can
use the fixed scope to define the lod expression. Right click and create calculated field and
call this acquisition date.
Step 1:A level of detail expression that calculates the minimum order date for a particular
customer ID is available. Click on OK. Now build the visualization. Drag and drop the order
year to the column shelf.
Step 2: Pick the sales and drag and drop to the row shelf. Next, drag the customer acquisition
date to the color shelf and change this to Bar.
Step 3: It is inferred from this type of cohort analysis which customer groups or cohorts have
made larger contribution to the sales. Most of our purchases are repeat purchases by
customers that had been acquired in 2014. A few more customers in 2015 have been
acquired who have again contributed to purchases in 2016 but however, the number of
customers acquired in 2015 is not as high as 2014, similarly for 2016. On inferring the
include scope for the level of detail expressions. It is obvious that most of the purchases are
made by customers has been acquired in 2014.
Step 4: The sales contribution made by customers in 2015 is not as high as those that are
acquired in 2014, and similarly for 2016, the return purchases again are higher by the 2014
customers. This kind of analysis is especially useful when we are trying to analyze if there is
a correlation between when we acquired a customer and the purchase patterns, et cetera.
Now that we have understood how each of these level of detail, scope, expressions work and
when these can be used, it's also important to understand how Tableau executes them, or
rather the order of execution, so that we know what to use when the fixed level of detail
expression filters are applied after the context filters and before the dimension filters,
whereas the include and exclude level of detail are applied after the dimension filter and
before the measure filter. So, in case of using the fixed level of detail expressions, we should
remember to use any dimension filter as context filter if you do not want them to be ignored,
but say if we do not prefer to use these context filters, then we will need to rewrite your
expressions using the exclude or the include keyword.
SELF-ASSESSMENT QUESTIONS – 1
Fig 10.6 Sales per state of the year 2014 as Excel data
In Figure 10.5.1 it is obvious that the Include level of detail keyword creates an expression
that is less aggregated and more granular by adding the dimension specified in the
expression to the visualization level of detail. The requirement is to display the average sales
order amount per state or region. In Figure 10.5.1, we have a map that shows the sales per
state for the year 2014. It is inferred that the amount shows up as 7699 for connect ticket.
Let's verify if this data is correct using the same data in an Excel sheet here. The steps
required to create and use an LOD expression in Tableau has been listed below.
Step 1: In Figure 10.2, it is inferred that the total sales amount per order ID for Connectcut
taking the average year for Connecticut is 16,000 525.The calculation is computed at the
dimensionality defined by the dimensions present. So it's actually the average sales for all
the line items or rows belonging to the state level, which is the dimension.
Step 2: The granularity is the product ID and not the order level. The average value for the
data provided in Figure 10.2 is 7699. The average is computed at the product ID level for the
state. But the requirement is the order amount aggregated up to the order level for the state
and then the average computed at these order IDs belonging to the state level. A level of
detail that includes expression can be helpful in this situation.
Step 3: The next step is to define the detail level that includes expression here.Select Create
Calculated Field from the right click menu and call it Average Sales Order.
Step 4: The sum of sales per order ID is computed here.And then finally, the average to
average out these values are computed using the level of detail expression. Finally click OK.
Step 5: Reverting to the view, add the average sales order, the recently calculated degree of
detail expression, and obtain the original sales figure from the map.
Fig 10.7 Average Sales per state for the year 2014
Step 7: On examining the Connect Ticket average sales order amount ,it is inferred that the
displayed value is 16,525 as expected.
Now determining LOD using build two views has been demonstrated. Both the views with
an average calculated a different level of detail. The second view, with the level of detail
expression now has the correct value aggregated at the state level and order ID level,
although the view is still at the state level. Hence the level of detail expression extends
tableau's calculation language by introducing the cable ability to define at the level the
aggregations should happen.
SELF-ASSESSMENT QUESTIONS – 2
2. The icon associated with the field that has been grouped is a ______________
3. It is possible to disable the highlight option for the entire workbook. Yes/No
STEP 1: Initially a state parameter has been created. Right click on this and select Show
parameter Control. The state parameter control is available on the right to choose
dynamically as we navigate through the visualization. Next the calculated field is defined to
compute the sales for the state. A comparative analysis with right click on State Create
calculated field can be done.
STEP 3: A condition is defined as whenever state is equal to the chosen parameter then
compute the sales value, else compute it is zero and End.
STEP 4: A reference to Sales value that holds the value of the sales for the chosen parameter
is chosen. Click OK. Next let's define the sales for the state using Exclude lod so that it
excludes the state belonging to the row from the sales total. Once this is done, this value can
be used to repeat across all states and we can then easily calculate the difference between
each state value and the state chosen from the parameter dropdown right click on state and
choose Create calculated field. This parameter is called as Exclude state value.
Fig 10.10 Define sales for the state using Exclude LOD
STEP 5: Now the LOD to exclude the state and sum up the value for the reference state or
the chosen parameter state is defined.
STEP 6: After defining the exclude expression, click on OK. Now compute the difference for
the state and the reference state sales value. Right click on Sales and create calculated field.
This difference in Sales sum of Sales minus sum of the Exclude Sales value is called. A view
with the state, the sales and the difference amount to do a comparison analysis is finally built.
Fig 10.11 Graphical representation of each state with the sales amount
STEP 11: A comparative analysis graph showing up here which shows that California and
florida are still doing well compared to Texas, while all other states are lagging behind.
STEP 12: The exclude expression can be used to repeat a value across all the states so that
we can easily calculate the difference between each state and the state chosen from the
parameter dropdown.
SELF-ASSESSMENT QUESTIONS – 3
4. Effective tables and charts for data visualization can be designed using _____________.
7. NESTED LOD
If our business requirement is much more complicated and we need more than one layer of
level of detail calculation. Initially, we could start with the visualization level of detail, then
have an inner part that uses an include expression to produce a more granular risk result.
This could be then wrapped in an exclude or a fixed expression so that the inner result is
aggregated back to the outer level of detail. Finally, the calculation level of detail will be
resolved to match the level of detail of the visualization. These kind of level of detail
expressions are more commonly referred to as nested level of detail.
A business is interested in seeing how many orders per state end up being unprofitable.
Step 1: Create a nested level of detail that first calculates how many orders are not profitable
using the include keyword right click and choose Create calculated field. Type the name as
number of unprofitable orders.
Step 2:An expression that looks for negative values and uses the int function to replace the
false values with zero and the true values as one is available. Using the fixed keyword to
calculate the orders at the same level of detail as state that's been specified here. click OK.
Now that we have defined the number of unprofitable orders.
Step 3:Create one more calculation for the percentage of unprofitable orders.
Step 4: Divide the total number of unprofitable orders over the number of order IDs. Click
OK. Choose percentage from the default properties for this value.
Step 5: Drag and drop the state the percentage and change this to tree view.
Step 7:The size in this view is controlled by the total number of orders for the particular
state and the color is controlled by the percentage of unprofitable orders. Here California has
a high percentage of unprofitable orders compared to Colorado which also has a
considerable amount of orders. But at the same time the person of orders being unprofitable
is lower than California.
SELF-ASSESSMENT QUESTIONS – 4
Answers
1. stem()
2. Paperclip
3. Yes
4. Data-ink ratio
5. Table calculation
6. 2020.3
Dimensions are the descriptive attribute values for multiple dimensions of each attribute,
defining multiple characteristics. A dimension table ,having reference of a product key form
the table, can consist of product name, product type, size, color, description, etc.
5. What are the different connections you can make with your dataset?
We can either connect live to our data set or extract data onto Tableau.
• Live: Connecting live to a data set leverages its computational processing and storage.
New queries will go to the database and will be reflected as new or updated within
the data.
• Extract: An extract will make a static snapshot of the data to be used by Tableau’s
data engine. The snapshot of the data can be refreshed on a recurring schedule as a
whole or incrementally append data. One way to set up these schedules is via the
Tableau server.
The benefit of Tableau extract over live connection is that extract can be used anywhere
without any connection and you can build your own visualization without connecting to
database.
10. What is the difference between a tree map and heat map?
A heat map can be used for comparing categories with color and size. With heat maps, you
can compare two different measures together.
8. SUMMARY
Let us recapitulate the important concepts discussed in this unit:
• To use the level of detailed expressions to aggregate our dimensionalities other than
the view level.
• Provides explanations for when and why you should use a LOD to visualize your data.
• Provides explanation about Include Level od Detail Expression.
• Provides explanation about Include Level od Detail Expression.
• Illustration about the use of Nested LOD
9. GLOSSARY
Let us have an overview of the important terms mentioned in the unit:
LOD: Level of Detail expressions (also known as LOD expressions) allow you to compute
values at the data source level and the visualization level.
Fixed LOD: FIXED level of detail expressions compute a value using the specified dimensions,
without reference to the dimensions in the view.
10. CASELET
1. Get a Single Aggregate.
2. Isolate a Specific Value from a range of values
3. Synchronize Chart Axes
Fig 10.16
DADS304
VISUALIZATION
Unit 11
Dashboard and Story Telling
Table of Contents
SL Topic Fig No / SAQ / Page No
No Table / Activity
Graph
1 Introduction
3
1.1 Learning Objectives
2 Aggregating dimensionalities 1, 2, 3, 4, 5, 6,
4-12
7, 8, 9
3 Steps to calculate aggregating dimensionalities 10, 11, 12, 13 1 13-16
4 INCLUDE Level of detailed expressions 14, 15, 16, 17 2 17-21
5 Steps to calculate level of detail expression 20 21-23
6 Exclude level of detail expression 3 24-27
7 Nested lod 28
8 Summary 28
9 Glossary 28
10 Caselet 29
11 Conceptual Map 30
1. INTRODUCTION
A dashboard is a visual display of the most critical information needed to achieve one or
more business objectives which fits entirely on a single computer screen so that all
information can be monitored in a single glance. A story on the other hand, contains multiple
sheets or dashboards that are combined together to convey a particular business story which
shows how various facts or incidents are connected and what can be done or could have been
done to improve a business outcome. In this chapter, we are going to look at the retail
dashboard and story that have been built using the retail data set.
STEP 1: A dashboard with a comprehensive view of the sales trend for the last two years is
shown, and then the sales broken down by the states and then which cities are the top
performing in terms of the sales.
STEP 2: The profits for 2014 and 2015 is being displayed, and then the profits is broken
down by the state and the top performing cities by profit. All these views are built to be
interactive as they allow us to filter to a particular month or a year or even into a particular
state. For instance, if we click on 2015 year, then all the views filter to list the data
corresponding to the year 2015 alone. Now clicking outside will return us back to the original
state and in the same single view we can also figure out the top five products for a particular
state or even the overall sales by category and customer profile as and if more purchases are
being made by male or female and so on.
STEP 3 : A dashboard gives us so much of information and the interactive capability allows
us to derive insights and answers questions that might come up as we review this dashboard.
And with dashboards we can also have multiple tabs representing a different dimension of
the same business information or other aspects of the business contributing to the business
performance or outcome. On looking at the second tab, sales by category, we have the sales
broken down by the category and the subcategory, along with the delivery status, both
overall and the various location it is shipped from. This view is again interactive, allowing us
to filter by the year or the month.
STEP 4 : And also clicking on the individual categories will allow us to monitor the delivery
status for that particular category. Now by clicking on various months or the years or the
categories, we can see if we have a problem with a particular location or a category and take
any timely decision that's necessary to resolve any issues that exist. And lastly, we have a tab
that displays the sales broken down with a customer profile, or rather gender in this case.
This again gives a comprehensive view of how gender influences the sales, both regionally
and by category. So we have seen a dashboard that contains all the critical information within
a single view or tabs, allowing the user to monitor his business performance effectively.
STEP 5 : Again, each tab year that we saw is a single view of a particular business function
or area that the user is interested in analyzing and deriving Insights . In the dashboard, we
saw the way various visualizations helps the user analyze the data, but in a story, we're
actually presenting our findings from the analysis in a conclusive manner to the end user. So
different facts are knit together to form the story that conveys why a particular scenario
happened and what could have been done to prevent something bad from happening or
make some conclusive recommendation based on the facts here. From an analysis that has
been done using the retail data set, we've seen that the sales went down in August and
September of 2015. And our story basically is targeted at presenting our findings related to
this decline in sales and why it happened and what could have been done to prevent this
scenario.
STEP 6 : The story that we've built, starts with displaying the sales trend across 2014 and
2015 by women, men and home. The first tab of the view shows how sales has taken a huge
dip in August and September of 2015, especially for the women category. Comparing this to
the previous year, we see that this is not a seasonal dip as previous year's sales seems to be
fine during the same time period and this exactly was conveyed as part of this initial story
view. And then we move on to the next tab where we further go into the details as to which
women's categories experienced this particular decline. Here we see that the clothing and
category sales have particularly gone down significantly in both of these months.
STEP 7 : Go ahead and look further at the customer sentiment for these months. Here we see
that there is significant negative sentiment factor associated with June and July of 2015 and
analyzing these sentiments by clicking on these months here further shows that a lot of it is
due to delayed or bad service. So we move on and analyze the delivery status for these
particular months. Here it is noticed that the clothing and the handbag shipments
particularly were delayed during June and July due to bad weather conditions in Chicago.
This in turn led to a lot of social media negative sentiment which we saw in the previous tab.
So this affected the sales for the next two months as customers weren't too keen on ordering
given the shipments were delayed.
Fig 11.7 Different facts are knit together to form the story
STEP 8:
If this had been addressed sooner and the company had clarified to the customers the
reasons for the delay and how this was a temporary problem and that they were working
actively on it, it would have helped them avoid the negative impact on the sales. So in the
story that all these facts are put together the steps that could have been adopted to improve
has been illustrated.
Fig 11.9 This Story contains all facts and recommendations for an improved business
outcome
3. BUILDING DASHBOARDS
Once the views have been built, the next logical step would be to use a dashboard or a story
to present these data views. To create a dashboard, we can either click on the new dashboard
icon next to the ad sheet icon, or click on the dashboard link on the top menu ribbon.
Step 1: Initially the dashboard workspace has opened up. A dashboard pane on the left opens
up with the dashboard properties in place of the data pane. This pane displays all available
sheets that can be used in the dashboard.
Step 2: The first thing while building dashboard is to set the size of the dashboard in the size
drop down. We have various options from automatic to exactly to range or various portrait
and landscape options and also an option for iPad. Automatic is an option that the dashboard
determines to fill on its own within the available area, exactly as the name suggests gives us
the flexibility to set some fixed size that we want the dashboard to display in. Then there is
the select range option. This option is used to set the limits or boundaries that the dashboard
can expand or shrink into. It's very important to plan ahead and set the size so that we have
to end up redesigning the layout at a later point. Add the views to the dashboard. It's as easy
as dragging and dropping a view to the dashboard layout. When we drag the sheet one, it
takes out the entire space. Go ahead and drag the next view.
Step 3: Once dragged and dropped a view, a checkmark icon appears next to the view
indicating that the view is used by the dashboard. Secondly, the options on the left panel are
available to format the dashboard layout. Initially, we have a horizontal container, a vertical
container, images, web pages, text and a blank box that can be added to the dashboard.
Before we dig deeper into the containers. Tile layouts are the ones that are arranged in a
single layer grid that adjusts in size based on the total size of the dashboard objects, whereas
floating objects are the ones that can be layered on top of other objects and have a fixed
custom position and size. Drag the views using the tile layout option. The objects are
arranged next to each other using a horizontal layout container for both the top and the
bottom row. Change these views to use the floating layout.
Step 4: The second view is layered on top of the first view. It's also possible to switch
between tiled and floating by clicking on the floating option from the shortcut menu. Change
the tile layout option for the first view. So now both views are floating. In the layout section
on the left all the items that have been added to the Dashboard are available. The order of
these items can be changed by dragging and dropping items in a different order than the
hierarchy. Tile layout items can never be reordered. Change the tile layout option for the
Sales Trend view. Now change this to floating and position each one of these on top of each
other. The order in which it is arranged in the layout on the left determines where both these
views are placed.
Step 5: The order has been changed. Move Sales by State to the top. The order has been
changed and the item Sales by Trend automatically goes to the back. The order of the items
can be changed by dragging and dropping items in a different order rather than the hierarchy
only for floating layout items, tile layout items can never be reordered. Next, is the position
and the size options. The position allows us to specify the position in pixels where they need
to be placed. And the size determines the width and height of the view for the floating objects
in the Dashboard. We want the Sales by State view to be displayed or placed in the top left
corner. Then specify the Y position as Zero for Sales by State.
Step 6: The Sales by state view has moved to the top left corner. The height or width for the
floating object can be controlled. Then the show title option is displayed. This can be used to
toggle the display for the title at both the sheet level and the dashboard level.
SELF-ASSESSMENT QUESTIONS – 1
4. FORMATTING DASHBOARDS
For formatting dashboards, consider an example which filters the top view, that is the top
ten cities by state to just show Los Angeles. It is noticed that the views remain set in the place
that they are placed initially, although there is enough space for the bottom view, that is the
sales per category and department view to move up now and use up all the free space. A
dashboard have the two views they placed inside the vertical layout container. Go in and
apply the same filter here. The second view at the bottom automatically moves up using the
empty space and in the process eliminating the need for a scrollbar. These container objects
helps in creating the seamless experience for users by repositioning and resizing the
dashboard objects whenever necessary.
Step 1: First, there is the horizontal layout container. This allows to group worksheets and
any other components that needs to be part of your dashboard from left to right direction.
And also using this layout, you can edit the height of all the objects that are placed within this
container at the same time. And then there are the vertical containers. Vertical container
allows to group worksheets and other dashboard components from top to bottom. And using
a vertical layout container, it is possible to edit the width of all the objects that are placed
within it at the same time.
Suppose you want to have one top view and two bottom views. When this is the case, place
a vertical container at the top of your dashboard layout. Add a horizontal container at the
bottom. Drop the view to see at the top of the dashboard into the vertical container, and the
two views to see at the bottom into the horizontal container. This is now the layout for the
dashboard. So, depending on the views to be placed and where they need to be placed, it is
necessary to decide whether horizontal or vertical containers is required.
Step 2: The layout containers always have a blue border when highlighted. Layout
containers in general are always pushed to the background of view and it's very hard to
select. So an easy way to select any particular element on the dashboard or in specific the
layout container would be to click on the element in the layout section, especially when there
are multiple or nested container objects.
An easiest way to select layout container is illustrated below. Initially, select the gray
container. Then use the dropdown arrow at the top right corner and choose Select Layout
Container. Now this highlights the layout container within which the particular view has
been placed. Secondly, add a web page to dashboard. we have an option that allows to add a
web page or a live connection. This option lets to embed any web page directly within the
Dashboard, which in fact is a live connection that displays the details of the page whenever
the dashboard is open. Suppose there is a finance dashboard and the interest is in seeing the
daily stock market status within the Dashboard. This could be done by adding the web page
with the URL pointing to that particular web page.
Step 3: Initially, add Horizontal container to the bottom. Track the web page object onto this
horizontal container. Enter the webpage address for the Yahoo finance page and click OK.
This will bring in the details of the current stock market status from the Yahoo web page and
populate it within this object container. To add an image, we simply click on the image object
and drag it to the location where the image has to be added. Go ahead and add a logo for this
dashboard. Then, pick the PNG image or the logo image file that has been already saved on
the desktop. Once it's added, it is possible to center it or fit it by choosing an option from the
menu. By right clicking on the image from the menu, it is possible to add or set a URL to make
this image linkable to a webpage.
Step 4: Go ahead and create the sample dashboard for this session using our views that have
already been created on.
Step 5: Start with adding these views. Sales Strength, Sales by State and the top ten cities by
state. Go ahead and add a horizontal container for the dashboard title.
Step 6: Go ahead and drop a text container to add the dashboard title. Drag one more
horizontal container to a dashboard layout space.
Step 7: Go ahead and arrange all these three views that have been dropped onto our
workspace into this container. Expand the size so that all three views fit into the first row
within this container.
Step 8: Adjust the size by clicking on the arrow mark on the horizontal container and also
the views to set the size. To format the dashboard. Click on dashboard and choose Format.
Step 9: Change the default Dashboard shading to another color if preferred. Change the
dashboard title, font, alignment, shading and so on within this format. Dashboard options.
Go and format the layout containers. Finally, change the color to a medium gray.
Step 10: To change the view colors and format them, it is necessary to directly do it in within
the sheet view. This could be done by going back to the view or by right clicking with a view
within the dashboard and choosing format.
SELF-ASSESSMENT QUESTIONS – 2
In the Fig 11.5.1 the mail and therefore only the male category is being highlighted in all the
other views. In this case, the mail has been highlighted, but it is possible to highlight a single
or even multiple items within Legend. This option can either be enabled or disabled from
within the toolbar option at the top. The steps to add interactivity dashboard is illustrated
below
Step 1: The option to highlight based on criteria or custom criteria has to be defined. This is
done through advanced highlight actions. On a worksheet, go ahead and click on Worksheet
Actions. In the Actions dialog box, click the Add Action button and select Highlight. Add a
name here and then select the trigger option. To trigger the action on Hover, select on Menu.
Hover is useful especially in a Dashboard when the mouse over a mark in the view and the
corresponding marks in all other views are highlighted. Or rather the particular action that
has been added is run.
Step 2: Select the highlight marks based on a click action. Click on a mark, the action is run.
Finally, is the Menu option. This allows us to highlight by selecting an option from the context
menu by right clicking on a mark. Go ahead and select the target sheets.
Step 3: Select the source sheet as Sales by State year and select the target sheets as Sales by
State, sales Trend and Top Sensitives by State.
Step 4: Go ahead and complete rest of the Dashboard offline so that it is possible to review
it further. Next, check if the layout is suitable for the end presentation or device, and then we
can specify the overall size in the Dashboard menu and if it needs to be automatic, or if it
automatically needs to resize to fit the window it is displayed in.
Step 5: Check if any unwanted items are present so that they can be cleaned up.
Step 6: Shading affects the dashboard objects itself and not the view themselves.
6. BUILDING STORIES
A story is a sequence of worksheets or dashboards that convey information in a narrative
manner so it is possible to indicate the way plot is connected and provide some context and
make a company spelling case using a storyline so that the business decision making process
is much easier. Each individual sheet in a tableau story is called a story point and it's not a
static presentation .It is a live connection to data and can change as the underlying data
changes. Stories work great as a presentation medium, making the narrative flow easier to
present to the audience. Once a click is done on the new story window, a new layout window
opens up with the left panel listing the available worksheets and the dashboards that can be
added to the story.
Step 2: Go ahead and choose a desktop size. To edit the title, double click and choose your
report title.
Step 3: Add the sheet that wanted to be displayed as the first story point. Go ahead and add
sales trend. It is also possible to customize these sheets within the story point by selecting a
particular range of marks or by sorting them in a particular order, or by filtering it to a
particular field and retaining them as part of the story. Any customization that is done on the
Story Point sheet will not be automatically updated in the original sheet. However, edits in
the original sheet, will be carried forward to the story points.
Step 4: In order to format the story pane, click on Story and choose Format . It is possible to
change the shading, or we can even set a color and set a transparency level, or set a Navigator
shading, or set the alignment, or the font and so on. To add a caption to each one of the story
points, double click and enter the story point name.
Step 5: Create a new story point, click on the new blank point. And now drag another sheet
onto this sheet and drag another sheet onto your story point.
Step 6: To delete the story point, click on the delete icon that appears right next to the story
point. To rearrange or remove story points, just drag a sheet and drop it in any order is
required. In order to duplicate story points click on the duplicate icon. This will create a copy
of the story point where a separate caption is created. To navigate between the story points
or use the back forward button in case there are a lot of story points.
Step 7: If the navigator buttons are not required, choose deselect. There are different tabs
for the different story points. Once the story is created view the story or rather present using
the presentation mode on the top. This will give a magnified view of the story.
SELF-ASSESSMENT QUESTIONS – 3
Answers
1. Reporting requirements
2. Dashboard
3. Box plot and histogram
4. Yes, with the utilization of a Site page protest
5. Sum
6. Countd
TERMINAL QUESTIONS
1. Give a brief about the tableau dashboard?
Tableau dashboard is a group of various views which allows you to compare different types
of data simultaneously. Datasheets and dashboards are connected if any modification
happens to the data that directly reflects in dashboards. It is the most efficient approach to
visualize the data and analyze it.
Dimensions are descriptive attributes of data. Those will be stored in the dimensions table.
For example, customer’s information like name, number, and email will be stored in the
dimension table.
Extract: Extract is a snapshot of data that will be extracted from the data source and put into
the Tableau repository. This snapshot can be refreshed periodically fully or incrementally.
This can be scheduled in Tableau Server.
Live: It creates a direct connection to the data source and data will be fetched directly from
tables. So, data will be up to date and consistent. But, this also affects access speed.
Filters are used to provide the correct information to viewers after removing unnecessary
data. There are various types of filters available in Tableau.
▪ Extract Filters – Extract filters are used to apply filters on extracted data from the
data source. For this filter, data is extracted from the data source and placed into the
Tableau data repository.
▪ Datasource Filters – Datasource filters are the same as extract filters. They also work
on the extracted dataset. But, the only difference is it works with both live and extract
connections.
▪ Context Filters – Context Filters are applied on the data rows before any other filters.
They are limited to views, but they can be applied on selected sheets. They define
Aggregation and Disaggregation of data in Tableau
▪ Dimension Filters – Dimension filters are used to apply filters on dimensions in
worksheets. Dimension filters are applied through the top or bottom conditions,
formula, and wildcard match.
▪ Measure Filters – Measure filters are applied to the values present in the measures.
8. Differentiate between Tiled and Floating in dashboards?
In a tiled layout, items don’t overlap. The layout will be adjusted according to dashboard size.
In the floating layout, items can be placed on some other layers. Floating items can have fixed
positions and sizes.
8. SUMMARY
Let us recapitulate the important concepts discussed in this unit:
• To use the dashboards, interactive dashboards and formatting dashboards
• Provides explanations for when and why you should use a dashboard and story to
visualize your data.
• Provides explanation about steps involved in creating a dashboard.
• Provides explanation about steps involved in creating a story.
7. GLOSSARY
Let us have an overview of the important terms mentioned in the unit:
Dashboard: A dashboard is a collection of several views, letting you compare a variety of data
simultaneously.
8. CASELET
1. Finance: Understanding and sharing variances
Given the rapid pace of changes in the economy and business operations, it can be harder
than ever to manage budgets, expenses and forecasts. The finance team’s dashboards
provide up-to-date results and enable the team to change forecasts as uncertainty decreases.
By sharing this dashboard with leaders across the business, the finance team supports
strategic decision-making and resource allocation.
The COVID-19 pandemic has forced human resources departments to ask new questions
around employee safety and working from home. The Tableau people analytics team used
data to inform their response: They enriched their data with publicly available information
and by gathering new data from an employee survey.
9. CONCEPTUAL MAP
Fig 11.21
DADS304
VISUALIZATION
Unit 12
Power BI - Connecting To Data Using Power
Query
1. INTRODUCTION
MS Excel can be used to convert raw data into meaningful visualizations. Although excel has
its own limitations, there are add-ins that make excel an excellent path for data visualization.
Data Discovery
Data discovery is the process of finding data sources by connecting to various data sources
which include:
• Relational data
• Structured data
• Semi-structured data
• Nonstructured data
Data Loading
Data loading is the process of loading data from data sources into excel for analysis.
Data Modification
Data modification helps in understanding how to,
• Modify data and filter it.
• Join separate data structures.
The combined application of data discovery, data loading and data modification is also called
as ETL (Extract, Transform , Load) data.
For earlier versions of Excel (before 2016) power query was an add-in available to discover,
access, and consolidate information from various sources. Power Query now in updated
version is known as Get & Transform features which helps with the process of collecting,
combining, and refining data sources. The four phases of Power Query are: -
• Connect
• Transform
• Combine
• Load
Figure 1 shows a blank excel worksheet. Power Query (also known as Get & Transform in
Excel) can help in importing or connecting to external data and then shaping that data
according to requirement, and then loading the query into Excel to create charts and reports.
Let’s look at the process of connecting the data. Using the get and transform utility we can
connect to file data such as Excel, CSV, XML, text, JSON, etc as shown in figure 2.
Figure 3 shows we can also connect to databases such as SQL server database, Microsoft
Access database, Oracle database, Sybase database, SAP Hana database, and so on. Similarly,
the data connection is possible with different sources of Azure storage and their databases.
Data collection from online services like SharePoint online list, Microsoft Exchange,
Facebook, and salesforce can be used to connect as a new query.
Data from other sources such as web page, SharePoint list, OData feed, Hadoop file, ODBC
service are used to connect external data to active directory.
First, select the new query option in the data tab and then click on from file option, then click
on from Excel workbook, and then select which workbook needs to open from the database
as shown in figure 6.
The selected file data shown in figure 7 is imported to the excel file which opens a navigator
window. Select the master data option and click on the load button.
The load option has two sub-options load and load too. The load option allows importing the
data into the current worksheet whereas the load to option will allow us to create a new
worksheet for the dataset.
Import the dataset into a new workbook by selecting the load to option and selecting only
create connection option.
This creates a connection that gets displayed in the workbook area of the excel sheet as
shown in figure 10.
Now, to import the patient transaction details follow the same steps as per figure 8 and select
the patient transaction data and load the data. Below is figure 11 shows the transaction data
loaded into the worksheet.
The navigator window shown in figure 11 is useful to take a quick look at the available data
models that are connected to or being loaded to data model.
In the navigator window, we can also make use of the peek pop-up feature which helps to
preview the data by hovering over the data model and we can use the scroll bar to view the
data in the model as shown in figure 12.
Peek pop-up allows for data discovery without loading unnecessary data. Note that each load
is stored as an individual query and it appears on the navigator window as it is being
connected to other data sources.
In the navigator window, we can right-click on the data model we can access many options
which allow us to copy, edit, load, or perform many such operations as shown in figure 13.
For example, consider the SQL server database. Select the SQL database option which will
provide a window where we need to enter the server name and details for any specific query
or relationship column required as shown in figure 15.
From the collection of data from file, option choose the XML option. Select the patients.xml
file which contains the patient’s data in an XML format and import the data as shown below
in figure 17.
Select the book option and load the data as shown in figure 18. This will load the data from
the XML file into the excel data model as shown in figure 19.
Fig 19: Patients Data Loaded into the Excel Data model
Figure 19 also shows the query connection which has been established to the XML schema
on the right side of the excel sheet in the workbook queries window.
To connect to web source to obtain data select the new query option in Data tab and under
‘from other sources’ select the from web option as shown in figure 20.
Following the steps shown for figure 20 a pop-up window will appear on the worksheet
where we need to enter the URL in the empty field as shown in figure 21.
For this scenario we are loading the hospital data from the California region as shown in
figure 21. Click ok to go to the navigator window where the HTML table element is loaded as
shown in figure 22.
Figure 23 shows a data table which consists of 3 columns (maps, name, city), since we do not
require map column, we can remove the column using the query editor
Since we need to remove the map column click on edit which will open the query editor
window where we can remove any unnecessary columns.
Select the required column and click on remove columns option from the top menu as shown
in figure 24. This will delete the maps column and only the name and city column will remain
which is required for the data model.
This will open a blank query window where we need to enter the formula in the empty field
provided. Enter the formula as shown in figure 26.
Once the formula is entered the HTML table structure will be loaded and on expanding the
data column, we will be able to see the various data attributes of the data table as shown in
figure 27.
In figure 27 we can see the attributes available in the data table, here we can choose which
data we need to display and as we do not require the maps data uncheck the maps attribute
and select ok.
This will show the data along with the city and name as displayed in figure 28.
In figure 29 we have created a demo page VV Hospital Training for the purpose of the
scenario. Here we can extract the data like posts comments likes etc. Now, in order to connect
to the above Facebook data, click on new query and under online services select ‘from
Facebook’ as shown in figure 30.
Following the above step will open a Facebook window which requests an object or a
connection in the Facebook graph. Facebook graph is the primary way for any external app
to talk with Facebook. Using the Facebook graph, we can access all the information we
require that can be handled by excel. To access the required object id open the Facebook
page in the browser and copy the numbers which appears after the name in the URL at the
end of page as shown in figure 31.
The highlighted value in figure 31 is the object id of the demo page which we need in order
to establish the connection from excel.
The above figure shows the pop-up window which appears following the steps in figure 30.
The object id is entered which we obtained from the URL and the connection attributes we
need here is posts. Clicking on ok will retrieve all information related to the page as shown
in figure 33.
Click on edit will open the query editor and we can use the header to see additional
information that is available for that attribute as shown in figure 34.
Figure 34 shows the data attributes that can be extracted from the comments data. By
clicking on ok the data from the comments will be added to the table as shown in figure 35.
The above shown steps shows how to connect to an online Facebook source to extract data,
the same data can also be extracted using a similar formula by clicking on a new blank query
option (refer figure 25). For this we need to choose new query from data tab, then clicking
on from other sources and then on blank query, will open power query editor window.
The below figure 36 shows the formula used to connect to the Facebook id.
Once we get the power query editor window, formula is required to enter to extract the post
from Facebook graph id . By clicking enter a similar window will pop-up with all the data
that appear on that web page, here the comments data can also be extracted using the same
steps from figure 34 and the resulting data will be the same as shown in figure 35. Here we
have discussed examples to connect to various data sources.
Summary
In this topic we discussed,
• How to load data into the Excel model
• Various methods to connect and load data into the Excel sheet.
• Various methods to connect to Web data source.
DADS304
VISUALIZATION
Unit 13
Merging and Appending Data Sources
Table of Contents
SL SAQ /
Topic Fig No / Table / Graph Page No
No Activity
1 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
Introduction 1 3-25
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
2 Summary - - 25
3 Terminal questions - - 26
4 Answers - - 26-32
1. INTRODUCTION
In Unit 12 we have discussed about connecting to various data sources. Now let’s understand
the concept of merging of data from different data sources into a single data source in order
to visualize the result.
Consider the two sheets as patient master data and patient experience data connected from
the file data and xml data, to merge these we need to do right click on the data model and
select the merge option, and then we need to choose the sheet or the data source which is
required to be merged, in this case we need to merge is patient experience data, for this we
need to follow the steps as shown below.
We need to choose the left outer as a join kind and then have to select the common join
element as shown in figure 3. In this case the common column or field that need to join on is
the case number. After selecting the common join element, we need to click on ‘ok’. The data
editor window will open which shows the columns from both the data sheets appearing in
the single window as shown in figure 4.
The columns from the second sheet are concatenated and are created as a single column
called the new column. We need to click on the expand icon and then select the data fields
that are required for the new merge data source. Since there will be repeating data columns
we need to choose only the required data, in this case, our required data would be ‘wait time
for check in’, ‘wait time in waiting room’, ‘wait time for physician’, ‘wait time at check out’,
‘total wait time’ and ‘patient satisfaction rating’ as shown in figure 5. After this we need to
click ok to display to the selected data as shown in figure 6.
Now we have required field from the data sources which we wanted to merge. After this we
need to select the close and load option on the top left of the window. The merge data source
will be loaded into the workbook query window which contains the data from both the
master data file and the experience data file, as shown in figure 7.
Once this is done, we have to right click on the merge file and rename it as needed. Here the
merge data source named as hospital performance data which will be used to visualize the
data as shown in figure 8.
First, we need to shape the dataset by deleting unnecessary columns, adding specific
calculated columns, filtering data to our requirements. To do this, we need to right click on
the merged data source which we renamed in figure 8 and click on edit. The query editor
window will open where we can perform any of the required data modification or cleansing
technique to prepare the data set for visualization.
Let’s begin with renaming the existing columns, this can be done by either double clicking
on the column name to change the name directly or right click on the column name and select
the rename option to change the name of the column accordingly.
To delete or remove any existing columns, we need to right click on the column name and
choose the option “remove” which will delete the column from the worksheet.
Note: the query editor keeps track of each transformation which has been performed under
the applied steps window. All the transformations which have been applied makes up the
new query. Note that none of these actions performed change the original source data, excel
records each data that is performed and takes a snapshot of it and brings it back to the
workbook.
To speed up the development of any complex data transformation processes we need to filter
or remove data to reduce the sample dataset. We can use the ‘Remove Rows’ option in the
top menu where we can do numerous operations like remove top rows, bottom rows,
alternate rows, remove duplicates and errors which will help in removing certain records
which will make the development of the data transformation process faster and easier.
We need to select the required data model, then right click to select the append option in the
drop menu. This option helps in appending two or more different tables having same data
structure. For example, if we have two or more rows appearing in one table and rest of rows
with the same column structure appearing in another table. Now using append option, we
can choose primary table which contain our first set of rows and second set of rows in second
table, once they are chosen, we can press ok. This would append the rows from both the data
sources.
We need to sort the column in descending order to get the top 20 patients.
The top 20 longest wait time patients are sorted and displayed as shown in figure 11. Once
we have sorted data, we must select to the keep the top rows option and enter the number
of rows required. For this above scenario we need 20 rows
The top 20 records for the highest physician wait time is displayed only, as shown in figure
13.
Note that all the steps performed will be added to the applied steps window in the query
settings as shown in figure 14.
Here we can revert to any previous version of the records by just deleting the step we need
to undo. Here if we need to keep all the top end or bottom end records or even a specified
range of records from the data sheet, we just need to delete the keep first rows step and all
the data records will be displayed again.
Most datasets have duplicates resulting in poor data quality, so if we need to remove
duplicate values from the records, we can remove the duplicates in the query editor
Consider the case in figure 15 where there are duplicate records in the case number column.
In the data records we have 37 rows and to remove the rows with duplicate records which
in this case is 3 rows, we need to choose the case number column and then select the remove
duplicate option in the remove rows feature in the top menu as displayed in figure 16.
Now we can see that the number of rows to be reduced to 34 and the 3 duplicate records are
deleted. Similarly, if there are any column that consist of duplicate records which need to be
deleted, select the column, and choose the remove duplicates option as shown in figure 16.
Filtering Data
Let’s undo the changes made to the data records to get back the complete data record which
is available. For this we have to delete the filtered rows option to obtain the original complete
data.
Once the complete data is obtained the process of limiting data by filtering can be done by
choosing the filter option in the expand option adjacent to each column as shown in figure
18.
Here we can filter those records we do not need and display a limited set of data. For
example, consider that the data from Boston and Chicago are not required, we can deselect
the above-mentioned records and only display the required data.
Once this option is applied the data will be filtered and the data from Boston and Chicago
will not be displayed and only the remaining data will be obtained.
Figure 20 shows the text filter option where we can choose advance options to filter the data
like equals, does not equal, begin with, does not begin with, end with etc.
When the data sheet consists of date records, we can filter the data using specific date option
such as, next week, quarter, last year etc.
Figure 21 shows the add custom column window, here we are adding the column full name,
and, in the column, formula add the attributes first name and last name and concatenating
the attributes with a space. After adding the attributes clicking ok will add the new custom
column ‘full name’ to the work sheet as shown in figure 22.
We can change the data type manually as shown in the figure 23. Selecting the data and then
doing right click on the data type will provide a dropdown menu from which we can choose
the necessary data type. A pop up appears with all the data types options, be it date, binary,
text, decimal, whole number etc.
Some data types cannot be applied to certain columns for example if we apply date data type
to a text field then the entire column will show error data as displayed in figure 24.
To restore the above error, we need to delete the change type step from the navigator
window. This will restore the column to its original form.
Here we need to enter the required value to be find and then have to enter the value we need
to replace the searched values and then click ok. This will change all the female records in
the column into 'FE’ as displayed in figure 26.
Changing data type and replacing values are the fundamental concepts required to
manipulate and cleanse data in excel.
SELF-ASSESSMENT QUESTIONS – 1
SELF-ASSESSMENT QUESTIONS – 1
2. SUMMARY
In this topic we discussed,
• How to remove unnecessary columns and add specific calculated columns.
• To remove duplicates and infiltering the data
• To transform and cleanse data so to visualize it effectively.
3. TERMINAL QUESTIONS
Q1. Explain the need for appending the data?
Q2. Why deleting unnecessary columns, adding specific calculated columns, filtering data is
important for editing and modifying data?
Q3. How data cleansing helps in data visualization?
Q4. Explain sorting and filter function in MS Excel?
Q5. What do you mean by data consistency? Explain this with an example
4. ANSWERS
Self-Assessment Questions
1. Option B, AVG is not a function instead average is the function
2. Option D, Ampersand (&)
3. Option B, 94
4. Option C, Data
5. Option D, Function Filter
6. Option D, View>Freeze Panes>Flash Fill
7. Option D, Integer
8. Query Editor
9. csv or text files
10. Option A, Text Filter
Answers-Terminal Questions
Ans1= A comprehensive data appending service usually includes a data normalization
process, standardization of data, appending missing data, removing redundant data and
setting up of data automation. Here is why your business should consider investing in data
appending services.
• Cleaner data
Apart from verifying and completing your information, data appending can help you correct
typos, update information (zip codes, place names or addresses) and check up on
email/postal address errors. Data appending services can strengthen the validity of your
mailing list.
• Better segmentation
With access to more information, your business can customize its services through
segmentation. Instead of only having access to name and age, data appending services can
give you access to other critical information like income, which will help you in your
marketing efforts. For instance, if you have a product for women in their mid-30s who make
X amount of income per month, you will be able to find them through data appending.
• Minimize cost
Data appending services can help your business keep the cost down. With verified lists, you
can save on the cost of research, recruiting staff, error correction and much more.
Ans2= First, we need to shape the dataset by deleting unnecessary columns, adding specific
calculated columns, filtering data to our requirements. To do this, we need to right click on
the merged data source which we renamed and click on edit. The query editor window will
open where we can perform any of the required data modification or cleansing technique to
prepare the data set for visualization.
Let’s begin with renaming the existing columns, this can be done by either double clicking
on the column name to change the name directly or right click on the column name and select
the rename option to change the name of the column accordingly.
To delete or remove any existing columns, we need to right click on the column name and
choose the option “remove” which will delete the column from the worksheet.
Note: the query editor keeps track of each transformation which has been performed under
the applied steps window. All the transformations which have been applied makes up the
new query. Note that none of these actions performed change the original source data, excel
records each data that is performed and takes a snapshot of it and brings it back to the
workbook.
To speed up the development of any complex data transformation processes we need to filter
or remove data to reduce the sample dataset. We can use the ‘Remove Rows’ option in the
top menu where we can do numerous operations like remove top rows, bottom rows,
alternate rows, remove duplicates and errors which will help in removing certain records
which will make the development of the data transformation process faster and easier.
Ans3= For the consumption of data, cleansing and modification of data is required to be done.
The query editor automatically assigns the data type when the data is loaded from excel or
other databases, so checking the data types is very important especially when we are looking
for slicing and dicing of data. When the data is loaded from CSV or text files the query editor
is unable to assign the data types, in this case the data type needs to be assigned manually.
The query editor occasionally attempts to apply the data type on its own, this will add a new
step in the navigator window known as ‘changed type’.
We can change the data type manually as shown in the figure 1. Selecting the data and then
doing right click on the data type will provide a dropdown menu from which we can choose
the necessary data type. A pop up appears with all the data types options, be it date, binary,
text, decimal, whole number etc.
Some data types cannot be applied to certain columns for example if we apply date data type
to a text field then the entire column will show error data as displayed in figure 2
To restore the above error, we need to delete the change type step from the navigator
window. This will restore the column to its original form.
Ans4= There are many built-in Excel tools to help with data management and the sorting and
filtering features are among the best. The filter tool gives you the ability to filter a column of
data within a table to isolate the key components you need. The sorting tool allows you to
sort by date, number, alphabetic order and more. In the following example, we will explore
the usage of sorting and filtering and show some advanced sorting techniques.
Let’s say you had the spreadsheet above and wanted to sort by price. This process is fairly
simple. You can either highlight the whole column or even click on the first cell in the column
to get started. Then you will:
• Right click to open the menu
• Go down to the Sort option – when hovering over Sort the sub-menu will appear
• Click on Largest to Smallest
• Select Expand the selection
• Click OK
The whole table has now adjusted for the sorted column. Note: when the data in one column
is related to the data in the remaining columns of the table, you want to select Expand the
selection. This will ensure the data in that row carries over with sorted column data.
In addition to sorting, you may find that adding a filter allows you to better analyse your data.
When data is filtered, only rows that meet the filter criteria will display and other rows will
be hidden. With filtered data, you can then copy, format, print, etc., your data, without having
to sort or move it first. To use a filter,
• Go to the Home ribbon, click the arrow below the Sort & Filtering icon in the Editing
group and choose Filter.
OR
• Go to the Data ribbon, and then click Filter in the Sort & Filter group.
You will notice that all of your column headings now have an arrow next to the heading name.
Click on the arrow next to the heading with which you want to filter, and you will see a list
of all the unique values in that column. Check the box next to the criteria you wish to match
and click OK. Click on the arrow next to another heading to further filter the data.
Ans5= Data that is consistent refers to data that is formatted in a consistent way. This is great
for people working with data because it means all the data can be handled in the same way.
Additionally, data consistency can also refer to data that is constant over time or some other
relationship. For example, if you have a weather dataset, it would be considered consistent
if there were no missing days/hours (depending on your metric).
In summary, data consistency is typically people in data science try to follow and keep in
mind because it ultimately makes the process of using it much easier.
There are many instances when data needs to be transferred or processed according to
business use case. After processing, there might be possibility that the values in different
table have different values but for the same record. This calls for redesign with improved
data consistency.
For Example, table with 3 columns with employee name, employee number, phone number.
Employee Name - XYZ, Employee Number - 123 and Phone Number - 0000.785.563. You
want to use this data in another table or maybe this time your source of data is different. In
new table if you have the same employee details, Employee Name - XYZ, Employee Number
-123 and Phone Number - 0000.785.563, it is consistent data and if you get different phone
for same employee, then it is inconsistent data.
Hence, data must be validated according to certain rules using constraints, triggers,
transaction, etc to get improved data consistency.
DADS304
VISUALIZATION
Unit 14
Visualization with Power BI Desktop
Table of Contents
1. INTRODUCTION
Power BI Desktop is a Windows application that enables users to create advanced data
visualizations and business intelligence reports using data from a variety of sources. It is a
free, standalone desktop application that is part of the Power BI suite of business intelligence
tools developed by Microsoft. Power BI Desktop allows users to create custom data
visualizations, reports, and dashboards with an intuitive drag-and-drop interface. It includes
a range of built-in data connectors for a variety of data sources.
Power BI Desktop is a free software that can be downloaded and installed from the Microsoft
website. Once the app is installed and launched, the UI similar to Figure 1 appears.
The start-up page or the launch page contains various sections. The ‘Get Data’ or ‘Recent
Sources’ can be used to connect to a data source or import the data from any file. The files or
reports that were previously worked on are listed as a ‘quick link’ option. As a part of the
start-up, links to tutorial videos are displayed. For example, there are videos on “Getting
started with Power BI Desktop’, ‘Building Reports’ and so on.
Power BI updates are released every month. The details of the latest update are also available
as a link on this screen under “What’s new”. Besides this, the start-up page has a link to
Forums, Blogs and Tutorials.
These options are always shown at every launch of the app. To not display them at start-up,
the option “Show the screen on the start-up” can be disabled.
Once this launch page is closed, the main interface appears as shown in Figure 2.
1. File Operation
2. Main Menu
3. Insert
4. Modelling
5. View
6. Help
3. REPORT CANVAS
The report canvas in Power BI is the main work area where users can create and design
reports and visualizations using data from various sources. It is the area where users can
drag and drop fields from their data sources, create charts and tables, and arrange
visualizations to create a cohesive report. Users can add new pages to their reports, and each
page has its own report canvas. The canvas can be customized to fit the user's needs,
including adding background colours or images and changing the page size.
The canvas in Power BI Desktop includes several layout tools that allow users to position
and resize their visualizations. It also includes a formatting pane that allows users to adjust
the formatting and appearance of their visualizations, including font size, colours, and other
formatting options.
Pages can be added and deleted on the canvas using the ‘+’ option. Along with the report
canvas, there is an option to add filters. This can be applied on a single page or on all pages.
The page also contains standard or pre-defined visualisations that can be added to the main
canvas.
Once the connection is established with a dataset, the columns corresponding to the datasets
are available under fields. This can be dragged and dropped into the respective fields to
populate the chart on the canvas.
The UI also has options to display the data model view and the relationship view. The data
model is used to preview the data. The relationship view displays the common relationship
between tables and the common columns.
The report view in Power BI is where you design and build the visualizations and reports
that you want to present to your audience. It offers a range of data visualization choices that
may be used to present your data in a meaningful way, and it enables you to construct custom
dashboards and reports using a drag-and-drop interface.
The workspace where you can design and generate your report in Power BI is called the
report canvas page. It is the primary location where you can format the report layout, add
and arrange visualizations, and apply filters. The canvas effectively serves as a blank page
where you can start from the beginning when creating your report and include any
visualizations you require, including maps, charts, tables, matrices, and images.
To launch the report canvas, click on the 3 dot that is present towards the right-end corner
of the page and click on remove to remove the content/object on the page (if any) as shown
in figure 5.
Click on the three dots to display the
options. Then click on the remove option
to remove the object on the page (if
any).
Figure 6 shows the basic report canvas that we get at the start when we don’t have any object
placed on the report canvas (it’s just an empty canvas).
Options
We have report canvas options and basic menus (file, home, insert etc.) as well on this page.
To change the background of the report canvas, click on view and change the background or
themes of the report canvas as shown in figure 7.
Page Option
• Gridlines – The user may enable or disable the gridlines that show up in the report
canvas by checking the "Gridlines" checkbox in the report canvas settings.
• Snap to Grid - You can enable or disable the snap-to-grid capability in Power BI's report
canvas settings by checking the "Snap to grid" checkbox. When you move or resize
visualizations with snap-to-grid enabled, they will automatically align to the closest
grid line.
Some of the key features and components of the report view are:
• Pages: A report in Power BI can have one or more pages. The "New Page" button can
be used to add new pages, and the page navigation bar at the bottom of the screen can
be used to switch between pages as shown in figure 7.
• Fields: This section is where the actual data sets or data tables will appear. The user
may view and manage each data field that is present in your dataset in the Fields pane.
You can add fields to already-existing visualizations or create new ones by dragging
fields from the Fields pane onto the canvas as shown in figure 8.
• Visualizations: These are the graphs, tables, maps, and other types of data
visualizations that Power BI lets you make. A new visualization can be added to your
report by choosing it from the Visualizations window on the right side of the screen,
selecting it, and then dragging it onto the canvas as shown in figure 8.
In this visualization, we have a Field view pane and a field formatting pane.
✓ Field View Pane: The values in the fields list will be changing based on the visualization
that we select.
✓ Field Formatting Pane: If a user clicks on the formatting option, the user can see the
page-related information like Page Size, Page background etc. as shown in figure 8. you
can expand them by clicking on the dropdown arrow and you can see the options for
each field as shown in figure 9.
Field Formatting
Pane
• Filters: You may customize which data is shown in your visualizations by using filters.
When creating a report, choose the visualization you want to filter, then click the
"Filters" button in the Visualizations pane as shown in figure 10.
To the left of the Visualization section, we have Filter page/Section which is used to filter
your visualization based on certain conditions. On this page, we have 3 options available i.e.,
For the demonstrations in this unit, the data from “Adventure Works” dataset is used.
AdventureWorks is a free sample database of retail sales data.
Once the data has been imported, the columns ( both inherent and derived) are available on
the right of the Power BI Desktop view as shown in figure 11. The different charts available
are also displayed here.
Once a column is selected, the most appropriate chart corresponding to it is displayed on the
view. For example, if the column ‘Total Orders’ is selected, a bar graph is selected and
displayed on the screen. The attributes of the bar plot are available on the screen as shown
in Figure 12.
The type of graph can be changed by selecting the most appropriate one in the visualisation
section. For instance, if the ‘stacked bar chart’ is selected, the graph will be converted into a
horizontal chart. The attributes displayed will be that of a ‘horizontal bar chart’.
We can create a new chart by selecting the ‘New Visual’ option as shown in Figure 13. This
will create a default column chart.
The third method of creating a chart is by double-clicking on the canvass. This will create a
default ‘Q&A’ chart.
Let us go back to selecting the ‘Total Orders’ as shown in Figure 14. The text corresponding
to the ‘Total Count’ can be formatted using the options as shown in Figure 14. The number
can be displayed as a comma-separated value by selecting the appropriate values.
To display the total orders subdivided by the subcategory names, ‘select’ the sub-category
name from the ‘fields’ option and drag and drop into the ‘Axis’ attribute as shown in Figure
15. This will display the image as shown in Figure 15.
Note that there is another attribute named ‘Legend’. If the column ‘Subcategory name’ is
moved to an attribute named ‘Legend’, the chart as shown in Figure 16 appears. This is not
very easy to interpret.
The chart can be highlighted by clicking on the option ‘Focus Mode’. The type of the chart can
be modified by clicking on the ‘Pie chart’ option as shown in Figure 17. Note that the
Attributes corresponding to the bar chart are also updated on the screen.
Tooltips are information that appears when the mouse pointer hovers around the visual.
Additional content can be added to the tooltip by adding the columns to the respective
attributes as shown in Figure 18.
Additional Formatting can be done by using the ‘Format’ option as shown in Figure 19 below.
There are general options. Also, there are options specific to the X-axis and Y-axis. For
example, to change the colour of the content, the ‘Color’ option of the y-axis can be changed
as shown in Figure 20.
Similarly, alignment, font, size of the bar graph, inner padding between the bar graph and
other formatting options can be changed by setting the appropriate attributes in this section.
This can be done for both X-axis and Y-axis separately. There is an option to “Return to
Default” settings after the changes have been made.
There is an option called the ‘zoom slider’. As shown in Figure 21, once the zoom slider is
turned on, the slider appears. Using this, the content that needs to appear on the screen can
be controlled. Figure 21 displays the bar greater than 5k only. This option can be used to
display a range of values also.
There are options available to change the colour of the bar chart, and the colour of the text
on the bar chart. There are options available to change the Title, the format of the title and
so on.
Similarly, there are formatting options to modify and enhance the background, border and
so on.
Power BI has several filtering options that can be used to refine and analyze data. Let’s talk
about filtering options that are available to us in Power BI report canvas.
Let’s add some visuals to the report canvas to explain the filtering options as shown in below
figure 23.
Here the user has added two different graphs that have been added from the visualization
pane by selecting the required fields.
Right now, in filter pane we have two filtering options (i.e., Filters on this Page and Filters on
all Pages)
Now, if the user selects a particular visual, then user can notice the options in filters field that
can be applied for that particular visual that is selected as shown in figure24.
Now, let us create a new page i.e., page 2, and copy the visual 2 from page 1 and paste it in
page 2 and resize the graph if it is required.
Back to page 1, and select the particular bar graph, then user can notice the filters on this
visual on Filters Pane can be applied only on category name because there are only two
attributes are present in that graph i.e., Category Name and Total Orders as shown in the
below figure 25.
Figure 25: Selected Visual 2 and Filters options have been enabled for the visual.
With respect to category name, if user click on drop down arrow then user will get the filter
type option and there are two filter type options.
Basic filtering options - where user can select the category on which he wants to visualize
the graph.
For example, in the below figure 26 user is selecting Bikes and clothing option and
accordingly he will be visualizing the graph.
Advanced Filtering options – Here if the user wants to filter their data based on whether a
specific column contains a certain string of text i.e., by mentioning the string in contains
field, then user can use this option as shown in figure 27.
For example, in figure 28, the user has mentioned Bikes in the contains field, and accordingly
his graph will be containing only Bike data.
Figure 28: Resultant Graph after applying the Advanced Filtering for ‘Bikes’
Apart from contains, the advanced filtering has some other options also like ‘doesnot’, ‘is’, ‘is
not’, ‘starts with’ etc as shown in figure 29.
Apart from this, we have options like “Top N” and “Bottom N” (In Filter Type field). If user
wants to see either top or bottom then user can make use of this option. These options allow
users to filter their data to show the top or bottom N items, where N is a specified number.
Let us work on Top N filtering. The below figure 30 shows the Top N filter type, where
number of items is to be set (i.e., n value needs to be set) in “Show items” field, and
accordingly the filter will be applied and it will display visual for Top N items.
In the below figure 31(a) the number of items has been set to 2 (i.e., n=2) and accordingly
data fields value to be set in “By Value” field (Here we are considering Order Quantity which
is selected from Fields pane) and apply filter as shown in figure 31(b).
Figure 31(a): N value set to 2 Figure 31(b): Values set to ‘Total Orders’
After applying filter, user can see the below figure 32 with the graph with top 2 products
based on order quantity (i.e., Bikes and Accessories).
Figure 32: Resultant Graph with Top 2 Products based on order quantity.
Now, let us see another filter (from filters on this visual) called “Total Orders”, which contains
a subfield ‘Show items when the value’.
If the user clicks on ‘Show items when the value’ field, then it displays options like is less than,
is less than or equal to, is, is not, is greater than etc.as shown in figure 33. (User can select
any option based on his requirement)
Next to ‘Show items when the value’ we have an empty field to enter the value.
For example, ‘Show items when the value’ is set to ‘is less than’ and the empty field is set to
5000 and click on apply filter then the below graph will be displayed. User can see the empty
graph indicating there is no such value where the order is less than 5000 as shown in figure
34.
Figure 34: Empty graph indicating there is no such value where the order is less than 5000.
If user enter 15000 (instead of 5000) then the below figure 35 - graph will be displayed
indicating that bikes and clothing are ordered less than 15000.
Figure 35: Graphs indicating that bikes and clothing are ordered less than 15000.
Let’s consider the user has added the Product Category in the data field from the field section.
Then the user can notice the same filter type as discussed before i.e., basic filtering and
advanced filtering.
Here if we select the options let say bike, then the click on apply. Then changes have been
applied to all the visuals (graphs) that are present on that page as shown in figure 36.
Similarly, we can apply advanced Filtering by selecting the options from ‘Show items when
the value’ and by entering the value in the next field.
For example, let’s select the ‘or’ operation for ‘contains’ field to visualize either Bikes or
Clothing data:
Bikes (to be entered in the empty field after the item field) and check the ‘or’ option
Again, select the ‘contains’ and the second category after that i.e., clothing and apply the filter.
The result is shown in figure 37, where the filter has been applied to both visuals.
Now, look back at the page 2, where no changes have been applied to the visuals (as we have
applied changes for the current entire page and not for all the pages).If user wants to check
the page 2 then clear the filters (by clicking on eraser icon which is present near category
name field) and select Filters on all pages.
Select the Filters on all pages and select the ‘category name’ from the ‘Fields’. Again, here we
have Basic filtering and Advanced filtering.
Let’s work with Basic Filtering, and select the clothing and Bikes from the options. Then in
all the pages (here we have Page 1 and Page 2) we can see the updating in visuals with
respect to clothing and Bikes only as shown in figure 38(a) and (b).
Figure 38(a): Applied filter to page 1 Figure 38(b): Applied filter to page 2.
Let’s clear all the filters and work on Page 1. In this page if user selects Bikes by clicking on
the bar graph as shown in the below figure 39, then Bike bar will be highlighted and the
corresponding sub categories from the second graph will be highlighted indicating that this
is also a filtering option.
Figure 39: Highlighting the Particular category and sub category in the graph.
Power BI is a well-known application for data visualization and reporting that gives you a
variety of formatting options to help you produce polished and interesting reports.
Now, let’s understand the report formatting options by considering a sample graph as shown
in figure 40.
To change the view of the displayed visual, then will perform the below steps:
1. Select the Visual (bar graph) by clicking on it, so that it gets highlighted. As soon as the
visual gets highlighted, the fields are updated under the field pane indicating what are
all the fields present in the visual as shown in figure 41.
If the user clicks on formatting options, then formatting options also get updated concerning
bar graphs like the x-axis, y-axis etc. as shown in figure 42.
If a user wants to change the visual of the graph, let’s say instead of a Bar graph (shown in
fig 42) user wants to view it as a pie chart. The user has to click the pie-chart symbol from
the visualization pane and update the graph. As soon as the graph is updated the user can
notice that formatting options will also get updated with options like Legend, Data colors,
Title etc. The updated pie chart and formatting options can be seen in figure 43.
Figure 43: Bar graph converted to the pie chart with updated formatting options.
Basically, the field pane and the formatting options are specific to the chart type the user
selects to visualize the data.
For example, if the user wants to view data as a line graph then according to that the field
pane/section and formatting options will be updated as shown in figure 44.
Figure 44: Visualizing Data as a line graph with updated formatting options view
Self-Assessment Questions -1
1. Name some of the information displayed in the startup or the launch page
2. The ___________in Power BI is the main work area where users can create and
design reports and visualizations using data from various sources.
3. The default chart selected for a integer column with continuous values is a
____________________.
4. To display a horizontal bar chart per subcategory, set the subcategory
column to the __________ attribute.
5. To control the range of values that appear on the screen the
______________option can be enabled.
6. The ____________ in Power BI is where you design and build the visualizations
and reports that you want to present to your audience.
7. Visualizations can be moved or resized with _____________ enabled, and the
visualizations will automatically align to the closest grid line.
8. The actual data sets or data tables will appear in ________ section
9. User can customize which data to be shown in his visualizations by using
________
10. Numbers appearing in Toolkit can be formatted to include symbols like ‘,’, ‘%’
etc -True or False
8. TERMINAL QUESTIONS
1. What are the different types of visualizations available in Power BI and how can a user
format them?
2. Explain the procedure to apply conditional formatting to the report visuals in Power
BI?
3. Elucidate the procedure to apply a filter to a specific visual or group of visuals in Power
BI?
4. How can a user use a filter to highlight specific data in a Power BI report?
5. Briefly explain the key features and components of the report view.