Unit II. Methods and Techniques For Data Analytics
Unit II. Methods and Techniques For Data Analytics
• Sales
• Salary
users.
• https://www.youtube.com/watch?v=YfE9jBq0
Looker
• Looker is a Looker data visualization tool that can go in-depth
in the data and analyze it to obtain useful insights. It provides
real-time dashboards of the data for more in-depth analysis
so that businesses can make instant decisions based on the
data visualizations obtained. Looker also provides connections
with Redshift, Snowflake, BigQuery, as well as more than 50
SQL supported dialects so you can connect to multiple
databases without any issues.
• https://www.youtube.com/watch?v=8Pzmrcu63oY
Zoho Analytics
• Zoho Analytics is a Business Intelligence and Data Analytics
software that can help you create wonderful looking data
visualizations based on your data in a few minutes. You can
obtain data from multiple sources and mesh it together to
create multidimensional data visualizations that allow you to
view your business data across de
• Zoho Analytics allows you to share or publish your reports with
your colleagues and add comments or engage in conversations
as required.partments.
• https://www.youtube.com/watch?v=Pc72RNNtXzc
Sisense
• Sisense is a business intelligence-based data visualization
system and it provides various tools that allow data analysts
to simplify complex data and obtain insights for their
organization and outsiders. Sisense believes that eventually,
every company will be a data-driven company and every
product will be related to data in some way.
• https://www.youtube.com/watch?v=6N3mkTWI5R4
IBM Cognos Analytics
• IBM Cognos Analytics is an Artificial Intelligence-based
business intelligence platform that supports data analytics
among other things. You can visualize as well as analyze your
data and share actionable insights with anyone in your
organization. Even if you have limited or no knowledge about
data analytics, you can use IBM Cognos Analytics easily as it
interprets the data for you and presents you with actionable
insights in plain language.
• https://www.youtube.com/watch?v=CMn-65yUM4U
Qlik Sense
• Qlik Sense is a data visualization platform that helps
companies to become data-driven enterprises by providing an
associative data analytics engine, sophisticated Artificial
Intelligence system, and scalable multi-cloud architecture that
allows you to deploy any combination of SaaS, on-premises or
a private cloud.
• https://www.youtube.com/watch?v=sd84bsRWSLY
Domo
• Domo is a business intelligence model that contains multiple
data visualization tools that provide a consolidated platform
where you can perform data analysis and then create
interactive data visualizations that allow other people to
easily understand your data conclusions. You can combine
cards, text, and images in the Domo dashboard so that you
can guide other people through the data while telling a data
story as they go
• https://www.youtube.com/watch?v=S3VW8FC47io
Microsoft Power BI
• Microsoft Power BI is a Data Visualization platform focused on
creating a data-driven business intelligence culture in all
companies today. To fulfill this, it offers self-service analytics
tools that can be used to analyze, aggregate, and share the
data in a meaningful fashion.
• https://www.youtube.com/watch?v=yKTSLffVGbk
Klipfolio
• Klipfolio is a Canadian business intelligence company that
provides one of the best data visualization tools. You can
access your data from hundreds of different data sources like
spreadsheets, databases, files, and web services applications
by using connectors. Klipfolio also allows you to create custom
drag-and-drop data visualizations wherein you can choose
from different options like charts, graphs, scatter plots, etc.
• https://www.youtube.com/watch?v=sw7qApKnS8U
SAP Analytics Cloud
• SAP Analytics Cloud uses business intelligence and data
analytics capabilities to help you evaluate your data and
create visualizations in order to predict business outcomes. It
also provides you with the latest modeling tools that help you
by alerting you of possible errors in the data and categorizing
different data measures and dimensions. SAP Analytics Cloud
also suggests Smart Transformations to the data that lead to
enhanced visualizations.
• https://www.youtube.com/watch?v=eGGZ33fzxK0
Machine learning
• Machine learning algorithms are part of artificial intelligence
(AI) that imitates the human learning process.
• Human learns through multiple experiences to perform a task.
• Similarly machine learning algorithms usually develop
multiple models and each model is equivalent to an
experience.
• Two groups: Knowledge acquisition and skill refinement
• When the training data set has both predictors (input) and
outcome (output) variables, we use supervised learning
algorithms.
• Learning is supervised by the fact that predictors and the
outcome are available for the model to use .
• Techniques such as regression , logistic regression, decision
tree learning , random forest.
• https://www.youtube.com/watch?v=Bo5dJT1QlHc
Unsupervised Learning Algorithms
• When the training data has ony predictor (input) variable, but
not the outcome variable, then we use unsupervised learning
algorithms.
• Techniques such as K means clustering and Hierarchical
clustering are examples of unsupervised learning algorithms.
• https://www.youtube.com/watch?v=4oB0fuOLWIY
Reinforcement Learning Algorithms
data.
• https://forms.gle/GMXVfb9vVEtePoTs6
Statistics
• Digital data analytics is exploratory , observational , visual and
mathematical
• Common data analysis method is used commonly in the
organizations.
• Quantitative techniques applied judiciously to data to answer
business questions
• Certain techniques exist for understanding data
• Correlation is used to check relationship two or more data
points
• Regression analysis to determine if certain data can predict
other data
• Distribution and assessment of probability
• Hypothesis testing to create best fitting model for predictive
power.
Correlating
• The statistics adage is that “Correlation is not causation,”
which is certainly true. Correlation, however, does imply
association and dependence.
• The analyst’s job is thus to prove that observed associations in
data are truly dependent, relevant to the business questions,
and ultimately whether the variable(s) cause the relationship
calculated
Regressing Data: Linear, Logistic, and So On
• The phrase regression analysis means applying a
mathematical method for understanding the relationship
between one or more variables.
• In more formal vocabulary, a regression analysis attempts to
identify the impact of one or more independent variables on a
dependent variable.
• Analytics professionals and the people who ask for analytical
deliverables often talk about regression, regression analysis,
the best fitting line, and ways to describe determining or
predicting the impact of one or more factors on a single or
multiple other factors.
• Impact of marketing program on sales
• In digital analytics, the regression analysis is used to
determine the impact of one or more factors on another
factor.
• Single and Multiple Linear Regression
• a simple linear regression is used when an analyst
hypothesizes that there is a relationship between the
movements of two variables in which the movements of one
variable impact either positive or negatively the movements
of another variable.
• Multiple linear regression and other forms of regression
where the dependent variable—that is, the variable for which
you are predicting—is predicted based on more than one
independent variable are used in digital analytics.
Understanding the marketing mix and how different
marketing channels impact response is often modeled using
multiple logistic regression.
• Logistic Regression
• Logistic regression enables predicting a categorical variable
based on several independent (predictor) variables.
• The output of a logistic regression is binomial if only two
answers are possible or multinomial if more than one answer
is possible.
• A 0 or 1 may be the results of binomial logistic regression,
whereas an output of “yes, no, or maybe” may be the output
of a multinomial logistic regression.
• Probability and Distributions
• The shape of data and observing shape can help an analyst
understand the data and the type of analytical methods to
use on the data.
• After all, the way an analyst applies a method to a normal
distribution versus a non-normal distribution is different.
• Probability simply stated is the study of random events. In
analytics, you use statistics and math to model and
understand probability of all sorts of things.
• In digital analytics, you are concerned about probabilities
related to whether a person will buy, visit again, or have a
deeper and more engaging experience and so on.
A digital analyst should be familiar with the
following concepts:
• Modeling probability and conditionality
• Building a model requires selecting (and often in analytics,
creating/collecting) accurate data, the dimension, and the
measures that can create your predictor variables.
• Central to the tendency to create models is statistical aptitude
and an understanding of measures, probability, and
conditionality.
• Measuring random variables
• A random variable is a type of data in which the value isn’t
fixed; it keeps changing based on conditions. In digital
analytics, most variables, whether continuous or discrete, are
random.
• Understanding binomial distributions and hypothesis testing
• A common way to test for statistical significance is to use
binomial distribution when you have two or more values
(such as yes or no, heads or tails).
• This type of testing considers the null hypothesis is done using
Z and T tables and P-values. The types of test are one-tailed or
two-tailed.
• If you want to understand more than two variables, you
would use a multinomial test and go beyond simple
hypothesis testing to perhaps chi-squares.
• Learning from the sample mean.
• The sample mean helps you understand the distribution and
is subject, of course, to the central limit theorem, which
indicates the larger the sample population, the more closely
the distribution will approximate normal.
• Thus, when modeling data, the sample mean and the related
measures of standard deviation variance can help you
understand the relationship between variables, especially
with smaller data sets.
Experimenting and Sampling Data
thing.
• Thus, experimenting in digital means controlled
experimentation. A controlled experiment is an experiment
that uses statistics to validate the probability that a sample is
as close as possible to identical to the control group.
• Although the boundaries of a controlled experiment may be
perceived as less rigorous than a true experiment in which
only one variable changes, that’s not actually true because
controlled experiments, when performed correctly, use the
scientific method and are statistically valid.
Population
• The aggregate group of people on which the controlled
experiment is performed or which data already collected is
analyzed.
• The population is divided into at least two groups: the control
group and the test group.
• The control group does not receive the test, whereas the test
group, of course, does.
Sampling method
• The way you select the people, customer, visitors, and so on
for your experiment is important. And it depends on whether
you want to understand a static population or a process
because different sampling methods are required. Sampling is
important because a poorly or sloppily sampled group can
give you poor results from experimentation.
• Random sample
• Stratified sampling
• Systematic sampling
Expected error
• When analyzing the results of experiments by applying the
methods you need to go into your experiment with an idea of
the expected amount of error you are willing to tolerate.
• There are various types of errors (such as type 1 and type 2).
Confidence intervals and confidence levels are applied to
understand and limit expected error (or variability by chance)
to an acceptable level that meets your business needs.
• Independent variable
• What you are holding static in the population or is shared
among the population or subgroups are the independent
variables.
• Dependent variables
• The predicted variables that are the outcome of the data
analysis.
Confidence intervals
• Commonly stated at 95 percent or 99 percent. Other times
they could be as low as 50 percent. The meaning of a
confidence interval is generally said to be the “99 percent of
the population will do X or has Y,” but that interpretation
would be incorrect.
• A better way to think of confidence intervals in digital analysis
is that were you to perform the same analysis again on a
different sample, the model would include the population you
are testing 99 percent of the time.
Significance testing
• Involves calculating how much of an outcome is explained by
the model and its variables. Often expressed between 10
percent and 0.01 percent, the significance test enables you to
determine whether the results were caused by error or
chance.
• Done right, analysts can say that their model was significant
to 99 percent, meaning that there’s a 1 in 100 chance that the
observed behavior was random.
Comparisons of data over time
• Such as Year over Year, Week over Week, and Day over Day
are helpful for understanding data movements positively and
negatively over time. Outlier comparisons need to be
investigated.
Inferences
• What are made as a result of analysis. Inferences are the
logical conclusions—the insights—derived by using statistical
techniques and analytical methods.
• The result of an inference is a recommendation and insight
about the sampled population.
Attribution: Determining the Business Impact
and Profit
• During the last few years, the concept of attribution has become
important within digital analytics.
• In digital analytics, attribution is the activity and process for
establishing the origin of the people who visited a digital experience.
Attribution is a rich area explored by data scientists worldwide.
• The roots of attribution for digital analytics come from traditional
web and site analytics where business people, primarily marketing,
wanted to understand the reach (that is, the number of people), the
frequency, and the monetary impact of marketing programs and
campaigns.
• Going back even further, the idea of attribution has roots in
financial management and measurement.
• Attribution enables an analyst to identify from data that an
absolute number of visits or visitors came from a particular
source, such as paid search, display advertising, or an email
campaign.
• By understanding the sources that send people who convert
(and thus create economic value), business people can then
fine-tune and optimize their work to produce the best
financial result.
• Attribution in digital analytics includes the click but also goes
beyond the click.
• Interactions with digital experiences that may not require a
click (think of a touch-enabled smart device) can be attributed
—as can exposures to events, types of content, or advertising
(as in the case of view-thru conversion).
• A list of fairly common attribution models are listed where a
click describes what could also be an interaction or exposure:
First click (or interaction or exposure)