0% found this document useful (0 votes)
16 views

2

Big data encompasses vast collections of structured, unstructured, and semi-structured data that traditional systems struggle to manage due to their volume, velocity, and variety, often referred to as the '3 Vs.' Additional characteristics include veracity, variability, and value, which are crucial for effective data analysis. Various analytical methods, such as text, statistical, and predictive analysis, are employed to derive insights that drive informed decision-making and improve business operations.

Uploaded by

theeeclipse17
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

2

Big data encompasses vast collections of structured, unstructured, and semi-structured data that traditional systems struggle to manage due to their volume, velocity, and variety, often referred to as the '3 Vs.' Additional characteristics include veracity, variability, and value, which are crucial for effective data analysis. Various analytical methods, such as text, statistical, and predictive analysis, are employed to derive insights that drive informed decision-making and improve business operations.

Uploaded by

theeeclipse17
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

BIG DATA

What is Big Data?


Big data refers to extremely large and diverse collections of structured,
unstructured, and semi-structured data that continues to grow
exponentially over time. These datasets are so huge and complex in volume,
velocity, and variety, that traditional data management systems cannot
store, process, and analyze them.

The Vs of big data


Big data definitions may vary slightly, but it will always be described in
terms of volume, velocity, and variety. These big data characteristics are
often referred to as the “3 Vs of big data” and were first defined by Gartner
in 2001.
1. Volume
As its name suggests, the most common characteristic associated with big
data is its high volume. This describes the enormous amount of data that is
available for collection and produced from a variety of sources and devices
on a continuous basis.
2. Velocity
Big data velocity refers to the speed at which data is generated. Today, data
is often produced in real time or near real time, and therefore, it must also
be processed, accessed, and analyzed at the same rate to have any
meaningful impact.
3. Variety
Data is heterogeneous, meaning it can come from many different sources
and can be structured, unstructured, or semi-structured. More traditional
structured data (such as data in spreadsheets or relational databases) is
now supplemented by unstructured text, images, audio, video files, or semi-
structured formats like sensor data that can’t be organized in a fixed data
schema.
In addition to these three original Vs, three others that are often mentioned
in relation to harnessing the power of big data: veracity, variability,
and value.
1. Veracity: Big data can be messy, noisy, and error-prone, which makes it
difficult to control the quality and accuracy of the data. Large datasets
can be unwieldy and confusing, while smaller datasets could present an
incomplete picture. The higher the veracity of the data, the more
trustworthy it is.
2. Variability: The meaning of collected data is constantly changing,
which can lead to inconsistency over time. These shifts include not only
changes in context and interpretation but also data collection methods
based on the information that companies want to capture and analyze.
3. Value: It’s essential to determine the business value of the data you
collect. Big data must contain the right data and then be effectively
analyzed in order to yield insights that can help drive decision-making.
 Big data analytics is the process of analyzing large amounts of data to
find patterns and insights. It's used to help businesses make better
decisions.

Data analytics examples


1. Text analysis: What is happening?
Text analysis, AKA data mining, involves pulling insights from large
amounts of unstructured, text-based data sources: emails, social media,
support tickets, reviews, and so on. You would use text analysis when the
volume of data is too large to sift through manually.
Here are a few methods used to perform text analysis, to give you a sense of
how it's different from a human reading through the text:
 Word frequency identifies the most frequently used words. For example,
a restaurant monitors social media mentions and measures the frequency
of positive and negative keywords like "delicious" or "expensive" to
determine how customers feel about their experience.
 Language detection indicates the language of text. For example, a global
software company may use language detection on support tickets to
connect customers with the appropriate agent.
 Keyword extraction automatically identifies the most used terms. For
example, instead of sifting through thousands of reviews, a popular brand
uses a keyword extractor to summarize the words or phrases that are most
relevant.
Because text analysis is based on words, not numbers, it's a bit more
subjective. Words can have multiple meanings, of course, and Gen Z makes
things even tougher with constant coinage. Natural language
processing (NLP) software will help you get the most accurate text analysis,
but it's rarely as objective as numerical analysis.
2. Statistical analysis: What happened?
Statistical analysis pulls past data to identify meaningful trends. Two
primary categories of statistical analysis exist: descriptive and inferential.
3. Descriptive analysis
Descriptive analysis looks at numerical data and calculations to determine
what happened in a business. Companies use descriptive analysis to
determine customer satisfaction, track campaigns, generate reports, and
evaluate performance.
Here are a few methods used to perform descriptive analysis:
 Measures of frequency identify how frequently an event occurs. For
example, a popular coffee chain sends out a survey asking customers what
their favorite holiday drink is and uses measures of frequency to determine
how often a particular drink is selected.
 Measures of central tendency use mean, median, and mode to identify
results. For example, a dating app company might use measures of central
tendency to determine the average age of its users.
 Measures of dispersion measure how data is distributed across a range.
For example, HR may use measures of dispersion to determine what salary
to offer in a given field.
4. Inferential analysis
Inferential analysis uses a sample of data to draw conclusions about a much
larger population. This type of analysis is used when the population you're
interested in analyzing is very large.
Here are a few methods used when performing inferential analysis:
 Hypothesis testing identifies which variables impact a particular topic.
For example, a business uses hypothesis testing to determine if increased
sales were the result of a specific marketing campaign.
 Confidence intervals indicates how accurate an estimate is. For example,
a company using market research to survey customers about a new
product may want to determine how confident they are that the individuals
surveyed make up their target market.
 Regression analysis shows the effect of independent variables on a
dependent variable. For example, a rental car company may use
regression analysis to determine the relationship between wait times and
number of bad reviews.
5. Diagnostic analysis: Why did it happen?
Diagnostic analysis, also referred to as root cause analysis, uncovers the
causes of certain events or results.
Here are a few methods used to perform diagnostic analysis:
 Time-series analysis analyzes data collected over a period of time. A
retail store may use time-series analysis to determine that sales increase
between October and December every year.
 Data drilling uses business intelligence (BI) to show a more detailed view
of data. For example, a business owner could use data drilling to see a
detailed view of sales by state to determine if certain regions are driving
increased sales.
 Correlation analysis determines the strength of the relationship between
variables. For example, a local ice cream shop may determine that as the
temperature in the area rises, so do ice cream sales.
6. Predictive analysis: What is likely to happen?
Predictive analysis aims to anticipate future developments and events. By
analyzing past data, companies can predict future scenarios and make
strategic decisions.
Here are a few methods used to perform predictive analysis:
 Machine learning uses AI and algorithms to predict outcomes. For
example, search engines employ machine learning to recommend products
to online shoppers that they are likely to buy based on their browsing
history.
 Decision trees map out possible courses of action and outcomes. For
example, a business may use a decision tree when deciding whether to
downsize or expand.
7. Prescriptive analysis: What action should we take?
The highest level of analysis, prescriptive analysis, aims to find the best
action plan. Typically, AI tools model different outcomes to predict the best
approach. While these tools serve to provide insight, they don't replace
human consideration, so always use your human brain before going with the
conclusion of your prescriptive analysis. Otherwise, your GPS might drive
you into a lake.
Here are a few methods used to perform prescriptive analysis:
 Lead scoring is used in sales departments to assign values to leads based
on their perceived interest. For example, a sales team uses lead scoring to
rank leads on a scale of 1-100 depending on the actions they take (e.g.,
opening an email or downloading an eBook). They then prioritize the leads
that are most likely to convert.
 Algorithms are used in technology to perform specific tasks. For example,
banks use prescriptive algorithms to monitor customers' spending and
recommend that they deactivate their credit card if fraud is suspected.

Data analysis process: How to get started


The actual analysis is just one step in a much bigger process of using data
to move your business forward. Here's a quick look at all the steps you need
to take to make sure you're making informed decisions.
Data decision
As with almost any project, the first step is to determine what problem
you're trying to solve through data analysis.
Make sure you get specific here. For example, a food delivery service may
want to understand why customers are canceling their subscriptions. But to
enable the most effective data analysis, they should pose a more targeted
question, such as "How can we reduce customer churn without raising
costs?"
These questions will help you determine your KPIs and what type(s) of data
analysis you'll conduct, so spend time honing the question—otherwise your
analysis won't provide the actionable insights you want.
Data collection
Next, collect the required data from both internal and external sources.
 Internal data comes from within your business (think CRM software,
internal reports, and archives), and helps you understand your business
and processes.
 External data originates from outside of the company (surveys,
questionnaires, public data) and helps you understand your industry and
your customers.
You'll rely heavily on software for this part of the process. Your analytics or
business dashboard tool, along with reports from any other internal tools
like CRMs, will give you the internal data. For external data, you'll
use survey apps and other data collection tools to get the information you
need.
Data cleaning
Data can be seriously misleading if it's not clean. So before you analyze,
make sure you review the data you collected. Depending on the type of
data you have, cleanup will look different, but it might include:
 Removing unnecessary information
 Addressing structural errors like misspellings
 Deleting duplicates
 Trimming whitespace
 Human checking for accuracy
You can use your spreadsheet's cleanup suggestions to quickly and
effectively clean data, but a human review is always important.
Data analysis
Now that you've compiled and cleaned the data, use one or more of the
above types of data analysis to find relationships, patterns, and trends.
Data analysis tools can speed up the data analysis process and remove the
risk of inevitable human error. Here are some examples.
 Spreadsheets sort, filter, analyze, and visualize data.
 Business intelligence platforms model data and create dashboards.
 Structured query language (SQL) tools manage and extract data in
relational databases.
Data interpretation
After you analyze the data, you'll need to go back to the original question
you posed and draw conclusions from your findings. Here are some common
pitfalls to avoid:
 Correlation vs. causation: Just because two variables are associated
doesn't mean they're necessarily related or dependent on one another.
 Confirmation bias: This occurs when you interpret data in a way that
confirms your own preconceived notions. To avoid this, have multiple
people interpret the data.
 Small sample size: If your sample size is too small or doesn't represent
the demographics of your customers, you may get misleading results. If
you run into this, consider widening your sample size to give you a more
accurate representation.
Data visualization
Last but not least, visualizing the data in the form of graphs, maps, reports,
charts, and dashboards can help you explain your findings to decision-
makers and stakeholders. While it's not absolutely necessary, it will help tell
the story of your data in a way that everyone in the business can understand
and make decisions based on.

Why is Data Analysis so Important ?


Or, to put it in terms that show more about its value: how do
businesses make better decisions, analyze trends, or invent better
products and services? The simple answer is this; they leverage
the distinct methods of data analysis to reveal insights that would
otherwise get lost in the mass of information, and big data analytics
is getting even more prominent owing to the below reasons.

1. Informed Decision-making

This one can pretty much seem to be the “leader,” not least since
the modern business world relies on facts rather than it does on
intuition—and it’s data analysis that serves as that all-important
foundation of informed decision-making. It’s best here to show this
in a UX (user experience) light, so consider what the role of data
analysis in UX design is, specifically when it’s about dealing with
non-numerical, subjective information.

Qualitative research gets right down into the “why” and “how”
beneath—or behind—user behavior, and reveals nuanced insights.
It provides a foundation for making well-informed decisions
regarding color, layout, and typography, and if you apply these
insights, it means you can create visuals that deeply resonate with
your target audience.

2. Better Customer Targeting and Predictive Capabilities

It might sound close to a cliche—or even a truism—but data has


become the lifeblood of successful marketing, and organizations do
rely on data science techniques—and how they can create targeted
strategies and marketing campaigns that work. Big data analytics
helps uncover deep insights about consumer behavior—vital
arteries of information to tap and channel to direct efforts later—
for instance, Google collects and analyzes many different data
types, and it examines search history, geography, and trending
topics to deduce what consumers want.

3. Improved Operational Efficiencies and Reduced Costs

Another big benefit data analytics brings—or advantage of it—is


streamlining operations and reducing organization costs, and it’s
something that makes it easier for businesses to identify
bottlenecks and opportunities for improvement, and so they’re
more able to be in a better position to optimize resource allocation
and—ultimately—bring costs down.

Procter & Gamble (P&G) is a leading company, and they use data
analytics to optimize their supply chain and inventory management,
and data analytics helps this industry leader reduce excess
inventory and stockouts, achieving cost savings for them.

4. Better Customer Satisfaction and Retention

Customer behavior patterns enable you to understand how they


feel about your products, services, and brand—vital sentiments to
appreciate and comprehend at a deep level. But there’s more than
that, in that different data analysis models help uncover future
trends—and trends allow you to personalize the customer
experience and improve satisfaction, so winning more “points”—of
the kudos variety—and loyalty with your customers as time goes on
and they stick with your brand.

The eCommerce giant Amazon learns from what each customer


wants and likes, and what it then does is it recommends the same
or similar products when they come back to the shopping app. Data
analysis helps create personalized experiences for Amazon
customers and improves user experience—and it’s part of what’s
behind the “magic” customers feel that the brand knows them and
the types of things they need, want, and enjoy.

1. Quantitative Data Analysis

The name pretty much gives away what quant analysis is about—
quantitative analysis means to look at the “what” that you’ve got:
as in, the complex data, the actual numbers, or the rows and
columns, and it’s something perhaps best illustrated through a
scenario:

Your e-commerce company wants to assess the sales team’s


performance and see how everyone’s doing on it, so you gather
quantitative data on various key performance indicators (KPIs),
and they’re things that include the number of units sold, the sales
revenue, the conversion rates, and—last, but not least—
the customer acquisition costs. So, from doing an analysis of these
numeric data points, the company is then able to calculate
its monthly sales growth, the average order value, and the return
on investment (ROI) for each sales representative: more than a
little important to have.

How does it help?

The quantitative analysis is a pretty handy thing that can help you
identify the top-performing sales reps, the best-selling products,
and the most cost-effective customer acquisition channels—and
these are metrics that help the company make data-driven
decisions and improve its sales strategy—ultra-important insights
to be able to wield in a competitive marketplace.

2. Qualitative Data Analysis

Quant data can help with bucket-loads of insights, but—for all its
value—there are situations where numbers in rows and columns
are just impossible to fit, and that’s where qualitative
research can help you understand the data’s underlying factors,
patterns, and meanings via non-numerical means and get behind
what’s going on.
To see “qual data” in action, imagine you’re a product manager for
an online shopping app and what you want to do is to improve the
app’s user experience and boost user engagement. You’ve got
quantitative data that tells you what’s going on but not why—which
is why you’ve got to collect customer feedback through interviews,
open-ended questions, and online reviews, as well as conduct in-
depth interviews to explore their experiences.

Best Data Analysis and Modeling Techniques


We—as in, the combined human efforts on Earth—generate
over 120 zettabytes daily, which works out to be about 120 billion
copies of the entire Internet in 2020, daily. To call that an ocean of
info would perhaps be an understatement—and then some!—but it’s
a humongous volume of data to trawl, and, without the best data
analysis techniques, businesses—however big they are—will never
be able to collect, analyze, and interpret data into real, actionable
insights. And that’s why they—and you—need to sharpen their
picture of data analysis into something that’s workable, and so
apply the top methods of data analysis or top techniques of data
analysis; there are “five-a-side" for quantitative and qualitative, to
put it one way
Data
Analysis Type Ideal For Best For Pricing
Tool

Business Basic data Paid


Microsoft Spreadshe
Analysts, manipulatio (Microsoft
Excel et
Managers n 365)

Individuals Basic data


, Small- analysis Free with
Google Spreadshe
Medium and Paid
Sheets et
Businesse collaboratio upgrades
s n

Google Web Digital Digital Free and


Analytics Analytics Marketers, marketing Paid
Web analysis (Google
Analytics
Analysts
360)

Free and
Data Paid
RapidMin Data Predictive
Scientists, (various
er Science analytics
Analysts licensing
options)

Business
Data Paid
Analysts, Interactive
Tableau Visualizati (various
Data dashboards
on plans)
Teams

Business
Business Paid
Analysts, Business
Power BI Intelligenc (various
Enterprise reporting
e plans)
s

Data Data Free and


Visual
KNIME Scientists, science Open-
Workflow
Analysts workflows source

Small-
Business Collaboratio Paid
Zoho Medium
Intelligenc n and (various
Analytics Businesse
e reporting plans)
s

Business
Business Paid
Qlik Analysts, Interactive
Intelligenc (various
Sense Data analysis
e plans)
Teams

1. Microsoft Excel
The world’s best and most user-friendly spreadsheet software
features calculations and graphing functions—and it’s ideal for non-
techies to perform basic data analysis and create charts and
reports.

Pros

 No coding is required.
 User-friendly interface.

Cons

 Runs slow with complex data analysis.


 Less automation compared to specialized tools.

2. Google Sheets

Similar to Microsoft Excel, Google Sheets stands out as a


remarkable and cost-effective tool for fundamental data analysis.
Not only does it handle everyday data analysis tasks, including
sorting, filtering, and simple calculations, but it is known for its
seamless collaboration capabilities as well.

Pros

 Easily accessible.
 Compatible with Microsoft Excel.
 Seamless integration with other Google Workspace tools.

Cons

 Lacks advanced features such as in Microsoft Excel.


 May not be able to handle large datasets.

3. Google Analytics

Digital marketers and web analysts use this one widely, and this
tool helps businesses understand how people interact with their
websites and apps, and it provides insights into website traffic, user
behavior, and performance to make data-driven business
decisions.

Pros

 Free version available.


 Integrates with Google services.

Cons

 Limited customization for specific business needs.


 May not support non-web data sources.

4. RapidMiner

RapidMiner is ideal for data mining and model development, and


it’s a platform that offers remarkable machine learning and
predictive analytics capabilities. It allows professionals to work
with data at many stages, including preparation, information
visualization, and analysis.

Pros

 User-friendly interface.
 Excellent support for machine learning.
 Large library of pre-built models.

Cons

 Can be expensive for advanced features.


 Limited data integration capabilities.

5. Tableau

Tableau is one of the best commercial data analysis tools you can
get, and it’s famous for its interactive dashboards and data
exploration capabilities. Data teams can create visually appealing
and interactive data representations through its easy-to-use
interface and powerful capabilities.
Pros

 Intuitive drag-and-drop interface.


 Interactive and dynamic data visualization.
 Backed by Salesforce.

Cons

 Steeper learning curve for advanced features.

6. Power BI

This one is an excellent choice for creating insightful business


dashboards, and among its nifty features it boasts incredible data
integration features and interactive reporting, so making it ideal for
enterprises.

Pros

 Intuitive drag-and-drop interface.


 Interactive and dynamic data visualization.
 Backed by Salesforce.

Cons

 Steeper learning curve for advanced features.

7. KNIME

The name is short for Konstanz Information Miner, and KNIME is


an outstanding tool for data mining, with its user-friendly graphical
interface that makes it accessible even to non-technical users,
enabling them to create data workflows easily. What’s more,
KNIME is a cost-effective choice, and hence, it is ideal for small
businesses operating on a limited budget.

Pros

 Visual workflow for data blending and automation.


 Active community and user support.

Cons

 Complex for beginners.


 Limited real-time data processing.

8. Zoho Analytics

Artificial intelligence and machine learning fuel this one, and Zoho
Analytics is a robust data analysis platform. A neat point is how its
data integration capabilities empower you to seamlessly connect
and import data from diverse sources while offering an extensive
array of analytical functions.

Pros

 Affordable pricing options.


 User-friendly interface.

Cons

 Limited scalability for very large datasets.


 Not as widely adopted as some other tools.

9. Qlik Sense

Qlik Sense offers you a wide range of augmented capabilities, and


it’s got everything from AI-generated analysis and insights to
automated creation and data prep, machine learning, and
predictive analytics.

Pros

 Impressive data exploration and visualization features.


 Can handle large datasets.

Cons
 Steep learning curve for new users.

How to Pick the Right Tool?

Consider the below factors to find the perfect data analysis tool for
your organization:

 Your organization’s business needs.


 Who needs to use the data analysis tools?
 The tool’s data modeling capabilities.
 The tool’s pricing.

You might also like