0% found this document useful (0 votes)
18 views23 pages

Big Data

This document provides an introduction to big data, including definitions, types, architectures and importance. It defines big data as large and complex datasets that are difficult to process using traditional tools. The challenges of big data include capture, storage, analysis and visualization. It discusses the 7 V's of big data - volume, variety, velocity, veracity, value, validity and volatility. It also outlines traditional and modern big data architectures and limitations of big data including prioritizing correlations, security, transferability and inconsistent data collection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views23 pages

Big Data

This document provides an introduction to big data, including definitions, types, architectures and importance. It defines big data as large and complex datasets that are difficult to process using traditional tools. The challenges of big data include capture, storage, analysis and visualization. It discusses the 7 V's of big data - volume, variety, velocity, veracity, value, validity and volatility. It also outlines traditional and modern big data architectures and limitations of big data including prioritizing correlations, security, transferability and inconsistent data collection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to Big Data

Associate Professor
Department of Computer Science and Engineering
Session Objectives

• Big Data Definition


• Big Data Types
• Big Data Architectures
• Big Data Warehousing
Importance of Big Data

• Cost Reductions
• Time Reductions
• New product development and optimize offerings.
• Smart Decision Making
Big Data
• A term for any collection of large and complex data sets.
• It is difficult to process using database management tools or traditional
data processing applications.
• The challenges include capture, refining, storage, search, sharing, transfer,
analysis and visualization.
• Most companies collect ‘millions’ of data items. Many more are available
via Google, Facebook, Twitter, Amazon, etc.
• These data are seldom structured.
• Many companies use “Big Data” for manual queries (marketing and
sales), to answer research questions etc.
• It is still not common to utilize Big Data automatically and
systematically within an algorithmic (forecasting) framework.
• We argue that such use will both contribute to both the analysis and to
the forecasting.
Big Data Types: 7 V s
Volume:
-Big data implies enormous volumes of data
-How much data is really relevant to the problem solution?
-Cost of processing?
-So, can you really afford to store and process all that data?
• Data Volume
– 44x increase from 2009 - 2020
– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially

Exponential increase in
collected/generated data 7
Volume: Example
4.6
30 billion RFID
12+ TBs tags today billion
of tweet data (1.3B in 2005) camera
every day phones
world wide

100s of
millions
data every
? TBs of

of GPS
day

enabled
devices sold
25+ TBs of annually
log data 2+
every day billion
people on
76 million smart meters the Web
in 2009… by end
200M by 2014 2011
Variety:
-Variety refers to the many sources and types of data both structured and unstructured.
-A small fraction is structured formats, Relational, XML, etc.
-A fair amount is semi-structured, as web logs, etc.
-The rest of the data is unstructured text, photographs, etc.
-So, no single data model can currently handle the diversity
Different Types of Data
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can only scan the data once
• A single application can be generating/collecting many types
of data
• Big Public Data (online, weather, finance, etc)

To extract knowledge all these types of data need to linked together


9
Veracity:
-Big Data Veracity refers to the biases, noise and abnormality in data.
-Is the data that is being stored, and mined meaningful to the problem being
analyzed.
-Accuracy, Precision, Reliability, Integrity
-So, what is it that you don’t know you don’t know about the data?
Ba
So nki
cial ng
Me Fin
dia anc Ou
e r
Kn
Ga
ow
mi
ng
Customer n
His
tor
Ent Pur y
erta cha
in se
Velocity:
Big Data Velocity deals with the pace at which data flows in from sources
like business processes, machines, networks and human interaction with
things like social media sites, mobile devices, etc.
- The flow of data is massive and continuous.
- Need for streaming for data analysis
- So, how to analyze data in-flight and combine with data at-rest
• Data is being generated fast and need to be processed fast
• Online Data Analytics
• Late decisions  missing opportunities
• Examples
– Pizza ordering and processing
– Zomatto Servicing
– Healthcare monitoring: sensors monitoring your activities and body  any abnormal
measurements require immediate reaction
11
Example: Real-time/Fast Data

Mobile devices
(tracking all objects all the time)
Social media and networks Scientific instruments
(all of us are generating data) (collecting all sorts of data)
Sensor technology and networks
(measuring all kinds of data)

• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from
the collected data in a timely manner and in a scalable fashion
12
Value:

- How much value is created for each unit of data (whatever it is)?
- So, what is the contribution of subsets of the data to the problem solution?
Validity:

- The data correct and accurate for the intended use.


- Valid data is key to making the right decisions.
- How to validate the data for accuracy?
Product Learning why Customers
Recommendations Switch to competitors
that are Relevant Influence and their offers; in
& Compelling Behavior time to Counter

Improving the Friend Invitations


Customer
Marketing to join a
Effectiveness of a Game or Activity
Promotion while it that expands
is still in Play Preventing Fraud business
as it is Occurring
& preventing more
proactively

Real-Time Analytics/Decision
Requirement
Volatility:

-Big data volatility refers to how long is data valid and how long should it be
stored.
-Real time data you need to determine at what point is data no longer relevant
to the current analysis.
-Decision on to how long keep the data?
TRADITIONAL BIG DATA ARCHITECTURE
STREAMING BIG DATA ARCHITECTURE
LAMBDA BIG DATA ARCHITECTURE
KAPPA BIG DATA ARCHITECTURE
Limitation of Big Data
Prioritizing correlations
• Data analysts use big data to tease out correlation: when one variable is
linked to another.
• However, not all these correlations are substantial or meaningful.
• More specifically, just because 2 variables are correlated or linked
doesn’t mean that a relationship exists between them.

Security
• As with many technological endeavors, big data analytics is prone to
data breach.
• The information that you provide a third party could get leaked to
customers or competitors.
Limitation of Big Data
Transferability
• Because much of the data you need analyzed lies behind a firewall or on a
private cloud, it takes technical know-how to efficiently get this data to
an analytics team.
• Furthermore, it may be difficult to consistently transfer data to specialists
for repeat analysis.

Inconsistency in data collection


• Sometimes the tools we use to gather big data sets are not certain.
• For example, Google is famous for its tweaks and updates; the results of a
search on one day will likely be different from those on another day.
• Hence, if using Google search to generate data sets then the correlations
would change as the data changes.
Summary

• Define Big Data


• Big Data Types
• Big Data Analysis Architecture
Thanks

You might also like