Introduction to Big Data
BS (CS) 6th
Lecture # 1
Dr. Syed Attique Shah (Ph.D.)
Assistant Professor
Department of Information Technology,
Faculty of Information & Communication Technologies (FICT),
BUITEMS, Quetta, Pakistan
1
About Me
Dr. Syed Attique Shah
PhD
Department of Geographical Information Technologies,
Informatics Institute,
Istanbul Technical University, Turkey.
Assistant Professor/ Chairperson
Department of Information Technology,
Faculty of Information & Communication
Technologies (FICT),
BUITEMS, Quetta, Pakistan.
DATA: SOME INTERESTING FACTS
In 2020, there will be around 40 trillion gigabytes of data (40 zettabytes).
(Source: Dell EMC)
90% of all data has been created in the last two years.
(Source: IBM)
In 2012, only 0.5% of all data was analyzed.
(Source: The Guardian)
97.2% of organizations are investing in big data and AI.
(Source: New Vantage)
DATA: SOME INTERESTING FACTS (cont.)
What was big data and analytics market worth in 2019? $49 billion, says Wikibon.
(Source: Wikibon)
In 2020, the big data market is expected to grow by 14%.
(Source: Statista)
Job listings for data science and analytics will reach around 2.7 million by 2020.
(Source: Forbes)
HOW IS BIG DATA DIFFERENT?
• Typically an entirely new source of data (e.g. Social Media, Mobile devices)
• Automatically generated by a machine (e.g. Sensor embedded in an engine)
• Not designed to be friendly (e.g. Text streams/ Unstructured data)
• May not have much values if an effective analytical technique is not used
TACKLING NEW BIG DATA
DATA is the “NEW OIL”
But then again:
• Mostly unstructured data (Approx: 80%+ of all the data)
• Lots of machine-generated data
• Key-value pairs instead of data tables
• No-SQL vs. RDBMs
• Storage and computing on commodity hardware
• Distributed storage and computing
• Lots of open-source solutions
• Complex data pre-processing (parsing, ETL, etc.)
• New analytics technologies (Hadoop/MR, 2005)
• New visualization techniques
• Cloud-based analytics vs. local analytics
Source: Andrei Khurshudov, PhD Alchemy IoT, Colorado, USA
BIG DATA ANALYTICS (BDA) VS. TRADITIONAL ANALYTICS
Source: Andrei Khurshudov, PhD Alchemy IoT, Colorado, USA
WHAT BDA OFFERS?
• Perfect consumer of data and a generator of decisions/commands/processed data
• Highly-dependent on fast, reliable, robust distributed infrastructure
• Abundance of cheap distributed storage and computing makes progress faster
• Fully suitable for working with the IoT ecosystem and users
• Main future direction: intelligent, autonomous, self-learning algorithms
• Over time, as analytics becomes more autonomous so will be decision-making in various applications
A GENERAL BIG DATA FRAMEWORK
A GENERAL BIG DATA FRAMEWORK (TOOLS OPTIONS)
WHAT IS THERE IN THE FUTURE FOR ICT?
Data, Data and more (Data) ….
Equipped with enhanced data storage
and processing Infrastructure
Enhanced global network of connected sensors,
devices, and machines (IoT) ….
The ever scalable INTERNET
Smart, autonomous, self-learning
data-processing and decision-making
algorithms (Analytics) ….
Integration of technologies (Solutions)….
WHERE INTERNET OF THINGS (IOT) STANDS?
Sensors + Connectivity + People and Processes
We are giving our world a digital
nervous system. Location data using These inputs are digitized and placed onto These networked inputs can then be
GPS sensors. Eyes and ears using networks. combined into bi-directional systems that
cameras and microphones, along with integrate data, people, processes and
sensory organs that can measure systems for better decision making.
everything from temperature to
pressure changes.
Source: https://www.postscapes.com/what-exactly-is-the-internet-of-things-infographic/
IOT APPLICATIONS (To name a few)
All these applications
will generate and
consume a lot of Data
Source: https://www.postscapes.com/what-exactly-is-the-internet-of-things-infographic/
WHAT IOT OFFERS?
• Huge new source of data and a big consumer of decisions/commands/processed data
• Today, relatively little data is collected, and little of what is collected is analyzed (less than 5%) from
IoTs
• Main future direction: “to Connect Everything to Everything and to Everyone”
• Highly-dependent on fast & smart analytics
• People will become less involved in decision-making over time
• Up to 40% of IoT would rely on local analytics (Edge/Fog IoT)
GENERAL BIG DATA ANALYTICS VS. IOT ANALYTICS
The main possible differentiator:
• Hybrid Analytics (Edge + Cloud)
• Edge analytics deals with
high-velocity data processed
near the sensors (because
cloud-based analytics is too
slow and too expensive for
this task)
• Cloud-based analytics
complements “Edge” analytics
• Both analytics systems work
together to deliver the best
possible value
Source: Andrei Khurshudov, PhD Alchemy IoT, Colorado, USA
THE INTEGRATION OF IOT AND BDA
• IoT is a giant, fast-growing data-generating platform
• BDA is a giant, fast-growing data-processing engine
• IoT is nearly helpless without fast and powerful analytics that enable the best decisions in real time
• BDA strengths become more pronounced in the presence of a lot of fast, diverse, noisy, multi-
dimensional data, which is what IoT is generating
• Growth and success in one area will promote growth and success in another area as well
CONCLUSION
• Data has become the unprecedented driver for innovation
• In the era of the Internet of Things and Mobility, with a huge volume of data becoming
available at a fast velocity, and there is huge scope for an efficient data analytics systems
• Big Data Frameworks are an efficient way to handle IoT data as we move to real time use
cases
• IoT is on the way to adopt fully-automated, autonomous, selflearning Big Data Analytics
• Big Data and IoT is changing the way data is leveraged and can lead to a paradigm shift in
various function-specific systems