Email Users Send
Data never sleeps… 204,166,667 Emails
How Much Google Receives Over
Data Is 2,000,000
Generated
Every Minute? Search Queries
24/7/365 Apple Receives About
47,000 App Downloads
Brands on Facebook Get
34,722 Likes
Big Data Facts
According to McKinsey – a reta According to Zuckerberg, 5 bill
iler using big data to the fullest ion pieces of content are share
could increase its operating m d via Facebook’s Open Graph D
argins by more than 60% Bad data or poor quality
aily data
costs US businesses $600
Billion annually
Google’s Sundar claims that By 2020 5.4 Million IT jobs
every two days now we cre will be created to support Big D
According
ate as much information as to Gartner
ataBig
– Data
generating 1.9 million
will drive $232 Billion
we did from the dawn of civili jobs ininthe
spUnited States
zation until 2003 ending through 2021
Data Big & Fast Data
• The data volume gets bigger (in the range of petabytes, exabytes Zettabyte (ZB), Yottabyte (YB)
• The data generation, transmission and crunching frequency have gone up significantly (the velocity
varies from batch to real-time)
• The data structure too has got diversified (poly-structured data, the variety)
• The correctness, / accuracy, timeliness, and trustworthiness of data (veracity)
• The intriguing and interesting relationships between data items (viscosity)
The Key Drivers for Big & Fast Data
Primarily due to newer Data Sources, Devices & Emerging Technologies
• Sentient Materials / Smart Objects / Digitized Entities through deeper digitization enabled by edge
technologies (Nano and micro-scale sensors, actuators, codes, chips, controllers, specks, smart dust,
tags, stickers, LED, etc.)
• Connected, resource-constrained, and embedded devices and machines (Device integration buses,
data transmission protocols, operating systems, etc.)
• Ambient Sensing, Vision, and Perception Technologies
• Social media and Professional and Knowledge Sharing Sites
• Consumerization (Mobiles and Wearables)
• Centralization, Commoditization, Containerization & Industrialization (Cloud Computing)
• Communication (Ambient, Autonomic and Unified)
• Integration (D2D, D2E, D2C, C2C, etc.)
• Big & Fast Data Platforms and Infrastructures
The convergence of technologies lays a profound
foundation for Large-scale Data Generation
Cloud Computing
Social Media
Mobile
7
Internet of Things
The extreme connectivity enables data
generation in heaps
The Pragmatic Forecast for the next generation
Market Analysts
• Software applications and services in Billions
• Connected devices in Trillions
• Digitized objects in Quadrillion
The transition from the systems of records (SoR) to the systems of
engagements (SoE) is on.
The IT Implications of Data Explosion
The ICT Infrastructures, Platforms, Toolsets and Human resources are the typical challenges
towards big data and making sense out of it.
1. Aggregate from multiple data sources through data integration and virtualization
2. Pre-process and transform different data formats, structures and sizes as per the target
systems’ needs
3. Store (In-memory and disk)
4. Investigate for extracting actionable and timely insights leveraging competent statistical and
mining algorithms, queries, and methods such as filtering, slicing, dicing, etc.
5. Visualize the knowledge generated via portals, dashboards, report-generation tools, maps,
charts, graphs, etc. in different devices (smartphones, wearables, etc. )
Data leads to insights, which in turn enable individuals, innovators and institutions in their
enterprising journey
The Data Explosion: the Opportunities
and Possibilities
1. Novel Analytical Capabilities and Competencies
2. To have and sustain a service repository comprising
• Insights-driven and Innovation-filled services
• Real-time and Real-world services
• People-centric Physical services
• Cognitive & Context-aware services
3. To craft Smarter Applications out of these services by dynamic service integration and orchestration
In short, it is all about fulfilling the Smarter Planet vision smartly leveraging versatile and resilient
technologies and tools
The Next-Gen Analytical Capabilities
The Emergence of Newer Analytics and Applications (As per McKinsey, at least 13
industry verticals got identified to gain immense benefits out of big data analytics)
Generic (Horizontal) Specific (Vertical)
Real-time Analytics Social Media Analytics
Predictive Analytics Operational Analytics
Prescriptive Analytics Machine Analytics
High-Performance Analytics Retail and Security Analytics
Diagnostic Analytics Sentiment Analytics
Descriptive Analytics Security & Fraud Analytics
Personalized Analytics Weather Analytics
Stream Analytics Watson Content Analytics
Big Data Analytics
Definition
Phases in Analytics
Data Analytics Stack
Typical Analytical Architecture
Data Analytics : Applications & Case studies
Big Data Analytics - Definitions
• Big Data analytics is the process of collecting, organizing and
analyzing large sets of data (called Big Data) to discover patterns and
other useful information – webopedia
• Big data analytics is the use of advanced analytic techniques against
very large, diverse data sets that include structured, semi-structured
and unstructured data, from different sources, and in different sizes
from terabytes to zettabytes - IBM
Data Vs Processing & Analytics
• Big Data Batch Processing & Real-time Processing
• Fast and Streaming Data Real-time Processing &
Historical Processing
The Four-Steps Analytics Processes
Aggregation &
Ingestion
Storage
Analytics
Visualization
The Data Aggregation & Ingestion
Mechanisms
• RDBMS Connectors to Hadoop (Apache Sqoop)
• NoSQL DB Connectors to Hadoop (MongoDB, HBase, Cassandra, RIAK, etc.)
• Specific File Adaptors
• Log Files to Hadoop (Apache Flume)
• Sensors, Actuators and Machines Data Ingestion Solutions
• Middleware for Operational & Transactional Systems to Hadoop
• Custom Data Adaptors, Connectors and Drivers
• Social Sites (Facebook, Twitters, etc.) to Hadoop via RESTful APIs
• Professional Sites (Linkedin, etc. ) to Hadoop via RESTful APIs
There are ODBC, JDBC, Persistence APIs, Scripts, automated tools, and bridges for integration
Phases in Analytics
• Basic Phases in Analytics for deriving new facts
• Descriptive - visualization & Reports
• Predictive - Predict / forecasts
• Prescriptive - better decisions
• Cognitive - derivation of Additional values
Analytics Models
How can we
make it happen?
Prescriptive
What will
Analytics
happen?
Predictive
Why did it
VALUE
Analytics
happen?
Diagnostic
What
Analytics
happened?
Descriptive
Analytics
DIFFICULTY
8
Descriptive Analytics
• Descriptive analytics, such as reporting/OLAP, dashboards,
and data visualization, have been widely used for some time.
• They are the core of traditional BI.
What has occurred?
Descriptive analytics, such as data visualization, is
important in helping users interpret the output from
predictive analytics.
Predictive Analytics
• Algorithms for predictive analytics, such as regression analysis, machine
learning, and neural networks, have also been around for some time.
What will occur?
• Marketing is the target for many predictive analytics applications.
• Descriptive analytics, such as data visualization, is important in helping
users interpret the output from predictive and prescriptive analytics.
Prescriptive Analytics
• Prescriptive analytics are often referred to as advanced analytics.
• Often for the allocation of scarce resources
• Optimization
What should occur?
Prescriptive analytics can benefit healthcare strategic planning by using analytics to leverage
operational and usage data combined with data of external factors such as economic data,
population demographic trends and population health trends, to more accurately plan for future
capital investments such as new facilities and equipment utilization as well as understand the
trade-offs between adding additional beds and expanding an existing facility versus building a
new one.
Cognitive Analytics
• “A subfield of artificial intelligence, [which] simulates human thought
processes in machines using self-learning algorithms through data mining,
pattern recognition, and Natural Language Processing.”
The goal of cognitive analytics is to
blend traditional analytics
techniques with AI and ML
features for advanced analytics
outcome
• cognitive analytics enables the analytic tools to think like humans.
The Knowledge Discovery in the Pre-Big Data Era
The Knowledge Discovery in the Post-Big Data Era
Traditional and Big Data Analytics – Reference Model
Four Layer Architecture of Big Data Analytics Stack
Big Data Architecture
• “Big Data architecture is the logical and/or physical layout/structure
of how Big Data will be stored, accessed and managed within a Big
Data or IT environment” – Techopedia
• Logically defines how Big Data solution will work, the core
components (hardware, database, software, storage) used, flow of
information, security and more
Design of logical layers in a data processing
architecture
Lowest Layer L1
• Considers amount of data needed at ingestion layer 2
(L2) and either Push from L1 or pull by L2 as per the
mechanism for the usages
• Source data-types: Database, files, web or service
• Source formats, i.e., semi-structured, unstructured or
structured
Data Ingestion and Acquisition Layer L2
• Considers Ingestion and ETL processes either in real time, which
means store and use the data as generated, or in batches
• Batch processing is using discrete datasets at scheduled or periodic
intervals of time.
Data Storage Layer L3
• Data storage type (historical or incremental), format, compression,
incoming data frequency, querying patterns and consumption
requirements for L4 or L5
• Data storage using Hadoop distributed file system or NoSQL data
stores—HBase, Cassandra, MongoDB
Data Processing Layer L4
• Data processing software such as MapReduce, Hive, Pig, Spark, Spark
Mahout, Spark Streaming
• Processing in scheduled batches or real time or hybrid
• Processing as per synchronous or asynchronous processing
requirements at L5. 2019 “Big
Data Consumption Layer L5
• Data integration
• Datasets usages for reporting and visualization,
Analytics (real time, near real time, scheduled
batches), BPs, BIs, knowledge discovery
• Export of datasets to cloud, web or other systems
BDA REFERENCE ARCHITECTURE
Big Data Analytics
Solution Architectures for
Different Industry
Segments
Big Data Analytics in Cloud
Big Data Insights for Media Industry – A Solution Architecture
Social Network Analytics – A Solution Architecture