100% found this document useful (1 vote)
294 views82 pages

Big Data and Hadoop Overview

Big data is generated from a variety of sources like IoT devices, social media, and industries like retail, banking, healthcare, and more. It is characterized by the 5 V's - volume, variety, velocity, value, and veracity. Hadoop is an open-source framework that allows processing of large data sets in a distributed manner across clusters of computers. It uses a master-slave architecture with the NameNode as master and DataNodes as slaves. HDFS provides fault tolerance by replicating data blocks across multiple DataNodes. MapReduce is a programming model used for parallel processing of large datasets.

Uploaded by

Aman Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
294 views82 pages

Big Data and Hadoop Overview

Big data is generated from a variety of sources like IoT devices, social media, and industries like retail, banking, healthcare, and more. It is characterized by the 5 V's - volume, variety, velocity, value, and veracity. Hadoop is an open-source framework that allows processing of large data sets in a distributed manner across clusters of computers. It uses a master-slave architecture with the NameNode as master and DataNodes as slaves. HDFS provides fault tolerance by replicating data blocks across multiple DataNodes. MapReduce is a programming model used for parallel processing of large datasets.

Uploaded by

Aman Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

BIG DATA

(Bigger Data, Better Insights)


Evolution of Technology (what you can Notice here?)

How Technology Changes? How much data is generated in every action. That can’t handle by RDBMS, Data
increase exponentially
IOT IOT: 50 Billion Devices by 2020
Social Media
• Social Media is most important
factor in contributing Big Data
• This data is not Structured. And
huge in size.
Other Factor contributing in Bid Data
• Retail
• Banking & Finance
• Media & ENT
• Healthcare
• Education
• Govt.
• Transportation
• Insurance
• Etc.
What is Big Data
How we know that this is Big Data
(i.e. what are the 5 V’s)
• Volume

• Variety

• Velocity

• Value

• Veracity
Volume
Variety
• Structured
With schema,

• Semi Structured
Schemas not defined
properly
JSON, XML, CSV etc

• Unstructured

Log, Audio, Video, etc


Velocity
Value

Getting the value from big data is a challenge


Veracity

Processing of data like this is difficult


5 V’s of BIG Data (in future many more V’s)
BIG DATA as an Opportunity
BIG DATA as an Opportunity
BIG DATA as an Opportunity
• Cost effective storage system for huge data set (Cost Reduction ie. Use of
commodity hardware to save cost with reliability)

• Provide way to analyze information more quickly and effective

( Faster and better decision making)

• Evaluation of customer need and satisfaction

• Automated car and Healthcare etc.


BIG Data Analytics
Example:

How Amazon recommends you for product that you like

Netflix recommends you for Movies that you like

Youtube recommends you for video that you like


What are the business benefits of BIG data?
There are six potential benefits for organizations:
 Better insight into customers,
 Operational improvements,
 Increased market intelligence,
 More agile supply chain operations,
 Data-driven product innovation and
 More sophisticated recommendation engines
What are common BIG data challenges?
Because of its very nature, big data tends to be challenging to process, manage and use
effectively. The data itself is also complex, particularly when data sets are large and varied or
involve streaming data. There are Ten challenges of big data deployments.
1. Managing large volumes of data
2. Finding and fixing data quality issues
3. Dealing with data integration and preparation complexities
4. Scaling big data systems efficiently and cost effectively
5. Evaluating and selecting big data technologies
6. Generating business insights
7. Hiring and retaining workers with big data skills
8. Keeping costs from getting out of control
9. Governing big data environments
10. Ensuring data context and use cases are understood
What are the business benefits of BIG data?
Those issues can be broken down into the following categories:
 Technical challenges 
that include selecting the right big data tools and technologies and designing big data
systems so they can be scaled as needed;
 Data management challenges
from processing and storing large amounts of data to cleansing, integrating, preparing
and governing them;
 Analytics challenges,
such as ensuring that business needs are understood and that analytics results are
relevant to an organization's business strategy; and
 Program management challenges 
that include keeping costs under control and finding workers with the required big
data skills.
Key elements of big data environments
Big data management and analytics initiatives involve various components and
functions
• Big data architecture
• Big data analytics. 
• Big data collection. 
• Big data integration and preparation. 
• Big data governance. 
• Big data technologies and tools
Big data technologies and tools
The technologies that now are common options for big data environments include the following
categories:
Processing engines. 
Examples include Spark, Hadoop MapReduce and stream processing platforms like Flink,
Kafka, Samza, Storm and Spark's Structured Streaming module.
Storage repositories. 
Examples include the Hadoop Distributed File System and cloud object storage services like
Amazon Simple Storage Service and Google Cloud Storage.
NoSQL databases. 
Examples include Cassandra, Couchbase, CouchDB, HBase, MarkLogic Data Hub, MongoDB, SQL
query engines. Examples include Drill, Hive, Presto and Trino.
Data lake and data warehouse platforms. 
Examples include Amazon Redshift, Delta Lake, Google BigQuery, Kylin and Snowflake.
Commercial platforms and managed services. 
Examples include Amazon EMR, Azure HDInsight, Cloudera Data Platform and Google Cloud Dataproc.
IBM Big Data Analytics
Big data Collected by Smart Meter
Big data Collected by Smart Meter
How smart meter Big data Analyzed
How smart meter Solution
How smart meter Solution
BIG Data Analytics and Application
Need of Big Data Analytics
• Identify criminal activity before
occur
• Analyzed and geo locate
historical Patterns and Map to
supporting Event as day rain,
Traffic flow and several Holiday.
• Utilizing data patterns , scientific
analytics, technological tools.
• Able to identify crime hotspot.
Predictive
2. Optimizing Business Operation
Big Data Analytics
Stages in Big Data Analytics
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics
Diagnostics Analytics
Domain Using Big Data Analytics
Use Cases
Use Case # 1 Starbucks
Use Case # 2 P & G
Introduction to HADOOP
Story of Big data and Traditional System
Traditional Scenario
Failure of Traditional System

Still problem?........What should Bob do now?


Solution #1 Hire Multiple cook

So hire multiple cook. 2 order per cook per hour? Ok .. Good


But sharing Food shelf among still a problem at the same time.
Need of effective Solution

Solution is to provide parallel & distributed approach


Effective Solution

Bob Solve the problem but how we have?


Need of Framework
Apache Hadoop: Framework to solve Big data
Hadoop: master Slave Architecture
Hadoop: master Slave Architecture
Hadoop: master Slave Architecture
Hadoop Core Component
Hadoop Core Component
Name Node

Data Node

Secondary Node
Name Node & Data Node
Secondary NameNode & Checkpointing
HDFS data Block
Fault Tolerance
Fault Tolerance
Replication Factor
Solution: Fault Tolerance Replication Factor
Fault Tolerance Replication Factor
Write Mechanism
HDFS Write Mechanism-Pipeline Setup
HDFS Write Mechanism-Write a Block
HDFS Write Mechanism- Acknowledgement
Multi block HDFS Write Mechanism
HDFS Read Mechanism
Map Reduce
Map Reduce
Map Reduce
Story of MapReduce
Story of MapReduce
What is MapReduce?
MapReduce Word count Program
Reference
https://www.youtube.com/watch?v=1vbXmCrkT3Y

You might also like