Ashish Presentation Stage1 Modify LR

The seminar presentation discusses the challenges of Big Data analytics, focusing on data and metadata management, analysis platforms, and the need for optimized solutions to handle the complexities of large datasets. It highlights the 4Vs of Big Data: Volume, Velocity, Variety, and Veracity, and emphasizes the importance of developing suitable platforms for effective data processing. The presentation also covers applications of Big Data, the Hadoop framework, and various challenges faced in the field.

Uploaded by

Gauri Pawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views24 pages

Ashish Presentation Stage1 Modify LR

Uploaded by

Gauri Pawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 24

BHARATI VIDYAPEETH DEEMED UNIVERSITY

COLLEGE OF ENGINEERING
Seminar Presentation On
A Big Data Analytics - Challenges with-
in New data, meta-data management &
Analysis platforms
Under the guidance of
Dr. Debnath Bhattacharyya
By
Research scholar : Mr. Ashish Nandkumar Patil
Department of Computer Engineering.
Seat No: 1011-149

 Date of Presentation : 02/05/2016

Prerequisites

• Data Vs Information Vs Metadata

• History of Database System
• Database Languages
• Data Models-
Classical DBMS-Hierarchical, Network, Relational
New Directions- Extended Relational (ORDBMS),
Object-Oriented, Distributed DB
• Data Warehouse & Data Mining
• DBMS (Conventional & Advanced)

3
Contents
• Introduction to Big Data
• Literature Survey
• Applications of Big Data
• Open Source Hadoop – case study
• Hadoop Framework Architecture
• Open Source Hadoop Components
• Challenges, Research Areas & Topics
• Big Data Datasets
• Conclusion
• References
Introduction
Introduction
Introduction
• Big Data - collection of data sets: large and
complex, sizes beyond the ability of commonly
used software tools  capture, curate, manage,
and process the data within a tolerable elapsed
time

 The trend to larger data sets :

 Single large sets v/s Separate smaller sets
 Spot business trends, Prevent diseases,
Combat crime i.e. Scientists, business
executives, practitioners of medicine, advertising
and governments difficulties
Introduction (Cont..)
META Group (now Gartner) analyst Doug Laney: 3-D-
"3Vs" model BD
• Volume (amount of data)…too Big

• Velocity (speed of data in and out)….too Fast

• Variety(range of data types and sources)…too Hard

Additionally, a new V:
"Veracity” - unreliability inherent in some sources of data
Big Data: Complex type of Data
Data Growth
 Developed economies increasingly use data-
intensive technologies; 4.6 billion mobile-phone
subscriptions worldwide, and between 1 billion
and 2 billion people accessing the internet.
 Between 1990 and 2005, more than 1 billion
people worldwide entered the middle class, which
means more people become more literate, which
in turn leads to information growth.
 The world's effective capacity to exchange
information through telecommunication networks
was 281 petabytes in 1986, 471 petabytes in
1993,
 2.2 exabytes in 2000, 65 exabytes in 2007 and
predictions put the amount of internet traffic at
667 exabytes annually by 2014 7
Data Growth
Literature Survey
 Reviewed on data analytics studies from traditional data
analysis to the recent big data analysis. Focused on
performance-oriented and result oriented issues in big data
analytics framework and platform. Introduction to data and
big data mining algorithms which consist of clustering
classification and frequent patterns mining technologies.
find solutions to welcome the new age of Big data
[1] Chun‑Wei Tsai1, Chin‑Feng Lai, Han‑Chieh Chao and Athanasios V.
Vasilakos, “Big data analytics: a survey”, Tsai et al. Journal of Big
Data (2015), A Springer open journal.

 The paper proposed the inefficiency of the Hadoop when

executing binary-input applications. It introduces Bi-hadoop
as an extension to Hadoop to better support and integrates
easy user interface. It implements a binary input aware
scheduler and transparent caching mechanism which
improve the Hadoop framework. Provides the future
directions in developing scheduling algorithms that improve
more general task sharing patterns
[2] Xiao Yu and Bo Hong, Dept. of Electrical and Computer Engineering,
Georgia Institute of Technology, “Bi-Hadoop: Extending Hadoop To
Improve Support For Binary-Input Applications” IEEE (computer
society 2013)/ ACM International Symposisum on cluster, cloud and
Grid computing 9
Applications of Big Data
•Transformations due to …………… Big Data
•TwitterHealth- flu epidemics.
•NASA Center: Climate Simulation (NCCS) stores 32
petabytes of climate observations and simulations on
the Discover supercomputing cluster
•google Trends: future orientation index based on GDP
search.
•Facebook: handles 50 billion photos from its user
base.
•FICO: Falcon Credit Card Fraud Detection System
protects 2.1 billion active accounts world-wide.
Case Study- Hadoop
•Apache Hadoop
Apache open source software framework for reliable,
scalable, distributed computing of massive amount of
data
 Hides underlying system details and complexities
from user
 Developed in Java
• Core Sub Projects
 MapReduce
 Hadoop Distributed File System : HDFS
•Supported by several Hadoop-related projects
 Hbase, Hive, Zookeeper, Avro, etc
• Meant for heterogeneous commodity hardware
Design principles of Hadoop
•Scalable
– New nodes can be added on the fly
• Performance & reliability
– Adaptive MapReduce, Compression,
– Indexing, Flexible Scheduler
•Affordable
– Massively parallel computing on
commodity servers
•Flexible
– Hadoop is schema-less, and can absorb any
type of data
•Fault Tolerant
– Through MapReduce software framework
Hadoop Framework
Architecture
Two Key Aspects of Hadoop
• Hadoop Distributed File System = HDFS
 Where Hadoop stores data
 A file system that spans all the nodes in a
Hadoop cluster
 It links together the file systems on many
local nodes to make them into one big file
system

• MapReduce framework
How Hadoop understands and assigns work to
the nodes
MapReduce
Take a large problem and divide it into sub-problems
– Break data set down into small chunks

Perform the same function on all sub-problems

Combine the output from all sub-problems

MapReduce co-locating with HDFS
MapReduce Processing
•User runs a program on client computer
• Program submits a job to HDFS. Job contains:
– Input data
– MapReduce program
– Configuration information
• Job sent to JobTracker
•JobTracker communicates with NameNode and assigns parts of a
job to TaskTrackers
(TaskTracker is run on each DataNode)
– Task is a single MAP or REDUCE operation over piece
of data
– Hadoop divides the input to MAP / REDUCE jobs into
equal splits
• The JobTracker knows (from NameNode) which nodes contain
the data, and which other machines are nearby.
• TaskTracker does the processing and sends heartbeats to
jobTracker.
Open Source Hadoop Components
Challenges With-in
• New meta-data/data management platforms
• Analysis platforms
• Techniques for data pre-processing and
addressing imbalance data sets
• Handling huge streaming data
• Enhancement for traditional mining iteration
based techniques over new frameworks
• Advance machine learning algorithms
• Soft computing techniques for efficient
processing
• Privacy Management
19
Techniques for Data Preprocessing
and addressing imbalance data sets

20
21
Conclusion
Big data has increased the demand of not
only the data and information management
specialists but also the data analysts. Due to 4Vs
(Volume, Velocity, Variety, Veracity) properties
of the big data; the Big Data is the big issue in
new digital age. To solve these issue; there is
need to build some suitable and optimized
platforms for management of the new Data,
Metadata and Information analysis which
improve the process of the Big Data Analytics.

22
References
[1] Chun‑Wei Tsai1, Chin‑Feng Lai, Han‑Chieh Chao and Athanasios V. Vasilakos,
“Big data analytics: a survey”, Tsai et al. Journal of Big Data (2015), A
Springer open journal.
[2] Sara del Río ⇑, Victoria López, José Manuel Benítez, Francisco Herrera, “On
the use of MapReduce for imbalanced big data using Random Forest”,
Information Sciences 285 (2014) 112–137, 2014 Elsevier Inc.
[3] Jesús Maillo, Isaac Triguero, Francisco Herrera, “A MapReduce-based k-
Nearest Neighbor Approach for Big Data Classification”, IEEE (computer society
2015), DOI 10.1109/Trustcom-BigDataSe-ISPA.2015.577
[4] Sergio Ram´ırez-Gallego, Salvador Garc´ıa, Héctor Mouriño-Tal´ın†, David
Mart´ınez-Rego, “Distributed Entropy Minimization Discretizer for Big Data
Analysis under Apache Spark”, IEEE (computer society 2015), DOI
10.1109/Trustcom-BigDataSe-ISPA.2015.559
[5] Daniel Peralta, Sara del Río, Sergio Ramírez-Gallego,1 Isaac Triguero,
JoseM. Benitez, Francisco Herrera1, “Evolutionary Feature Selection for Big Data
Classification: A MapReduce Approach”
[6] M. B. Chandak, “Role of big‑data in classification and novel class detection
in data streams”, Chandak J Big Data (2016) 3:5, A Springer open journal, DOI
10.1186/s40537-016-0040-9
[7] Silberschatz A., Korth H, Sudarshan S., “Database System Concepts”,4th
Edition, Mc GrawHill Publishers.
[8] J. Han and M. Kamber, “Data Mining- Concepts and Techniques”, 2nd
Edition, Morgan Kaufmann, 2006. 23

Big Data Analytics Using Apache Hadoop
No ratings yet
Big Data Analytics Using Apache Hadoop
33 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Hadoop PPT
100% (1)
Hadoop PPT
25 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
31 pages
BDA - Unit-1
No ratings yet
BDA - Unit-1
24 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Hadoop & Big Data Overview
No ratings yet
Hadoop & Big Data Overview
23 pages
Unit 1
No ratings yet
Unit 1
19 pages
Data Science
No ratings yet
Data Science
87 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
229 pages
Big Data & Hadoop Architecture Guide
50% (2)
Big Data & Hadoop Architecture Guide
168 pages
Experiment No - 1 Bda
No ratings yet
Experiment No - 1 Bda
10 pages
Big Data
No ratings yet
Big Data
25 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
No ratings yet
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
30 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Big Data Analytics
No ratings yet
Big Data Analytics
20 pages
Module 1
No ratings yet
Module 1
54 pages
Module - 1
No ratings yet
Module - 1
84 pages
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
Big Data Analytics - Overview
No ratings yet
Big Data Analytics - Overview
66 pages
Chapter - 2 Hadoop
100% (1)
Chapter - 2 Hadoop
32 pages
Bigdata PPT Slides (E)
No ratings yet
Bigdata PPT Slides (E)
10 pages
Cloud Comp Techno
No ratings yet
Cloud Comp Techno
5 pages
Big Data
No ratings yet
Big Data
25 pages
Bda U1
No ratings yet
Bda U1
80 pages
Hadoop V.01
No ratings yet
Hadoop V.01
24 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
119 pages
Big Data-One
No ratings yet
Big Data-One
9 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Parcial Cono 1 21
No ratings yet
Parcial Cono 1 21
21 pages
Parcial Cono 1 14
No ratings yet
Parcial Cono 1 14
14 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
BDA Unit-1
No ratings yet
BDA Unit-1
33 pages
Bigdata
No ratings yet
Bigdata
12 pages
Lecture8 - Big Data (Hadoop)
No ratings yet
Lecture8 - Big Data (Hadoop)
29 pages
BigDataAnalytics 1.2
No ratings yet
BigDataAnalytics 1.2
25 pages
Biggdata
No ratings yet
Biggdata
24 pages
Big Data: Hadoop Framework Guide
No ratings yet
Big Data: Hadoop Framework Guide
3 pages
BDS DS307 Unit-1
No ratings yet
BDS DS307 Unit-1
46 pages
HADOOP
No ratings yet
HADOOP
55 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Big Data Training
No ratings yet
Big Data Training
244 pages
Lect 2 Big Data Lesson01
No ratings yet
Lect 2 Big Data Lesson01
26 pages
Big Data Analytics
No ratings yet
Big Data Analytics
61 pages
Gag PDF
No ratings yet
Gag PDF
15 pages
Fillatre Big Data
No ratings yet
Fillatre Big Data
98 pages
7 Ways To Optimize Jenkins
No ratings yet
7 Ways To Optimize Jenkins
15 pages
ATV600 Communication Parameters EAV64332 V3.6
No ratings yet
ATV600 Communication Parameters EAV64332 V3.6
324 pages
Railway Applications Katalog25214
0% (1)
Railway Applications Katalog25214
74 pages
Salient Features of IT Act 2000
No ratings yet
Salient Features of IT Act 2000
10 pages
DPP ITNO Pump PetroTec CEM03 80335202
No ratings yet
DPP ITNO Pump PetroTec CEM03 80335202
9 pages
Amor de Siempre (Audiotree Live Version) : Download Print
No ratings yet
Amor de Siempre (Audiotree Live Version) : Download Print
1 page
Barangay Baracbac SK Annual Budget Fy 2019: Republic of The Philippines Province of Ilocos Sur Municipality of Galimuyod
No ratings yet
Barangay Baracbac SK Annual Budget Fy 2019: Republic of The Philippines Province of Ilocos Sur Municipality of Galimuyod
7 pages
Confirmatory Composite Analysis Guide
No ratings yet
Confirmatory Composite Analysis Guide
10 pages
Nextreme Whitepaper Design Considerations For TEG System Optimization NWP003.1
No ratings yet
Nextreme Whitepaper Design Considerations For TEG System Optimization NWP003.1
14 pages
Concurrent Managers Not Working Check This
No ratings yet
Concurrent Managers Not Working Check This
16 pages
MSC Adams 2019.2 Software Overview
No ratings yet
MSC Adams 2019.2 Software Overview
2 pages
Crawler Crane: SCC1000A-6
No ratings yet
Crawler Crane: SCC1000A-6
51 pages
Human Relations in Organizations Applications and Skill Building 10th Edition Lussier Test Bank 1
100% (79)
Human Relations in Organizations Applications and Skill Building 10th Edition Lussier Test Bank 1
26 pages
Sony hcd-gtr6 gtr6b gtr7 gtr8 gtr8b Ver.1.2 PDF
No ratings yet
Sony hcd-gtr6 gtr6b gtr7 gtr8 gtr8b Ver.1.2 PDF
92 pages
Diagnose IIS Performance Problems Using Windows Performance Monitor
No ratings yet
Diagnose IIS Performance Problems Using Windows Performance Monitor
2 pages
UserGuide10 PDF
No ratings yet
UserGuide10 PDF
494 pages
Huawei RTN 905e Brochure
No ratings yet
Huawei RTN 905e Brochure
2 pages
WM 2024
No ratings yet
WM 2024
6 pages
Cantina Centrifuge CFG February March2025
No ratings yet
Cantina Centrifuge CFG February March2025
10 pages
Chart - Poster - PMBOK 6th Ed Data Flow Diagram
No ratings yet
Chart - Poster - PMBOK 6th Ed Data Flow Diagram
1 page
Final Semester Exam Paper
No ratings yet
Final Semester Exam Paper
4 pages
S-1206 Series: Ultra Low Current Consumption and Low Dropout Cmos Voltage Regulator
No ratings yet
S-1206 Series: Ultra Low Current Consumption and Low Dropout Cmos Voltage Regulator
35 pages
Process Gas Compressors
100% (1)
Process Gas Compressors
24 pages
MP - ECE - UNIT-2 8086 and Interfacing
No ratings yet
MP - ECE - UNIT-2 8086 and Interfacing
60 pages
Disabled Toilet Alarm Setup Guide
100% (1)
Disabled Toilet Alarm Setup Guide
3 pages
Laser Spectroscopy Basic Concepts and Instrumentation 3rd Ed Wolfgang Demtrder PDF Download
100% (1)
Laser Spectroscopy Basic Concepts and Instrumentation 3rd Ed Wolfgang Demtrder PDF Download
16 pages
RX1 Getting Started
No ratings yet
RX1 Getting Started
60 pages
Python Lab
No ratings yet
Python Lab
21 pages
KIDNAPPERS AND ROBBERS THREAT-ALERT INTELLIGENT SYSTEM 2 Unical Conference
No ratings yet
KIDNAPPERS AND ROBBERS THREAT-ALERT INTELLIGENT SYSTEM 2 Unical Conference
13 pages
ABAP Web Service Client Proxy Guide
No ratings yet
ABAP Web Service Client Proxy Guide
20 pages

Ashish Presentation Stage1 Modify LR

Uploaded by

Ashish Presentation Stage1 Modify LR

Uploaded by

BHARATI VIDYAPEETH DEEMED UNIVERSITY

 Date of Presentation : 02/05/2016

• Data Vs Information Vs Metadata

 The trend to larger data sets :

• Velocity (speed of data in and out)….too Fast

• Variety(range of data types and sources)…too Hard

 The paper proposed the inefficiency of the Hadoop when

Perform the same function on all sub-problems

Combine the output from all sub-problems

You might also like