Big Data Camp, Delhi, Sep 10, 2011
Introduction to Hadoop / Big Data
Good Times < Year 2000
Online Applications- OLTP
Web Users Web Servers RDBMS
Analytics and Reporting- OLAP
Report Users Reporting Servers RDBMS DW
Year 2000 +
Online Applications- OLTP
Web Users
Web Servers
RDBMS
Analytics and Reporting- OLAP
Report Users Reporting Servers
RDBMS DW
Big Data- Problems to Solve
Storage
Fail
Scalability
The Knight in Shining Armor
Engine + Logic
File system
Video: What can Apache Hadoop Do for You?
Who Uses Hadoop?
Search Yahoo, Amazon, Zvents,
Log processing
Facebook, Yahoo Recommendation Systems
Facebook
Data Warehouse Facebook, AOL Video and Image Analysis New York Times, Eyealike INDIAN GOVERNMENT- UUID project
HDFS: Design Principles
Hardware will Fail!
Petabyte Scale Store!
HDFS: Design Principles
Map Reduce
Origin in Lisp!
Google- GFS paper!
Divide and Rule!
Map Reduce Programming Model
Borrows from functional programming
Users implement interlace of two functions :
map (in_key, in_value) ->
(out_key, intermediate value) list
reduce (out_key, intermediate value list) -> out_value list
Hadoop Map Reduce
Hadoop Map Reduce
Hadoop Example
Weather sensors collecting data every hour at many locations cross the globe gather a large volume of log data, which is a good candidate for analysis with MapReduce, since it is semistructured and recordoriented.
Data Format: The data is stored using a line-oriented ASCII format, in which each line is a record. The format supports a rich set of meteorological elements, many of which are optional or with variable data lengths. For simplicity, we shall focus on the basic elements, such as temperature, which are always present and are of fixed width.
Hadoop Example
Hadoop Example
Hadoop Ecosystem Map
1 Workflow 2 10
Cascading Cascading
12
Support More High Level Interfaces
Unstructured Data 6
5
High Level Interfaces 8 4
JAQL
13
Engine + Logic
File system 9
7 RDBMS
Structured Data
hiho
Monitor/manage 11 Hadoop ecosystem 14 OLTP
Java Applications
Sqoop
How can You Contribute?
Apache Hadoop Projects
Learn more about Hadoop Contribute to source code Participate in Mailing Lists/Forums Share blogs etc.
Impetus Open Source Projects
Github/Google code hosted projects Contribute to source code
Thank you
Visit [Link]
Big Data in EDW
20
Building Big Data Analytics Platform
Commercial
Open source
Hybrid
Teradata/ Netezza
CloverETL/ Kettle/ Talend
ETL - Open Source and Commercial
Greenplum/ Vertica/ Aster
Jaspersoft/ Pentaho Reporting Hadoop Apache Cassandra
Analytics - Open Source or Commercial
Informatica
Commercial Hadoop Versions
SAS/
Microstrategy/
Business Objects
Pentaho/ Jasper
Web Analytics
22