Intro to Hadoop
   Jaideep Dhok
Hi!


●   I work at

●   Involved with Hadoop for 2+ years
Outline
Brief History of Hadoop
●   2005 -
●   Inspired by the GFS and MapReduce papers
    published by Google.
●   Promoted heavily by Yahoo! Since 2006
●   Today, the defacto standard in 'Big Data'
    computing
The Buzz
Why?
●   'Big Data'
●   How big? - petabyte scale
●   Scalable
●   Robust
●   Secure!
Scalability
When To Use It
●   Can you use Hadoop to do X?
    ●   Is your problem 'embarassingly' parallel?
    ●   Workflow?
        –   Dependent/Independent Tasks
    ●   Data/CPU intensive?
●   Can you use Hadoop to do X in the Clouds?
    ●   Depends where your data is
Why To Use It?
●   Ad hoc analysis
    ●   Semi/structured data
        –   Log files
        –   Text
        –   CSV, XML, anything really
        –   RDBMS
        –   NoSQL!
Use Cases
●   Analytics
    ●   User behavior
●   Reporting
●   Filtering
●   Machine Learning
●   Just storing your data
Just From The Logs
●   Suppose you run a web-site
    ●   User breakdown by browsers
    ●   Location
    ●   Understanding user session
        –   How long do they use it?
        –   Who are the active users?
        –   What part of my app they use the most?
        –   What part of my app is user X's fav?
Tools
●   Native Hadoop APIs – Java
●   Streaming – Perl, Python, Ruby, any language
    as long it has support for 'stdin' and 'stdout'
●   Pig
●   HIVE
●   Pipes – C and C++
Ecosystem
Don't Wait
●   Hadoop
    ●   hadoop.apache.org
●   Cloudera tutorials on Hadoop
●   Books
Questions?
Thank You!
jaideep.dhok@gmail.com

More Related Content

PPT
Another Intro To Hadoop
KEY
Intro to Hadoop
PDF
introduction to data processing using Hadoop and Pig
PPTX
Pig, Making Hadoop Easy
PPTX
Big data Hadoop presentation
PPT
Hadoop Technology
PDF
Facebook Hadoop Data & Applications
PPT
Hadoop Technologies
Another Intro To Hadoop
Intro to Hadoop
introduction to data processing using Hadoop and Pig
Pig, Making Hadoop Easy
Big data Hadoop presentation
Hadoop Technology
Facebook Hadoop Data & Applications
Hadoop Technologies

What's hot (19)

PPT
Seminar Presentation Hadoop
PPT
Introduction to Apache Hadoop
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
PPTX
Asbury Hadoop Overview
PPTX
Big data and Hadoop
PPTX
Dataiku big data paris - the rise of the hadoop ecosystem
PPT
Hadoop at Yahoo! -- University Talks
PPTX
Hadoop: Distributed Data Processing
PDF
Map reduce and hadoop at mylife
PDF
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
PPTX
Hadoop
PPT
Hadoop basics
PPTX
Bw tech hadoop
PDF
Migrating structured data between Hadoop and RDBMS
PDF
Hadoop trainting in hyderabad@kelly technologies
PDF
Hadoop Primer
PPTX
MapReduce basic
PPTX
HADOOP TECHNOLOGY ppt
Seminar Presentation Hadoop
Introduction to Apache Hadoop
Introduction to Big Data & Hadoop Architecture - Module 1
Asbury Hadoop Overview
Big data and Hadoop
Dataiku big data paris - the rise of the hadoop ecosystem
Hadoop at Yahoo! -- University Talks
Hadoop: Distributed Data Processing
Map reduce and hadoop at mylife
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Hadoop
Hadoop basics
Bw tech hadoop
Migrating structured data between Hadoop and RDBMS
Hadoop trainting in hyderabad@kelly technologies
Hadoop Primer
MapReduce basic
HADOOP TECHNOLOGY ppt
Ad

Similar to Geek camp (20)

PDF
Intro to Apache Hadoop
PPTX
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
PDF
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
PPTX
JOSA TechTalks - Big Data on Hadoop
PPTX
Unit 3 intro.pptx
PDF
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
PDF
Drupal sharing in HP7
ODP
Hadoop and Big Data for Absolute Beginners
PDF
Real-Time Queries in Hadoop w/ Cloudera Impala
PPTX
Getting started big data
PDF
Scaling up wso2 bam for billions of requests and terabytes of data
PDF
The Semantic Web and Drupal 7 - Loja 2013
PPTX
Hadoop jon
PPTX
Hadoop and Big Data
PDF
Mr hadoop seedrocket
PPTX
201305 hadoop jpl-v3
PDF
Drupal as a Semantic Web platform - ISWC 2012
PDF
Introduction to Apache Spark
PDF
Hw09 Next Steps For Hadoop
PDF
Apache pig
Intro to Apache Hadoop
Enhancing Big Data Analytics with Pig and Hadoop: Harnessing the Power of Dis...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
JOSA TechTalks - Big Data on Hadoop
Unit 3 intro.pptx
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
Drupal sharing in HP7
Hadoop and Big Data for Absolute Beginners
Real-Time Queries in Hadoop w/ Cloudera Impala
Getting started big data
Scaling up wso2 bam for billions of requests and terabytes of data
The Semantic Web and Drupal 7 - Loja 2013
Hadoop jon
Hadoop and Big Data
Mr hadoop seedrocket
201305 hadoop jpl-v3
Drupal as a Semantic Web platform - ISWC 2012
Introduction to Apache Spark
Hw09 Next Steps For Hadoop
Apache pig
Ad

Geek camp

  • 1. Intro to Hadoop Jaideep Dhok
  • 2. Hi! ● I work at ● Involved with Hadoop for 2+ years
  • 4. Brief History of Hadoop ● 2005 - ● Inspired by the GFS and MapReduce papers published by Google. ● Promoted heavily by Yahoo! Since 2006 ● Today, the defacto standard in 'Big Data' computing
  • 6. Why? ● 'Big Data' ● How big? - petabyte scale ● Scalable ● Robust ● Secure!
  • 8. When To Use It ● Can you use Hadoop to do X? ● Is your problem 'embarassingly' parallel? ● Workflow? – Dependent/Independent Tasks ● Data/CPU intensive? ● Can you use Hadoop to do X in the Clouds? ● Depends where your data is
  • 9. Why To Use It? ● Ad hoc analysis ● Semi/structured data – Log files – Text – CSV, XML, anything really – RDBMS – NoSQL!
  • 10. Use Cases ● Analytics ● User behavior ● Reporting ● Filtering ● Machine Learning ● Just storing your data
  • 11. Just From The Logs ● Suppose you run a web-site ● User breakdown by browsers ● Location ● Understanding user session – How long do they use it? – Who are the active users? – What part of my app they use the most? – What part of my app is user X's fav?
  • 12. Tools ● Native Hadoop APIs – Java ● Streaming – Perl, Python, Ruby, any language as long it has support for 'stdin' and 'stdout' ● Pig ● HIVE ● Pipes – C and C++
  • 14. Don't Wait ● Hadoop ● hadoop.apache.org ● Cloudera tutorials on Hadoop ● Books