Cloud Computing Lecture #1
What is Cloud Computing?
(and an intro to parallel/distributed processing)
Jimmy Lin
The iSchool
University of Maryland
Wednesday, September 3, 2008
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States
See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet,
Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
Source: http://www.free-pictures-photos.com/
The iSchool
University of Maryland
What is Cloud Computing?
1. Web-scale problems
2. Large data centers
3. Different models of computing
4. Highly-interactive Web applications
The iSchool
University of Maryland
1. Web-Scale Problems
 Characteristics:
 Definitely data-intensive
 May also be processing intensive
 Examples:
 Crawling, indexing, searching, mining the Web
 “Post-genomics” life sciences research
 Other scientific data (physics, astronomers, etc.)
 Sensor networks
 Web 2.0 applications
 …
The iSchool
University of Maryland
How much data?
 Wayback Machine has 2 PB + 20 TB/month (2006)
 Google processes 20 PB a day (2008)
 “all words ever spoken by human beings” ~ 5 EB
 NOAA has ~1 PB climate data (2007)
 CERN’s LHC will generate 15 PB a year (2008)
640K ought to be
enough for anybody.
Maximilien Brice, © CERN
Maximilien Brice, © CERN
The iSchool
University of Maryland
There’s nothing like more data!
s/inspiration/data/g;
(Banko and Brill, ACL 2001)
(Brants et al., EMNLP 2007)
The iSchool
University of Maryland
What to do with more data?
 Answering factoid questions
 Pattern matching on the Web
 Works amazingly well
 Learning relations
 Start with seed instances
 Search for patterns on the Web
 Using patterns to find more instances
Who shot Abraham Lincoln? → X shot Abraham Lincoln
Birthday-of(Mozart, 1756)
Birthday-of(Einstein, 1879)
Wolfgang Amadeus Mozart (1756 - 1791)
Einstein was born in 1879
PERSON (DATE –
PERSON was born in DATE
(Brill et al., TREC 2001; Lin, ACM TOIS 2007)
(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )
The iSchool
University of Maryland
2. Large Data Centers
 Web-scale problems? Throw more machines at it!
 Clear trend: centralization of computing resources in large
data centers
 Necessary ingredients: fiber, juice, and space
 What do Oregon, Iceland, and abandoned mines have in
common?
 Important Issues:
 Redundancy
 Efficiency
 Utilization
 Management
Source: Harper’s (Feb, 2008)
Maximilien Brice, © CERN
The iSchool
University of Maryland
Key Technology: Virtualization
Hardware
Operating System
App App App
Traditional Stack
Hardware
OS
App App App
Hypervisor
OS OS
Virtualized Stack
The iSchool
University of Maryland
3. Different Computing Models
 Utility computing
 Why buy machines when you can rent cycles?
 Examples: Amazon’s EC2, GoGrid, AppNexus
 Platform as a Service (PaaS)
 Give me nice API and take care of the implementation
 Example: Google App Engine
 Software as a Service (SaaS)
 Just run it for me!
 Example: Gmail
“Why do it yourself if you can pay someone to do it for you?”
The iSchool
University of Maryland
4. Web Applications
 A mistake on top of a hack built on sand held together by
duct tape?
 What is the nature of software applications?
 From the desktop to the browser
 SaaS == Web-based applications
 Examples: Google Maps, Facebook
 How do we deliver highly-interactive Web-based
applications?
 AJAX (asynchronous JavaScript and XML)
 For better, or for worse…
The iSchool
University of Maryland
What is the course about?
 MapReduce: the “back-end” of cloud computing
 Batch-oriented processing of large datasets
 Ajax: the “front-end” of cloud computing
 Highly-interactive Web-based applications
 Computing “in the clouds”
 Amazon’s EC2/S3 as an example of utility computing
The iSchool
University of Maryland
Amazon Web Services
 Elastic Compute Cloud (EC2)
 Rent computing resources by the hour
 Basic unit of accounting = instance-hour
 Additional costs for bandwidth
 Simple Storage Service (S3)
 Persistent storage
 Charge by the GB/month
 Additional costs for bandwidth
 You’ll be using EC2/S3 for course assignments!
The iSchool
University of Maryland
This course is not for you…
 If you’re not genuinely interested in the topic
 If you’re not ready to do a lot of programming
 If you’re not open to thinking about computing in new ways
 If you can’t cope with uncertainly, unpredictability, poor
documentation, and immature software
 If you can’t put in the time
Otherwise, this will be a richly rewarding course!
Source: http://davidzinger.wordpress.com/2007/05/page/2/
The iSchool
University of Maryland
Cloud Computing Zen
 Don’t get frustrated (take a deep breath)…
 This is bleeding edge technology
 Those W$*#T@F! moments
 Be patient…
 This is the second first time I’ve taught this course
 Be flexible…
 There will be unanticipated issues along the way
 Be constructive…
 Tell me how I can make everyone’s experience better
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
Source: Wikipedia
The iSchool
University of Maryland
Things to go over…
 Course schedule
 Assignments and deliverables
 Amazon EC2/S3
The iSchool
University of Maryland
Web-Scale Problems?
 Don’t hold your breath:
 Biocomputing
 Nanocomputing
 Quantum computing
 …
 It all boils down to…
 Divide-and-conquer
 Throwing more hardware at the problem
Simple to understand… a lifetime to master…
The iSchool
University of Maryland
Divide and Conquer
“Work”
w1 w2 w3
r1 r2 r3
“Result”
“worker” “worker” “worker”
Partition
Combine
The iSchool
University of Maryland
Different Workers
 Different threads in the same core
 Different cores in the same CPU
 Different CPUs in a multi-processor system
 Different machines in a distributed system
The iSchool
University of Maryland
Choices, Choices, Choices
 Commodity vs. “exotic” hardware
 Number of machines vs. processor vs. cores
 Bandwidth of memory vs. disk vs. network
 Different programming models
The iSchool
University of Maryland
Flynn’s Taxonomy
Instructions
Single (SI) Multiple (MI)
Data
Multiple(M
SISD
Single-threaded
process
MISD
Pipeline
architecture
SIMD
Vector Processing
MIMD
Multi-threaded
Programming
Single(SD)
The iSchool
University of Maryland
SISD
D D D D D D D
Processor
Instructions
The iSchool
University of Maryland
SIMD
D0
Processor
Instructions
D0D0 D0 D0 D0
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D1
D2
D3
D4
…
Dn
D0
The iSchool
University of Maryland
MIMD
D D D D D D D
Processor
Instructions
D D D D D D D
Processor
Instructions
The iSchool
University of Maryland
Memory Typology: Shared
Memory
Processor
Processor Processor
Processor
The iSchool
University of Maryland
Memory Typology: Distributed
MemoryProcessor MemoryProcessor
MemoryProcessor MemoryProcessor
Network
The iSchool
University of Maryland
Memory Typology: Hybrid
Memory
Processor
Network
Processor
Memory
Processor
Processor
Memory
Processor
Processor
Memory
Processor
Processor
The iSchool
University of Maryland
Parallelization Problems
 How do we assign work units to workers?
 What if we have more work units than workers?
 What if workers need to share partial results?
 How do we aggregate partial results?
 How do we know all the workers have finished?
 What if workers die?
What is the common theme of all of these problems?
The iSchool
University of Maryland
General Theme?
 Parallelization problems arise from:
 Communication between workers
 Access to shared resources (e.g., data)
 Thus, we need a synchronization system!
 This is tricky:
 Finding bugs is hard
 Solving bugs is even harder
The iSchool
University of Maryland
Managing Multiple Workers
 Difficult because
 (Often) don’t know the order in which workers run
 (Often) don’t know where the workers are running
 (Often) don’t know when workers interrupt each other
 Thus, we need:
 Semaphores (lock, unlock)
 Conditional variables (wait, notify, broadcast)
 Barriers
 Still, lots of problems:
 Deadlock, livelock, race conditions, ...
 Moral of the story: be careful!
 Even trickier if the workers are on different machines
The iSchool
University of Maryland
Patterns for Parallelism
 Parallel computing has been around for decades
 Here are some “design patterns” …
The iSchool
University of Maryland
Master/Slaves
slaves
master
The iSchool
University of Maryland
Producer/Consumer Flow
CP
P
P
C
C
CP
P
P
C
C
The iSchool
University of Maryland
Work Queues
CP
P
P
C
C
shared queue
W W W W W
The iSchool
University of Maryland
Rubber Meets Road
 From patterns to implementation:
 pthreads, OpenMP for multi-threaded programming
 MPI for clustering computing
 …
 The reality:
 Lots of one-off solutions, custom code
 Write you own dedicated library, then program with it
 Burden on the programmer to explicitly manage everything
 MapReduce to the rescue!
 (for next time)

More Related Content

PPT
Cloud computing lecture
PPT
Cloud computing
PDF
Hector Guerrero- Road to Business Analytics
PPTX
Cloud Computing A Perspective
PDF
Perspectives on Cloud COmputing - Google
PPTX
Cloud computing boi fair 9jan2012
PDF
An online catalogue of platforms, tools and apps for teachers, trainers and e...
Cloud computing lecture
Cloud computing
Hector Guerrero- Road to Business Analytics
Cloud Computing A Perspective
Perspectives on Cloud COmputing - Google
Cloud computing boi fair 9jan2012
An online catalogue of platforms, tools and apps for teachers, trainers and e...

Viewers also liked (7)

PDF
Cloud Computing Tutorial - Jens Nimis
PPT
Implementation of Cloud Computing in Saudi High Schools - Student Presentation
PPTX
Cs6703 grid and cloud computing unit 3
PDF
MapReduce in Cloud Computing
PPTX
Planning A Cloud Implementation
ODP
Distributed Computing
PPT
Cloud computing simple ppt
Cloud Computing Tutorial - Jens Nimis
Implementation of Cloud Computing in Saudi High Schools - Student Presentation
Cs6703 grid and cloud computing unit 3
MapReduce in Cloud Computing
Planning A Cloud Implementation
Distributed Computing
Cloud computing simple ppt
Ad

Similar to Cloud Computing (20)

PPT
ccna course 2
PPT
eScience: A Transformed Scientific Method
PDF
MeDiCI - How to Withstand a Research Data Tsunami
PDF
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
PDF
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
PPT
Chapter 1. Introduction
PDF
A New Year in Data Science: ML Unpaused
PPT
ppt
PPTX
Big Data Analytics V2
PDF
Data Science in Future Tense
PDF
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
PPT
Parallel Computing 2007: Overview
PPT
Cyberinfrastructure and Applications Overview: Howard University June22
PPT
Above the Clouds: A View From Academia
PPT
Industry Vs Curriculum Talk Mec
PPTX
Dm sei-tutorial-v7
PDF
Big Data is changing abruptly, and where it is likely heading
PPT
Ibm and innovation overview 20150326 v15 short
PDF
Introduction to the Artificial Intelligence and Computer Vision revolution
PPT
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
ccna course 2
eScience: A Transformed Scientific Method
MeDiCI - How to Withstand a Research Data Tsunami
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
Chapter 1. Introduction
A New Year in Data Science: ML Unpaused
ppt
Big Data Analytics V2
Data Science in Future Tense
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Parallel Computing 2007: Overview
Cyberinfrastructure and Applications Overview: Howard University June22
Above the Clouds: A View From Academia
Industry Vs Curriculum Talk Mec
Dm sei-tutorial-v7
Big Data is changing abruptly, and where it is likely heading
Ibm and innovation overview 20150326 v15 short
Introduction to the Artificial Intelligence and Computer Vision revolution
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
Ad

More from Rahul Pola (11)

PPTX
UML Diagrams
PPTX
Automized Examination System
DOCX
Project Synopsis sample
DOC
Report format
PPTX
Sequence diagram
PPTX
Object diagram
PPT
Use case Diagram
PPTX
Linux shell env
PPT
Linux Information
PPTX
Cloud computing
PPTX
Android introduction
UML Diagrams
Automized Examination System
Project Synopsis sample
Report format
Sequence diagram
Object diagram
Use case Diagram
Linux shell env
Linux Information
Cloud computing
Android introduction

Recently uploaded (20)

PDF
Cryptography and Network Security-Module-I.pdf
PDF
Micro 3 New.ppt.pdf tools the laboratory the method
PDF
IAE-V2500 Engine Airbus Family A319/320
PPT
UNIT-I Machine Learning Essentials for 2nd years
PPTX
Environmental studies, Moudle 3-Environmental Pollution.pptx
PDF
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
PPTX
SE unit 1.pptx aaahshdhajdviwhsiehebeiwheiebeiev
PDF
Unit1 - AIML Chapter 1 concept and ethics
PDF
VTU IOT LAB MANUAL (BCS701) Computer science and Engineering
PPTX
Environmental studies, Moudle 3-Environmental Pollution.pptx
DOCX
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
PPTX
Solar energy pdf of gitam songa hemant k
PDF
IAE-V2500 Engine for Airbus Family 319/320
PDF
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
PDF
Beginners-Guide-to-Artificial-Intelligence.pdf
PDF
[jvmmeetup] next-gen integration with apache camel and quarkus.pdf
PDF
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
PPTX
Wireless sensor networks (WSN) SRM unit 2
PPTX
chapter 1.pptx dotnet technology introduction
PPTX
Micro1New.ppt.pptx the main themes if micro
Cryptography and Network Security-Module-I.pdf
Micro 3 New.ppt.pdf tools the laboratory the method
IAE-V2500 Engine Airbus Family A319/320
UNIT-I Machine Learning Essentials for 2nd years
Environmental studies, Moudle 3-Environmental Pollution.pptx
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
SE unit 1.pptx aaahshdhajdviwhsiehebeiwheiebeiev
Unit1 - AIML Chapter 1 concept and ethics
VTU IOT LAB MANUAL (BCS701) Computer science and Engineering
Environmental studies, Moudle 3-Environmental Pollution.pptx
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
Solar energy pdf of gitam songa hemant k
IAE-V2500 Engine for Airbus Family 319/320
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
Beginners-Guide-to-Artificial-Intelligence.pdf
[jvmmeetup] next-gen integration with apache camel and quarkus.pdf
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
Wireless sensor networks (WSN) SRM unit 2
chapter 1.pptx dotnet technology introduction
Micro1New.ppt.pptx the main themes if micro

Cloud Computing

  • 1. Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday, September 3, 2008 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
  • 3. The iSchool University of Maryland What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications
  • 4. The iSchool University of Maryland 1. Web-Scale Problems  Characteristics:  Definitely data-intensive  May also be processing intensive  Examples:  Crawling, indexing, searching, mining the Web  “Post-genomics” life sciences research  Other scientific data (physics, astronomers, etc.)  Sensor networks  Web 2.0 applications  …
  • 5. The iSchool University of Maryland How much data?  Wayback Machine has 2 PB + 20 TB/month (2006)  Google processes 20 PB a day (2008)  “all words ever spoken by human beings” ~ 5 EB  NOAA has ~1 PB climate data (2007)  CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody.
  • 8. The iSchool University of Maryland There’s nothing like more data! s/inspiration/data/g; (Banko and Brill, ACL 2001) (Brants et al., EMNLP 2007)
  • 9. The iSchool University of Maryland What to do with more data?  Answering factoid questions  Pattern matching on the Web  Works amazingly well  Learning relations  Start with seed instances  Search for patterns on the Web  Using patterns to find more instances Who shot Abraham Lincoln? → X shot Abraham Lincoln Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) Wolfgang Amadeus Mozart (1756 - 1791) Einstein was born in 1879 PERSON (DATE – PERSON was born in DATE (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )
  • 10. The iSchool University of Maryland 2. Large Data Centers  Web-scale problems? Throw more machines at it!  Clear trend: centralization of computing resources in large data centers  Necessary ingredients: fiber, juice, and space  What do Oregon, Iceland, and abandoned mines have in common?  Important Issues:  Redundancy  Efficiency  Utilization  Management
  • 13. The iSchool University of Maryland Key Technology: Virtualization Hardware Operating System App App App Traditional Stack Hardware OS App App App Hypervisor OS OS Virtualized Stack
  • 14. The iSchool University of Maryland 3. Different Computing Models  Utility computing  Why buy machines when you can rent cycles?  Examples: Amazon’s EC2, GoGrid, AppNexus  Platform as a Service (PaaS)  Give me nice API and take care of the implementation  Example: Google App Engine  Software as a Service (SaaS)  Just run it for me!  Example: Gmail “Why do it yourself if you can pay someone to do it for you?”
  • 15. The iSchool University of Maryland 4. Web Applications  A mistake on top of a hack built on sand held together by duct tape?  What is the nature of software applications?  From the desktop to the browser  SaaS == Web-based applications  Examples: Google Maps, Facebook  How do we deliver highly-interactive Web-based applications?  AJAX (asynchronous JavaScript and XML)  For better, or for worse…
  • 16. The iSchool University of Maryland What is the course about?  MapReduce: the “back-end” of cloud computing  Batch-oriented processing of large datasets  Ajax: the “front-end” of cloud computing  Highly-interactive Web-based applications  Computing “in the clouds”  Amazon’s EC2/S3 as an example of utility computing
  • 17. The iSchool University of Maryland Amazon Web Services  Elastic Compute Cloud (EC2)  Rent computing resources by the hour  Basic unit of accounting = instance-hour  Additional costs for bandwidth  Simple Storage Service (S3)  Persistent storage  Charge by the GB/month  Additional costs for bandwidth  You’ll be using EC2/S3 for course assignments!
  • 18. The iSchool University of Maryland This course is not for you…  If you’re not genuinely interested in the topic  If you’re not ready to do a lot of programming  If you’re not open to thinking about computing in new ways  If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software  If you can’t put in the time Otherwise, this will be a richly rewarding course!
  • 20. The iSchool University of Maryland Cloud Computing Zen  Don’t get frustrated (take a deep breath)…  This is bleeding edge technology  Those W$*#T@F! moments  Be patient…  This is the second first time I’ve taught this course  Be flexible…  There will be unanticipated issues along the way  Be constructive…  Tell me how I can make everyone’s experience better
  • 25. The iSchool University of Maryland Things to go over…  Course schedule  Assignments and deliverables  Amazon EC2/S3
  • 26. The iSchool University of Maryland Web-Scale Problems?  Don’t hold your breath:  Biocomputing  Nanocomputing  Quantum computing  …  It all boils down to…  Divide-and-conquer  Throwing more hardware at the problem Simple to understand… a lifetime to master…
  • 27. The iSchool University of Maryland Divide and Conquer “Work” w1 w2 w3 r1 r2 r3 “Result” “worker” “worker” “worker” Partition Combine
  • 28. The iSchool University of Maryland Different Workers  Different threads in the same core  Different cores in the same CPU  Different CPUs in a multi-processor system  Different machines in a distributed system
  • 29. The iSchool University of Maryland Choices, Choices, Choices  Commodity vs. “exotic” hardware  Number of machines vs. processor vs. cores  Bandwidth of memory vs. disk vs. network  Different programming models
  • 30. The iSchool University of Maryland Flynn’s Taxonomy Instructions Single (SI) Multiple (MI) Data Multiple(M SISD Single-threaded process MISD Pipeline architecture SIMD Vector Processing MIMD Multi-threaded Programming Single(SD)
  • 31. The iSchool University of Maryland SISD D D D D D D D Processor Instructions
  • 32. The iSchool University of Maryland SIMD D0 Processor Instructions D0D0 D0 D0 D0 D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D1 D2 D3 D4 … Dn D0
  • 33. The iSchool University of Maryland MIMD D D D D D D D Processor Instructions D D D D D D D Processor Instructions
  • 34. The iSchool University of Maryland Memory Typology: Shared Memory Processor Processor Processor Processor
  • 35. The iSchool University of Maryland Memory Typology: Distributed MemoryProcessor MemoryProcessor MemoryProcessor MemoryProcessor Network
  • 36. The iSchool University of Maryland Memory Typology: Hybrid Memory Processor Network Processor Memory Processor Processor Memory Processor Processor Memory Processor Processor
  • 37. The iSchool University of Maryland Parallelization Problems  How do we assign work units to workers?  What if we have more work units than workers?  What if workers need to share partial results?  How do we aggregate partial results?  How do we know all the workers have finished?  What if workers die? What is the common theme of all of these problems?
  • 38. The iSchool University of Maryland General Theme?  Parallelization problems arise from:  Communication between workers  Access to shared resources (e.g., data)  Thus, we need a synchronization system!  This is tricky:  Finding bugs is hard  Solving bugs is even harder
  • 39. The iSchool University of Maryland Managing Multiple Workers  Difficult because  (Often) don’t know the order in which workers run  (Often) don’t know where the workers are running  (Often) don’t know when workers interrupt each other  Thus, we need:  Semaphores (lock, unlock)  Conditional variables (wait, notify, broadcast)  Barriers  Still, lots of problems:  Deadlock, livelock, race conditions, ...  Moral of the story: be careful!  Even trickier if the workers are on different machines
  • 40. The iSchool University of Maryland Patterns for Parallelism  Parallel computing has been around for decades  Here are some “design patterns” …
  • 41. The iSchool University of Maryland Master/Slaves slaves master
  • 42. The iSchool University of Maryland Producer/Consumer Flow CP P P C C CP P P C C
  • 43. The iSchool University of Maryland Work Queues CP P P C C shared queue W W W W W
  • 44. The iSchool University of Maryland Rubber Meets Road  From patterns to implementation:  pthreads, OpenMP for multi-threaded programming  MPI for clustering computing  …  The reality:  Lots of one-off solutions, custom code  Write you own dedicated library, then program with it  Burden on the programmer to explicitly manage everything  MapReduce to the rescue!  (for next time)