Mixing the Grid and Clouds:
High-throughput Science using Nimrod
                            or
              Is the Grid Dead?
                     David Abramson
    Monash e-Science and Grid Engineering Lab (MESSAGE Lab)
               Faculty of Information Technology

          Science Director: Monash e-Research Centre

                    ARC Professorial Fellow



                                                              1
Instructions ..
• To highlight their science and its success' so
  far and how their work utilizes advanced
  cyberinfrastructure
• Identify potential ITC barriers to ongoing
  success
• Paint a picture of the future for their
  research in the next 3 to 5 years and what
  demands it may create for cyberinfrastrcture
• Identify concrete experiments or
  demonstrations which will utilize and/or
  stress the infrastructure within 12-24
  months
What have we been doing
over the Pacific?
PRAGMA
                A Practical Collaborative Framework




                 IOIT-VN




                                    Strengthen Existing and Establish
                                          New Collaborations

                                      Work with Science Teams to
                                     Advance Grid Technologies and
                                        Improve the Underlying
                                            Infrastructure


http://www.pragma-grid.net           In the Pacific Rim and Globally
PRIME @
    Monash
• Engaged in PRIME
  since 2004
• Projects range
  from bio-engineering, theoretical
  chemistry to computer science
• Has underpinned long lasting academic
  collaborations
  – Publications
  – Presentations at conferences
• Undergraduate students without research
  experience!
MURPA Seminars
                                  I’ve participated in numerous
                                  video conferences to date but
                                  nothing like this. The quality
                                  was so high that the
                                  experience was almost as if we
                                  were all in the same room.




The massively increased
bandwidth was transformational.
Quantity begat quality.

Alan Finkel,
Chancellor, Monash Univ
Students give seminars
Clouds and Grids and ….

A bit about hype …
Gartner Hype Cycle 2000
Gartner Hype Cycle 2005
Gartner Hype Cycle 2007
Gartner Hype Cycle 2008
Gartner Hype Cycle 2009
Background and motivation
Introduction
• University research groups have used varying sources of
  infrastructure to perform computational science
   – Rarely been provided on a strict commercial basis.
   – Access controlled by the users
   – High end facilities peer re-viewed grant, usually made in terms of
     CPU hours
• Cloud computing is a major shift in
   – provisioning
   – delivery of computing infrastructure and services.
• Shift from
   – distributed, unmanaged resources to
   – scalable centralised services managed in professional data centres,
     with rapid elasticity of resource and service provisioning to users.
   – Commercial cloud services
Policy and technical challenges
• Free resources will not disappear!
• Commercial clouds could provide an
  overflow capability
• Potential
  – perform base-load computations on “free”
    resources,
  – pay-as-you-go ser-vices to meet user
    demand.
• To date, very few tools can support
  both styles of resource provisioning.
Grid Enabled Elasticity
• Resources maintained by home
  organisation
• Distinct administrative domains
• Unified compute, instruments and data
• Middleware layer
• Never solved deployment
   – See Goscinski, W. and Abramson, D. “An
     Infrastructure for the Deployment of
     e-Science Applications”, in “High
     Performance Computing (HPC) and Grids
     in Action”, Volume 16 Advances in
     Parallel Computing, Editor: L.
     Grandinetti, March 2008, approx. 540
     pp., hardcover, ISBN: 978-1-58603-
     839-7.
• Standards exploded this vision!
   – Plus a whole load of useless computer
     scientists!
                                              17
Cloud Enabled Elasticity
• Home resource expands
  elastically
• Cloud providers “join” home
  resource
• Virtual machines deployed on
  demand
• Scalable infrastructure
  – Compute
  – Doesn’t address instruments
    and data
• Do we still have a whole load
  of useless computer
  scientists?
                                  18
Hybrid solutions
• Grid (Wide Area)
  –   Wide area computing
  –   Instruments, data
  –   Security
  –   File transport
• Cloud (Local Area)
  – Elastic resources
  – Virtual machines (deployment)
• Underpinned by a computational economy!
  – Abramson, D., Giddy, J. and Kotler, L. “High Performance Parametric Modeling
    with Nimrod/G: Killer Application for the Global Grid?”, International Parallel
    and Distributed Processing Symposium (IPDPS), pp 520- 528, Cancun, Mexico,
    May 2000
High throughput science with
Nimrod
Nimrod supporting “real” science
• A full parameter sweep is the
  cross product of all the
  parameters (Nimrod/G)
• An optimization run minimizes
  some output metric and returns   Nimrod/O   Results
                                              Results
                                               Results

  parameter combinations that do
  this (Nimrod/O)
• Design of experiments limits
  number of combinations
  (Nimrod/E)
• Workflows (Nimrod/K)
Antenna Design   Aerofoil Design
                                Aerofoil Design
Drug Docking




                                                  22
Nimrod/K Workflows
• Nimrod/K integrates Kepler with
  –    Massivly parallel execution mechanism
  –    Special purpose function of Nimrod/G/O/E
  –    General purpose workflows from Kepler
  –    Flexible IO model: Streams to files



                                  Authentication

      GUI        …Kepler GUI Extensions…

                                      Vergil          Documentation

       Kepler                                   Smart
                     SMS              Type     Re-run /    Provenance
       Object                        System     Failure
                     Actor&Data
                                       Ext                 Framework
      Manager
                      SEARCH

                                               Recovery


          Kepler
          Core                                   Ptolemy
        Extensions
Parameter Sweep Actors




• Using a MATLAB actor provided by
  Kepler
• Local spawn
   • Multiple thread ran concurrently on
      a computer with 8 cores (2 x quads)
   • Workflow execution was just under
      8 times faster
• Remote Spawn
   • 100’s of remote processes
Nimrod/EK Actors




• Actors for generating
  and analyzing designs
• Leverage concurrent
  infrastructure
Nimrod/OK Workflows
              •   Nimrod/K supports
                  parallel execution
              •   General template for
                  search
                   – Built from key
                     components
              •   Can mix and match
                  optimization
                  algorithms




                                      26
A recent experiment
Resource   #jobs completed   Total job time   μ / σ Job runtime
                             (h:m:s)          (mins)
East       818               1245:37:23       91/5.7
EC2        613               683:34:05        67/14.2
A Grid exemplar
Grid Architecture for
      Microscopy
                        Microscopes

Clusters                          Storage



             Grid
           Middleware




                           Visualization
ARC Linkage Grant with Leica
               Remote control of Leica Microscope from Kepler
               Nov 2008

                        First OptiPortal/Kepler link Feb 2009




               First remote control of Leica Microscope in
             Germany to Opti-portal in Australia using Kepler
                              March 2009.
Bird’s eye capture and display
Zooming into area of interest
Image cleanup and rendering
Image cleanup and rendering
   Parallelism for free!
Strawman Project:
    Grid Enabled Microscopy Across the
            Pacific (GEMAP)?
•   Remote microscopes                  •   Cloud time
    – Currently Leica                       – Which cloud?
•   Mix of Compute Clusters                 – Who pays?
    –   University Clusters (Monash)    •   Network
    –   NCRIS (APAC grid)                   – Reservation?
    –   Rocks Virtual Clusters (UCSD)       – Who pays?
    –   Commercial services             • Project funding
        (Amazon)
                                            – Who pays?
• Distributed display devices
    – OptIPortals
•   Faculty Members          •   Funding & Support
     –   Jeff Tan                 –   Axceleon
     –   Maria Indrawan           –   Australian Partnership for Advanced
•   Research Fellows                  Computing (APAC)
     –   Blair Bethwaite          –   Australian Research Council
     –   Slavisa Garic            –   Cray Inc
     –   Donny Kurniawan
     –   Tom Peachy
                                  –   CRC for Enterprise Distributed Systems
                                      (DSTC)
•   Admin                         –   GrangeNet (DCITA)
     –   Rob Gray
                                  –   Hewlett Packard
•   Current PhD Students
                                  –   IBM
     –   Shahaan Ayyub
     –   Philip Chan              –   Microsoft
     –   Colin Enticott           –   Sun Microsystems
     –   ABM Russell              –   US Department of Energy
     –   Steve Quinette
     –   Ngoc Dinh (Minh)
•   Completed PhD Students
     –   Greg Watson
     –   Rajkumar Buyya
     –   Andrew Lewis
     –   Nam Tran
     –   Wojtek Goscinski
     –   Aaron Searle
     –   Tim Ho
     –   Donny Kurniawan                                                 37
Questions?


• More information:
    http://messagelab.monash.edu.au

Grid is Dead ? Nimrod on the Cloud

  • 1.
    Mixing the Gridand Clouds: High-throughput Science using Nimrod or Is the Grid Dead? David Abramson Monash e-Science and Grid Engineering Lab (MESSAGE Lab) Faculty of Information Technology Science Director: Monash e-Research Centre ARC Professorial Fellow 1
  • 2.
    Instructions .. • Tohighlight their science and its success' so far and how their work utilizes advanced cyberinfrastructure • Identify potential ITC barriers to ongoing success • Paint a picture of the future for their research in the next 3 to 5 years and what demands it may create for cyberinfrastrcture • Identify concrete experiments or demonstrations which will utilize and/or stress the infrastructure within 12-24 months
  • 3.
    What have webeen doing over the Pacific?
  • 4.
    PRAGMA A Practical Collaborative Framework IOIT-VN Strengthen Existing and Establish New Collaborations Work with Science Teams to Advance Grid Technologies and Improve the Underlying Infrastructure http://www.pragma-grid.net In the Pacific Rim and Globally
  • 5.
    PRIME @ Monash • Engaged in PRIME since 2004 • Projects range from bio-engineering, theoretical chemistry to computer science • Has underpinned long lasting academic collaborations – Publications – Presentations at conferences • Undergraduate students without research experience!
  • 6.
    MURPA Seminars I’ve participated in numerous video conferences to date but nothing like this. The quality was so high that the experience was almost as if we were all in the same room. The massively increased bandwidth was transformational. Quantity begat quality. Alan Finkel, Chancellor, Monash Univ
  • 7.
  • 8.
    Clouds and Gridsand …. A bit about hype …
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    Introduction • University researchgroups have used varying sources of infrastructure to perform computational science – Rarely been provided on a strict commercial basis. – Access controlled by the users – High end facilities peer re-viewed grant, usually made in terms of CPU hours • Cloud computing is a major shift in – provisioning – delivery of computing infrastructure and services. • Shift from – distributed, unmanaged resources to – scalable centralised services managed in professional data centres, with rapid elasticity of resource and service provisioning to users. – Commercial cloud services
  • 16.
    Policy and technicalchallenges • Free resources will not disappear! • Commercial clouds could provide an overflow capability • Potential – perform base-load computations on “free” resources, – pay-as-you-go ser-vices to meet user demand. • To date, very few tools can support both styles of resource provisioning.
  • 17.
    Grid Enabled Elasticity •Resources maintained by home organisation • Distinct administrative domains • Unified compute, instruments and data • Middleware layer • Never solved deployment – See Goscinski, W. and Abramson, D. “An Infrastructure for the Deployment of e-Science Applications”, in “High Performance Computing (HPC) and Grids in Action”, Volume 16 Advances in Parallel Computing, Editor: L. Grandinetti, March 2008, approx. 540 pp., hardcover, ISBN: 978-1-58603- 839-7. • Standards exploded this vision! – Plus a whole load of useless computer scientists! 17
  • 18.
    Cloud Enabled Elasticity •Home resource expands elastically • Cloud providers “join” home resource • Virtual machines deployed on demand • Scalable infrastructure – Compute – Doesn’t address instruments and data • Do we still have a whole load of useless computer scientists? 18
  • 19.
    Hybrid solutions • Grid(Wide Area) – Wide area computing – Instruments, data – Security – File transport • Cloud (Local Area) – Elastic resources – Virtual machines (deployment) • Underpinned by a computational economy! – Abramson, D., Giddy, J. and Kotler, L. “High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?”, International Parallel and Distributed Processing Symposium (IPDPS), pp 520- 528, Cancun, Mexico, May 2000
  • 20.
  • 21.
    Nimrod supporting “real”science • A full parameter sweep is the cross product of all the parameters (Nimrod/G) • An optimization run minimizes some output metric and returns Nimrod/O Results Results Results parameter combinations that do this (Nimrod/O) • Design of experiments limits number of combinations (Nimrod/E) • Workflows (Nimrod/K)
  • 22.
    Antenna Design Aerofoil Design Aerofoil Design Drug Docking 22
  • 23.
    Nimrod/K Workflows • Nimrod/Kintegrates Kepler with – Massivly parallel execution mechanism – Special purpose function of Nimrod/G/O/E – General purpose workflows from Kepler – Flexible IO model: Streams to files Authentication GUI …Kepler GUI Extensions… Vergil Documentation Kepler Smart SMS Type Re-run / Provenance Object System Failure Actor&Data Ext Framework Manager SEARCH Recovery Kepler Core Ptolemy Extensions
  • 24.
    Parameter Sweep Actors •Using a MATLAB actor provided by Kepler • Local spawn • Multiple thread ran concurrently on a computer with 8 cores (2 x quads) • Workflow execution was just under 8 times faster • Remote Spawn • 100’s of remote processes
  • 25.
    Nimrod/EK Actors • Actorsfor generating and analyzing designs • Leverage concurrent infrastructure
  • 26.
    Nimrod/OK Workflows • Nimrod/K supports parallel execution • General template for search – Built from key components • Can mix and match optimization algorithms 26
  • 27.
    A recent experiment Resource #jobs completed Total job time μ / σ Job runtime (h:m:s) (mins) East 818 1245:37:23 91/5.7 EC2 613 683:34:05 67/14.2
  • 28.
  • 29.
    Grid Architecture for Microscopy Microscopes Clusters Storage Grid Middleware Visualization
  • 30.
    ARC Linkage Grantwith Leica Remote control of Leica Microscope from Kepler Nov 2008 First OptiPortal/Kepler link Feb 2009 First remote control of Leica Microscope in Germany to Opti-portal in Australia using Kepler March 2009.
  • 31.
  • 32.
    Zooming into areaof interest
  • 33.
  • 34.
    Image cleanup andrendering Parallelism for free!
  • 35.
    Strawman Project: Grid Enabled Microscopy Across the Pacific (GEMAP)? • Remote microscopes • Cloud time – Currently Leica – Which cloud? • Mix of Compute Clusters – Who pays? – University Clusters (Monash) • Network – NCRIS (APAC grid) – Reservation? – Rocks Virtual Clusters (UCSD) – Who pays? – Commercial services • Project funding (Amazon) – Who pays? • Distributed display devices – OptIPortals
  • 36.
    Faculty Members • Funding & Support – Jeff Tan – Axceleon – Maria Indrawan – Australian Partnership for Advanced • Research Fellows Computing (APAC) – Blair Bethwaite – Australian Research Council – Slavisa Garic – Cray Inc – Donny Kurniawan – Tom Peachy – CRC for Enterprise Distributed Systems (DSTC) • Admin – GrangeNet (DCITA) – Rob Gray – Hewlett Packard • Current PhD Students – IBM – Shahaan Ayyub – Philip Chan – Microsoft – Colin Enticott – Sun Microsystems – ABM Russell – US Department of Energy – Steve Quinette – Ngoc Dinh (Minh) • Completed PhD Students – Greg Watson – Rajkumar Buyya – Andrew Lewis – Nam Tran – Wojtek Goscinski – Aaron Searle – Tim Ho – Donny Kurniawan 37
  • 37.
    Questions? • More information: http://messagelab.monash.edu.au