変通 [hen-tsoo]
noun
1. Resourcefulness – the quality of being able to cope with a difficult situation
2. Adaptability – the ability to change (or be changed) to fit changed circumstances
3. Agility – the power of moving quickly and easily; nimbleness
INFINITELY SCALABLE CLUSTERS
Grid computing on public cloud
AGENDA
• Grid Computing Background
• On-Premise & Public Cloud
• Google Cloud Platform
• Demo
January 2017
BACKGROUND
TERMINOLOGY
• Public Cloud (AWS, Azure,
Google)
• Private Cloud (Your data centre)
• High Performance Computing
(HPC)
• Grid Computing
• Compute Cluster
• CPUs / Processors / Cores
• RAM and Disk Storage
• IaaS (virtual hardware and
networking)
• PaaS (software services)
January 2017
WHAT IS PUBLIC CLOUD?
“A service provider makes resources, such as virtual machines, applications and
storage, available to the general public.”
• Utility model
• No contracts
• Shared hardware / multi-tenant
• Self managed
January 2017
WHAT IS GRID COMPUTING?
Traditional Resource Limitations:
• Data Store Performance
• PC Processor / Memory / Storage
• Network Bandwidth
The researcher may wait a long time for results.
• Grid computing moves the computational
work from the PC to a cluster of servers
• The cluster processes the data on behalf of
the researcher and returns the results
• Processing time is reduced
• Larger datasets can be tackled
January 2017
KEY CONCEPTS
The Challenges The Workflows
Number of Tasks
SizeofData
Big Data
High Throughput
Computing
MapReduce
High Performance
Computing
Ingest Process
Analyse
Visualise
Store
January 2017
CHOICE OF TOOLS AND PLATFORMS
January 2017
ON-PREMISE & PUBLIC
CLOUD
HARDWARE INFLEXIBILITY
• Buy 22 core processors at 2.2GHz or 6
core processors at 3.6GHz?
• Buy 8GB, 16GB or 32GB memory
modules (RAM per core ratio)?
• Graphical Processing Units (GPUs)?
• How much local storage per server?
• What network devices between
servers (32 or 48 port switches)?
• What size file server?
0
20
40
60
80
100
120
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Jobsperday
Date
Grid usage varies depending on research priorities:
January 2017
EXAMPLE OF
MATLAB GRID WITH
PUBLIC CLOUD
- Pay only for what you use
- Scale compute resource up
AND down
- Minimal capital outlay on
hardware
- Experiment with grid
computing platforms
quickly, cheaply and with
no commitment
January 2017
A DAY IN A PUBLIC
CLOUD CLUSTER
0
20
40
60
80
100
120
140
160
180
Time
00:30:00
01:10:00
01:50:00
02:30:00
03:10:00
03:50:00
04:30:00
05:10:00
05:50:00
06:30:00
07:10:00
07:50:00
08:30:00
09:10:00
09:50:00
10:30:00
11:10:00
11:50:00
12:30:00
13:10:00
13:50:00
14:30:00
15:10:00
15:50:00
16:30:00
17:10:00
17:50:00
18:30:00
19:10:00
19:50:00
20:30:00
21:10:00
21:50:00
22:30:00
23:10:00
Workers Tasks in Queue
- Cluster consisting 32x 4
cores
- Max 128 worker nodes
- Ramps up as jobs get
submitted
- Tears down nodes when
jobs finished
- Minimising costs when not
in use
January 2017
IDEAL CLUSTER SIZE?
0
200
400
600
800
1000
1200
1400
8 16 32 64 96 128 160 192 224
Seconds
Cores
Job Run time in seconds
Ingest Process
Analyse
Visualise
Store
Optimise other parts of the workflow?
January 2017
RUNNING HYBRID CLUSTER ON IAAS
AWS vCPUs are hyper-threaded™
Each vCPU is a hyper thread of an Intel Xeon core for 2nd generation instance types
(M4, M3, C4, C3, R3, HS1, G2, I2, and D2)
https://aws.amazon.com/ec2/instance-types/
Azure does not overcommit memory or
cores. vCPUs are physical cores.
Azure does not use hyper-threading.
https://aws.amazon.com/ec2/instance-types/
January 2017
CLOUD GRID DEPLOYMENT OPTIONS
1. Infrastructure as a Service (IaaS) DIY
Spin up a compute cluster on VMs for additional capacity and new workloads
2. Burst
Use existing on premises compute cluster and burst on cloud as required
3. Software as a Service (SaaS)
Software vendors and Managed Service Providers provide their own SaaS
solutions. Pay for compute and application software per hour
4. Platform as a Service (PaaS)
Cloud providers’ data analytics platform as a service:
Google BigQuery & Datalab, Microsoft HDInsight, Amazon EMR
January 2017
CLOUD HOSTED DATA AND
ANALYTICS AS A SERVICE
GOOGLE BIG DATA REFERENCE ARCHITECTURE
January 2017
BIGQUERY – A GOOGLE CLOUD PLATFORM SERVICE
• Fully managed and serverless architecture
• Massively scalable to petabytes of data, without the need to capacity plan
• Resources are deployed as necessary in the background to run queries in seconds
• Standard SQL queries
• Table partitioning
• No indexing needed
• Simple pricing model:
• Data storage, streaming inserts, and queries are charged
• Data loading and exporting are free of charge
BIGQUERY TECHNICAL BACKGROUND
Hadoop based “service that enables
interactive analysis of massively large
datasets”
• Distributed File System - Stores data
that’s larger than can fit on a single
machine
• Map Reduce – Distributes processing
across multiple systems
http://blogs.forrester.com/mike_gualtieri/13-06-07-what_is_hadoop
January 2017
GOOGLE BIGQUERY AND
DATALAB DEMO
FINAL NOTES – DON’T FORGET SECURITY
Security considerations:
• Secure transfer and storage of data and code
• Secure remote access to cloud hosted environment
• Secure authentication
• Windows AD Credentials
• AWS IAM Credentials
• Google Accounts
• Microsoft Accounts
• Auditing (who accessed what, who changed what)
January 2017
SUMMARY
• Traditional grid and HPC tools can benefit from moving into cloud
• Vast landscape of available tools
• Off-the-shelf PaaS offerings
• Integrations and ecosystems
• Cheap and very quick to experiment
January 2017
hello@hentsu.com
https://hentsu.com
London:
1 Fore Street
London EC2Y 9DT
New York:
450 Lexington Ave
New York 10017
MORE INFORMATION?
NEXT NEW YORK
EVENT:
MAY 2017
Cognitive Cloud Computing
Machine learning and AI for
trading strategies
January 2017

More Related Content

PPTX
Azure Big Data Story
PPTX
Big data in Azure
PPTX
REDSHIFT - Amazon
PDF
Accelerating analytics in a new era of data
PDF
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
PPTX
The evolution of the big data platform @ Netflix (OSCON 2015)
PDF
Big data on AWS
PDF
Microsoft Build 2020: Data Science Recap
Azure Big Data Story
Big data in Azure
REDSHIFT - Amazon
Accelerating analytics in a new era of data
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
The evolution of the big data platform @ Netflix (OSCON 2015)
Big data on AWS
Microsoft Build 2020: Data Science Recap

What's hot (20)

PPTX
The Fermilab HEPCloud Facility
PDF
Proud to be Polyglot - Riviera Dev 2015
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
PDF
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
PDF
Introduction to SQream and the IoT environment
PDF
Big problems Big Data, simple solutions
PDF
GPU databases - How to use them and what the future holds
PPTX
GPU 101: The Beast In Data Centers
PDF
How to teach your data scientist to leverage an analytics cluster with Presto...
PPTX
Database Choices
PDF
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
PPT
Cloud computing
PPTX
Dataminds - ML in Production
PDF
情報処理学会 Exciting Coding! Treasure Data
PPTX
Microservices Live
PDF
Big data real time architectures
PPTX
BTUG - Dec 2014 - Hybrid Connectivity Options
PDF
Build Real-Time Applications with Databricks Streaming
PDF
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
The Fermilab HEPCloud Facility
Proud to be Polyglot - Riviera Dev 2015
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
5 Comparing Microsoft Big Data Technologies for Analytics
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
Introduction to SQream and the IoT environment
Big problems Big Data, simple solutions
GPU databases - How to use them and what the future holds
GPU 101: The Beast In Data Centers
How to teach your data scientist to leverage an analytics cluster with Presto...
Database Choices
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
Cloud computing
Dataminds - ML in Production
情報処理学会 Exciting Coding! Treasure Data
Microservices Live
Big data real time architectures
BTUG - Dec 2014 - Hybrid Connectivity Options
Build Real-Time Applications with Databricks Streaming
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ad

Similar to Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York (20)

PPTX
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
PPTX
PDF
Solving enterprise challenges through scale out storage & big compute final
PDF
Cloud Computing Fundamental
PPTX
Introducing Technologies for Handling Big Data by Jaseela
PPTX
Unit 1 - Cloud Computing Basics and Details.pptx
PPT
Cloud computing
PPT
Fundamentals of Cloud Computing Basics.ppt
PPTX
Cloud Computing.pptx
PPTX
Materi Pertemuan I Trends of Computing.pptx
PPTX
From Grid to Cloud
PPTX
What is cloud computing
PPT
Cloud computing
PPTX
Unit 1
PDF
SpringPeople - Introduction to Cloud Computing
PPTX
Cloud Computing - Foundations, Perspectives & Challenges
PDF
OIT552 Cloud Computing Material
PPTX
Cloud architecture, conception and computing PPT
PDF
module1st-cloudcomputing-180131063409 - Copy.pdf
PPTX
lecture5_4.pptx
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Solving enterprise challenges through scale out storage & big compute final
Cloud Computing Fundamental
Introducing Technologies for Handling Big Data by Jaseela
Unit 1 - Cloud Computing Basics and Details.pptx
Cloud computing
Fundamentals of Cloud Computing Basics.ppt
Cloud Computing.pptx
Materi Pertemuan I Trends of Computing.pptx
From Grid to Cloud
What is cloud computing
Cloud computing
Unit 1
SpringPeople - Introduction to Cloud Computing
Cloud Computing - Foundations, Perspectives & Challenges
OIT552 Cloud Computing Material
Cloud architecture, conception and computing PPT
module1st-cloudcomputing-180131063409 - Copy.pdf
lecture5_4.pptx
Ad

Recently uploaded (20)

PDF
SaaS reusability assessment using machine learning techniques
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Human Computer Interaction Miterm Lesson
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
substrate PowerPoint Presentation basic one
SaaS reusability assessment using machine learning techniques
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Electrocardiogram sequences data analytics and classification using unsupervi...
A symptom-driven medical diagnosis support model based on machine learning te...
Connector Corner: Transform Unstructured Documents with Agentic Automation
Auditboard EB SOX Playbook 2023 edition.
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Human Computer Interaction Miterm Lesson
Introduction to MCP and A2A Protocols: Enabling Agent Communication
4 layer Arch & Reference Arch of IoT.pdf
Advancing precision in air quality forecasting through machine learning integ...
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Co-training pseudo-labeling for text classification with support vector machi...
Build Real-Time ML Apps with Python, Feast & NoSQL
Build automations faster and more reliably with UiPath ScreenPlay
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
substrate PowerPoint Presentation basic one

Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York

  • 1. 変通 [hen-tsoo] noun 1. Resourcefulness – the quality of being able to cope with a difficult situation 2. Adaptability – the ability to change (or be changed) to fit changed circumstances 3. Agility – the power of moving quickly and easily; nimbleness INFINITELY SCALABLE CLUSTERS Grid computing on public cloud
  • 2. AGENDA • Grid Computing Background • On-Premise & Public Cloud • Google Cloud Platform • Demo January 2017
  • 4. TERMINOLOGY • Public Cloud (AWS, Azure, Google) • Private Cloud (Your data centre) • High Performance Computing (HPC) • Grid Computing • Compute Cluster • CPUs / Processors / Cores • RAM and Disk Storage • IaaS (virtual hardware and networking) • PaaS (software services) January 2017
  • 5. WHAT IS PUBLIC CLOUD? “A service provider makes resources, such as virtual machines, applications and storage, available to the general public.” • Utility model • No contracts • Shared hardware / multi-tenant • Self managed January 2017
  • 6. WHAT IS GRID COMPUTING? Traditional Resource Limitations: • Data Store Performance • PC Processor / Memory / Storage • Network Bandwidth The researcher may wait a long time for results. • Grid computing moves the computational work from the PC to a cluster of servers • The cluster processes the data on behalf of the researcher and returns the results • Processing time is reduced • Larger datasets can be tackled January 2017
  • 7. KEY CONCEPTS The Challenges The Workflows Number of Tasks SizeofData Big Data High Throughput Computing MapReduce High Performance Computing Ingest Process Analyse Visualise Store January 2017
  • 8. CHOICE OF TOOLS AND PLATFORMS January 2017
  • 10. HARDWARE INFLEXIBILITY • Buy 22 core processors at 2.2GHz or 6 core processors at 3.6GHz? • Buy 8GB, 16GB or 32GB memory modules (RAM per core ratio)? • Graphical Processing Units (GPUs)? • How much local storage per server? • What network devices between servers (32 or 48 port switches)? • What size file server? 0 20 40 60 80 100 120 Monday Tuesday Wednesday Thursday Friday Saturday Sunday Jobsperday Date Grid usage varies depending on research priorities: January 2017
  • 11. EXAMPLE OF MATLAB GRID WITH PUBLIC CLOUD - Pay only for what you use - Scale compute resource up AND down - Minimal capital outlay on hardware - Experiment with grid computing platforms quickly, cheaply and with no commitment January 2017
  • 12. A DAY IN A PUBLIC CLOUD CLUSTER 0 20 40 60 80 100 120 140 160 180 Time 00:30:00 01:10:00 01:50:00 02:30:00 03:10:00 03:50:00 04:30:00 05:10:00 05:50:00 06:30:00 07:10:00 07:50:00 08:30:00 09:10:00 09:50:00 10:30:00 11:10:00 11:50:00 12:30:00 13:10:00 13:50:00 14:30:00 15:10:00 15:50:00 16:30:00 17:10:00 17:50:00 18:30:00 19:10:00 19:50:00 20:30:00 21:10:00 21:50:00 22:30:00 23:10:00 Workers Tasks in Queue - Cluster consisting 32x 4 cores - Max 128 worker nodes - Ramps up as jobs get submitted - Tears down nodes when jobs finished - Minimising costs when not in use January 2017
  • 13. IDEAL CLUSTER SIZE? 0 200 400 600 800 1000 1200 1400 8 16 32 64 96 128 160 192 224 Seconds Cores Job Run time in seconds Ingest Process Analyse Visualise Store Optimise other parts of the workflow? January 2017
  • 14. RUNNING HYBRID CLUSTER ON IAAS AWS vCPUs are hyper-threaded™ Each vCPU is a hyper thread of an Intel Xeon core for 2nd generation instance types (M4, M3, C4, C3, R3, HS1, G2, I2, and D2) https://aws.amazon.com/ec2/instance-types/ Azure does not overcommit memory or cores. vCPUs are physical cores. Azure does not use hyper-threading. https://aws.amazon.com/ec2/instance-types/ January 2017
  • 15. CLOUD GRID DEPLOYMENT OPTIONS 1. Infrastructure as a Service (IaaS) DIY Spin up a compute cluster on VMs for additional capacity and new workloads 2. Burst Use existing on premises compute cluster and burst on cloud as required 3. Software as a Service (SaaS) Software vendors and Managed Service Providers provide their own SaaS solutions. Pay for compute and application software per hour 4. Platform as a Service (PaaS) Cloud providers’ data analytics platform as a service: Google BigQuery & Datalab, Microsoft HDInsight, Amazon EMR January 2017
  • 16. CLOUD HOSTED DATA AND ANALYTICS AS A SERVICE
  • 17. GOOGLE BIG DATA REFERENCE ARCHITECTURE January 2017
  • 18. BIGQUERY – A GOOGLE CLOUD PLATFORM SERVICE • Fully managed and serverless architecture • Massively scalable to petabytes of data, without the need to capacity plan • Resources are deployed as necessary in the background to run queries in seconds • Standard SQL queries • Table partitioning • No indexing needed • Simple pricing model: • Data storage, streaming inserts, and queries are charged • Data loading and exporting are free of charge
  • 19. BIGQUERY TECHNICAL BACKGROUND Hadoop based “service that enables interactive analysis of massively large datasets” • Distributed File System - Stores data that’s larger than can fit on a single machine • Map Reduce – Distributes processing across multiple systems http://blogs.forrester.com/mike_gualtieri/13-06-07-what_is_hadoop January 2017
  • 21. FINAL NOTES – DON’T FORGET SECURITY Security considerations: • Secure transfer and storage of data and code • Secure remote access to cloud hosted environment • Secure authentication • Windows AD Credentials • AWS IAM Credentials • Google Accounts • Microsoft Accounts • Auditing (who accessed what, who changed what) January 2017
  • 22. SUMMARY • Traditional grid and HPC tools can benefit from moving into cloud • Vast landscape of available tools • Off-the-shelf PaaS offerings • Integrations and ecosystems • Cheap and very quick to experiment January 2017
  • 23. [email protected] https://hentsu.com London: 1 Fore Street London EC2Y 9DT New York: 450 Lexington Ave New York 10017 MORE INFORMATION?
  • 24. NEXT NEW YORK EVENT: MAY 2017 Cognitive Cloud Computing Machine learning and AI for trading strategies January 2017

Editor's Notes

  • #5: HPC - use of parallel processing, high bandwidth, high throughput Grid - networked compute resources Compute cluster - similar to grid, compute tied together as a processing unit
  • #8: MapReduce - Map() and Reduce() - filter and summary results
  • #15: Hyperthreading - 1/2 a physical core, possibly inconsistent performance,