0% found this document useful (0 votes)
150 views68 pages

HPC Applications and Parallel Programming

This document provides an introduction to a course on high performance computing with applications in R. It discusses how computational challenges can be addressed through parallel computing. It outlines that the course will cover computer architecture, parallel programming in R, and cloud computing. It also explains some of the complex applications that arise in quantitative finance that require high performance computing. Finally, it briefly discusses Moore's Law and how the number of transistors in microprocessors has increased exponentially over time.

Uploaded by

yashbhardwaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views68 pages

HPC Applications and Parallel Programming

This document provides an introduction to a course on high performance computing with applications in R. It discusses how computational challenges can be addressed through parallel computing. It outlines that the course will cover computer architecture, parallel programming in R, and cloud computing. It also explains some of the complex applications that arise in quantitative finance that require high performance computing. Finally, it briefly discusses Moore's Law and how the number of transistors in microprocessors has increased exponentially over time.

Uploaded by

yashbhardwaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

High Performance Computing

with Applications in R
Florian Schwendinger, Gregor Kastner, Stefan Theußl
October 4, 2021
1 / 68
Outline

Four parts:
I Introduction
I Computer Architecture
I The parallel Package
I (Cloud Computing)

Outline 2 / 68
Part I
Introduction
About this Course

After this course students will


I be familiar with concepts and parallel programming paradigms in High Performance
Computing (HPC),
I have an basic understanding of computer architecture and its implication on parallel
computing models,
I be capable of choosing the right tool for time consuming tasks depending on the type of
application as well as the available hardware,
I know how to use large clusters of workstations,
I know how to use the parallel package,
I be familiar with parallel random number generators in order to run large scale simulations
on various nodes (e.g., Monte Carlo simulation),
I understand the cloud; definitions and terminologies.

Part I: Introduction About this Course 4 / 68


Administrative Details

I A good knowledge of R is required.


I You should be familiar with basic Linux commands since we use some in this course.
I The course material is available at https://atc.r-forge.r-project.org/.

Part I: Introduction About this Course 5 / 68


What is this course all about?

High performance computing (HPC) refers to the use of (parallel) supercomputers and
computer clusters. Furthermore, HPC is a branch of computer science that
concentrates on developing high performance computers and software to run on
these computers.
Parallel computing is an important area of this discipline. It is referred to as the development
of parallel processing algorithms or software.
Parallelism is physically simultaneous processing of multiple threads or processes with the
objective to increase performance (this implies multiple processing elements).

Part I: Introduction About this Course 6 / 68


Complex Applications in Finance

I In quantitative research we are increasingly facing the following challenges:


I more accurate and time consuming models (1),
I computational intensive applications (2),
I and/or large datasets (3).
I Thus, one could
I wait (1+2),
I reduce problem size (3),
I or
I run similar tasks on independent processors in parallel (1+2),
I load data onto multiple machines that work together in parallel (3).

Part I: Introduction Challenges in Computing 7 / 68


Moore’s Law

Microprocessor Transistor Counts 1971−2017 & Moore's Law

2e+10 32−core AMD Epyc ●

32−core SPARC M7 ● ●
22−core Xeon Broadwell−E5
● ● ●
18−core Xeon ● ●
61−core
Xbox Xeon
One PhiHaswell−E5
main ●
SoC ● ●
4e+09 ● ● ● ●
● ● ●

10−core Xeon Westmere−EX ● ● ●
8−core Xeon Nehalem−EX ● ●
2e+09 Six−core2 Xeon 7400 ● ● ● ● ● ●
Dual−core Itanium ● ●
● ●
● ●
1e+09 Six−core Opteron 2400 ●


● ●

POWER6 ● ●
Itanium 2 with 9 MB cache ●
5e+08 Itanium 2 Madison 6M ●




● D Smithfield ●
Pentium
Itanium 2 McKinley ● ●

2e+08 ● ●
● ●

● ●
PentiumPentium III Tualatin
4 Willamette ● ● ●

Pentium II Mobile Dixon ● ●


Transistor count

● ●

AMD K6 ● ●

Pentium II Deschutes ●
Pentium Pro ●
4e+06 AMD K5 ●
Pentium ●
68060 ● ●

R4000 ●
Intel 8048668040
● ●

TI Explorer's 32−bit Lisp machine chip ● ●

Intel 80386 ● ●
2e+05 Intel ●
i960 ●
Motorola 68020 ● ●
Intel 80286 ●

Motorola 68000 ●

ARM 2 ●

Intel 8086 ● ●
WDC 65C816 ● ●

1e+04 ●
WDC 65C02 ●
Zilog Z80 ●
8e+03 TMS 1000 ●



●6502 ●
IntelMOS ●
8008Technology
Intel 4004 ●
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
Date of introduction

Part I: Introduction Challenges in Computing 8 / 68


Some Recent Developments

I Consider Moore’s law: number of transistors on a chip doubles every 18 months.


I Until recently this corresponded to a speed increase at about the same rate.
I However, speed per processing unit has remained flat because of:
I heat
I power consumption
I technological reasons
I Graphics cards have been equipped with multiple logical processors on a chip.
+ more than 500 specialized compute cores for a few hundreds of Euros.
− special libraries are needed to program these. Still experimental interfaces to languages like R.
I Parallel computing is likely to become essential even for desktop computers.

Part I: Introduction Challenges in Computing 9 / 68


Almost Ideal Scenario

Application: pricing a European call option using several CPUs in parallel.

Task: Parallel Monte Carlo Simulation

normal

60
● MPI

50

execution time [s]

40
30


20




10

● ●


2 4 6 8 10

# of CPUs

Source: Theußl (2007, p. 104)

Figure: Runtime for a simulation of 5000 payoffs repeated 50 times

Part I: Introduction Challenges in Computing 10 / 68


Parallel Programming Tools

I We know that it is hard to write efficient sequential programs.


I Writing correct, efficient parallel programs is even harder.
I However, several tools facilitate parallel programming on different levels.
I Low level tools: (TCP) sockets [1] for distributed memory computing and threads [2] for
shared memory computing.
I Intermediate level tools: message-passing libraries like MPI [3] for distributed memory
computing or OpenMP [4] for shared memory computing.
I Higher level tools: integrate well with higher level languages like R.
I The latter let us parallelize existing code without too much modification and are the main
focus of this course.
[1] http://en.wikipedia.org/wiki/Network_socket
[2] http://en.wikipedia.org/wiki/Thread_(computing)
[3] http://www.mcs.anl.gov/research/projects/mpi/
[4] http://openmp.org/wp/

Part I: Introduction Challenges in Computing 11 / 68


Performance Metrics

The best and easiest way to analyze the performance of an application is to measure the
execution time. Consequently an application can be compared with an improved version
through the execution times.
ts
Speedup = (1)
te
where
ts denotes the execution time for a program without enhancements (serial version)
te denotes the execution time for a program using the enhancements (enhanced
version)

Part I: Introduction Performance Metrics 12 / 68


Parallelizable Computations

I A simple model says intuitively that a computation runs p times faster when split over p
processors.
I More realistically, a problem has a fraction f of its computation that can be parallelized
thus, the remaining fraction 1 − f is inherently sequential.
I Amdahl’s law:
1
Maximum Speedup =
f /p + (1 − f )
I Problems with f = 1 are called embarrassingly parallel.
I Some problems are (or seem to be) embarrassingly parallel: computing column means,
bootsrapping, etc.

Part I: Introduction Performance Metrics 13 / 68


Amdahl’s Law

Amdahl’s Law for Parallel Computing

10
f=1
f = 0.9
f = 0.75
8 f = 0.5
speedup

6
4
2

2 4 6 8 10

number of processors

Part I: Introduction Performance Metrics 14 / 68


Literature

I Schmidberger et al. (2009)


I HPC Task View http://CRAN.R-project.org/view=HighPerformanceComputing by
Dirk Eddelbuettel
I Kontoghiorghes (2006)
I Rossini et al. (2003)
I Notes on WU’s cluster and cloud system can be found at
http://statmath.wu.ac.at/cluster/ and http://cloud.wu.ac.at/manual/,
respectively.

Part I: Introduction Performance Metrics 15 / 68


Applications

We want to improve the following applications:


I Global optimization using a multistart approach.
I Option pricing using Monte Carlo Simulation.
I Markov Chain Monte Carlo (MCMC) – cf. lecture on Bayesian Computing.
These will be the assignments for this course.

Part I: Introduction Applications 16 / 68


Global Optimization

We want to find the global minimum of the following function:


2 2 x 2 2 1 2 2
f (x, y ) = 3(1 − x)2 e −x −(y +1) − 10( − x 3 − y 5 )e −x −y − e −(x+1) −y
5 3

0
z

−5
2
1
y 0−1 2
−1 0 1
−2 −2 x

Part I: Introduction Applications 17 / 68


European Call Options

I Underlying St (non-dividend paying stock)


I Expiration date or maturity T
I Strike price X
I Payoff CT

0 if ST ≤ X
CT = (2)
ST − X if ST > X
= max{ST − X , 0}

I Can also be priced analytically via Black-Scholes-Merton model

Part I: Introduction Applications 18 / 68


MC Algorithm (1)

1. Sample a random path for S in a risk neutral world.


2. Calculate the payoff from the derivative.
3. Repeat steps 1 and 2 to get many sample values of the payoff from the derivative in a risk
neutral world.
4. Calculate the mean of the sample payoffs to get an estimate of the expected payoff in a
risk neutral world.
5. Discount the expected payoff at a risk free rate to get an estimate of the value of the
derivative.

Part I: Introduction Applications 19 / 68


MC Algorithm (2)

Require: option characteristics (S, X, T), the risk free yield r , the number of simulations n
1: for i = 1 : n do
2: generate Z i √
1 2 i
3: STi = S(0)e (r − 2 σ )T +σ T Z
4: CTi = e −rT max(STi − X , 0)
5: end for
1 n
6: Cbn = CT +...+CT
T n

Part I: Introduction Applications 20 / 68


Sampling Accuracy

The number of trials carried out depends on the accuracy required. If n independent
simulations are run, the standard error of the estimate CbTn of the payoff is
s

n

where s is the (estimated) standard deviation of the discounted payoff given by the simulation.
According to the central limit theorem, a 95% confidence interval for the “true” price of the
derivative is given asymptotically by
1.96s
CbTn ± √ .
n
The accuracy of the simulation is therefore inversely proportional to the square root of the
number of trials n. This means that to double the accuracy the number of trials have to be
quadrupled.

Part I: Introduction Applications 21 / 68


General Strategies

I “Pseudo” parallelism by simply starting the same program with different parameters
several times.
I Implicit parallelism, e.g., via parallelizing compilers, or built-in support of packages.
I Explicit parallelism with implicit decomposition.
I Parallelism easy to achieve using compiler directives (e.g., OpenMP).
I Incrementally parallelizing sequential code possible.
I Explicit parallelism, e.g., with message passing libraries.
I Use R packages porting API of such libraries.
I Development of parallel programs is difficult.
I Deliver good performance.

Part I: Introduction Applications 22 / 68


Part II
Computer Architecture
Computer Architecture

Shared Memory Systems (SMS) host multiple processors which share one global main memory
(RAM), e.g., multi core systems.
Distributed Memory Systems (DMS) consist of several units connected via an interconnection
network. Each unit has its own processor with its own memory.
DMS include:
I Beowulf Clusters are scalable performance clusters based on commodity hardware, on a
private system network, with open source software (Linux) infrastructure (e.g.,
clusterwu@WU).
I The Grid connects participating computers via the Internet (or other wide area networks)
to reach a common goal. Grids are more loosely coupled, heterogeneous, and
geographically dispersed.
The Cloud or cloud computing is a model for enabling convenient, on-demand network access
to a shared pool of configurable computing resources.

Part II: Computer Architecture Overview 24 / 68


Excursion: Process vs. Thread

Processes are the execution of lists of statements (a sequential program). Processes have
their own state information, use their own address space and interact with other
processes only via an interprocess communication mechanism generally managed
by the operating system. (A master process may spawn subprocesses which are
logically separated from the master process)
Threads are typically spawned from processes for a short time to achieve a certain task
and then terminate – fork/join principle. Within a process threads share the
same state and same memory space, and can communicate with each other
directly through shared variables.

Part II: Computer Architecture Overview 25 / 68


Excursion: Overhead and Scalability

Overhead is generally considered any combination of excess or indirect computation time,


memory, bandwidth, or other resources that are required to be utilized or
expanded to enable a particular goal.
Scalability is referred to as the capability of a system to increase the performance under an
increased load when resources (in this case CPUs) are added.
Scaling Efficiency
ts
E=
te p

Part II: Computer Architecture Overview 26 / 68


Shared Memory Platforms

I Multiple processors share one global memory


I Connected to global memory mostly via bus technology
I Communication via shared variables
I SMPs are now commonplace because of multi core CPUs
I Limited number of processors (up to around 64 in one machine)

Part II: Computer Architecture Overview 27 / 68


Distributed Memory Platforms

I Provide access to cheap computational power


I Can easily scale up to several hundreds or thousands of processors
I Communication between the nodes is achieved through common network technology
I Typically we use message passing libraries like MPI or PVM

Part II: Computer Architecture Overview 28 / 68


Distributed Memory Platforms

Part II: Computer Architecture Overview 29 / 68


Shared Memory Computing

Parallel computing involves splitting work among several processors. Shared memory parallel
computing typically has
I single process,
I single address space,
I multiple threads or light-weight processes,
I all data is shared,
I access to key data needs to be synchronized.

Part II: Computer Architecture Shared Memory Computing 30 / 68


Typical System

IBM System p 550


4 2-core IBM POWER6 @ 3.5 GHz
128 GB RAM
This is a total of 8 64-bit computation nodes which have access to 128 gigabytes of shared
memory.

Part II: Computer Architecture Shared Memory Computing 31 / 68


Cluster Computing

Cluster computing or distributed memory computing usually has


I multiple processes, possibly on different computers
I each process has its own address space
I data needs to be exchanged explicitly
I data exchange points are usually points of synchronization
Hybrid models, i.e., combinations of distributed and shared memory computing are possible.

Part II: Computer Architecture Cluster Computing 32 / 68


Cluster@WU (Hardware)

Cluster@WU consists of computation nodes accessible via so-called queues (node.q,


hadoop.q), a file server, and a login server.

Login server
2 Quad Core Intel Xeon X5550 @ 2.67 GHz
24 GB RAM
File server
2 Quad Core Intel Xeon X5550 @ 2.67 GHz
24 GB RAM
node.q – 44 nodes
2 Six Core Intel Xeon X5670 @ 2.93 GHz
24 GB RAM
This is a total of 528 64-bit computation cores (544 including login- and file server) and more
than 1 terabyte of RAM.

Part II: Computer Architecture Cluster Computing 33 / 68


Connection Technologies

I Sockets: everything is managed by R, thus “easy”. Socket connections run over TCP/IP,
thus usable almost on any given system. Advantage: no additional software required.
I Message Passing Interface (MPI): basically a definition of a networking protocol. Several
different implementations, but openMPI (see http://www.open-mpi.org/) is the most
common and widely used.
I Parallel Virtual Machine (PVM): nowadays obsolete.
I NetWorkSpaces (NWS): is a framework for coordinating programs written in scripting
languages.

Part II: Computer Architecture Cluster Computing 34 / 68


Cluster@WU (Software)

I Debian GNU/Linux
I Compiler Collections
I GNU 4.4.7 (gcc, g++, gfortran, . . . ), [g]
I R, some packages from CRAN
I R-g latest R-patched compiled with [g]
I R-<g>-<date> R-devel compiled at <date>
I Linear algebra libraries (BLAS, LAPACK, INTEL MKL)
I OpenMPI, PVM and friends
I various editors (emacs, vi, nano, etc.)

Part II: Computer Architecture Cluster Computing 35 / 68


Cluster Login Information

You can use the following account:


I login host: clusterwu.wu.ac.at
I user name: provided in class
I password: provided in class
The software packages R 3.4.3, OpenMPI 1.10.0, Rmpi 0.6-7 and snow 0.4-1 are pre-installed
for this account. All relevant data and code is supplied in the directory ’~/HPC examples’.

Part II: Computer Architecture Cluster Computing 36 / 68


Using Cluster@WU

I Remote connection can be established by


I Secure shell (ssh): type ssh <username>@clusterwu.wu.ac.at on the terminal
I Windows
I MobaXterm (http://mobaxterm.mobatek.net/download-home-edition.html)
I Combination of PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/) and
WinSCP (http://winscp.net)
I First time configuration:
I Add the following lines to the beginning of your ’~/.bashrc’ (and make sure that
’~/.bashrc’ is sourced at login). Done for you.
## OPENMPI
export MPI=/opt/libs/openmpi-1.10.0-GNU-4.9.2-64/
export PATH=${MPI}/bin:/opt/R/bin:/opt/sge/bin/lx-amd64:${PATH}
export LD_LIBRARY_PATH=${MPI}/lib:${MPI}/lib/openmpi:${LD_LIBRARY_PATH}
## Personal R package library
export R_LIBS="~/lib/R/"
I Create your package library using mkdir -p ~/lib/R. Done for you.

Part II: Computer Architecture Using Cluster@WU 37 / 68


SGE

Son of Grid Engine (SGE) is an open source cluster resource management and scheduling
software. It is used to run cluster jobs which are user requests for resources (i.e., actual
computing instances or nodes) available in a cluster/grid.
In general the SGE has to match the available resources to the requests of the grid users. SGE
is responsible for
I accepting jobs from the outside world
I delay a job until it can be run
I sending job from the holding area to an execution device (node)
I managing running jobs
I logging of jobs
Useful SGE commands:
I qsub submits a job.
I qstat shows statistics of jobs running on cluster@WU.
I qdel deletes a job.
I sns shows the status of all nodes in the cluster.
Part II: Computer Architecture Using Cluster@WU 38 / 68
Submitting SGE Jobs

1. Login,
2. create a plain text file (e.g., ’myJob.qsub’) with the job description, containing e.g.:
#!/bin/bash
## This is my first cluster job.

#$ -N MyJob

R-g --version
sleep 10
3. then type qsub myJob.qsub and hit enter.
4. Output files are provided as ’<jobname>.o<jobid>’ (standard output) and
’<jobname>.e<jobid>’ (error output), respectively.

Part II: Computer Architecture Using Cluster@WU 39 / 68


SGE Jobs

An SGE job typically begins with commands to the grid engine. These commands are prefixed
with #$.
E.g., the following arguments can be passed to the grid engine:
-N specifying the actual jobname
-q selecting one of the available queues. Defaults to node.q.
-pe [type] [n] stets up a parallel environment of type [type] reserving [n] cores.
-t <first>-<last>:stepsize creates a job-array (e.g. -t 1-20:1)
-o [path] redirects stdout to path
-e [path] redirects stderr to path
-j y[es] merges stdout and stderr into one file
For an extensive listing of all avialable arguments type qsub -help into your terminal.

Part II: Computer Architecture Using Cluster@WU 40 / 68


A Simple MPI Example

I We want our processes sent us the following message: "Hello World from processor
<ID>" where <ID> is the processor ID (or rank in MPI terminology).
I MPI uses the master-worker paradigm, thus a master process is responsible for starting
(spawning) worker processes.
I In R we can utilize the MPI library via package Rmpi.

Expert note: Rmpi can be installed using install.packages("Rmpi",


configure.args="--with-mpi=/opt/libs/openmpi-1.10.4-GNU-4.9.2-64") in R. Done
for you.

Part II: Computer Architecture Using Cluster@WU 41 / 68


A Simple MPI Example

I Job script (’Rmpi hello world.qsub’):


#!/bin/sh
## A simple Hello World! cluster job
#$ -cwd # use current working dirrectory
#$ -N RMPI_Hello # set job name
#$ -o RMPI_Hello.log # write stdout to RMPI_Hello.log
#$ -j y # merge stdout and stderr
#$ -pe mpi 4 # use the parallel environment "mpi" with 4 slots

R-g --vanilla < Rmpi_hello_world.R

Part II: Computer Architecture Using Cluster@WU 42 / 68


I R code (’Rmpi hello world.R’):
library("Rmpi")

slots <- as.integer(Sys.getenv("NSLOTS"))

mpi.is.master()
mpi.get.processor.name()
mpi.spawn.Rslaves(nslaves = slots)
mpi.remote.exec(mpi.comm.rank())

hello <- function(){


sprintf("Hello World from processor %d", mpi.comm.rank())
}

mpi.bcast.Robj2slave(hello)
mpi.remote.exec(hello())
I MPI is used via package Rmpi.

Part II: Computer Architecture Using Cluster@WU 43 / 68


Part III
The parallel Package
The parallel Package

I Available since R version 2.14.0.


I Builds on the CRAN packages multicore and snow.
I multicore for parallel computation on shared memory (unix) plattforms
I snow for parallel computation on distributed memory plattforms
I Allows to use both, shared- (threads, POSIX systems only) and distributed memory
systems (sockets).
I Additionally, package snow extends parallel with other communication technologies for
distributed memory computing like MPI.
I Integrates handling of random numbers.
For more details see the parallel vignette .

Part III: The parallel Package 45 / 68


Functions of the Parallel Package

Shared Memory
Function Description Example
detectCores detect the number of CPU cores ncores <- detectCores()
mclapply parallelized version of lapply mclapply(1:5, runif, mc.cores = ncores)

Distributed memory
Function Description Example
makeCluster start the cluster 1 cl <- makeCluster(10, type="MPI")
clusterSetRNGStream set seed on cluster clusterSetRNGStream(cl, 321)
clusterExport exports variables to the workers clusterExport(cl, list(a=1:10, x=runif(10)))
clusterEvalQ evaluates expressions on workers clusterEvalQ(cl, {x <- 1:3
myFun <- function(x) runif(x)})
clusterCall calls a function on all workers clusterCall(cl, function(y) 3 + y, 2)
parLapply parallelized version of lapply parLapply(cl, 1:100, Sys.sleep)
parLapplyLB parLapply with load balancing parLapplyLB(cl, 1:100, Sys.sleep)
stopCluster stop the cluster stopCluster(cl)

1
(allowed types are PSOCK, FORK, SOCK, MPI and NWS)
Part III: The parallel Package 46 / 68
A Simple Multicore Example

I We use the parallel package to parallelize certain constructs.


I E.g., instead of lapply() we use mclapply() which implicitly applies a given function
to the supplied parameters in parallel.
I Example: global optimization using a multistart approach.

Part III: The parallel Package 47 / 68


A Simple Multicore Example

Sequential:
fun <- function(x){
3*(1-x[1])^2*exp(-x[1]^2-(x[2]+1)^2) - 10 * (x[1]/5 - x[1]^3 - x[2]^5) * exp(-x[1]^2 - x[2]^2) - 1/3 * exp(-(x[1]+1)^2 - x[2]^2)
}
start <- list( c(0, 0), c(-1, -1), c(0, -1), c(0, 1) )
seqt <- system.time(sol <- lapply(start, function(par) optim(par, fun, method = "Nelder-Mead",
lower = -Inf, upper = Inf, control = list(maxit = 1000000, beta = 0.01, reltol = 1e-15)))
)["elapsed"]
seqt

Parallel:
require(parallel)
ncores <- detectCores()
part <- system.time(sol <- mclapply(start, function(par)
optim(par, fun, method = "Nelder-Mead", lower = -Inf, upper = Inf,
control = list(maxit = 1000000, beta = 0.01, reltol = 1e-15)), mc.cores = ncores)
)["elapsed"]

Part III: The parallel Package 48 / 68


Random Numbers and Parallel Computing

I You need to be careful when generating pseudo random numbers in parallel, especially if
you want the streams be independend and reproducible.
I Identical streams produced on each node are likely but not guaranteed.
I Parallel PRNGs usually have to be set up by the user. E.g., via clusterSetRNGStream()
in package parallel.
I The source file ’snow pprng.R’ shows how to use such parallel PRNGs.

Part III: The parallel Package 49 / 68


Example: Pricing European Options (1)

MC sim using the parallel package. Job script (’snow mc sim.qsub’):


#!/bin/sh
## Parallel MC simulation using parallel/snow

#$ -N SNOW_MC
#$ -pe mpi 4

R-g --vanilla < snow_mc_sim.R

Note: to run this example package snow has to be installed since no functionality to start MPI
clusters is provided with package parallel.

Part III: The parallel Package 50 / 68


Example: Pricing European Options (2)

R script (’snow mc sim.R’):


require("parallel")

source("HPC_course.R")

## number of paths to simulate


n <- 400000

slots <- as.integer(Sys.getenv("NSLOTS"))

## start MPI cluster and retrieve the nodes we are working on.
cl <- snow::makeMPIcluster(slots)
clusterCall(cl, function() Sys.info()[c("nodename","machine")])

## note that this must be an integer


sim_per_slot <- as.integer(n/slots)

## setup PRNG
clusterSetRNGStream(cl, iseed = 123)
price <- MC_sim_par(cl, sigma = 0.2, S = 120, T = 1, X = 130, r = 0.05, n_per_node = sim_per_slot, nodes = slots)
price

## finally shut down the MPI cluster


stopCluster(cl)

Part III: The parallel Package 51 / 68


Literature

I R-core (2013)
I Mahdi (2014)
I Jing (2010)
I McCallum and Weston (2011)

Part III: The parallel Package 52 / 68


Part IV
Cloud Computing
Overview

What is the Cloud?

Source: Wikipedia, http://en.wikipedia.org/wiki/File:Cloud_computing.svg, accessed 2011-12-05

Part IV: Cloud Computing Overview 54 / 68


Cloud Computing

According to the NIST Definition of Cloud Computing, see


http://csrc.nist.gov/groups/SNS/cloud-computing/, the cloud model
I promotes availability,
I is composed of five essential characteristics (On-demand self-service, Broad network
access, Resource pooling, Rapid elasticity, Measured Service),
I three service models:
IaaS: Infrastructure as a Service
PaaS: Protocol as a Service
SaaS: Software as a Service
I and four deployment models (private, community, public, hybrid clouds).

Part IV: Cloud Computing Overview 55 / 68


Essential Characteristics

I On-demand self-service. Provision computing capabilities, such as server time and


network storage, as needed.
I Broad network access. Service available over the network and accessed through API (via
mobile, laptop, Internet).
I Resource pooling. Different physical and virtual resources (e.g., storage, memory, network
bandwidth, virtual machines) are dynamically assigned and reassigned according to
consumer demand.
I Rapid elasticity. Capabilities can be rapidly and elastically provisioned.
I Measured Service. Resource usage can be monitored, controlled, and reported providing
transparency for both the provider and consumer of the utilized service.

Part IV: Cloud Computing Overview 56 / 68


Service Models

I Cloud Infrastructure as a Service (IaaS). provides (abstract) infrastructure as a service


(computing environments available for rent, e.g., Amazon EC2, see
http://aws.amazon.com/ec2/)
I Cloud Platform as a Service (PaaS). corresponds to programming environments, or,
platform as a service (development environment for web applications, e.g., Google App
Engine).
I Cloud Software as a Service (SaaS). refers to the provision of software as a service (web
applications like office environments, e.g., Google Docs).

Part IV: Cloud Computing Overview 57 / 68


Deployment Models

I Private cloud. The cloud infrastructure is operated solely for an organization (e.g.,
wu.cloud).
I Community cloud. The cloud infrastructure is shared by several organizations and
supports a specific community that has shared concerns.
I Public cloud. The cloud infrastructure is made available to the general public or a large
industry group and is owned by an organization selling cloud services (e.g., Amazon EC2).
I Hybrid cloud. The cloud infrastructure is a composition of two or more clouds (private,
community, or public) bound together by standardized or proprietary technology.

Part IV: Cloud Computing Overview 58 / 68


Terminology

Image (also called “appliance”): A stack of an operating system and applications


bundled together. wu.cloud users can select from a variety of appliances (both
GNU/Linux and Windows 7 based) with standard scientific tools (R, Matlab,
Mathematica, STATA, etc.).
Instance: Cloud images are started in their own separate virtual machine environments
(with up to 196 GB RAM and 8 CPU cores). This process is called “instancing”,
running appliances are called “instances”.
EBS Volume: is “off-instance storage that persists independently from the life of an instance.”
(Amazon, 2011). EBS Volumes can be attached to running instances to provide
virtual disk space for large datasets, calculation results and custom software
configurations.

Part IV: Cloud Computing Terminology 59 / 68


Private Clouds

wu.cloud is a private cloud service and thus the following characteristics hold.
I Emulate public cloud on (existing) private resources,
I thus, provides benefits of clouds (elasticity, dynamic provisioning, multi-OS/arch
operation, etc.),
I while maintaining control of resources.
I Moreover, there is always the option to scale out to the public cloud (going hybrid).

Part IV: Cloud Computing wu.cloud 60 / 68


wu.cloud

wu.cloud is
I solely operated for WU members and projects,
I thus, network access only via Intranet/VPN (https://vpn.wu.ac.at),
I on-demand self-service,
I resource pooling via virtualization,
I extensible/elastic,
I Infrastructure as a Service (IaaS),
I Platform as a Service (PaaS).

Part IV: Cloud Computing wu.cloud 61 / 68


wu.cloud Software

I wu.cloud is a private cloud system based on the open source software package
Eucalyptus (see http://open.eucalyptus.com/).
I Accessible via http://cloud.wu.ac.at/.
I Consists of a frontend (website, management software) and a backend (providing
resources) system.

Figure: wu.cloud setup

Part IV: Cloud Computing wu.cloud 62 / 68


wu.cloud Hardware

Backend system:

I 2x IBM X3850 X5
I 8x8 (64) core Intel Xeon CPUs 2.26 GHz
I 1 TB RAM
I EMC2 Storage Area Network: 7 TB fast + 4 TB slow disks
(c) 2010 IBM Corporation, from I Suse Linux Enterprise Server 11 SP1
Datasheet XSD03054-USEN-05 I Xen 4.0.1
I Eucalyptus backend components (cluster, storage, node
controller)
Frontend System:
I Virtual (Xen) instance
I Apache Webserver
I Eucalyptus frontend components (cloud controller, walrus)

Part IV: Cloud Computing wu.cloud 63 / 68


wu.cloud Characteristics

wu.cloud aims at scaling in three different dimensions:


I Compute-nodes: number of cloud instances and cores employed
I Memory: amount of memory per instance requested
I Software: Windows vs. Linux and software packages installed
Debian/gridMathematica virtual cluster

35
30
25

Windows/R high CPU instance



20
CPU

customized system
R dev environment
15

Matlab/PASW/Stata
Windows base system
10

Debian/R high memory instance


● Linux base system
5

R/Mathematica/Matlab
R dev environment
GUI−based
0

1 2 4 8 16 32 64 128 256

RAM [GB] per instance

Part IV: Cloud Computing wu.cloud 64 / 68


wu.cloud User Interface

I Amazon EC2 API


I allows for using tools like ec2/euca2ools, Hybridfox, etc., primarily designed for EC2
I transparent use of wu.cloud and EC2/S3 side by side
I Remote connection to cloud instances can be established by
I Secure shell (ssh), PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/)
I VNC (Linux)
I Remote Desktop (Windows)

Part IV: Cloud Computing wu.cloud 65 / 68


wu.cloud User Interface

Part IV: Cloud Computing wu.cloud 66 / 68


Contact

Florian Schwendinger
Institute for Statistics and Mathematics
email: [email protected]
URL: http://www.wu.ac.at/statmath/faculty_staff/faculty/fschwendinger
WU Vienna
Welthandelsplatz 1/D4/level 4
1020 Wien
Austria

Part IV: Cloud Computing wu.cloud 67 / 68


References

L. Jing. Parallel Computing with R and How to Use it on High Performance Computing Cluster, 2010. URL
http://datamining.dongguk.ac.kr/R/paraCompR.pdf.
E. Kontoghiorghes, editor. Handbook of Parallel Computing and Statistics. Chapman & Hall, 2006.
E. Mahdi. A survey of r software for parallel computing. American Journal of Applied Mathematics and Statistics, 2(4):
224–230, 2014. ISSN 2333-4576. doi: 10.12691/ajams-2-4-9. URL http://pubs.sciepub.com/ajams/2/4/9.
Q. E. McCallum and S. Weston. Parallel R. O’Reilly Media, Inc., 2011. ISBN 1449309925, 9781449309923.
R-core. Package ’parallel’, 2013. URL
https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf.
https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.R.
A. Rossini, L. Tierney, and N. Li. Simple Parallel Statistical Computing in R. UW Biostatistics Working Paper Series,
(Working Paper 193), 2003. URL http://www.bepress.com/uwbiostat/paper193.
M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney, and U. Mansmann. State of the art in parallel
computing with r. Journal of Statistical Software, 31(1):1–27, 8 2009. ISSN 1548-7660. URL
http://www.jstatsoft.org/v31/i01.
S. Theußl. Applied high performance computing using R. Master’s thesis, WU Wirtschaftsuniversität Wien, 2007. URL
http://statmath.wu-wien.ac.at/~theussl/publications/thesis/Applied_HPC_Using_R-Theussl_2007.pdf.

Part IV: Cloud Computing wu.cloud 68 / 68

You might also like