0% found this document useful (0 votes)

12 views

WP 100

This document summarizes benchmarking tests performed on the Delft3D modeling suite using PRACE supercomputing infrastructure. Three tests were conducted: 1) A large regular domain showed good scalability up to 1,000 cores. 2) An irregular realistic domain had reasonable scalability up to 100 cores, though I/O was a bottleneck. 3) A regular domain with sediment transport modules benefited from inlining routines and scaled reasonably to 100 cores. Overall, optimizations were needed to improve portability and scaling of Delft3D across different supercomputing platforms.

Uploaded by

Abd El Ghany Gamal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

WP 100

Uploaded by

Abd El Ghany Gamal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Available online at www.prace-ri.

Partnership for Advanced Computing in Europe

Delft3D Performance Benchmarking Report

J. Donnersa*, A. Mouritsb, M. Gensebergerb, B. Jagersb
a
SURFsara, Amsterdam, The Netherlands
b
Deltares, Delft, The Netherlands

Abstract

The Delft3D modelling suite has been ported to the PRACE Tier-0 and Tier-1 infrastructure. The portability of
Delft3D was improved by removing platform-dependent options from the build system and replacing non-
standard constructs from the source. Three benchmarks were used to investigate the scaling of Delft3D: (1) a
large, regular domain; (2) a realistic, irregular domain with a low fill-factor; (3) a regular domain with a
sediment transport module. The first benchmark clearly shows a good scalability up to a thousand cores for a
suitable problem. The other benchmarks show a reasonable scalability up to about 100 cores. For test case (2) the
main bottleneck is the serialized I/O. It was attempted to implement a separate I/O server by using the last MPI
process only for the I/O, but this work is not yet finished. The imbalance due to the irregular domain can be
reduced somewhat by using a cyclic placement of MPI tasks. Test case (3) benefits from inlining of often-called
routines.

Introduction
Delft3D [1] is a world leading 3D modelling suite used to investigate hydrodynamics, sediment transport and
morphology and water quality for fluvial, estuarine and coastal environments. As of 1 January 2011, the Delft3D
flow (FLOW), morphology (MOR) and waves (WAVE) modules are available as open source. Delft3D has over
350k lines of code and is developed by Deltaresb.
The software is used and has proven its capabilities all over the world, e.g. in the Netherlands, USA, Hong Kong,
Singapore, Australia, Venice, etc. It is continuously improved and developed with innovating advanced
modelling techniques due to the research work of Deltares. Delft3D is meant to remain a world leading software
package.

Description
The FLOW module is the heart of Delft3D; it is a multi-dimensional (2D or 3D) hydrodynamic (and transport)
simulation programme which calculates non-steady flow and transport phenomena resulting from tidal and
meteorological forcing on a curvilinear, boundary fitted grid or spherical coordinates. A more flexible grid
approach is under development. In 3D simulations, the vertical grid is defined following the so-called sigma
coordinate approach or Z-layer approach. The MOR module computes sediment transport (both suspended and
bed total load) and morphological changes for an arbitrary number of cohesive and non-cohesive fractions.

* Corresponding author. E-mail address: [email protected]

b
Deltares is an independent, institute for applied research in the field of water, subsurface and infrastructure. For
more information, see http://www.deltares.nl/en
1
The application consists of mainly Fortran 90, with some routines in C and C++ and some features from Fortran
2003. The parallel version that we considered uses MPI with 1-D domain decomposition as its parallelisation
strategy, where it automatically selects the longest dimension to be partitioned. The length of the domain (i.e. the
direction with most grid-points) is split across MPI processes. It uses an alternating direction implicit (ADI)
method to solve the momentum and continuity equations. The parallel implementation of the ADI method in
Delft3D uses the halo regions of the computational domain on the processor to some extent as internal boundary
conditions for iterations locally within the process. Therefore, convergence could become a problem when
scaling up to higher process counts. I/O is implemented using a master-only technique. Although the application
should scale well, as shown by similar models with the same input data set, this was not the case on local
hardware at Deltares. The developers could not create an insightful profile that would show the main bottleneck
that prevents its scalability.
The MPI routines are wrapped in custom routines that are mostly used for halo exchanges and the reduction of
convergence parameters. Halo exchanges are executed by two calls to MPI_Isend and MPI_Irecv, immediately
followed by separate calls to MPI_Wait for all communication. The haloes are stored in temporary arrays.

Configuration and setup

The program uses automake, autoconf and libtool to create a configure script, which then configures the
whole package for a particular system. For now, the only method to obtain the sources is to check out the svn-
version. It is expected that a packaged version of the source of Delft3D would come with only one configure
script that can target any platform. At the moment, there are still a few platform-specific options in the
configure.ac file that prevent this.

Porting
Delft3D has been ported to and tested on the following systems:
 IBM Power6 system “Huygens” running Linux at SURFsara (IBM XL compilers + IBM POE)
 Intel Xeon Nehalem cluster “Lisa” running Linux at SURFsara (Intel compilers + OpenMPI)
 BullX Intel Xeon cluster “Curie” running Linux at CEA. (Intel compilers + BullxMPI)
 BullX Intel Xeon cluster “Cartesius” running Linux at SURFsara (Intel compilers + IntelMPI)
During the porting to these systems, several portability issues were identified and fixed in the mainstream
releases of Delft3D:
 The MPI-implementation in Delft3D would check the environment variable PMI_RANK, which is only
used in the MPICH2-library and is therefore not portable. New releases of Delft3D have fixed this and
it now supports MPICH2, Intel MPI, MVAPICH, OpenMPI and POE.
 In case of an abnormal exit of the code, the code would write an error message to the output-file, but the
output-file was not closed. However, some compilers (e.g. from IBM) cache the output and the error
message is not written to disk. This has been fixed.
 An erroneous attempt to de-allocate a static object caused a runtime error with the IBM compiler; this
has been fixed.
 The MPI-implementation would not call MPI_Finalize in case there was only 1 MPI-task. In this
case, Scalasca would not write its final report. This is now fixed.
 Variables with the LOGICAL and INTEGER type were used interchangeably, which is now fixed.
 Bug due to an assumed pointer size of 4 bytes, which is now fixed.
 When running the first benchmark with 3 MPI tasks or more, there would be a signed integer overflow
when multiplying the total nr. of grid points (9M) with the running sum of the cpu weights (in this case
300). This can be fixed by using INTEGER*8 variables.
 OpenMP option in configure.ac updated.
 When opening a file, the non-standard argument access='append' was used, which is now replaced
with the standard argument position='append'.

2
 To replace the non-standard Fortran function iargc() with the standard Fortran 2003 function
command_argument_count().

 Delft3D would crash unexpectedly when using more than 60 MPI tasks. This could eventually be traced
back to some initial conditions that were left empty, which Delft3D did not handle correctly.
 A compiler-specific flag (-fPIC) was included in some Makefile.am files, which were removed to
make the build-system (more) platform-independent.
 Some issues were noted, but not yet fixed completely due to other constraints:
o The library libstdc++ was thought to be necessary for each C++-compiler, but this is not
true. E.g., the IBM XL C++ compiler uses its own libraries. This has therefore been removed
from configure.ac. As a result, the compilation fails because some C++-parts of the
Delft3D-code are compiled as static libraries (libstream.a and libesm_c.a) and linked
with Fortran-parts, which then miss the C++-libraries. The automake-files (*/Makefile.am)
are changed to build shared libraries of the C++-parts through libtool, which then
automatically links to the C++-libraries, even if it is combined with Fortran-code.
o the latest autoconf releases include an option to look for the Fortran compiler-specific flags
to pass preprocessor directives.

Benchmarking
The benchmarks in this report have all been run with Delft3D-binaries that use double precision numbers to
represent a real production environment.

Test Case: Waal river

Figure 1: Overview of part of the simulation domain. The grid representation of a groin can be seen at the bottom right.

The simulation is a schematic representation of the Waal, one of the main rivers in the Netherlands, with groins
and part of the floodplain. This model is used to estimate the effect of lowering the groins on the water level
when the area is flooded.
The resolution is high enough to reach a good scaling up to at least 80 processors using a similar software
package that is maintained by Deltares (this software package is only applied in the Netherlands for a limited
distribution). The total domain is 30x2 km and uses a resolution of 2x2 m in the main channel and 4x2 m on the
floodplain. The total number of grid cells is more than 9 million. The domain is homogeneous and has 15.000

3
gridpoints in the direction of the 1-D domain decomposition, which makes this test case ideal to investigate
scalability.
At the beginning of the project, there was not yet access to a Tier-0 system, so porting started on local systems.
After the first porting issues were resolved, the Delft3D model was compiled without any optimization flags on a
local Intel Xeon cluster “Lisa” and an IBM Power6 system “Huygens”. An initial scaling benchmark showed
about 15 timesteps per minute when using 40 cores with a scaling efficiency above 80%.
Compilation of Delft3D with the -O2 optimization flag shows good scalability on the Curie Tier-0 system (see
Figure 2). Scaling starts to tail off from 512 cores. Scalability was not tested above 1.000 cores, as Delft3D uses
temporary files for each process with a maximum of 1.000 files. The results clearly show that the computational
core of Delft3D can scale to 1.000 cores with a suitable input dataset that is both large enough and has a regular
domain that is homogeneously distributed across processes.

1600
1400

1400
1200
1200
1000
1000
800
800 steps/min dp
600
600 perfect scaling dp
400
400
200
200

0
0 100 200
200 300
400 400 500
600 600 800
700 800
1000900 1000
1200
cores

Figure 2: Performance of Delft3D with Waal schematic setup in timesteps per minute on Curie thin nodes. Perfect scaling is measured
relative to 1 node (16 cores).

Scalasca was used to profile Delft3D. Scalasca version 1.4.1 was specially compiled with position-independent
code, so the Scalasca libraries can be linked with shared libraries in Delft3D. The Delft3D code uses by default
dynamic loading of shared libraries, which is not supported by the Scalasca tool. Therefore, we used the option
to compile Delft3D as a monolithic executable, with the shared libraries all added at link-time. It was also
necessary to move the call to MPI_Init to the main C++-routine.
The most important routines in terms of computing time are:
• uzd (solve the continuity equation), and

• sud (solve the momentum equation)

A more in-depth analysis of the important bottlenecks for realistic benchmarks can be found in the following
section.
Gperftools
Gperftools was used to create a profile at the source-line level. It was not possible to get a profile of the full
program, rather it had to be restricted to the main loop with the routines ProfilerStart and ProfilerStop. A Fortran
interface was written that uses the intrinsic module ISO_C_BINDING. Also, the profiler-library had to be linked
with the Delft3D-executable, instead of being preloaded at runtime. No significant insights were gained from
these profiles.

4
Test Case: Zeedelta

Figure 3: Full domain for the Zeedelta benchmark. Each colour represents a computational domain, with in this case in total 8 domains.

The Zeedelta model is simulation of the Rotterdam estuary and represents a real production case.
This is a 3D model with 501x1539 horizontal grid-points (total 771.039), of which 171.659 active grid-points
and 600.055 inactive grid-points (totaal 771.714). The 3D model has 10 layers. This model has a heterogeneous
domain, with a very high number of inactive grid-points. It represents the other extreme in comparison to the
first benchmark, which had a regular and completely active domain.
The computational processes at the left of the domain will have all active points, while for the processes at the
right of the domain a large fraction of the points in memory will be inactive points. Delft3D splits the domain
into partitions with an equal number of active points, and the processes at the right of the domain will therefore
compute a longer area.
Scalasca was used to generate a profile of the test case at different numbers of processes. Small routines that are
frequently called (comparerealdouble, redvic, reddic, dens_unes), have been filtered from
instrumentation to minimize the impact of profiling on performance. The following results were obtained on the
PRACE Tier-0 Curie system, using 96 MPI processes on 96 cores of the thin node partition. Most important
routines are:
1. uzd: 30% runtime

2. tritra (compute transports for conservative constituents): 22% runtime

3. postpr (I/O postprocessing): 15% runtime

4. tratur (compute transports of turbulent kinetic energy and dissipation): 14% runtime

5. sud: 10% runtime

which in total represents more than 90% of the runtime, excluding initialisation and finalisation.
The default placement was used, with MPI tasks in consecutive blocks on the nodes, and cyclically placing the
tasks within the nodes over the sockets. Cyclic placement of tasks across nodes was shown to increase
performance approximately 5% for 64 processes and above.

5
400

350

300

250
steps/min

200 steps/min dp
steps/min dp (cyclic)
150 perfect scaling
100

0
0 20 40 60 80 100 120 140 160 180
cores

Figure 4: Performance of Zeedelta benchmark on Curie thin nodes in timesteps/min. Perfect scaling is measured relative to 16 cores (1 node).

When increasing the number of processes to 128, the master-only I/O is the most expensive part of the code with
22% of the runtime and uzd only takes 19% of the time. Due to the serial nature of the master-only I/O and the
increasing message count to gather all data, the I/O is actually slower for higher process counts.
Unfortunately, for process counts higher than 128, the ADI-algorithm for the transport equation would not
converge anymore within 50 iterations from timestep 192. Although the model does not crash, we concluded not
to investigate further due to the severe impact on the performance and the scientific results.
Several scaling bottlenecks were identified:
1. postpr: master-only I/O

2. uzd, tritra: imbalance. MPI_Wait synchronizes neighbours: a process needs to wait for its slowest
neighbour to send the data. MPI_Allreduce is used to check for convergence at every timestep and
synchronizes all processes. From a test with an MPI_Barrier in front of the iterative ADI algorithm,
it is clear that there is no imbalance in the earlier part of the code. The processes with a higher fill
factor need more computation time.
3. tratur: imbalance

4. sud: Imbalance in the routine cucnp that causes waiting time at MPI_Barrier.
5. incbc: boundary conditions are defined per grid-point. Boundary data are gathered and broadcasted
multiple times at every timestep. This routine takes up about 4% of runtime with 96 tasks and will
not scale due to its serial implementation.
This test case has a high fill factor for the processes to the left of the domain, and a much lower fill factor to the
right. Therefore, the blocked placement of tasks to nodes puts all slower tasks together on the first node, each
competing for the limited memory bandwidth. The cyclic placement of tasks circumvents this and decreases the
imbalance, which results in a 5% overall performance increase. The lower imbalance reduces waiting time for
collective communication, which more than compensates for the longer nearest neighbour communication.
Another solution would be to undersubscribe the nodes, giving every process a second core to increase the
available memory bandwidth per process. This improves performance relative to the blocked placement by over
20%, which should be compared to the performance of using twice as many processes. That would not usually
be advantageous. If energy usage is taken into account, this balance might shift.
Since the main scaling bottleneck for this particular benchmark is the I/O, it was attempted to implement a
separate I/O server by using the last MPI process only for the I/O. Unfortunately, the implementation is not yet
finished and therefore has not yet been tested.
6
Test Case: Sediment transport
The last test case includes sediment transport and morphology updates. Due to these extra processes, different
parts of the code are activated, which are highly compute-intensive. The computational domain is rectangular
with 243 grid points in each dimension. Since Delft3D uses 1D domain decomposition and each domain needs a
minimum number of columns, the maximum number of processes that we could use is 160.
For 16 processes on the Curie thin nodes, the CPU time is dominated by two routines:
1. erosed (compute sediment fluxes): 48%
2. bott3d (update depth due to changes in bottom sediment): 20%

3. adi (uzd+sud): 11%

4. tritra: 7%

5. taubot: 6%
again representing more than 90% of the computing time. I/O takes less than 1% for this test case.

1600

1400

1200

1000

800 steps/min dp
600 perfect scaling

400

200

0
0 20 40 60 80 100 120 140 160 180
cores

Figure 5: Performance of Sediment transport benchmark on Curie thin nodes in timesteps/min. Perfect scaling is measured relative to 16
cores (1 node).

Figure 5 shows the performance of the Sediment transport benchmark as measured on the PRACE Tier-0 Curie
system. It can be seen that the benchmark does not scale well and levels off above 64 cores. However, the
performance does increase up to 128 cores.
The routines erosed and bott3d call several functions for each grid-point with active sediment transport and
depth changes (bedbc1993, calseddf1993, bedtr1993, comparerealdouble, getsedthick_1point and
more). These functions are called tens of billions of times, even for these short benchmark runs. The overhead
for calling these routines is significant and it is important to make sure that these routines are inlined by the
compiler. Due to the fact that Delft3D is composed of several dynamic and static libraries, the compiler does not
automatically inline functions from other libraries, or even functions in the same static libraries. For the Intel
compiler it is needed to add the optimization flag -ipo and to replace the archiver ar with xiar and to replace
the linker ld with xild. The performance increase is about 7% when using 96 cores.
The sediment transport routines only exchange haloes for some of their variables, but halo values for many
variables are computed by the process to reduce communication. This furthers the conclusion that it is important
to check for correct inlining of these often-called routines.

7
Unfortunately, this particular benchmark is sensitive to the number of processes, since the solution of the ADI-
solver depends on the process count which cascades into changes in sediment transport and resulting height
changes that feed back to the circulation, even for short simulations like these. This affects the reliability of the
results, although we do not know to what extent.
A cyclic placement of the processes across the nodes results in a 25% penalty for this benchmark, which shows
that this option is not a panacea and should only be used if there is an imbalance between the processes.

Conclusions
Delft3D is a complex application that is used in a broad range of real-world applications. Consequently, it
contains a large number of different modules, many of which can be activated separately. For this white paper
we have selected a representative set of benchmarks with different use patterns. The first benchmark clearly
shows that Delft3D can scale up to 1.000 cores on PRACE Tier-0 systems for suitably large problems. The
second benchmark has a domain with a realistic, irregular bathymetry that has a large fraction of inactive points.
Delft3D uses static load-balancing by assigning an equal number of active points to each domain. Together with
a cyclic placement of processes across the nodes, this works reasonably well. However, the scalability flattens
around 100 cores, which can be attributed for a large part to the I/O in this particular benchmark. An initial
implementation of a separate I/O server was created, but not yet tested. The third benchmark has a regular
domain and includes an extra module for sediment transport and depth changes. This module takes up about 70%
of the runtime. The benchmark shows little scaling potential beyond 64 cores. Several functions are called
billions of times and it is important to make sure that these are inlined by the compiler.

References
[1] http://oss.deltares.nl/web/opendelft3d/home

Acknowledgements
This work was financially supported by the PRACE project funded in part by the EUs 7th Framework
Programme (FP7/2007-2013) under grant agreement no. RI-283493.

Update to Modern C++
From Everand
Update to Modern C++
James Raynard
No ratings yet
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
C# for Beginners: Learn in 24 Hours
From Everand
C# for Beginners: Learn in 24 Hours
Alex Nordeen
No ratings yet
Serial Port Complete: COM Ports, USB Virtual COM Ports, and Ports for Embedded Systems
From Everand
Serial Port Complete: COM Ports, USB Virtual COM Ports, and Ports for Embedded Systems
Jan Axelson
3.5/5 (9)
Email The Output of A Concurrent Program As Attachment
100% (1)
Email The Output of A Concurrent Program As Attachment
16 pages
Brochure Delft3D Flexible Mesh Suite
No ratings yet
Brochure Delft3D Flexible Mesh Suite
8 pages
Get Started - Delft3D - Oss - Deltares.nl
No ratings yet
Get Started - Delft3D - Oss - Deltares.nl
6 pages
A High Quality Eulerian 3D Fluid Solver in C++
No ratings yet
A High Quality Eulerian 3D Fluid Solver in C++
14 pages
Reader Ic Block 1 Hydrodynamics PDF
No ratings yet
Reader Ic Block 1 Hydrodynamics PDF
177 pages
Reader Ic Block 1 Hydrodynamics
No ratings yet
Reader Ic Block 1 Hydrodynamics
177 pages
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Como Compilar Delft
No ratings yet
Como Compilar Delft
24 pages
Basement v3: A Modular Freeware For River Process Modelling Over Multiple Computational Backends
No ratings yet
Basement v3: A Modular Freeware For River Process Modelling Over Multiple Computational Backends
45 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
FLOW 3D v12 0 Install Instructions
No ratings yet
FLOW 3D v12 0 Install Instructions
31 pages
Chapter 04
No ratings yet
Chapter 04
17 pages
Unit 3
No ratings yet
Unit 3
10 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
LBNL 49407
No ratings yet
LBNL 49407
8 pages
202206-ECCOMAS-Oslo-Article
No ratings yet
202206-ECCOMAS-Oslo-Article
12 pages
Navigating the Worlds of C and C++: Masters of Code
From Everand
Navigating the Worlds of C and C++: Masters of Code
Kameron Hussain
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Bare Metal C: Embedded Programming for the Real World
From Everand
Bare Metal C: Embedded Programming for the Real World
Stephen Oualline
No ratings yet
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
Chapter 1
No ratings yet
Chapter 1
25 pages
Fpga Implementation of A License Plate Recognition Soc Using Automatically Generated Streaming Accelerators
No ratings yet
Fpga Implementation of A License Plate Recognition Soc Using Automatically Generated Streaming Accelerators
8 pages
Main Seminar 'Autonomic Computing': Operating Systems and Middleware
No ratings yet
Main Seminar 'Autonomic Computing': Operating Systems and Middleware
10 pages
Prepared For The U.S. Department of Energy, UNDER CONTRACT DE-AC02-76CH03073
No ratings yet
Prepared For The U.S. Department of Energy, UNDER CONTRACT DE-AC02-76CH03073
15 pages
Object-Oriented Finite Elements
No ratings yet
Object-Oriented Finite Elements
39 pages
C# Fundamentals Made Simple: A Practical Guide with Examples
From Everand
C# Fundamentals Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Basilisk Tutorial
0% (1)
Basilisk Tutorial
12 pages
Summary Master Thesis
No ratings yet
Summary Master Thesis
3 pages
Diinsox86 Help Eng
No ratings yet
Diinsox86 Help Eng
98 pages
Caffa3d MB
100% (1)
Caffa3d MB
16 pages
Achieving High Performance Computing
No ratings yet
Achieving High Performance Computing
58 pages
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Debaun 2005
No ratings yet
Debaun 2005
13 pages
Computer Practices Using C++
From Everand
Computer Practices Using C++
Ramkrishna Ghosh
No ratings yet
C4-IEEE ParallelRT
No ratings yet
C4-IEEE ParallelRT
8 pages
Introduction to Internet & Web Technology: Internet & Web Technology
From Everand
Introduction to Internet & Web Technology: Internet & Web Technology
Dr. Yashpal singh
No ratings yet
Model I CA Rationale 13 No Rev
No ratings yet
Model I CA Rationale 13 No Rev
49 pages
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
Fluid Project
0% (1)
Fluid Project
45 pages
31912
No ratings yet
31912
19 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Learn Python in One Hour: Programming by Example
From Everand
Learn Python in One Hour: Programming by Example
Victor R. Volkman
3/5 (2)
C++ Learn in 24 Hours
From Everand
C++ Learn in 24 Hours
Alex Nordeen
No ratings yet
LEARN MPLS FROM SCRATCH PART-A: A Beginner's Guide to Next Level of Networking
From Everand
LEARN MPLS FROM SCRATCH PART-A: A Beginner's Guide to Next Level of Networking
POONAM DEVI
No ratings yet
3D Design Software
No ratings yet
3D Design Software
25 pages
Concurrency in C++: Writing High-Performance Multithreaded Code
From Everand
Concurrency in C++: Writing High-Performance Multithreaded Code
Robert Johnson
No ratings yet
FLOW-3D HYDRO Installation Instructions
No ratings yet
FLOW-3D HYDRO Installation Instructions
45 pages
Profound Linux For Users
From Everand
Profound Linux For Users
Onder Teker
No ratings yet
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Parallel Computing in CFD: Milovan Perić
No ratings yet
Parallel Computing in CFD: Milovan Perić
25 pages
Jake_s_Resume (1)
No ratings yet
Jake_s_Resume (1)
1 page
SQL Material
100% (1)
SQL Material
78 pages
System Verilog Interview Questions - 2 ?
No ratings yet
System Verilog Interview Questions - 2 ?
24 pages
The 8051 Assembly Language
No ratings yet
The 8051 Assembly Language
89 pages
Learn PyParsing Library
No ratings yet
Learn PyParsing Library
48 pages
Class 10 I.T. Practical File
No ratings yet
Class 10 I.T. Practical File
30 pages
Process of Execution of A Program:: Compiler Design
No ratings yet
Process of Execution of A Program:: Compiler Design
26 pages
C Handbook
No ratings yet
C Handbook
75 pages
Q. Describe Memory Layout of Multiprogramming Operating System. State It's Advantage
No ratings yet
Q. Describe Memory Layout of Multiprogramming Operating System. State It's Advantage
9 pages
CG4
No ratings yet
CG4
24 pages
Kruskal's Algorithm
100% (1)
Kruskal's Algorithm
1 page
Keywords
No ratings yet
Keywords
3 pages
D2T2 - Binder - The Bridge To Root - Hongli Han & Mingjian Zhou
No ratings yet
D2T2 - Binder - The Bridge To Root - Hongli Han & Mingjian Zhou
82 pages
ch4 Os
No ratings yet
ch4 Os
18 pages
Data Structre Practical File by Satyajeet Mohanty
No ratings yet
Data Structre Practical File by Satyajeet Mohanty
16 pages
Lecture1 Introduction To Operating Systems
No ratings yet
Lecture1 Introduction To Operating Systems
30 pages
Power PC vs. MIPS: by Saumil Shah and Joel Martin
No ratings yet
Power PC vs. MIPS: by Saumil Shah and Joel Martin
31 pages
Lec02 Data Models SQL Basics
No ratings yet
Lec02 Data Models SQL Basics
32 pages
PHP Lab1
No ratings yet
PHP Lab1
54 pages
SAP Plant Maintenance Master Data
No ratings yet
SAP Plant Maintenance Master Data
72 pages
JustMod V17
No ratings yet
JustMod V17
24 pages
220245-MSBTE-22412-Java (Unit 1)
No ratings yet
220245-MSBTE-22412-Java (Unit 1)
40 pages
PracTest1 Thursday
No ratings yet
PracTest1 Thursday
1 page
Chapter 2 Introduction To R and Python
No ratings yet
Chapter 2 Introduction To R and Python
35 pages
Chapter 2 Query Optimization
No ratings yet
Chapter 2 Query Optimization
31 pages
Visual Studio Code Distilled: Evolved Code Editing For Windows, Macos, and Linux 1St Edition Alessandro Del Sole
100% (4)
Visual Studio Code Distilled: Evolved Code Editing For Windows, Macos, and Linux 1St Edition Alessandro Del Sole
52 pages
Domain Model-Class Diagram
No ratings yet
Domain Model-Class Diagram
19 pages
Kotlin Code Finals
No ratings yet
Kotlin Code Finals
22 pages
Test Cases of Calculator (Software Testing)
No ratings yet
Test Cases of Calculator (Software Testing)
9 pages

WP 100

Uploaded by

WP 100

Uploaded by

Available online at www.prace-ri.

Partnership for Advanced Computing in Europe

Delft3D Performance Benchmarking Report

* Corresponding author. E-mail address: [email protected]

Configuration and setup

Test Case: Waal river

• sud (solve the momentum equation)

2. tritra (compute transports for conservative constituents): 22% runtime

5. sud: 10% runtime

3. adi (uzd+sud): 11%

You might also like