Nvidia Quadro Vdws App Sizing Guide For Esri Final
Nvidia Quadro Vdws App Sizing Guide For Esri Final
APPLICATION GUIDE
Ver 1.0
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
EXECUTIVE SUMMARY
This document provides insights into how to deploy NVIDIA® Quadro® Virtual Data Center Workstation
(Quadro vDWS) software for Esri ArcGIS Pro users. Recommendations are based on actual customer
deployments and benchmarking data and cover three common questions:
How do I select the right profile(s) for the types of users I will have?
Since user behavior varies and is a critical factor in determining the best GPU and profile size,
recommendations are made for three user types along with two levels of quality of service (QoS) for
each user type: Dedicated Performance and Typical Customer Deployment. User types are segmented
as either light, medium or heavy based on type of workflow and the size of the model/data they are
working with. Users with more advanced graphics requirements and using larger data sets are
categorized as heavy users, for example. Light and medium users require less graphics and typically
work with smaller model sizes. Recommendations for each of those users within each level of service
along with the server configuration are shown below.
The vGPU profiles listed in the Dedicated Performance and Typical Customer Deployment tables on the
next page were created by first understanding the graphic performance of a Quadro workstation GPU
(for example, Quadro P2000). The benchmark scores of the physical workstation card were then aligned
with the scores outputted for the virtual GPU. It is important to note; the Dedicated Performance table
is based upon the Equal Share scheduler and does not oversubscribe the GPU compute engine, resulting
in the same GPU performance at all times. Similar to vCPU to physical core oversubscription, many
virtual GPUs can utilize the same physical GPU compute engine. The GPU compute engine can be
oversubscribed by selecting the Best Effort GPU scheduler policy which best utilizes the GPU during idle
and not fully utilized times. For many customer deployments, it is not typical that 12 users will be
executing rendering requests simultaneously or even to the degree which were replicated in dedicated
performance testing, therefore selecting the best effort scheduler often results in a 2X to 3X
oversubscription of the GPU compute engine which results in two to three times the number of users.
The degree to which higher scalability is achieved is dependent on the typical day to day activities of
your users, such as the number of meetings and the length of lunch or breaks, multi-tasking, etc. It is
recommended that you test and validate the appropriate GPU scheduling policy to meet the needs of
your users.
APPLICATION GUIDE | 2
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
DEDICATED PERFORMANCE *
12 6 3
Users per Server Users per Server Users per Server
These recommendations are meant to be a guide. The most successful customer deployments start
with a proof of concept and are “tuned” throughout the lifecycle of the deployment. Beginning with a
POC enables customers to understand the expectations and behavior of their users and optimize their
deployment for the best user density while maintaining required performance levels. Continued
maintenance is important because user behavior can change over the course of a project and as the
role of an individual changes in the organization. A GIS Analyst that was once a light graphics user
might become a heavy graphics user when they change teams or are assigned a different project.
Management and monitoring tools enable administrators and IT staff to ensure their deployment is
optimized for each user.
APPLICATION GUIDE | 3
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
guide focuses on 3D visualization as well as spatial analytical tools which execute compute and deep
learning inferencing within ArcGIS Pro 2.2.
Esri works closely with NVIDIA to certify the deployment of ArcGIS Pro in the cloud using VDI with
NVIDIA Quadro vDWS software. VDI certification eliminates the need to install ArcGIS Pro on a local
client, which can help reduce IT support and maintenance costs and enables greater mobility and
collaboration. This virtual workstation deployment option enhances flexibility and further expands the
wide variety of platform choices available to Esri customers.
NVIDIA Quadro is the world’s preeminent visual computing platform, trusted by millions of creative and
technical professionals to accelerate their workflows. With Quadro vDWS software, you can deliver the
most powerful virtual workstation from the data center. This frees your most innovative professionals
to work from anywhere and on any device, with access to the familiar tools they trust. Certified with
over 140 servers and supported by every major public cloud vendor, Quadro vDWS is the industry
standard for virtualized enterprises.
To deploy an NVIDIA vGPU solution for Esri ArcGIS Pro, you will need NVIDIA GPUs and a Quadro vDWS
software license for each user.
The Esri PerfTools Add-in for ArcGIS Pro allows gathering of rendering metrics during benchmarking.
Esri provided their “3D Philly” dataset and project file which contained 10 spatial bookmarks. PerfTools
scripting ran the application through 10 spatial bookmarks with a 10 second think time. This allows the
benchmark to more closely mimic a user pause between interactions within the application. An
automated scripting framework enabled the test to be scaled out to multiple virtual desktops. Esri
applies a frame rate limiter for ArcGIS Pro. By default, the application limits frames to 60 frames per
second. This application sizing guide uses the default render settings for ArcGIS Pro which includes
utilizing the DirectX rendering engine.
The following test metrics are outputted from the PerfTools add-in for each virtual machine and then
analyzed:
• Draw Time Sum: The total time elapsed for all of the benchmarks to fully draw. Less
time would be a better user experience (UX) and more would be a worsening UX.
• Frames Per Second (FPS): Esri stated that 30FPS is what most users perceive as a good
UX, 60 is optimal but most users do not see a significant difference.
APPLICATION GUIDE | 4
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
• FPS Minimum: Esri stated that a drop below 5-10FPS would appear to an end user that
the drawing had stopped or “frozen”.
• Standard Deviation: This would represent the number of tests that were outside the
average of the others, typically representing a faulty test or that scalability thresholds
have been exceeded. Values should be < 4 for 3D workloads.
Deep Learning Inferencing tests execute the Detect Objects Using Deep Learning tool which is
available using Image Analyst license. The test dataset is from the OpenAerialMap and is based
upon Esri’s Learn ArcGIS Deep Learning lesson. Test metrics were outputted as Total Execution
Time within the ArcGIS Pro output window. The tests used TensorFlow which was configured to run
on the GPU. For more information on how to configure TensorFlow to run on the GPU, please refer
to Appendix C.
3. Spatial Analysis (CUDA Compute) – Source: California– Shuttle Radar Topography Mission NASA L
Spatial Analysis tests execute Calculate ViewShed2 tool which is available using the Spatial Analyst
license. The test dataset was downloaded from the NASA Shuttle Rader Topography Mission
website and was geographically exclusion to the State of California. Test metrics were outputted
as Total Execution Time within the ArcGIS Pro output window. The tests used CUDA to execute the
processing on the GPU. For more information on how to configure 3D Analyst tools to run on the
GPU, please refer to Appendix D.
The GPU Profiler was used as a tool for evaluating GPU/CPU utilization rates during each of the three
aforementioned benchmarks. ArcGIS Pro demonstrates a great balance between CPU and GPU
resources.
Table 7. Common user types for ArcGIS Pro and application licensing description
The following table aligns the three Esri ArcGIS Pro benchmarks to user type and ArcGIS Pro licensing.
APPLICATION GUIDE | 5
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
FINDINGS
To determine the optimal configuration of Quadro vDWS for Esri ArcGIS Pro, both user performance and
scalability were considered based on data from industry benchmarks as well as insights from customer
best practices.
The following tables summarize the recommended configurations based on benchmarking data and
customer best practices. These recommendations take into account the performance requirements for
different user types as well as optimizing for scale, or user density, on the server to achieve the best
total cost of ownership. The performance of the equivalent physical Quadro workstation card was also
measured and then analyzed. A 10% threshold was used to align the equivalent physical Quadro
workstation card with the reported VDI performance score.
The dedicated performance table illustrates recommendations based upon the fixed share scheduler,
which provides the most consistent dedicated performance at all times. However, most customer
deployments typically select the best effort GPU scheduler policy to achieve better utilization of the
GPU, which usually results in supporting more users per server and better TCO per user. It is important
to keep the scheduling policy options in mind when comparing the two tables to one another.
For more on the GPU scheduling options, refer to Deployment Best Practices, Section 5 below.
APPLICATION GUIDE | 6
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
The T4 GPU is a single width, half height form factor and requires less power than other GPUs, allowing
it to be powered via the standard PCIe bus. This results in a high-density solution accommodating up to
six T4 GPUs per 2 rack unit server. Esri ArcGIS Pro benchmark results show that six T4 GPUs in a server
configured with two Intel Xeon Gold 6154 CPUs is a well-balanced configuration for ArcGIS Pro.
Based on benchmark results, there are enough CPU resources available to host six T4 GPUs in a single 2
rack unit (RU), 2-socket server running Esri ArcGIS Pro on 12 virtual machines.
APPLICATION GUIDE | 7
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
heavy users that require the additional performance of a Tesla P40 over a T4, but the P40 does not
offer the enhanced Turing generation benefits. The Tesla P40 is a dual slot card, which allows for up to
three GPUs to be installed many 2RU, 2-socket servers.
A typical configuration for non-persistent virtual machines is to use the direct attached storage (DAS)
on the server in a RAID 5 or RAID 10 configuration. For persistent virtual machines, a high performing
all-flash storage solution is the preferred option.
As with the recommended best practice, which is optimized for both performance and user density,
NVIDIA T4 is recommended for both light and medium users. For heavy users, the Tesla P40 or RTX
6000 is recommended. We also recommend that a larger profile size be used, a T4-8Q for light users,
T4-16Q for medium users and P40-24Q or RTX6000-24Q for heavy users. As a result, fewer users can be
supported on each server. If only performance is important, it is recommended that the fixed share
scheduler is utilized.
This configuration for “performance-only” is based on running ArcGIS Pro across all virtual machines
since it shows the impact of a peak workload on all resources of the server, including CPU, memory,
GPU, and network, to best architect the solution. The dedicated performance data in this application
sizing guide shows benchmarks running at scale.
APPLICATION GUIDE | 8
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
Tests are simultaneously executed on all virtual machines with minimal pauses or idle time. This
workflow is not typical in a true production environment but provides a methodology for assessing
dedicated performance during these worst-case scenarios.
Table 5. Comparison of VMs per GPU performance utilization based on Dedicated Performance vs. Best Effort Configurations
We highly recommend a proof of concept (POC) is run prior to doing a full deployment to provide a
better understanding of how your users work and how many GPU resources they really need, analyzing
the utilization of all resources, both physical and virtual. Consistently analyzing resource utilization
and gathering subjective feedback allows for optimizing the configuration to meet the performance
requirements of end users while optimizing the configuration for best scale.
APPLICATION GUIDE | 9
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
To identify bottlenecks of individual end users or of the physical GPU serving multiple end users,
execute the following nvidia-smi commands on the hypervisor.
Another benefit of performing a POC prior to deployment is that it enables more accurate
categorization of user behavior and GPU requirements for each virtual workstation. Customers often
segment their end users into user types for each application and bundle similar user types on a host.
Light users can be supported on a smaller GPU and smaller profile size while heavy users require more
GPU resources, a large profile size and, may be best supported on a larger GPU.
The graphic below demonstrates how workflows processed by end users are typically interactive, which
means there are multiple short idle breaks when users require less performance and resources from the
hypervisor and NVIDIA vGPU.
APPLICATION GUIDE | 10
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
Table 8. Comparison of the Esri ArcGIS Pro benchmarks utilization versus a typical end user workflow
NVIDIA used a custom-designed benchmarking engine to conduct vGPU testing at scale. This
benchmarking engine automates the testing process from provisioning virtual machines, establishing
remote connections, executing the benchmark, and analyzing the results across all virtual machines.
Dedicated performance scores mentioned in this application guide are based on benchmark data which
was run in parallel on all virtual machines with scores averaged across three runs.
1) Fixed share scheduling guarantees the same dedicated quality of service at all times.
2) Best effort scheduling1 provides consistent performance at a higher scale and therefore reduces
the TCO per user.
3) Equal share scheduling provides equal GPU resources to each running VM. As vGPUs are added or
removed, the share of GPU processing cycles allocated changes accordingly, resulting in
performance to increase when utilization is low, and decrease when utilization is high.
Organizations typically select the best effort GPU scheduler policy for their deployment to achieve
better utilization of the GPU, which usually results in supporting more users per server with a lower
quality of service (QoS) and better TCO per user.
The below example demonstrates the different numbers of users per server that can be reached by
applying different QoS thresholds via GPU Scheduling policies. Choosing the Fixed Share Scheduler
always guarantees a particular QoS. In this example, two users on a T4 will always experience
performance similar to a workstation with Quadro P2200 GPU. Using the Best Effort Scheduler, which is
the most commonly chosen GPU scheduling option for enterprises and does not provide the same level
of QoS, could allow more users to experience a Quadro P2200 level performance but user performance
will vary depending on load from other users on the same T4 at any given time. A single user on a T4
will experience performance similar to a Quadro P4000 but as density increases to 3-4 users per GPU,
the performance can be similar to a workstation with a Quadro P620 card. The below example assumes
sufficient frame buffer at all scales to demonstrate options on how GPU scheduling policies can impact
scale.
1
Available since 2013 when NVIDIA virtual GPU technology was first introduced
APPLICATION GUIDE | 11
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
Table 9. T4 user density with Fixed Share Scheduler versus Best Effort Scheduler
The fixed share scheduling policies guarantee equal GPU performance across all vGPUs sharing the
same physical GPU. Dedicated quality of service simplifies a POC since it allows the use of common
benchmarks used to measure physical workstation performance such as SPECviewperf, to compare the
performance with current physical or virtual workstations.
The best effort scheduler leverages a round-robin scheduling algorithm which shares GPU resources
based on actual demand which results in optimal utilization of resources. This results in consistent
performance with optimized user density. The best effort scheduling policy best utilizes the GPU during
idle and not fully utilized times, allowing for optimized density and a good QoS.
The table below shows that when using the best effort GPU scheduling policy, performance for an
individual user that shares a GPU with other users can be as good as having a dedicated GPU, if the
other end users aren’t executing GPU intensive tasks in parallel.
For details on the NVIDIA test environment used for this report, refer to the Appendix.
SUMMARY
When sizing a Quadro vDWS deployment for Esri ArcGIS Pro, NVIDIA recommends conducting a POC and
fully analyzing resource utilization using objective measurements and subjective feedback. The best
effort scheduler option is recommended for enterprise deployments, and user density will be
dependent on the hardware configuration and user types.
To see how you can virtualize Esri ArcGIS Pro using Quadro vDWS software, try it for free. Or learn
more about Quadro vDWS software.
APPLICATION GUIDE | 12
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
APPLICATION GUIDE | 13
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
APPENDIX A
NVIDIA TEST ENVIRONMENT
APPENDIX B
APPLICATION GUIDE | 14
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
The recommended users per server within the Dedicated Performance table is limited to 12 light users
per server since the Host Utilization reached 100% during the beginning of 24 VM test execution. The
following graph indicates Host CPU Utilization at scale of 24 VM’s. This spike in Host CPU illustrates
that the host was CPU bound and in result, rendering times within VM’s increased greater than 25%.
Host GPU was not a bottleneck during the 24 VM test. The following graphs illustrates that all six T4
were heavily utilized during test execution and were optimally utilized:
APPLICATION GUIDE | 15
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
The following graph provides a baseline for the benchmark dataset by showing the GPU and vRAM
utilization on a single T4-8Q VM. It is important to keep in mind, that the dataset used within the 3D
multi-patch rendering benchmark is considered less complex and is smaller size (less than 1.5GB on
disk) than a typical production level dataset. It is highly recommended to baseline your own
production datasets using the GPU Profiler tool in order to understand the frame buffer requirements
of your own datasets. In doing so, you will be able to determine if your own production datasets will
fit within the 8GB frame buffer for dedicated performance.
APPLICATION GUIDE | 16
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
Using the information from the two aforementioned graphs (Host CPU and VM GPU utilization rates),
the Dedicated Performance recommendations, ensures that Host CPU is not a bottleneck and the VM
has plenty of framebuffer and GPU compute to deliver the most optimal performance to ArcGIS Pro.
Throughout this guide, it is noted that there are some limitations to benchmarks, like the three which
were used for Esri ArcGIS Pro. The benchmark doesn’t account for the times when the system isn’t fully
utilized, or which hypervisors, and the best effort scheduling policy to leverage to achieve higher user
densities with consistent performance. Therefore, the typical customer deployment table illustrates
customers have achieved 16-24 users per server using the T4-4Q profile. We highly recommend a proof
of concept (POC) is ran prior to doing a full deployment to provide a better understanding of how your
users work and how many GPU resources they really need, analyzing the utilization of all resources,
both physical and virtual. Consistently analyzing resource utilization and gathering subjective feedback
allows for optimizing the configuration to meet the performance requirements of end users while
optimizing the configuration for best scale.
APPLICATION GUIDE | 17
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
APPENDIX C
Configuring TensorFlow to execute on GPU.
Esri ArcGIS Pro can use TensorFlow to execute Deep Learning. To run TensorFlow on a GPU, first you
will need to install Nvidia GPU drivers, CUDA toolkit and CuDNN SDK.
NOTE: If TensorFlow was previously added within the ArcGIS Python Package Manager, you
will need to first uninstall. Once TensorFlow (TensorFlow-base and Tensorboard) have been
un-installed from the Python Package Manager, you can verify the uninstall by typing import
tensorflow as tf within the python command window. If no module named tensorflow
was found, then the uninstall was successful.
NOTE: Unselecting the Display Driver will keep the virtualization host/guest driver in sync
which is essential for a successful deployment of vGPU.
APPLICATION GUIDE | 18
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
4. Open Python command prompt to verify the install and typing in nvcc -V
NOTE: This is not a traditional installer file. The download link will contain a zip file with
several folders, each containing the CuDNN files (1-Dll, 1-header and 1-library).
a. Locate your CUDA installation (should be something like C:\Program Files\NVIDIA GPU
Computing Toolkit\CUDA\v10.1).
NOTE: the directories within the CUDA installation are also the directory
within the zip file. There is a bin, and include, a lib.
b. Copy the files from the zip to the relevant directory. For example, cudnn64_7.dll into
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin. Do the same for the
other files.
c. Add the CUDA, cuDNN installation directories to the %PATH% environmental variable.
6. Within ArcGIS Pro, to create an environment, in the Python backstage, click Manage
Environments button, click an environment and click the environment’s Clone button.
7. Add Tensorflow-gpu and Tensorflow-gpu-base Packages within the Python Package Manager in
ArcGIS Pro.
APPLICATION GUIDE | 19
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
TensorFlow has now been successfully configured to run on the GPU when executing Esri Deep Learning
Tools within ArcGIS Pro.
APPLICATION GUIDE | 20
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
APPENDIX D
Configuring 3D Analyst tools to execute on GPU.
Esri ArcGIS Pro can use CUDA to execute Spatial Analysis tools on the GPU. For more information
regarding which tools currently support GPU processing, please refer to the Esri ArcGIS Pro
documentation.
APPLICATION GUIDE | 21
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
3. Edit the System variable to use the GPU, value of 0 will use the GPU.
4. Optional: TDRDelay, a timeout value, may prematurely timeout GPU CUDA processing if the
following timeout is not set in the following REGKEY:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
APPLICATION GUIDE | 22
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION SIZING GUIDE FOR ESRI ARCGIS PRO
©2019 NVIDIA Corporation. All rights reserved. NVIDIA, NVIDIA Quadro, the NVIDIA logo, and Tesla are trademarks and/or registered trademarks of
NVIDIA Corporation. All company and product names are trademarks or registered trademarks of the respective owners with which they are
associated. AUGUST2019
APPLICATION GUIDE | 23