Multithreading with Deferred Contexts in DX11
Multithreading with Deferred Contexts in DX11
Abstract proach may improve rendering and loading performance for games
and 3D applications.
In this tutorial we intend to cover one of the innovations brought
by DirectX11. Previous versions of DirectX didn‘t support native 2 DirectX11 Improvements
multithreading and most part of the API was not thread-safe. The
user needed to add mutexes in the code to avoid race conditions in
order to support a multi-threaded renderer. Moreover, the lack of This section shows a quick review of the main improvements
native support wouldn‘t properly manage the swap of render states brought by DirectX11. Later we will focus on DirectX11 deferred
by multiple threads. This kind of guarantee can also be an applica- contexts which is the main purpose of this tutorial
tion requirement.
2.1 Compute Shader
With the support of Deferred Contexts in DirectX11 the game en-
gine can properly avoid the overhead on the submission thread be-
The Compute Shader technology is also known as the DirectCom-
ing the bottleneck of the application. One may now queue API calls
pute technology. It is the DirectX11 solution for GPGPU, one
through command lists and multiple threads to be executed later.
mayfind it easir to use this solutions instead of others(ex. CUDA,
The API is now responsible for inter-thread synchronization for fi-
OpenCL) due to its tight integration with DirectX11 API and not
nal submission to the GPU. The main goal behind multithreading is
being necessary to add more dependencies to the project. With this
to use every cycle of CPU and GPU without making the GPU wait,
technology programmers are able to use the GPU as a general pro-
which impacts the game frame rate.
cessor. This provides more control than the regular shader stages
The full improvements set of DirectX 11 will be showed briefly, then for GPGPU purposes such as global shared memory. With the full
the explanation of why states of the API must be synchronized for parallel processing power of modern graphics at hand, program-
a proper rendering will follow. A considerable part of the course mers can create new techniques that may assist existing rendering
will explain how to use the Deferred Contexts and how to properly algorithms. For example, one may render output an image from
build command lists for later submission. The focus will be in the the bound render target to a compute shader for a post-processing
comparison to previous APIs, highlighting the issues of the previous effect.
versions that the new DirectX11 improvements had arisen from.
2.2 Tessellation
Then we present some samples of code and cases that would have
good performance improvements with the adoption of Deferred
Contexts. During the samples exhibition, the important parts of The new graphics pipeline provides a way to adaptively tessellate
the code should be discussed briefly at a high level of abstraction a mesh on the GPU. This capability implies that we will be trading
in order to give some consistency to the knowledge of the audience. a lot of CPU-GPU bus bandwidth for GPU ALU operations, which
is a fair trade as moderns GPUs have a massive processing power,
This tutorial is a sequence of SBGames 2010 course entitled: ”Un- and the bandwidth is constantly a bottleneck.
derstanding Shader Model 5.0 with DirectX11” [Valdetaro et al.
2010]. In that tutorial we presented other set of innovations brought Aside from this straightforward advantage in performance, the Tes-
by DirectX11, which is the Tessellator pipeline. sellator also enables a faster dynamic computations such as: skin-
ning animation, collision detection, morphing and any per vertex
transform on a model. These computations are now faster because
Keywords:: DirectX 11, Multithreaded, Deferred Context, Imme- they use the pre-tessellated mesh which is going to be a device ob-
diate Context, Shader Model 5 ject contexts tessellated into a highly detailed mesh later on. An-
other advantage of the Tessellator usage, is the possibility of apply-
Author’s Contact: ing continuous Level-of-detail to a model, which has always been a
crucial issue to be addressed in any rendering engine. For a detailed
introduction to the Tessellation stage please refer to [Valdetaro et al.
{rodrigo, alexandre, gustavo}@[Link] 2010].
bfeijo@[Link]
abraposo@[Link]
2.3 Multithreading
1 Introduction When older Direct3D versions had been released, there was no real
focus on supporting multithreading, as multi-core CPUs were not so
One of the most important capabilities introduced in the DirectX
popular back then. However, with the recent growth on CPU cores,
11 API is around multithreading. The number of cores in PCs
there is an increasing need for a better way to control the GPU
have been increasing significantly in the past few years. Developers
from a multithreaded scenario. DirectX11 addressed this matter
started to seek solutions for spreading the computation of a game
with great concern.
among the available cores. Tasks such as physics or AI already
could use parallel paradigm to take advantage of multiple cores, Asynchronous graphics device access is now possible in the Di-
but mainly, rendering tasks were only done in a single-thread. Al- rectX11 device object. Now programmers are able make API calls
though one could implement a multi-threaded with past rendering from multiple threads. This feature is possible because of the im-
APIs(DirectX9 and DirectX10), there were lots of syncronization provements in synchronization between the device object and the
work that must be guaranteed by the application in order to function graphics driver in DirectX11.
properly. DirectX11 API was specifically designed to handle the
syncronization issues for a multi-threaded application. The concept DirectX11 device object has now the possibility of extra rendering
of a deferred context was created. With this new concept one may contexts. The main immediate context that controls data flow to
call many API functions in a thread-safe environment. In this tuto- the GPU continues, but there is now additional deferred contexts,
rial our intent is to explain this new DirectX11 feature and enlighten that can be created as needed. Deferred contexts can be created
others with some examples and cases where a multi-threaded ap- on separate threads and issues commands to the GPU that will be
X SBGames - Salvador - BA, November 7th - 9th, 2011 1
SBC - Proceedings of SBGames 2011 Tutorials Track - Computing
processed when the immediate context is ready to send a new task 3.4 Thread-safe
to the GPU.
Thread-safe is a commonly used technique when we are in a mul-
3 Process and Thread tithreaded application. This technique is used to ensure that a par-
ticular snippet of code of your program, when executed by a thread
#1, does not interfers in the shared data of another thread #2. In
We will briefly introduce the concept involved in process and
other words, multiple threads can run concurrently with the assur-
threads before starting with DirectX11 API.
ance that they will not modify the shared data that they have in
common. Moreover, if we are in an environment with multiple pro-
3.1 Process cessors, these threads can be executed simultaneously and not only
concurrently.
Process is the structure responsable for the maintenance of all the
needed information for the exceution of a program. A process
stores the information about hardware context, software context and
4 Threading Differences between DirectX
addressing space. Those information are important inside a multi- Versions
task environment were many processes are being executed concur-
rently. In that manner, it is needed to know how to alternate between In DirectX9 and DirectX10 it was possible to set one multithread-
them without losing of data. However, the swap between process ing flag making some API methods thread-safe. However, when
is costly, so the concept of multiple threads for a single process is they becomed thread safe, some syncronization issues needed to be
introduced. Each process is created with at least 1 execution thread, respected by the application and it was necessary to use synchro-
although more threads may be created for the same process. nization solutions ( such as mutexes ) to turning some critical code
sections thread-safe and prevent it from being acessed from more
3.2 Thread than a thread on a given time. Sometimes this syncronization over-
head was so significant that the usage of multiple threads in the
Thread is an execution line inside a process. Although they have previous rendering APIs were completely avoided.
different hardware context, each execution line inside a process has DirectX11 API has a buil-in syncronization system that is not de-
the same software context and shares the same memory space. In pendent on the application. The runtime is responsable for syn-
that way the cost generated by the information exchance between cronizing threads for the application allowing them to run concur-
the threads is much less than the information exchange between rently. This improvement turned the DirectX11 syncronization so-
processes. lution much more efficient than previous DirectX thread-safe flags.
5 Multithreading in DirectX11
In DirectX11 the use of the ID3D11Device interface is thread-safe.
This interface may be called by any number of threads concur-
rently. Its mainly purpose is the creation of resources, like ver-
tex buffers, index buffers, constant buffers, shaders, render targets,
textures and more. With the ID3D11Device the application is also
able to create a ID3D11DeviceContext which is NOT thread safe
and one device context should be created for each core. There
is only one ID3D11DeviceContext which is called the Immedi-
ate Context, this context is the main rendering thread, it is this
thread that submits the renderization call to the pipeline. The others
ID3D11DeviceContext that might be created are called Deferred
Figure 1: Multiples processes with single thread Contexts, they work by saving command lists that will be later
called by the main thread (Immediate Context). Please see Figure
3.
3.3 Multithreaded
There are two main improvements that might be used with mul-
The process, in a multithreaded environment, has at least one exe- tiple threads in DirectX11: Parallel Resources creation and Com-
cution thread. It may share the address space with other threads that mand Lists recording. The first is achieved with the usage of the
may be fastly concurrently executed in the case of multiple proces- ID3D11Device by multiple threads. The later and most important
sors. With this approach, computers with many cores are capable to is achieved with the usage of Deferred Contexts.
have a performance increase, executting tasks in parallel. However,
it is needed to consider how the access of shared resources is made
among the threads. This kind of control is necessary to avoid that a
thread change data of a shared resource while another thread is still
using old data. This kind of guaranteed is called thread safe.
the function shows, the device has one and only one immediate con-
text , which can retrieve data from the GPU. However, in order to
use device contexts and asynchronous thread free resource loading,
there is the need to check if there is driver support for it available.
so we use the following code after creating the device:
D3D11_FEATURE_DATA_THREADING threadingFeature ;
device−>CheckFeatureSupport ( D3D11_FEATURE_THREADING , &←-
threadingFeature , s i z e o f ( threadingFeature ) ) ;
i f ( threadingFeature . DriverConcurrentCreates && ←-
threadingFeature . DriverCommandLists )
/ / A p p l i c a t i o n code
Figure 6: Sequence of execution[Jansen 2011]
actorsDefCtx−>Draw ( ) ;
deferredContexts [ threadNumber]−>IASetInputLayout ( ←-
vertexLayout ) ; / * P a s s 2 : R e n d e r s t h e AO mask and b l e n d w i t h ←-
deferredContexts [ threadNumber]−>IASetPrimitiveTopology←- the outputTexture */
( D3D11 PRIMITIVE TOPOLOGY TRIANGLELIST ) ; AODefCtx−>Draw ( ) ;
deferredContexts [ threadNumber]−>IASetVertexBuffers ( 0 , ←- / / End t h e t r a v e r s a l
1 , vertexBuffer , stride , 0 ) ;
deferredContexts [ threadNumber]−>VSSetShader ( ←- / / Execute Pass 1
vertexShader , NULL , 0 ) ; actorsDefCtx−>FinishCommandList ( 0 , &actorCommandList←-
deferredContexts [ threadNumber]−>PSSetShader ( ←- );
pixelShader , NULL , 0 ) ; shadowImmCtx−>ExecuteCommandList ( actorCommandList ) ;
deferredContexts [ threadNumber]−>Draw ( count , 0 ) ;
/ / Execute Pass 2
AODefCtx−>FinishCommandList ( 0 , &AOCommandList ) ;
After finishing, it is time to put the rendering code into the com- shadowImmCtx−>ExecuteCommandList ( AOCommandList ) ;
mand list:
shadowImmCtx−>OMSetRenderTarget ( 1 , &←-
shadowMapTexRTView , NULL ) ;
actorsDefCtx−>OMSetRenderTarget ( 1 , &outputTexRTView , ←-
NULL ) ;
actorsDefCtx−>PSSetShaderResources ( 0 , 1 , &←-
shadowMapTexSRView ) ;
AODefCtx−>OMSetRenderTarget ( 1 , &outputTexRTView , NULL←-
);
AODefCtx−>PSSetShaderResources ( 0 , 1 , &outputTexSRView←-
);
/ / Begin t h e t r a v e r s a l t h r o u g h t h e s p a t i a l s t r u c t u r e
/ * P a s s 0 : R e n d e r s t h e s c e n e from l i g h t ` s p o i n t ←-
o f view o n t o t h e shadow map * /
shadowImmCtx−>Draw ( ) ;
/ * P a s s 1 : R e n d e r s t h e s c e n e from camera ` s ←-
p o i n t o f view o n t o t h e o u t p u t T e x t u r e * /
X SBGames - Salvador - BA, November 7th - 9th, 2011 5