Vulkan 101
Tom Olson
Directory, Graphics Research, ARM
Chair, Vulkan Working Group
© Copyright Khronos Group 2016 - Page 8
What is Vulkan?
• A 3D graphics API for the next 20 years
- Logical successor to OpenGL / OpenGL ES
- Modern, efficient design
- An open, industry-controlled standard
• Here, now
- Released in February 2016
- Available today for Windows / Linux
- Shipping in Samsung Galaxy S7
- Support announced in Android ‘N’
• Different!
- Fundamental change in philosophy
- Requires corresponding changes in applications
© Copyright Khronos Group 2016 - Page 9
Why did we do this?
• Traditional APIs had issues…
• Developers weren’t happy
[Link]
[Link]
© Copyright Khronos Group 2016 - Page 10
Problems with OpenGL / OpenGL ES
• Programming model doesn’t match GPU HW
- Especially in mobile
- Driver magic hides the mismatch
• CPU intensive
- Lots of state validation, dependency tracking
• Complex, buggy, unpredictable drivers
- Different bugs and fast-paths on every GPU
• Fundamentally single-threaded
- Can’t use multi-core CPUs effectively
• …not to mention twenty years of legacy cruft
© Copyright Khronos Group 2016 - Page 11
Enter Vulkan…
• Design discussions start in October 2012
• Moves into high gear in July/August 2014
- Commitment from key ISVs
- AMD donation of Mantle
• A lot of very hard work follows…
• Release to public in February 2016
- Conformant drivers from four IHVs
- GLSL to SPIR-V compiler
- Debug and validation tools
© Copyright Khronos Group 2016 - Page 12
Vulkan in one slide
Resources (textures, buffers)
Memory
Instance Device
Queues
Command Buffers
© Copyright Khronos Group 2016 - Page 13
Vulkan in one slide two slides
Andrew
Neil / Hans-Kristian Tobias
Buffer
Command
Render Pass
Descriptor
Descriptor
Draw Call
Draw Call
Pipeline
Pipeline
Shaders
Shaders
Copy
Sync
Sync
Sets
Sets
Michael Jesse
© Copyright Khronos Group 2016 - Page 14
The principle of Explicit Control
• You promise to tell the driver
OpenGL lets you specify important
- What you are going to do information very late, and change it
- In sufficient detail that it doesn’t have to guess at any time. It’s convenient, but has
- When the driver needs to know it huge performance costs.
• In return, driver promises to do
OpenGL drivers often defer work
- What you asked for until later, move it to another
- When you asked for it thread, or even ignore your
- Very quickly commands, based on guesses about
your intent. Vulkan drivers won’t.
• No driver magic!
© Copyright Khronos Group 2016 - Page 15
Loader, layers, and extensions
• Vulkan has no dependencies on external APIs
- ICD loader is built-in
- Window system binding is (semi) built-in
• A side benefit: Layers
- Loader can install intercept libraries (“layers”)
- E.g. trace, debug
• Extensions
- Must be enabled at initialization time
© Copyright Khronos Group 2016 - Page 16
Multithreading
• All objects visible / accessible to all threads
• Most operations are externally synchronized
- Application must prevent unsafe concurrent access
- E.g., recording to the same command buffer
- E.g., submitting to the same queue
- Application must manage object lifetimes
- Note, many objects are immutable
- Concurrent read access is OK
• Allocation / creation are internally synchronized and may block
- Per-thread pool allocators keep this reasonably cheap
© Copyright Khronos Group 2016 - Page 17
Error handling
• Vulkan is optimized for correct applications
- Does not (generally) check for invalid usage
- Does not track dependencies
- Does not (generally) provide thread safety
- Breaking the rules results in undefined behavior
• Vulkan does check for errors you can’t predict
- Out of memory
- Device lost
- Other system errors…
• Layers to the rescue!
- Can enable validation layers during development
© Copyright Khronos Group 2016 - Page 18
Community
• A new attitude
- ISV member input drove key decisions
- Consulted with hundreds of developers
• Strong commitment to open source
- Loader
- Validation and other layers
- SPIR-V tools: compiler, validator, …
- Conformance tests
- Specification
• All at [Link]
© Copyright Khronos Group 2016 - Page 19
Should you be using Vulkan?
• Challenges
- Verbose and complex
- Lots of exposed sharp edges
- Lots to learn
• Opportunities
- Much lower driver overhead
- …which you can spread across multiple threads
- More predictable performance
- Mobile friendly
• Realities
- Ecosystem is still immature
- Will need to ship GL/DX versions for years to come
© Copyright Khronos Group 2016 - Page 20
Command Buffers and Pipelines
Michael Worcester – Driver Engineer
([Link]@[Link])
26 May 2016 [Link]
Command Buffers – Deferring the work
OpenGL is immediate (ignoring display lists)
Driver does not know how much work is incoming
Has to guess
Bad!
Vulkan splits recording of work from submission of work
Removes guesswork from driver
Reducing hitching
Helps eliminate unexplained resource usage
© Imagination Technologies
Command Buffers – Pooling Resource
Command Buffers always belong to a Command Pool
Buffers are allocated from pools
Pools provide lightweight synchronisation
Pools can be reset, reclaiming all resources
Two flavours of pool:
Individual reset of command buffers
Group reset only
© Imagination Technologies
Command Buffers – Going wide
Single Thread OpenGL Context
Thread 1 VkCommandBuffer
Thread 2 VkCommandBuffer
Thread N VkCommandBuffer
© Imagination Technologies
Command Buffers – Command Types
Deferred recording of commands
Transfer
Graphics
Compute
Synchronisation
© Imagination Technologies
Command Buffers – Transfers
Transfer commands are raw copies
However, they can change the tiling of an image (this is the only way!)
CPU -> GPU
Texture upload
Static buffer data
GPU -> CPU
Read back of data
GPU -> GPU
Pipelined updates of data
Mipgen
© Imagination Technologies
Command Buffers – “Inside” or “Out”
Transfer Compute RenderPass Compute
Graphics Graphics Graphics
Dispatch BindPipeline BindDescriptors BeginRenderPass PushConstants Draw
© Imagination Technologies
Command Buffers – Secondaries
Primary Transfer Compute RenderPass Compute
ExecuteCommands ExecuteCommands
Secondaries BindPipeline BindDescriptors Draw BindPipeline BindDescriptors Draw Draw
© Imagination Technologies
Command Buffers – Reuse
Camera
© Imagination Technologies
Command Buffers – Reuse
Camera
© Imagination Technologies
Command Buffers – Lifetime
Ownership
CPU GPU
Allocated
Begin
Record End Begin Pending Submit Wait Active
© Imagination Technologies
Pipelines - An anatomy
VI IA VS CS TS ES GS VP RS MS DS FS CB
Fixed Function States
Programmable Shaders
Descriptor Layout
Renderpass (more later)
Dynamic State
© Imagination Technologies
Pipelines – Fixed Function States
VI IA VS CS TS ES GS VP RS MS DS FS CB
VertexInput
Everything that isn’t a shader
InputAssembly
Buffer formats/layouts
Tessellation
Viewport
Raster
Multisample
DepthStencil
ColorBlend
© Imagination Technologies
Pipelines – Shader Stages
VI IA VS CS TS ES GS VP RS MS DS FS CB
Currently same as OpenGL
Vertex
Control
Evaluation
Geometry
Fragment
Note: Tessellation and Geometry are optional features
© Imagination Technologies
Pipelines – Descriptor Layout
Describes the set of resources that a shader can access
Uniforms
Storage Buffers
Images
Samplers
Push Constants
© Imagination Technologies
Pipelines – Dynamic State
Viewport
Per-draw state
Scissor
Tedious to compile each one
Line Width
Combinatorial explosion Depth Bias
Dynamic state! Blend Constant Colour
Opt-in Depth Bounds
Only use when required Stencil
Compare
Write
Reference
© Imagination Technologies
Pipelines – The Cache
Share common state
Load/Store
© Imagination Technologies
Introduction to SPIR-V Shaders
Neil Hickey
Compiler Engineer, ARM
© Copyright Khronos Group 2016 - Page 38
SPIR History
© Copyright Khronos Group 2016 - Page 39
SPIR-V Purpose
Parse HLSL Parse GLSL Parse OpenCL C Parse ISPC Parse Static C++
SPIR-V CFG Optimize SPIR-V CFG
Binary IHV Compiler SPIR-V Print SPIR-V
© Copyright Khronos Group 2016 - Page 40
Developer Ecosystem
• Multiple Developer Advantages:
• Same front-end compiler for multiple
platforms
• Reduces runtime kernel compilation time
• Don’t have to ship shader/kernel source
code
• Drivers are simpler and more reliable
© Copyright Khronos Group 2016 - Page 41
Vulkan and OpenCL
SPIR 1.2 SPIR 2.0 SPIR-V 1.0
100% Khronos defined
LLVM Interaction Uses LLVM 3.2 Uses LLVM 3.4 Round-trip lossless
conversion
Compute Constructs Metadata/Intrinsics Metadata/Intrinsics Native
Graphics Constructs No No Native
Supported Language OpenCL C 1.2 OpenCL C 1.2 – 2.0
OpenCL C 1.2
Feature Sets OpenCL C 2.0 OpenCL C++ and GLSL
OpenCL 2.1 Core
OpenCL C 1.2 OpenCL C 2.0
OpenCL Ingestion OpenCL 1.2 / 2.0
Extension Extension
Extensions
Vulkan Ingestion - - Vulkan 1.0 Core
© Copyright Khronos Group 2016 - Page 42
Compiler flow
GLSL Third party kernel and
Khronos has open sourced shader languages
these tools and translators
OpenCL C OpenCL C++
Khronos plans to open source
these tools soon
SPIR-V Tools
SPIR-V Validator
Other
SPIR-V (Dis)Assembler LLVM intermediate
forms
LLVM to SPIR-V
SPIR-V Bi-directional
• 32-bit word stream Translator
• Extensible and easily parsed
• Retains data object and
control flow information for
effective code generation and
translation
© Copyright Khronos Group 2016 - Page 43
SPIR-V Capabilities
• OpenCL and Vulkan
• Capabilities define feature sets
OpCapability Addresses
• Separate capabilities for Vulkan shaders and OpCapability Linkage
OpenCL kernels OpCapability Kernel
• Validation layer checks correct capabilities
OpCapability Vector16
requested OpCapability Int16
© Copyright Khronos Group 2016 - Page 44
SPIR-V Extensions
• OpExtension
• New functionality
• New instructions OpExtInstImport
“[Link]”
• New semantics
© Copyright Khronos Group 2016 - Page 45
Vulkan shaders vs. GL shaders
• Program GLSL/ESSL shaders in high level language
• Ship high level source with application
• Graphics drivers compile at runtime
• Each driver needs a full compilation tool chain
• Shaders in binary format
• Compile offline
• Ship intermediate language with application
• Graphics drivers “just” lower from IL
• Higher level compilation can be shared among vendors (provided by Khronos)
© Copyright Khronos Group 2016 - Page 46
Vulkan shaders vs. GL shaders
; SPIR-V %6 = OpTypeFloat 32
#version 310 es ; Version: 1.0 %7 = OpTypeVector %6 4
; Generator: Khronos Glslang Reference Front End; 1 %8 = OpTypePointer Output %7
precision mediump float; ; Bound: 20
; Schema: 0
%9 = OpVariable %8 Output
%10 = OpTypeImage %6 2D 0 0 0 1 Unknown
uniform sampler2D s; OpCapability Shader
%1 = OpExtInstImport "[Link].450"
%11 = OpTypeSampledImage %10
%12 = OpTypePointer UniformConstant %11
in vec2 texcoord; OpMemoryModel Logical GLSL450
OpEntryPoint Fragment %4 "main" %9 %17
%13 = OpVariable %12 UniformConstant
%15 = OpTypeVector %6 2
out vec4 color;
OpExecutionMode %4 OriginUpperLeft %16 = OpTypePointer Input %15
OpSource ESSL 310 %17 = OpVariable %16 Input
OpName %4 "main" %4 = OpFunction %2 None %3
OpName %9 "color" %5 = OpLabel
OpName %13 "s" %14 = OpLoad %11 %13
void main() OpName %17 "texcoord"
OpDecorate %9 RelaxedPrecision
%18 = OpLoad %15 %17
%19 = OpImageSampleImplicitLod %7 %14 %18
{ OpDecorate %13 RelaxedPrecision
OpDecorate %13 DescriptorSet 0
OpStore %9 %19
OpReturn
color = texture(s, texcoord);
OpDecorate %14 RelaxedPrecision OpFunctionEnd
OpDecorate %17 RelaxedPrecision
OpDecorate %18 RelaxedPrecision
} OpDecorate %19 RelaxedPrecision
%2 = OpTypeVoid
%3 = OpTypeFunction %2
© Copyright Khronos Group 2016 - Page 47
Khronos SPIR-V Tools
• Reference frontend (glslang) glslangValidator –V –o [Link] [Link]
• SPIR-V disassembler (spirv-dis) spirv-dis -o [Link] [Link]
• SPIR-V assembler (spirv-as) spirv-as –o [Link] [Link]
• SPIR-V reflection (spirv-cross) spirv-cross [Link]
© Copyright Khronos Group 2016 - Page 48
Vulkan shaders in a high level language
• GL_KHR_vulkan_glsl
• Exposes SPIR-V features
• Similar to GLSL with some changes
• Extends #version 140 and higher on desktop and #version 310 es for mobile
content
© Copyright Khronos Group 2016 - Page 49
Vulkan_glsl removed features
• Default uniforms
• Atomic-counter bindings
• Subroutines
• Packed block layouts
© Copyright Khronos Group 2016 - Page 50
Vulkan_glsl new features
• Push constants
• Separate textures and samplers
• Descriptor sets
• Specialization constants
• Subpass inputs
© Copyright Khronos Group 2016 - Page 51
Push Constants
• Push constants replace non-opaque uniforms
- Think of them as small, fast-access uniform buffer memory
• Update in Vulkan with vkCmdPushConstants
// New
layout(push_constant, std430) uniform PushConstants {
mat4 MVP;
vec4 MaterialData;
} RegisterMapped;
// Old, no longer supported in Vulkan GLSL
uniform mat4 MVP;
uniform vec4 MaterialData;
// Opaque uniform, still supported
uniform sampler2D sTexture;1
© Copyright Khronos Group 2016 - Page 52
Separate textures and samplers
• sampler contains just filtering information
• texture contains just image information
• combined in code at the point of texture lookup
uniform sampler s;
uniform texture2D t;
in vec2 texcoord;
...
void main()
{
fragColor = texture(sampler2D(t,s), texcoord);
}
© Copyright Khronos Group 2016 - Page 53
Descriptor sets
• Bound objects can optionally define a descriptor set
• Allows bound objects to be updated in one block
• Allows objects in other descriptor sets to remain the same
• Enabled with the set = ... syntax in the layout specifier
layout(set = 0, binding = 0) uniform sampler s;
layout(set = 1, binding = 0) uniform texture2D t;
© Copyright Khronos Group 2016 - Page 54
Specialization constants
• Allows for special constants to be created whose value is overridable at pipeline
creation time.
• Can be used in expressions
• Can be combined with other constants to form new specialization constants
• Declared using layout(constant_id=...)
• Can have a default value if not overridden at runtime
layout(constant_id = 1) const int arraySize = 12;
vec4 data[arraySize];
© Copyright Khronos Group 2016 - Page 55
Specialization constants(2)
• gl_WorkGroupSize can be specialized with values for the x,y and z component.
layout(local_size_x_id = 2, local_size_z_id = 3) in;
• These specialization constants can be set at pipeline creation time by using
vkSpecializationMapInfo
const VkSpecializationMapEntry entries[] =
{
{ 1, // constantID
0*sizeof(uint32_t), // offset
sizeof(uint32_t) // size
},
};
© Copyright Khronos Group 2016 - Page 56
Specialization constants(3)
const uint32_t data[] = { 16};
const VkSpecializationInfo info =
{
1, // mapEntryCount
entries, // pMapEntries
1*sizeof(uint32_t), // dataSize
data, // pData
};
© Copyright Khronos Group 2016 - Page 57
Subpass Inputs
• Vulkan supports subpasses within render passes
• Standardized GL_EXT_shader_pixel_local_storage!
// GLSL
#extension GL_EXT_shader_pixel_local_storage : require
__pixel_local_inEXT GBuffer {
layout(rgba8) vec4 albedo;
layout(rgba8) vec4 normal;
...
} pls;
// Vulkan
layout(input_attachment_index = 0) uniform subpassInput albedo;
layout(input_attachment_index = 1) uniform subpassInput normal;
...
© Copyright Khronos Group 2016 - Page 58
Acknowledgements
• Hans-Kristian Arntzen – ARM
• Benedict Gaster – University of the West of England
• Neil Henning – Codeplay
© Copyright Khronos Group 2016 - Page 59
Using SPIR-V in practice with
SPIRV-Cross
Hans-Kristian Arntzen
Engineer, ARM
© Copyright Khronos Group 2016 - Page 60
Contents
• Moving to offline compilation of SPIR-V
• Creating pipeline layouts with SPIRV-Cross
- Descriptor sets
- Push constants
- Multipass input attachments
• Making SPIR-V portable to other graphics APIs
• Debugging complex shaders with your C++ debugger of choice
© Copyright Khronos Group 2016 - Page 61
Offline Compilation to SPIR-V
• Shader compilation can be part of your build system
• Catching compilation bugs in build time is always a plus
• Strict, mature GLSL frontends available
- glslang: [Link]
- shaderc: [Link]
• Full freedom for other languages in the future
# Makefile rules
FRAG_SHADERS := $(wildcard *.frag)
SPIRV_FILES :=
$(FRAG_SHADERS:.frag=.[Link])
shaders: $(SPIRV_FILES)
%.[Link]: %.frag
glslc –o $@ $< $(GLSL_FLAGS) –std=310es
© Copyright Khronos Group 2016 - Page 62
Vulkan Pipeline Layouts
• Need to know the “function signature” of our shaders
[Link] = <layout goes here>;
vkCreateGraphicsPipelines(..., &pipelineInfo, ..., &pipeline);
© Copyright Khronos Group 2016 - Page 63
The Contents of a Pipeline Layout
layout(set = 0, binding = 1) uniform UBO {
mat4 MVP;
};
layout(set = 1, binding = 2) uniform sampler2D uTexture;
layout(push_constant) uniform PushConstants {
vec4 FastConstant;
•} Signature
constants;
- 16 bytes of push constant space
- Two descriptor sets
- Set #0 has one UBO at binding #1
- Set #1 has one combined image sampler at binding #2
• Need to figure this out automatically, or write every layout by hand
- Latter is fine for tiny applications
- Vulkan does not provide reflection here, after all, this is vendor neutral information
© Copyright Khronos Group 2016 - Page 64
Introducing SPIRV-Cross
• SPIRV-Cross is a new tool hosted by Khronos
- [Link]
• Extensive reflection
• Decompilation to high level languages
Khronos SPIR-V Toolbox
SPIRV- SPIRV- SPIRV-
glslang
Tools LLVM Cross
© Copyright Khronos Group 2016 - Page 65
Reflecting Uniforms and Samplers
• SPIRV-Cross has a simple API to retrieve resources
using namespace spirv_cross;
vector<uint32_t> spirv_binary = load_spirv_file();
Compiler comp(move(spirv_binary));
// The SPIR-V is now parsed, and we can perform reflection on it.
ShaderResources resources = comp.get_shader_resources();
for (auto &u : resources.uniform_buffers)
{
uint32_t set = comp.get_decoration([Link], spv::DecorationDescriptorSet);
uint32_t binding = comp.get_decoration([Link], spv::DecorationBinding);
printf(“Found UBO %s at set = %u, binding = %u!\n”,
[Link].c_str(), set, binding);
}
© Copyright Khronos Group 2016 - Page 66
Stepping it up with Push Constants
• SPIRV-Cross can figure out which push constant elements are in use
- Push constant blocks are typically shared across the various stages
- Only parts of the push constant block are referenced in a single stage
layout(push_constant) uniform PushConstants {
mat4 MVPInVertex;
vec4 ColorInFragment;
} constants;
FragColor = [Link]; // Fragment only uses element #1.
uint32_t id = resources.push_constant_buffers[0].id;
vector<BufferRange> ranges = comp.get_active_buffer_ranges(id);
for (auto &range : ranges)
{
printf(“Accessing member #%u, offset %u, size %u\n”,
[Link], [Link], [Link]);
}
// Possible to get names for struct members as well
© Copyright Khronos Group 2016 - Page 67
Subpass Input Attachments
• Subpass attachments are similar to regular images
- Set
- Binding
- Input attachment index
layout(set = 0, binding = 0, input_attachment_index = 0) uniform subpassInput uAlbedo;
layout(set = 0, binding = 1, input_attachment_index = 1) uniform subpassInput uNormal;
vec4 lastColor = subpassLoad(uLastPass);
for (auto &attachment : resources.subpass_inputs)
{
// ...
}
© Copyright Khronos Group 2016 - Page 68
Taking SPIR-V Beyond Vulkan
• SPIR-V is a great format to rally around
- Makes sense to be able to use it in older graphics APIs as well
• Will take some time before exclusive Vulkan support is mainstream
• How to make use of Vulkan features while being compatible?
- Push constants
- Subpass
- Descriptor sets
• Without tools, Vulkan features will be harder to take advantage of
© Copyright Khronos Group 2016 - Page 69
GL + GLES + Vulkan Pipeline
• Implemented in our internal demo engine
• Write shaders in Vulkan GLSL
• Use Vulkan features directly
• No need for platform #ifdefs
• Can target mobile and desktop GL from same
SPIR-V binary
© Copyright Khronos Group 2016 - Page 70
Subpasses in OpenGL
• The subpass attachment is really just a texture read from gl_FragCoord
- Enables reading directly from tile memory on tiled architectures
- Great for deferred rendering and programmable blending
// Vulkan GLSL
uniform subpassInput uAlbedo;
...
FragColor = accumulateLight(
subpassLoad(uAlbedo),
subpassLoad(uNormal).xyz,
subpassLoad(uDepth).x);
// Translated to GLSL in SPIRV-Cross
uniform sampler2D uAlbedo;
...
FragColor = accumulateLight(
texelFetch(uAlbedo, ivec2(gl_FragCoord.xy), 0),
texelFetch(uNormal, ivec2(gl_FragCoord.xy), 0).xyz,
texelFetch(uDepth, ivec2(gl_FragCoord.xy), 0).x);
© Copyright Khronos Group 2016 - Page 71
Push Constants in OpenGL
• Push constants bundle up old-style uniforms into buffer blocks
- Translates directly to uniform structs
- Use reflection to stamp out a list of glUniform() calls
// Vulkan GLSL
layout(push_constant) uniform PushConstants {
vec4 Material;
} constants;
FragColor = [Link];
// Translated to GLSL in SPIRV-Cross
struct PushConstants {
vec4 Material;
};
uniform PushConstants constants;
FragColor = [Link];
© Copyright Khronos Group 2016 - Page 72
Descriptor Sets in OpenGL
• OpenGL has a binding space per type
• Find some remapping scheme that fits your application
• SPIRV-Cross can tweak bindings before decompiling to GLSL
// Vulkan GLSL
layout(set = 1, binding = 1) uniform sampler2D uTexture;
// SPIRV-Cross
uint32_t newBinding = 4;
glsl.set_decoration([Link], spv::DecorationBinding, newBinding);
glsl.unset_decoration([Link], spv::DecorationDescriptorSet);
string glslSource = [Link]();
// GLSL
layout(binding = 4) uniform sampler2D uTexture;
© Copyright Khronos Group 2016 - Page 73
gl_InstanceIndex in OpenGL
• Vulkan adds the base instance to the instance ID
- GL does not
- Workaround is to have GL backend pass in the base index as a uniform
// Vulkan GLSL
layout(set = 0, binding = 0) uniform UBO {
mat4 MVPs[MAX_INSTANCES];
};
gl_Position = MVPs[gl_InstanceIndex] * Position;
// GLSL through SPIRV-Cross
layout(binding = 0) uniform UBO {
mat4 MVPs[MAX_INSTANCES];
};
uniform int SPIRV_Cross_BaseInstance; // Supplied by application
gl_Position = MVPs[(gl_InstanceID + SPIRV_Cross_BaseInstance)] * Position;
© Copyright Khronos Group 2016 - Page 74
Debugging Shaders in C++
• If you have thought …
- “I wish I could assert() in a compute shader”
- “I wish I could instrument a shader with logging”
- “I wish I could use clang address sanitizer to debug out-of-bounds access”
- “I want to reproduce a shader bug outside the driver”
- “I want to run regression tests when optimizing a shader”
- “I want to step through a compute thread in <insert C++ debugger here>”
• … the C++ backend in SPIRV-Cross could be interesting
• Still a very experimental feature
• Hope to expand this further in the future
© Copyright Khronos Group 2016 - Page 75
Basic Idea
• With GLM, C++ can be near GLSL compatible
• Reuse the GLSL backend to emit code which also works in C++
- Minor differences like references vs. in/out, etc
• Add some scaffolding to redirect shader resources
- Easily done with macros, the actual C++ output is kept clean
• The C++ output implements a simple C-compatible interface
• Add instrumentation to the C++ file as desired
• Compile C++ file to a dynamic library with debug symbols
• Instantiate from test program, bind buffers and invoke
- And have fun running shadertoy raymarchers at seconds per frame
© Copyright Khronos Group 2016 - Page 76
On the Command Line
# Compile to SPIR-V
glslc –o [Link] [Link]
# Create C++ interface
spirv-cross --output [Link] [Link] --cpp
# Add some instrumentation to the shader if you want
$EDITOR [Link]
# Build library
g++ -o [Link] –shared [Link] –O0 –g –Iinclude/spirv_cross
# Run your test app
./<my app> --shader [Link]
© Copyright Khronos Group 2016 - Page 77
Another tool supporting Vulkan:
Mali Graphics Debugger is an advanced API tracer tool for Vulkan, OpenGL ES, EGL and
OpenCL. It allows developers to trace their graphics and compute applications to debug
issues and analyze the performance.
• Vulkan Support
- Trace all the function calls in the
SPEC.
- Allows you to see exactly what calls
compose your application.
- Contact the Mali forums and we would
love to get you setup.
[Link]
arm-mali-graphics
© Copyright Khronos Group 2016 - Page 78
Investigation with the Mali Graphics Debugger
Frame
Assets View
Statistics
Frame
Outline
States
Uniforms
Frame Vertex Attributes
Capture: Buffers
Framebuffers
API Trace
Textures
Shaders
Dynamic
Help
© Copyright Khronos Group 2016 - Page 79
References
• SPIRV-Cross
- [Link]
• Glslang
- [Link]
• Shaderc
- [Link]
• SPIRV-Tools
- [Link]
• Mali Graphics Debugger
- [Link]
© Copyright Khronos Group 2016 - Page 80
Feeding Your Shaders
Jesse Barker
Principal Software Engineer
Moving to Vulkan: How to make your 3D graphics more explicit
May 26, 2016
© ARM 2016
What is a Vulkan Resource?
Shader Input/Output Buffers
Referenced via Descriptors Images
Some are specialized in the Samplers
hardware Input Attachments
Vertex Input Attributes
Render Targets
83 © ARM 2016
What are Vulkan Descriptors?
Handle Type
myImageView SAMPLED_IMAGE
Image View
Image Device
Memory
84 © ARM 2016
What are Descriptor Sets?
// uniform blocks:
layout(set = 0, binding = 0) uniform Type0 { ... } ubo0; binding type stages
// textures: 0 Uniform Buffer Graphics
layout(set = 0, binding = 1) uniform sampler2D tex0;
1 Image/Sampler Graphics
// SSBO:
layout(set = 0, binding = 2) buffer Type2 { ... } ssbo0; 2 Storage Buffer Graphics
void main()
// ...
}
85 © ARM 2016
What is a Descriptor Pool?
typedef struct VkDescriptorPoolSize {
Parent object of a VkDescriptorType type;
Descriptor Set uint32_t descriptorCount;
} VkDescriptorPoolSize;
Allows Descriptor Set
typedef struct VkDescriptorPoolCreateInfo {
management to be VkStructureType sType;
threaded const void*
VkDescriptorPoolCreateFlags
pNext;
flags;
Manages memory for uint32_t maxSets;
uint32_t poolSizeCount;
hardware descriptors const VkDescriptorPoolSize* pPoolSizes;
} VkDescriptorPoolCreateInfo;
86 © ARM 2016
Allocating Descriptor Sets
Define desired layouts of descriptors
Ask the Descriptor Pool to allocate a Descriptor Set per layout
87 © ARM 2016
What is a Pipeline Layout?
// uniform blocks:
layout(set = 0, binding = 0) uniform Type0
Descriptor Set 0
{ ... } ubo0;
layout(set = 0, binding = 0) uniform Type1 binding type stages
{ ... } ubo1;
0 Uniform Buffer Graphics
// textures:
layout(set = 0, binding = 1) uniform 0 Uniform Buffer Graphics
sampler2D tex0;
layout(set = 1, binding = 0) uniform 1 Image/Sampler Graphics
sampler2D tex1;
// SSBO:
layout(set = 1, binding = 1) buffer Type2 {
... } ssbo0;
Descriptor Set 1
void main() { binding type stages
// ...
}
0 Image/Sampler Graphics
1 Storage Buffer Graphics
88 © ARM 2016
How do Descriptors get into Descriptor Sets?
VKAPI_ATTR void VKAPI_CALL vkUpdateDescriptorSets( typedef struct VkWriteDescriptorSet {
VkDevice device, VkStructureType sType;
uint32_t const void* pNext;
descriptorWriteCount, VkDescriptorSet dstSet;
const VkWriteDescriptorSet* pDescriptorWrites, uint32_t dstBinding;
uint32_t descriptorCopyCount, uint32_t dstArrayElement;
const VkCopyDescriptorSet* pDescriptorCopies); uint32_t descriptorCount;
VkDescriptorType descriptorType;
const VkDescriptorImageInfo* pImageInfo;
const VkDescriptorBufferInfo* pBufferInfo;
const VkBufferView* pTexelBufferView;
} VkWriteDescriptorSet;
typedef struct VkCopyDescriptorSet {
VkStructureType sType;
const void* pNext;
VkDescriptorSet srcSet;
uint32_t srcBinding;
uint32_t srcArrayElement;
VkDescriptorSet dstSet;
uint32_t dstBinding;
uint32_t dstArrayElement;
uint32_t descriptorCount;
} VkCopyDescriptorSet;
89 © ARM 2016
Finally, I’m ready to use my Descriptor Sets
VKAPI_ATTR void VKAPI_CALL vkCmdBindDescriptorSets(
VkCommandBuffer commandBuffer, Bound sets must
VkPipelineBindPoint pipelineBindPoint, match pipeline layout
VkPipelineLayout layout,
uint32_t firstSet, Graphics or compute?
uint32_t descriptorSetCount,
const VkDescriptorSet* pDescriptorSets, Simple layout is best
uint32_t dynamicOffsetCount,
const uint32_t* pDynamicOffsets);
90 © ARM 2016
What about Vertex Input?
91 © ARM 2016
Vertex Input Description
If your shader declares: const VkVertexInputBindingDescription binding[] =
{
{
0, // binding
in vec3 position; sizeof(float) * 3, // stride
in uvec2 texcoord; VK_VERTEX_INPUT_RATE_VERTEX // inputRate
},
{
Your C code declares: 1,
sizeof(uint8_t) * 2,
// binding
// stride
VK_VERTEX_INPUT_RATE_VERTEX // inputRate
struct Position },
{ };
float x, y, z; const VkVertexInputAttributeDescription attributes[] =
}; {
{
0, // location
struct Texcoord binding[0].binding, // binding
{ VK_FORMAT_R32G32B32_SFLOAT, // format
uint8_t u, v; 0 // offset
},
}; {
1, // location
binding[1].binding, // binding
VK_FORMAT_R8G8_UNORM, // format
0 // offset
}
};
92 © ARM 2016
Questions?
93 © ARM 2016
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM
Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured
may be trademarks of their respective owners.
Copyright © 2016 ARM Limited
© ARM 2016
Vulkan Subpasses
or
The Frame Buffer is Lava
Andrew Garrard
Samsung R&D Institute UK
UK Khronos Chapter meet, May 2016
Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Vulkan aims to reduce the overheads of
keeping the GPU busy
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 96
Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Vulkan aims to reduce the overheads of
keeping the GPU busy
- Efficient generation of work on multiple CPU cores
Core 1 CmdBuf CmdBuf CmdBuf
Core 2 CmdBuf CmdBuf CmdBuf Command buffer
recording
Core 3 CmdBuf CmdBuf CmdBuf
Core 4 Submit Submit Submit
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 97
Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Vulkan aims to reduce the overheads of
keeping the GPU busy
- Efficient generation of work on multiple CPU cores
- Reuse of command buffers to avoid CPU build time
Record 2ry command buffer Record primary command buffer
Invoke
Invoke
Invoke
Invoke
2ry 2ry 2ry 2ry
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 98
Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Vulkan aims to reduce the overheads of
keeping the GPU busy
- Efficient generation of work on multiple CPU cores
- Reuse of command buffers to avoid CPU build time
vkQueueSubmit vkQueueSubmit vkQueueSubmit
Record command buffer CmdBuf CmdBuf CmdBuf
Record command buffer CmdBuf CmdBuf
Record command buffer CmdBuf
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 99
Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Vulkan aims to reduce the overheads of
keeping the GPU busy
- Efficient generation of work on multiple CPU cores
- Reuse of command buffers to avoid CPU build time
- Potentially more efficient memory management
Heap 1 Heap 2
User-defined memory reuse
Pool 1 Pool 2
Explicit state transitions
Image 1 Image 2 Image 3 Cost invoked at defined points
View 1 View 2
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 100
Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Vulkan aims to reduce the overheads of
keeping the GPU busy
- Efficient generation of work on multiple CPU cores
- Reuse of command buffers to avoid CPU build time
- Potentially more efficient memory management
- Avoiding unpredictable shader compilation
Compile to SPIR-V (slow) Offline
Record command buffer (slow-ish) 2ry thread
Submit command buffer (fast) Submitting thread
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 101
Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Vulkan aims to reduce the overheads of
keeping the GPU busy
- Efficient generation of work on multiple CPU cores
- Reuse of command buffers to avoid CPU build time
- Potentially more efficient memory management
- Avoiding unpredictable shader compilation
•Mostly, the message has been that if you’re entirely
limited by shader performance or bandwidth, Vulkan
can’t help you (there is no magic wand)
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 102
Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Actually, that’s not entirely true...
•APIs like OpenGL were designed when the GPU
looked very different (or was partly software)
•The way to design an efficient mobile GPU is
not a perfect match for OpenGL
-Think a CPU’s command decode unit/microcode
•But the translation isn’t always perfectly
efficient
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 103
TiledtoGPUs
Click edit Master title style
•Most (not all) mobile GPUs use tiling
- It’s all about the bandwidth (size and power limits)
Scene description Binning pass Shading pass
•On-chip tile memory is much faster than the
main frame buffer
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 104
Not everything
Click to edit Masterreaches
title stylememory
•Rendering requires lots of per-pixel data
- Z, stencil
- Full multisample resolution
•We usually only care about the final image
Z Stencil RGB RGB
- We can throw away Z and stencil
- We only need a downsampled (A)RGB
- Don’t need to load anything from a previous frame
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 105
Sometimes
Click we want
to edit Master title the
styleresults
of rendering
•Output from one rendering job can be used by
the next
•Z buffer for shadow maps
•Rendering for environment maps
•HDR bloom
•These can have low resolution and may not
take much bandwidth
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 106
Sometimes
Click you do
to edit Master need
title styleframebuffer resolution
•Deferred shading
Z
Light
weight Render
render Diffuse/ɑ
full-screen
storing quad and
per- perform
surface fragment
content Specular/ shading
at each Specularity
fragment
Normal
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 107
Sometimes
Click you do
to edit Master need
title styleframebuffer resolution
•Deferred shading
•Deferred lighting
Diffuse
Z
Re-render
Render scene with
Light full-screen full
weight quad and fragment
render calculate shading,
Specularity Specular
for lighting using
lighting output lighting
input inputs
Normal
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 108
Sometimes
Click you do
to edit Master need
title styleframebuffer resolution
•Deferred shading
•Deferred lighting
•Order-independent transparency
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 109
Sometimes
Click you do
to edit Master need
title styleframebuffer resolution
•Deferred shading
•Deferred lighting
•Order-independent transparency
•HDR tone mapping
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 110
Rendering
Click outputs
to edit Master separately
title style
•Rendering to each surface separately is bad
•Geometry has a per-bin cost
- Sometimes the cost is low, but it’s there
- Vertices in multiple bins get processed repeatedly
- Rendering the scene repeatedly is painful
•Even immediate-mode renderers hate this!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 111
Multiple
Click render
to edit Mastertargets don’t
help much
title style
•Using MRTs means multiple buffers in one pass
Single scene traversal
This is a typical approach for
immediate-mode renderers (e.g.
desktop/console systems)
•Reduces the geometry load (only process once)
•Still writing a lot of data off-chip
- Tilers are all about trying not to do this!
- Increases use of shader resources may slow some h/w
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 112
PixeltoLocal
Click Storage
edit Master title(OpenGL
style ES extension)
•Tiler-friendly (at last)
- Store only the current tile values
- Read them later in the tile processing
•But not portable!
- Not practical on immediate renderers
- Debugging on desktop won’t work!
- Capabilities vary between devices
- Driver doesn’t have visibility
- Data access is restricted
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 113
Vulkan:
Click Explicit
to edit Master dependencies
title style
•Vulkan has direct support for this type of
rendering work load
•By telling the driver how you intend to use the
rendered results, the driver can produce a
better mapping to the hardware
- The extra information is a little verbose, but simpler
than handling all possible cases yourself!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 114
Vulkan
Click render
to edit passes
Master and
subpasses
title style
•A render pass groups dependent operations
- All images written in a render pass are the same size
Lighting Fragment
Geometry
Single render pass
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 115
Vulkan
Click render
to edit passes
Master and
subpasses
title style
•A render pass groups dependent operations
- All images written in a render pass are the same size
•A render pass contains a number of subpasses
- Subpasses describe access to attachments
- Dependencies can be defined between subpasses
Sub Sub Sub
pass pass 2: pass 3:
1: Light Frag
Geo
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 116
Vulkan
Click render
to edit passes
Master and
subpasses
title style
•A render pass groups dependent operations
- All images written in a render pass are the same size
•A render pass contains a number of subpasses
- Subpasses describe access to attachments
- Dependencies can be defined between subpasses
•Each render pass instance has to be contained
within a single command buffer (unit of work)
- Some tilers schedule by render pass
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 117
Defining
Click a Master
to edit rendertitle
passstyle
•VkRenderPassCreateInfo
- VkAttachmentDescription *pAttachments
- Just the descriptions, not the actual attachments!
- VkSubpassDescription *pSubpasses
- VkSubpassDependency *pDependencies
•vkCreateRenderPass(device, createInfo,.. pass)
- Gives you a VkRenderPass object
- This is a template that you can use repeatedly
- When we use it, we get a render pass instance
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 118
Describing
Click attachments
to edit Master title stylefor a render pass
•VkAttachmentDescription
- format/samples
- loadOp
- VK_ATTACHMENT_LOAD_OP_LOAD to preserve
- VK_ATTACHMENT_LOAD_OP_DONT_CARE for overwrites
- VK_ATTACHMENT_LOAD_OP_CLEAR uniform clears (e.g. Z)
- storeOp
- VK_ATTACHMENT_STORE_OP_STORE to output it
- VK_ATTACHMENT_STORE_OP_DONT_CARE may discard after
the render pass
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 119
Defining
Click a Master
to edit subpass
title style
•VkSubpassDescription
- pInputAttachments
- Which of the render pass’s attachments this subpass reads
- pColorAttachments
- Which ones this subpass writes (1:1 - optional)
- pResolveAttachments
- Which ones this subpass writes (resolving multisampling)
- pPreserveAttachments
- Which attachments need to persist across this subpass
- Subpasses are numbered and ordered
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 120
Defining
Click subpass
to edit dependencies
Master title style
•VkSubpassDependency
- srcSubpass
- dstSubpass
- Where the dependency applies (can be external)
- srcStageMask
- dstStageMask
- Execution dependencies between subpasses
- srcAccessMask
- dstAccessMask
- Memory dependencies between subpasses
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 121
Vulkan
Click framebuffers
to edit Master title style
•A VkFramebuffer defines the set of
attachments used by a render pass instance
•VkFramebufferCreateInfo
- renderPass
- pAttachments
- These are actual VkImageViews this time!
- width
- height
- layers
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 122
Starting
Click to Master
to edit use a title
render
stylepass
•vkCmdBeginRenderPass/vkCmdEndRenderPass
- Starts a render pass instance in a command buffer
- You start in the first (maybe only) subpass implicitly
- pRenderPassBegin contains configuration
•VkRenderPassBeginInfo
- VkRenderPass renderPass
- The render pass “template”
- VkFrameBuffer framebuffer
- Specifies targets for rendering
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 123
Putting
Click it all
to edit together…
Master title style
VkAttachmentDescription VkSubpassDescription VkSubpassDependency
VkAttachmentDescription VkSubpassDescription VkSubpassDependency
VkAttachmentDescription VkSubpassDescription
VkAttachmentDescription
VkRenderPassCreateInfo Key:
VkImageView
• Objects are dark grey
VkImageView vkCreateRenderPass • Functions are light grey
• Arrows between objects are
VkImageView
references of some sort
VkImageView VkRenderPass • Arrows into functions are arguments
• Arrows out of functions are
VkFramebufferCreateInfo constructed objects
vkCreateFramebuffer VkRenderPassBeginInfo VkCommandBuffer
VkFramebuffer
vkCmdBeginRenderPass
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 124
Simple
Click rendering
to edit Master title style
•vkAllocateCommandBuffers (VK_COMMAND_BUFFER_LEVEL_PRIMARY)
•vkBeginCommandBuffer
Command buffer
•vkCmdBeginRenderPass Render pass
Draw Draw Draw Draw
•vkCmdDraw (etc.)
•vkCmdEndRenderPass
•vkEndCommandBuffer
Queue
•vkQueueSubmit
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 125
Multiple
Click render
to edit Masterpasses
title style
•You can have more than one render pass in a
command buffer Render pass
Command buffer
Render pass
- Yes, Leeloo multipass,
Draw Draw Draw Draw
we know…
- So a command buffer can render to many outputs
- E.g. you could render to the same shadow and environment
maps every frame by reusing the same command buffer
- But it must be the same outputs each time you submit
- A specific render pass instance has fixed vkFrameBuffers!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 126
Two to
Click limitations…
edit Master title style
•Different render passes ֜ independent outputs
- Rendering goes off-chip, there’s no PLS-style on-chip
reuse of pixel contents
•You can’t reuse the same command buffer with
a different render target
- E.g. for double buffering or streamed content
- We’ll come back to this…
•Still sometimes all you need, though!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 127
Moretothan
Click one subpass
edit Master title style
•vkCmdNextSubpass moves to the next subpass
- Implicitly start in the first subpass of the render pass
- Dependencies say what you’re accessing from
previous subpasses Command buffer
- Same render pass so Render Pass
accesses stay on
New subpass
chip (if possible) Draw Draw Draw Draw Draw
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 128
Usingtomultiple
Click subpasses
edit Master title style
•vkCmdBeginCommandBuffer
•vkCmdBeginRenderPass
Command buffer
•vkCmdDraw (etc.) Render Pass
New subpass
•vkCmdNextSubpass Draw Draw Draw Draw Draw
•vkCmdDraw (etc.)
•vkCmdEndRenderPass
•vkCmdEndCommandBuffer
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 129
Accessing
Click to edit subpass output
in fragment shaders
Master title style
•In SPIR-V, previous subpass content is read
with OpImageRead
- Coordinates are sample-relative, and need to be 0
- OpTypeImage Dim = SubpassData
•In GLSL (using GL_KHR_vulkan_glsl):
- Types for subpass access are [ui]subpassInput(MS)
- layout(input_attachment_index = i, …) uniform
subpassInput t; to select a subpass C.f. __pixel_localEXT layouts in
EXT_shader_pixel_local_storage
- subpassLoad() to access the pixel when using OpenGL ES
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 130
Avoiding
Click unnecessary
to edit allocations
Master title style
•If we’re using subpasses, we likely don’t need
the images in memory
- A tiler may be able to process the subpasses entirely
on-chip, without needing an allocation
- Still need to “do the allocation” in case the tiler can’t
handle the request/on an immediate-mode renderer!
- Won’t commit resources unless it actually needs to
•vkCreateImage flags for “lazy committal”
- VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 131
Vulkan
Click subpasses:
to edit advantages
Master title style
•The driver knows what you’re doing
- It can reorder subpasses EXT_shader_pixel_local_storage is actually
more explicit than Vulkan here (and may still
- It can change the tile size be offered as an extension)
- It can balance resources between subpasses
- It will fall back to memory for you if it has to
- Under the hood, mechanism likely matches PLS
•Works on immediate mode renderers
- Probably MRTs and normal external writes
- Desktop debugging tools will work!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 132
There’s
Click more:
to edit Secondary
Master command buffers
title style
•Vulkan has two levels of command buffers
- Determined by vkAllocateCommandBuffers
•VK_COMMAND_BUFFER_LEVEL_PRIMARY
- Main command buffer, as we’ve seen so far
•VK_COMMAND_BUFFER_LEVEL_SECONDARY
- Command buffer that can be invoked from the
primary command buffer
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 133
Use of
Click secondary
to edit command
Master title style buffers
•vkBeginCommandBuffer
- Takes a VkCommandBufferBeginInfo
•VkCommandBufferBeginInfo
- flags include:
- VK_COMMANDBUFFER_USAGE_RENDER_PASS_CONTINUE_BIT
- pInheritanceInfo
•VkCommandBufferInheritanceInfo
- renderPass and subpass
- framebuffer (can be null, more efficient if known)
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 134
Secondary
Click command
to edit Master buffers
and passes
title style
•Why do we need the “continue bit”?
- Render passes (and subpasses) can’t start in a
secondary command buffer
- Non-render pass stuff can be in a secondary buffer
- You can run a compute shader outside a render pass
- Otherwise, the render pass is inherited from the
primary command buffer
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 135
Secondary
Click command
to edit Master buffers
and passes
title style
•Why specify render pass/framebuffer?
- Command buffers needs to know this when recording
- Some operations depends on render pass info (e.g. format)
- Framebuffer is optional (can just inherit)
- If you can specify the actual framebuffer, the command
buffer can be less generic and therefore may be faster
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 136
Invoking
Click the
to edit secondary
Master command
title stylebuffer
•You can’t submit a secondary command buffer
•You have to invoke it from a primary command
buffer with vkCmdExecuteCommands
Secondary buffer Secondary buffer Secondary buffer
Draw Draw Draw Draw Draw Draw
Primary command buffer
Render pass Render pass
subpass
New
vkCEC vkCEC vkCEC
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 137
Secondary
Click command
to edit Master code buffer
title style
•vkCmdBeginCommandBuffer
Primary command buffer
•vkCmdBeginRenderPass Render pass
subpass
New
vkCEC vkCEC
•vkCmdExecuteCommands
•vkCmdNextSubpass Secondary buffer
•vkCmdExecuteCommands Draw Draw
•vkCmdEndRenderPass Secondary buffer
•vkCmdEndCommandBuffer Draw Draw
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 138
Performance
Click andtitle
to edit Master parallelism
style
•Creating a command buffer can be slow
- Lots of state to check, may require compilation
- This happens in GLES as well, you just don’t control when!
•So create secondary command buffers on
different threads
- Lots of 4- and 8-core CPUs in cell phones these days
•Invoking the secondary buffer is lightweight
- Primary command buffer generation is quick(er)
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 139
Whattodoes
Click this have
edit Master to do
title style with passes?
•Remember:
- Render passes exist within (primary) command buffers
- The command buffer sets up the GPU for the render pass
- On-chip rendering happens within a render pass
- If you want content to persist between render passes, it’ll
reach memory (or at least cache), not stay in the tile buffer
- You can’t use multiple threads to build work for a
primary command buffer in parallel
- You can build many secondary command buffers at once
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 140
You can’t
Click to edit mix and
Master match
title style
•Within a subpass you can either (but not both):
- Execute rendering commands directly in the primary
command buffer
- VK_SUBPASS_CONTENTS_INLINE
Command buffer
Render pass
Draw Draw Draw Draw
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 141
You can’t
Click to edit mix and
Master match
title style
•Within a subpass you can either (but not both):
- Execute rendering commands directly in the primary
command buffer
- VK_SUBPASS_CONTENTS_INLINE
- Invoke secondary command buffers from the primary
command buffer with vkCmdExecuteCommands
- VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS
Primary command buffer
Secondary buffer Render pass Secondary buffer
Draw Draw vkCEC vkCEC Draw Draw
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 142
You can’t
Click to edit mix and
Master match
title style
•Within a subpass you can either (but not both):
- Execute rendering commands directly in the primary
command buffer
- VK_SUBPASS_CONTENTS_INLINE
- Invoke secondary command buffers from the primary
command buffer with vkCmdExecuteCommands
- VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS
- Chosen by vkCmdBeginRenderPass/vkCmdNextSubpass
- Remember: you can only do these in a primary command
buffer!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 143
Command
Click to edit buffer reuse:
Master title styleeven
faster
•Primary command buffers work with a fixed
render pass and framebuffer
- You can reuse a primary command buffer, but it will
always access the same images – often good enough
- May have to wait for execution to end; can’t be “one-time”
•What if you want to access different targets?
- E.g. a cycle of framebuffers or streamed content?
- You can round-robin several command buffers
- Or you can use secondary command buffers!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 144
Compatible
Click render
to edit Master titlepasses
and frame buffers
style
•The render pass a secondary command buffer
uses needn’t be the one it was recorded with
- It can be “compatible”
- Same formats, number of sub-passes, etc.
•You can have primary command buffers with
different outputs, and they can re-use
secondary command buffers
- The primary has to be different to record new targets
- The primary may have to patch secondary addresses
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 145
Almost-free
Click use with
to edit Master changing
framebuffers
title style
•No cost for secondary command buffers
•Primary command buffer is simple and quick
Primary command buffer
Render pass
CEC CEC
Target
image 1
Secondary
command
Primary command buffer buffer
Target
image 2 Render pass
CEC CEC Secondary
command
buffer
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 146
So I can
Click doMaster
to edit bloom/DoF/rain/motion
title style blur…!
•No! Remember, you can only access the
current pixel
•Tilers process one tile at a time
?
- If you could try to access a different pixel, the tile
containing it may not be there
- You have to write out the whole image to do this
- Slow, painful, last resort!
- Yes, we can think of possible solutions too
- Give it time (lots of different hardware out there)
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 147
Coming
Click outMaster
to edit of the shadow(buffer)s
title style
•Render passes are integral to the Vulkan API
- Reflects modern, high-quality rendering approaches
•The driver has more information to work with
- It can do more for you
- Remember this if you complain it’s verbose!
•Hardware resource management is hard
- Expect drivers to get better over time
•Another tool for better mobile gaming
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 148
Thank
Click to you
edit Master title style
•Over to you…
Andrew Garrard
[Link] at [Link]
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 149
Keeping your GPU fed
without getting bitten
Tobias Hector
May 2016
© Copyright Khronos Group 2016 - Page 150
Introduction
• You have delicious draw calls
- Yummy!
© Copyright Khronos Group 2016 - Page 151
Introduction
• You have delicious draw calls
- Yummy!
• Your GPU wants to eat them
- It’s really hungry
© Copyright Khronos Group 2016 - Page 152
Introduction
• You have delicious draw calls
- Yummy!
• Your GPU wants to eat them
- It’s really hungry
• Keep it fed at all times
- So it keeps making pixels
© Copyright Khronos Group 2016 - Page 153
Introduction
• You have delicious draw calls
- Yummy!
• Your GPU wants to eat them
- It’s really hungry
• Keep it fed at all times
- So it keeps making pixels
• Don’t want it biting your hand
- Look at those teeth!
© Copyright Khronos Group 2016 - Page 154
Keeping it fed
• GPU needs a constant supply of food
- It doesn’t want to wait
• Certain foods are tough to digest
- Provide multiple operations to hide stalls
• Draw calls provide a variety of nutrition
- Vertex work, raster work, tessellation, vitamins A-K, etc.
© Copyright Khronos Group 2016 - Page 155
Keeping it fed
System
CPU
0 1
GPU
0 1
© Copyright Khronos Group 2016 - Page 156
Keeping it fed
System
CPU
0 1 2
GPU
0 1 2
© Copyright Khronos Group 2016 - Page 157
Keeping it fed
GPU
Vertex
0 1
Fragment
0 1
© Copyright Khronos Group 2016 - Page 158
Keeping it fed
GPU
Vertex
0 1 2
Fragment
0 1 2
© Copyright Khronos Group 2016 - Page 159
Not getting bitten
• GPU eating from lots of different plates
- Don’t touch anything it’s using!
• It doesn’t want a mouthful of beef choc chip ice cream
- Don’t change data whilst it’s accessing a resource
• Hey I’m eating that!
- Don’t delete resources whilst the GPU is still using them
© Copyright Khronos Group 2016 - Page 160
© Copyright Khronos Group 2016 - Page 161
© Copyright Khronos Group 2016 - Page 162
© Copyright Khronos Group 2016 - Page 163
© Copyright Khronos Group 2016 - Page 164
© Copyright Khronos Group 2016 - Page 165
On to the serious bits…
© Copyright Khronos Group 2016 - Page 166
Terminology
• Operation
- Anything that can be executed Note: Memory barrier does not
mean quite the same thing as GL’s
- Includes synchronization and memory barriers memory barrier, though there is
some relation.
• Execution Dependency
- Operations waiting on other operations
- All synchronization expresses these
• Memory Barrier
- Flush/invalidate caches
- Determination of access and visibility
• Memory Dependency
- Execution dependency involving a Memory Barrier
© Copyright Khronos Group 2016 - Page 167
Synchronization Types
• 3 types of explicit synchronization in Vulkan
• Pipeline Barriers, Events and Subpass Dependencies
- Within a queue
- Explicit memory dependencies
• Semaphores
- Between Queues
• Fences
- Whole queue operations to CPU OpenGL has just two, very coarse
synchronization primitives: memory
barriers and fences. They are
loosely similar to the equivalently
named concepts in Vulkan
© Copyright Khronos Group 2016 - Page 168
Pipeline Barriers
• Pipeline Barriers void vkCmdPipelineBarrier(
VkCommandBuffer commandBuffer,
- Precise set of pipeline stages VkPipelineStageFlags srcStageMask,
- Memory Barriers to execute VkPipelineStageFlags dstStageMask,
- Single point in time VkDependencyFlags dependencyFlags,
uint32_t memoryBarrierCount,
const VkMemoryBarrier* pMemoryBarriers,
uint32_t bufferMemoryBarrierCount,
const VkBufferMemoryBarrier* pBufferMemoryBarriers,
uint32_t imageMemoryBarrierCount,
const VkImageMemoryBarrier* pImageMemoryBarriers);
Executing a pipeline barrier is
roughly equivalent to a
glMemoryBarrier call, though with
much more control.
© Copyright Khronos Group 2016 - Page 169
Events
• Events void vkCmdSetEvent(
VkCommandBuffer commandBuffer,
- Same info as Pipeline Barriers VkEvent event,
- …but operate over a range VkPipelineStageFlags stageMask);
void vkCmdResetEvent(
VkCommandBuffer commandBuffer,
VkEvent event,
VkPipelineStageFlags stageMask);
void vkCmdWaitEvents(
VkCommandBuffer commandBuffer,
uint32_t eventCount,
const VkEvent* pEvents,
VkPipelineStageFlags srcStageMask,
VkPipelineStageFlags dstStageMask,
uint32_t memoryBarrierCount,
const VkMemoryBarrier* pMemoryBarriers,
uint32_t bufferMemoryBarrierCount,
const VkBufferMemoryBarrier* pBufferMemoryBarriers,
uint32_t imageMemoryBarrierCount,
const VkImageMemoryBarrier* pImageMemoryBarriers);
© Copyright Khronos Group 2016 - Page 170
Events
• Events VkResult vkSetEvent(
VkDevice device,
- Same info as Pipeline Barriers VkEvent event);
- …but operate over a range
VkResult vkResetEvent(
VkDevice device,
• CPU interaction VkEvent event);
- No explicit CPU wait
- No Memory Barriers VkResult vkGetEventStatus(
VkDevice device,
VkEvent event);
© Copyright Khronos Group 2016 - Page 171
Events
• Events VkResult vkSetEvent(
VkDevice device,
- Same info as Pipeline Barriers VkEvent event);
- …but operate over a range
VkResult vkResetEvent(
VkDevice device,
• CPU interaction VkEvent event);
- No explicit CPU wait
- No Memory Barriers VkResult vkGetEventStatus(
VkDevice device,
VkEvent event);
• Warning!
- OS may apply a timeout
- Set events soon after submission
- Could you just defer submission?
© Copyright Khronos Group 2016 - Page 172
Pipeline Barriers vs Events
• Use pipeline barriers for point synchronization
- Dependant operation immediately precedes operation that depends on it
- May be more optimal than set/wait event pair
• Use events if other work possible between two operations
- Set immediately after the dependant operation
- Wait immediately before the operation that depends on it
• Use events for CPU/GPU synchronization
- Memory accesses between processors
- Late latching of data to reduce latency
© Copyright Khronos Group 2016 - Page 173
Memory Barrier Types
• Global Memory Barrier
- All memory-backed resources OpenGL’s memory barriers imply
execution dependencies, which
Vulkan memory barriers do not –
• Buffer Barrier execution barriers are provided by
a pipeline barrier, event or subpass
- For a single buffer range dependency.
• Image Barrier
- For a single image subresource range
© Copyright Khronos Group 2016 - Page 174
Global Memory Barriers
• Global Memory Barriers typedef struct VkMemoryBarrier {
VkStructureType sType;
- All memory used by accessed stages const void* pNext;
- Effectively flushes entire caches VkAccessFlags srcAccessMask;
VkAccessFlags dstAccessMask;
} VkMemoryBarrier;
• Use when many resources transition
- Cheaper than one-by-one
- Don’t transition unnecessarily!
• User must define prior access
- Driver not tracking for you
© Copyright Khronos Group 2016 - Page 175
Buffer Barriers
• Buffer Barriers typedef struct VkBufferMemoryBarrier {
VkStructureType sType;
- A single buffer range const void* pNext;
- Defines access stages VkAccessFlags srcAccessMask;
- Defines queue ownership VkAccessFlags dstAccessMask;
uint32_t srcQueueFamilyIndex;
uint32_t dstQueueFamilyIndex;
• User must define prior access VkBuffer buffer;
VkDeviceSize offset;
- Driver not tracking for you VkDeviceSize size;
} VkBufferMemoryBarrier;
© Copyright Khronos Group 2016 - Page 176
Image Barriers
• Image Barriers typedef struct VkImageMemoryBarrier {
VkStructureType sType;
- A single image subresource range const void* pNext;
- Defines access stages VkAccessFlags srcAccessMask;
- Defines queue ownership VkAccessFlags dstAccessMask;
VkImageLayout oldLayout;
- Defines image layout VkImageLayout newLayout;
uint32_t srcQueueFamilyIndex;
uint32_t dstQueueFamilyIndex;
• User must define prior access VkImage image;
- Driver not tracking for you VkImageSubresourceRange subresourceRange;
- For images, this includes prior layout } VkImageMemoryBarrier;
• Appropriate layouts allow compression
- GPU may use image compression
- Saves bandwidth
- Use GENERAL instead of switching
frequently
© Copyright Khronos Group 2016 - Page 177
Subpass Dependencies
• Subpass dependencies typedef struct VkSubpassDependency {
uint32_t srcSubpass;
- Similar info to Pipeline Barriers uint32_t dstSubpass;
- Explicitly between two subpasses VkPipelineStageFlags srcStageMask;
VkPipelineStageFlags dstStageMask;
VkAccessFlags srcAccessMask;
• Memory barriers VkAccessFlags dstAccessMask;
- Implicit for attachments VkDependencyFlags dependencyFlags;
} VkSubpassDependency;
- Explicit for other resources
• Pixel local dependencies
- Same fragment/sample location
- Cheap for most implementations
- Use region dependency flag:
- VK_DEPENDENCY_BY_REGION_BIT
© Copyright Khronos Group 2016 - Page 178
Subpass Dependencies
• Subpass self-dependencies typedef struct VkSubpassDependency {
uint32_t srcSubpass;
- Subpasses can wait on themselves uint32_t dstSubpass;
- A pipeline barrier in the subpass VkPipelineStageFlags srcStageMask;
VkPipelineStageFlags dstStageMask;
VkAccessFlags srcAccessMask;
• Forward progress only VkAccessFlags dstAccessMask;
- Can’t wait on later stages VkDependencyFlags dependencyFlags;
} VkSubpassDependency;
- Must wait on earlier or same stage
void vkCmdPipelineBarrier(
VkCommandBuffer commandBuffer,
• Pixel local only between fragments VkPipelineStageFlags srcStageMask,
- Must use flag: VkPipelineStageFlags dstStageMask,
VkDependencyFlags dependencyFlags,
- VK_DEPENDENCY_BY_REGION_BIT
uint32_t memoryBarrierCount,
const VkMemoryBarrier* pMemoryBarriers,
uint32_t bufferMemoryBarrierCount,
const VkBufferMemoryBarrier* pBufferMemoryBarriers,
uint32_t imageMemoryBarrierCount,
const VkImageMemoryBarrier* pImageMemoryBarriers);
© Copyright Khronos Group 2016 - Page 179
Subpass Dependencies
• Subpass external dependencies typedef struct VkSubpassDependency {
uint32_t srcSubpass;
- Wait on ‘external’ operations uint32_t dstSubpass;
- vkCmdWaitEvent in the subpass VkPipelineStageFlags srcStageMask;
- Events set outside the render pass VkPipelineStageFlags dstStageMask;
VkAccessFlags srcAccessMask;
VkAccessFlags dstAccessMask;
VkDependencyFlags dependencyFlags;
} VkSubpassDependency;
void vkCmdWaitEvents(
VkCommandBuffer commandBuffer,
uint32_t eventCount,
const VkEvent* pEvents,
VkPipelineStageFlags srcStageMask,
VkPipelineStageFlags dstStageMask,
uint32_t memoryBarrierCount,
const VkMemoryBarrier* pMemoryBarriers,
uint32_t bufferMemoryBarrierCount,
const VkBufferMemoryBarrier* pBufferMemoryBarriers,
uint32_t imageMemoryBarrierCount,
const VkImageMemoryBarrier* pImageMemoryBarriers);
© Copyright Khronos Group 2016 - Page 180
Example – Texture Upload
// Transition the buffer from host write to transfer read
[Link] = VK_ACCESS_HOST_WRITE_BIT;
[Link] = VK_ACCESS_TRANSFER_READ_BIT;
// Transition the image to transfer destination
[Link] = 0;
[Link] = VK_ACCESS_TRANSFER_WRITE_BIT;
[Link] = VK_IMAGE_LAYOUT_UNDEFINED;
[Link] = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_HOST_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, &bufferBarrier,
&imageBarrier);
vkCmdCopyBufferToImage(commandBuffer, srcBuffer, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, ©);
// Transition the image from transfer destination to shader read
[Link] = VK_ACCESS_TRANSFER_WRITE_BIT;
[Link] = VK_ACCESS_SHADER_READ_BIT;
[Link] = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
[Link] = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
&imageBarrier);
© Copyright Khronos Group 2016 - Page 181
Example – Compute to Draw Indirect
// Add a subpass dependency to express the wait on an external event
[Link] = VK_SUBPASS_EXTERNAL;
[Link] = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;
[Link] = VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT;
[Link] = VK_ACCESS_SHADER_WRITE_BIT;
[Link] = VK_ACCESS_INDIRECT_COMMAND_READ_BIT;
// Dispatch a compute shader that generates indirect command structures
vkCmdDispatch(...);
// Set an event that can be later waited on (same source stage).
vkCmdSetEvent(commandBuffer, event, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT);
vkCmdBeginRenderPass(...);
//Transition the buffer from shader write to indirect command
[Link] = VK_ACCESS_SHADER_WRITE_BIT;
[Link] = VK_ACCESS_INDIRECT_COMMAND_READ_BIT;
[Link] = indirectBuffer;
vkCmdWaitEvent(commandBuffer, event, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT,
&bufferBarrier);
vkCmdDrawIndirect(commandBuffer, indirectBuffer, ...);
© Copyright Khronos Group 2016 - Page 182
Semaphores
• Semaphores typedef struct VkSubmitInfo {
VkStructureType sType;
- Used to synchronize queues const void* pNext;
- Not necessary for single-queue uint32_t waitSemaphoreCount;
const VkSemaphore* pWaitSemaphores;
const VkPipelineStageFlags* pWaitDstStageMask;
• Fairly coarse grain uint32_t commandBufferCount;
- Per submission batch const VkCommandBuffer* pCommandBuffers;
uint32_t signalSemaphoreCount;
- E.g. a set of command buffers const VkSemaphore* pSignalSemaphores;
- Multiple per submit command } VkSubmitInfo;
• Implicit memory guarantees
- Effects visible to future operations on
the same device
- Not guaranteed visible to host
© Copyright Khronos Group 2016 - Page 183
Example – Acquire and Present
// Acquire an image. Pass in a semaphore to be signalled
vkAcquireNextImageKHR(device, swapchain, UINT64_MAX, acquireSemaphore, VK_NULL_HANDLE, &imageIndex);
// Submit command buffers
[Link] = 1;
[Link] = &acquireSemaphore;
[Link] = 1;
[Link] = &commandBuffer;
[Link] = 1;
[Link] = &graphicsSemaphore;
vkQueueSubmit(graphicsQueue, 1, &submitInfo, fence);
// Present images to the display
[Link] = 1;
[Link] = &graphicsSemaphore;
[Link] = 1;
[Link] = &swapchain;
[Link] = &imageIndex;
vkQueuePresent(presentQueue, &presentInfo);
© Copyright Khronos Group 2016 - Page 184
Example – Acquire and Present (same queue)
// Acquire an image. Pass in a semaphore to be signalled
vkAcquireNextImageKHR(device, swapchain, UINT64_MAX, acquireSemaphore, VK_NULL_HANDLE, &imageIndex);
// Submit command buffers
[Link] = 1;
[Link] = &acquireSemaphore;
[Link] = 1;
[Link] = &commandBuffer;
[Link] = 0;
vkQueueSubmit(universalQueue, 1, &submitInfo, fence);
// Present images to the display
[Link] = 0;
[Link] = 1;
[Link] = &swapchain;
[Link] = &imageIndex;
vkQueuePresent(universalQueue, &presentInfo);
© Copyright Khronos Group 2016 - Page 185
Fences
• Fences VkResult vkQueueSubmit(
VkQueue queue,
- Used to synchronize queue to CPU uint32_t submitCount,
const VkSubmitInfo* pSubmits,
VkFence fence);
• Very coarse grain
- Per queue submit command VkResult vkResetFences(
VkDevice device,
uint32_t fenceCount,
• Implicit memory guarantees const VkFence* pFences);
- Effects visible to future operations on VkResult vkGetFenceStatus(
the same device VkDevice device,
- Not guaranteed visible to host VkFence fence);
VkResult vkWaitForFences(
VkDevice device,
GL’s fences are like a combination
uint32_t fenceCount,
of a semaphore and a fence in
const VkFence* pFences,
Vulkan – they can synchronize GPU
VkBool32 waitAll,
and CPU in multiple ways at a
uint64_t timeout);
coarse granularity.
© Copyright Khronos Group 2016 - Page 186
Example – Multi-buffering
// Have enough resources and fences to have one per in-flight-frame, usually the swapchain image count
VkBuffer buffers[swapchainImageCount];
VkFence fence[swapchainImageCount];
// Can use the index from the presentation engine - 1:1 mapping between swapchain images and resources
vkAcquireNextImageKHR(device, swapchain, UINT64_MAX, semaphore, VK_NULL_HANDLE, &nextIndex);
// Make absolutely sure that the work has completed
vkWaitForFences(device, 1, &fence[nextIndex], true, UINT64_MAX);
// Reset the fences we waited on, so they can be re-used
vkResetFences(device, 1, &fence[nextIndex]);
// Change the data in your per-frame resources (with appropriate events/barriers!)
...
// Submit any work to the queue, with those fences being re-used for the next time around
vkQueueSubmit(graphicsQueue, 1, &sSubmitInfo, fence[nextIndex]);
© Copyright Khronos Group 2016 - Page 187
Wait Idle
• Ensures execution completes VkResult vkQueueSubmit(
VkQueue queue,
- VERY heavy-weight uint32_t submitCount,
const VkSubmitInfo* pSubmits,
VkFence fence);
• vkQueueWaitIdle
- Wait for queue operations to finish VkResult vkResetFences(
- Equivalent to waiting on a fence VkDevice device,
uint32_t fenceCount,
const VkFence* pFences);
• vkDeviceWaitIdle VkResult vkGetFenceStatus(
- Waits for device operations to finish VkDevice device,
- Includes vkQueueWaitIdle for queues VkFence fence);
VkResult vkWaitForFences(
VkDevice device,
These are a lot like glFinish, and uint32_t fenceCount,
should be treated similarly – use const VkFence* pFences,
them VERY SPARINGLY. VkBool32 waitAll,
uint64_t timeout);
© Copyright Khronos Group 2016 - Page 188
Wait Idle
• Useful primarily at teardown
- Use it to quickly ensure all work is done
• Favour other synchronization at all other times
- Extremely heavyweight, will cause serialization!
© Copyright Khronos Group 2016 - Page 189
Programmer Guidelines
• Specify EXACTLY the right amount of synchronization
- Too much and you risk starving your GPU
- Miss any and your GPU will bite you
• Use the validation layers to help!
- Won’t catch everything yet, but improving over time
• Pay particular attention to the pipeline stages
- Fiddly but become intuitive as you use them
• Consider Image Layouts
- If your GPU can save bandwidth it will
• Different behaviour depending on implementation
- Test/Tune on every platform you can find!
© Copyright Khronos Group 2016 - Page 190
Keep your GPU fed without getting bitten!
Questions?
© Copyright Khronos Group 2016 - Page 191
Swapchains Unchained!
(What you need to know about Vulkan WSI)
Alon Or-bach, Chair, Vulkan System
Integration Sub-Group – May 2016
@alonorbach (disclaimers apply!)
© Copyright Khronos Group 2016 - Page 193
Intro to Vulkan Window System Integration
• Explicit control for acquisition and
presentation of images WSI Jargon Buster
- Designed to fit the Vulkan API and today’s
• Platform
compositing window systems Our terminology for an OS
• Not all extensions are supported by every / window system e.g.
platform Android, Windows,
- You MUST check and enable the extensions Wayland, X11 via XCB
your app/engine uses!!! • Presentation Engine
The platform’s compositor
• Today’s presentation should help you get
or display engine
presentation working
• Application
- Learn how to present through a swapchain
Your app or game engine
- Overview of Vulkan objects used by the WSI
extensions
© Copyright Khronos Group 2016 - Page 194
How many WSI extensions are there?
• Two cross-platform instance extensions
- VK_KHR_surface
- VK_KHR_display
• Six (platform) instance extensions
- VK_KHR_android_surface
- VK_KHR_mir_surface
- VK_KHR_wayland_surface
- VK_KHR_win32_surface
- VK_KHR_xcb_surface
- VK_KHR_xlib_surface
• Two cross-platform device extensions
- VK_KHR_swapchain
- VK_KHR_display_swapchain
© Copyright Khronos Group 2016 - Page 195
Vulkan Surfaces
• VkSurfaceKHR Physical Device A
Queue
- Vulkan’s way to encapsulate a native Family 2
window / surface Queue
Family 1 Queue
Unlike an EGLSurface, creating a Family 0
Vulkan Surface doesn’t mean you’ve
got your render targets created …yet
Physical Device B
Platform X
• Platform-independent surface queries
- Find out crucial information about your Queue
Queue
Family 1 Platform Y
surface’s properties Family 0
- Such as format, transform, image usage
- Some platforms provide additional queries
• Presentation support is per queue family Physical Device C
Queue
- An implementation may support multiple Queue
Family 1
platforms e.g. both xlib and xcb Family 0
- Or may not support presentation at all
© Copyright Khronos Group 2016 - Page 196
Vulkan Swapchains: VK_KHR_swapchain
• Array of presentable images associated with
a surface const VkSwapchainCreateInfoKHR createInfo =
- Application requests a minimum number {VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR, // sType
of presentable images NULL, // pNext
0, // flags
- Implementation creates at least that mySurface, // surface
desiredNumberOfPresentableImages, // minImageCount
number surfaceFormat, // imageFormat
surfaceColorSpace, // imageColorSpace
- Implementation may have a limit myExtent, // imageExtent
1, // imageArrayLayers
• Upfront allocation of presentable images VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT, // imageUsage
VK_SHARING_MODE_EXCLUSIVE, // imageSharingMode
- No allocation hitching at crucial moment 0, // queueFamilyIndexCount
NULL, // pQueueFamilyIndices
- Pre-record fixed content command buffers [Link], // preTransform
VK_COMPOSITE_ALPHA_INHERIT_BIT_KHR, // compositeAlpha
• Present mode determines behavior swapchainPresentMode, // presentMode
VK_TRUE, // clipped
- FIFO support mandatory VK_NULL_HANDLE // oldSwapchain
};
- Platforms can offer mailbox,
immediate, FIFO relaxed FIFO is like eglSwapInterval = 1
Mailbox/Immediate is like eglSwapInterval 0
FIFO relaxed is like EXT_swap_control_tear
© Copyright Khronos Group 2016 - Page 197
Vulkan Swapchains: They’re good!
• Application knows which image within a
swapchain it is presenting Similar but neater than how
- Content of image preserved between EGL_KHR_partial_update /
EGL_EXT_buffer_age and preserved
presents behavior achieves this
• Application is responsible for explicitly
recreating swapchains - no surprises
- Platform informs app if current swapchain
- Suboptimal: e.g. after window resize,
swapchain still usable for present via image
scaling
- Surface Lost: swapchain no longer usable for
present
- Application is responsible to create a new
swapchain In EGL, the EGLSurface may be resized by the
platform after an eglSwapBuffers call.
Vulkan requires the application to intervene
© Copyright Khronos Group 2016 - Page 198
Vulkan Swapchains: They’re jolly good!
• Presenting and acquiring are separate
operations
- No need to submit a new image to acquire
another one, unless presentation engine
cannot release it
• Application must only modify presentable
images it has acquired
• Presentation engine must only display
presentable images that have been Stalls in frame loop
presented! are very bad!
In EGL, calling eglSwapBuffers both presents the
current back buffer and acquires a new one
Vulkan splits this up into separate operations
© Copyright Khronos Group 2016 - Page 199
Steps to setup your presentable images
1 – Create a native
window/surface Platform-specific APIs
2 – Create a Vulkan
surface
VK_KHR_<platform>_surface
3 – Query information
about your surface
VK_KHR_surface
4 – Create a Vulkan
swapchain
VK_KHR_swapchain
5 – Get your
presentable images
© Copyright Khronos Group 2016 - Page 200
Vulkan Frame Loop – as easy as 1-2-3!
0 – Create your
swapchain
1 – Acquire the next
3 – Present the image
presentable image
VK_KHR_swapchain
Legend
2 – Submit command Setup
buffer(s) for that image Steady-state
Response to suboptimal
/ surface_lost
© Copyright Khronos Group 2016 - Page 201
Vulkan Displays: VK_KHR_display Display 0
• Vulkan’s way to discover display devices Display
Display
(screens, panels) outside a window system Mode 1
Mode 0
- Reminder: Not supported on all platforms
Physical
• Defines VkDisplayKHR and Device
VkDisplayModeKHR objects
Plane 2
- Represent the display devices and the Plane 1
Plane 0
modes they support connected to a
VkPhysicalDevice
- Determine if a display supports multiple
planes that are blended together Surface Display 1
• Enables creation of a VkSurfaceKHR to Display
Display
Mode 1
represent a display plane Mode 0
A Vulkan display represents an actual display!
(Whereas an EGLDisplay is actually just a
connection to a driver – like a Vulkan Device)
© Copyright Khronos Group 2016 - Page 202
VK_KHR_display_swapchain
• Extends the information provided at vkQueuePresentKHR
- What region to present from the swapchain image
- What region to present to on the display
- Whether the display should persist the image
• Adds ability to create a shared swapchain
- Swapchain that takes multiple VkSwapchainCreateInfoKHR structs
- Allows multiple displays to be presented to simultaneously
- No guarantee that presents are atomic ...presently!
© Copyright Khronos Group 2016 - Page 203
Any question?
[Link]@[Link]
@alonorbach
© Copyright Khronos Group 2016 - Page 204
Moving To Vulkan
Asynchronous Compute
Chris Hebert, Dev Tech Software Engineer, Professional Visualization
Who am I?
Chris Hebert
@chrisjhebert
Dev Tech Software Engineer- Pro Vis
20 years in the industry
Joined NVIDIA in March 2015.
Real time graphics makes me happy
I also like helicopters
Chris Hebert - Circa 1974
206
NVIDIA/KHRONOS CONFIDENTIAL
• Some Context
Agenda • Sharing The Load
• Pipeline Barriers
207
NVIDIA/KHRONOS CONFIDENTIAL
Some Context
208
GPU Architecture
In a nutshell
NVIDIA Maxwell 2
Register File
Core
Load Store Unit
209
Execution Model SMM
Thread Hierarchies
Logical View HW View
32 threads
32 threads
32 threads
32 threads
Work Group Warps
210
Resource Partitioning
Resources Are Limited
Key resources impacting local execution:
• Program Counters
• Registers
• Shared Memory
211
Resource Partitioning
Resources Are Limited
Key resources impacting local execution:
• Program Counters Partitioned amongst threads
• Registers
• Shared Memory
Partitioned amongst work groups
212
Resource Partitioning
Resources Are Limited
Key resources impacting local execution:
• Program Counters Partitioned amongst threads
• Registers
• Shared Memory
Partitioned amongst work groups
e.g. GTX 980 ti
64k 32bit registers per SM
96kb shared memory per SM
213
Resource Partitioning
Registers
The more registers used by a kernel means few resident warps on the SM
Fewer Registers More Registers
More Threads Fewer Threads
214
Resource Partitioning
Shared Memory
The more shared memory used by a work group means fewer work groups on the SM
Less SMEM More SMEM
More Groups Fewer Groups
215
Keeping It Moving
Occupancy
• Some small kernels may have low occupancy
• Depending on the algorithm
• Compute resources are limited
• Shared across threads or work groups on a per SM basis
• Warps stall when they have to wait for resources
• This latency can be hidden
• If there are other warps ready to execute.
216
Keeping It Moving
Occupancy – Simple Theoretical Example
• Simple kernel that updates positions of 20480 particles
• 1 FMAD - ~20 cycles (instruction latency)
• 20480 particles = 640 warps
• To hide this latency, according to Littles Law
• Required Warps = Latency x Throughput
• Throughput should be 32 threads * 16 sms = 512 to keep GPU busy
• Required warps is 20*512 = 10240
• ….oh….
217
Keeping It Moving
Occupancy – Simple Theoretical Example
• Simple kernel that updates positions of 20480 particles
• 1 FMAD - ~20 cycles (instruction latency)
• 20480 particles = 640 warps
• To hide this latency, according to Littles Law – But only on 1 SM..
• Required Warps = Latency x Throughput
• Throughput should be 32 threads * 1 sm = 32 to keep GPU busy
• Required warps is 20*32 = 640
• And we theoretically have 15 SMs to use for other stuff.
218
Queuing It Up
Working with 1 Queue • Scheduler will distribute work across all SMs
• kernels execute in sequence
Command Buffer (there may be some overlap)
Command Buffer
Command Buffer
• Low occupancy kernels will waste GPU time
Command Buffer
Kernel Kernel Kernel
Command Queue
Command Buffer
Transfers
219
NVIDIA/KHRONOS CONFIDENTIAL
Sharing The Load
220
Queuing It Up
Working with N Queues
Command Buffer
• NVIDIA hardware gives you 16 all powerful queues
Command Buffer
Command Buffer • 1 Queue family that supports all operations
Command Buffer
• 16 queues available for use
Command Queue #1 Kernel Kernel Kernel
Command Queue #2 Kernel Kernel Kernel
Command Queue #3 Kernel Kernel Kernel
221
Queuing It Up
Working with N Queues
Command Buffer
• Application decides which queues for which kernels
Command Buffer
Command Buffer • Load balance for best performance
Command Buffer
• Profile (Nsight) to gain insights
Command Queue #1 Kernel Kernel Kernel
Command Queue #2 Kernel Kernel Kernel
Command Queue #3 Kernel Kernel Kernel
222
Queuing It Up
Compute and Graphics In Harmony
• Some hardware can even run compute and graphics work concurrently
• Needs fast context switching and at high granularity (not just at draw commands)
• Simple Graphics work tends to have high occupancy
• Complex graphics work can reduce occupancy
• Profile for performance insights
223
Queuing It Up
Compute and Graphics In Harmony
Compute Cmd Buffer • Profile to understand occupancy of both graphics and compute workloads
Compute Cmd Buffer
Graphics Cmd Buffer
• Queues can support both compute and graphics
Compute Cmd Buffer
Command Queue #1 Kernel Kernel Kernel
Command Queue #2 Kernel Kernel Kernel
Command Queue #3 Kernel Kernel Kernel
224
An Example
Compute and Graphics In Harmony
Free Surface Navier Stokes Solver
• 11 Compute Kernels
• 4 Shaders
Click here to view this video
• The output of each kernel is the input to the next
• Some kernels have very low occupancy
• Still opportunities for concurrency with compute
225
An Example
Many discretized operations are separable
Process X Axis Process Y Axis
(and half the Z) (and other half of Z)
Examples
Command Queue Command Queue • Fluid Sims
• Gaussian Blurs
• Convolution Kernels
SM SM SM SM SM SM SM SM
SM SM SM SM SM SM SM SM
Driver handles dispatching groups
Semaphore Semaphore Use semaphores to synchronize
226
An Example
Compute and graphics run concurrently
Compute Graphics
Compute Work Graphics Work Frame N
Command Queue Command Queue Frame
Frame N
N+1
Frame Frame
N+2 N+1
SM SM SM SM SM SM SM SM
Frame Frame
N+3 N+2
SM SM SM SM SM SM SM SM
Frame Frame
N+4 N+3
Semaphore
227
An Example
Putting it all together
Compute Graphics
Process X Axis Process Y Axis Frame N
Graphics Work
(and half the Z) (and other half of Z)
Command Queue Command Queue Command Queue Frame
Frame N
N+1
Frame Frame
N+2 N+1
SM SM SM SM SM SM SM SM
Frame Frame
N+3 N+2
SM SM SM SM SM SM SM SM
Frame Frame
N+4 N+3
Semaphore Semaphore
228
Memory Transfers
More opportunity for concurrency
• Memory transfers are handle by MMU
• Can run concurrently with Kernels
• As long as the current kernel isnt using the memory
MMU may be idle
Why do this?
Command Queue #1 Kernel Transfer Kernel Transfer Kernel
ALUs may be idle
229
Memory Transfers Examples
More opportunity for concurrency • Large image processing
• Video processing
When you can do this
• DtoH and HtoD transfers can run concurrently
Host to Device Queue Transfer Transfer Transfer
Compute Queue Kernel Kernel Kernel
Device to Host Queue Transfer Transfer Transfer
230
NVIDIA/KHRONOS CONFIDENTIAL
Conclusion
Takeaways
There is more than 1 queue available
Keep registers and shared memory to a minimum
Low occupancy leads to an under utilized GPU
Maximize GPU utilization by running kernels concurrently
Profile to understand the occupancy profiles of kernels and shaders
Some hardware can run kernels AND shaders concurrently
Use Semaphores to synchronize between queues
Be sensible at the beer festival
231
NVIDIA/KHRONOS CONFIDENTIAL
Thank You Enjoy Vulkan!!
232
Questions?
Chris Hebert, Dev Tech Software Engineer, Professional Visualization
Porting to Vulkan
Hans-Kristian Arntzen
Engineer, ARM
(Credit for slides: Marius Bjørge)
© Copyright Khronos Group 2016 - Page 234
Agenda
• API flashback
• Engine design
- Command buffers
- Pipelines
- Render passes
- Memory management
© Copyright Khronos Group 2016 - Page 235
API Flashback
Application
Application
Logic shift
Driver
Driver
© Copyright Khronos Group 2016 - Page 236
API Flashback
vkDevice
vkQueue vkCommandPool
vkCommandBuffer
vkCmdBindDescrip
vkBeginRenderPass vkCmdBindXXX vkCmdBindPipeline vkCmdDraw vkEndRenderPass
torSets
vkRenderPass vkBuffer vkPipeline vkDescriptorSet
State vkBufferView
vkFramebuffer Shaders vkImageView
vkImageView vkRenderPass vkSampler
vkDeviceMemory vkDeviceMemory vkDescriptorPool
Heap
© Copyright Khronos Group 2016 - Page 237
Porting from OpenGL to Vulkan?
• Most graphics engines today are designed around the principles of implicit driver
behaviour
- A direct port to Vulkan won’t necessarily give you a lot of benefits
• Approach it differently
- Re-design for Vulkan, and then port that to OpenGL
© Copyright Khronos Group 2016 - Page 238
Allocating Memory
• Memory is first allocated and then bound to Vulkan objects
- Different Vulkan objects may have different memory requirements
- Allows for aliasing memory across different Vulkan objects
• Driver does no ref counting of any objects in Vulkan
- Cannot free memory until you are sure it is never going to be used again
- Also applies to API handles!
• Most of the memory allocated during run-time is transient
- Allocate, write and use in the same frame
- Block based memory allocator
© Copyright Khronos Group 2016 - Page 239
Block Based Memory Allocator
• Relaxes memory reference counting
• Only entire blocks are freed/recycled
• Sub-allocations take refcount on block
© Copyright Khronos Group 2016 - Page 240
Command Buffers
• Request command buffers on the fly
- Allocated using ONE_TIME_SUBMIT_BIT
- Recycled
• Separate command pools per
- Thread
- Frame
- Primary/secondary
© Copyright Khronos Group 2016 - Page 241
Secondary Command Buffers
vkCommandPool vkCommandBuffer
Main thread
vkBeginRenderPass vkCmdExecuteCommands vkEndRenderPass
Thread 0 vkCommandPool Secondary command buffer
Thread 1 vkCommandPool Secondary command buffer
Thread 2 vkCommandPool Secondary command buffer
© Copyright Khronos Group 2016 - Page 242
Shaders
• Standardize on SPIR-V binary shaders
• Extensively use the Khronos SPIRV-Cross library
- Cross compiling back to GLSL
- Provides shader reflection for
- Vertex attributes
- Subpass attachments
- Pipeline layouts
- Push constants
© Copyright Khronos Group 2016 - Page 243
Pipelines
Pipeline state
Dynamic state Shaders Render pass
Blend State Pipeline layout
Rasterizer state Vertex input
Depth/stencil state Input assembly
© Copyright Khronos Group 2016 - Page 244
Pipelines
• Not trivial to create all required pipeline state objects upfront
Public interface
• Our approach: SetRenderState()
- Keep track of all pipeline state per command buffer
- Flush pipeline creation when required SetShaders()
- In our case this is implemented as an async operation SetVertexBuffer()
SetIndexBuffer()
Command
Draw() Buffer Internal
Flush
RequestPipeline
CreateNewPipeline
© Copyright Khronos Group 2016 - Page 245
Pipelines
• In an ideal world…
- All pipeline combinations should be created upfront
• …but this requires detailed knowledge of every potential shader/state combination that
you might have in your scene
- As an example, one of our fragment shaders have ~9000 combinations
- Every one of these shaders can use different render state
- We also have to make sure the pipelines are bound to compatible render passes
- An explosion of combinations!
© Copyright Khronos Group 2016 - Page 246
Pipeline cache
• Vulkan has built-in support for pipeline caching
- Store to disk and re-use on next run
• Can also speed up pipeline creation during run-time
- If the pipeline state is already in the cache it can be re-used
Pipeline state
Dynamic state Shaders Render pass
Blend State Pipeline layout
Rasterizer state Vertex input
Depth/stencil state Input assembly
vkPipelineCache
Disk
© Copyright Khronos Group 2016 - Page 247
Pipeline layout
• Defines what kind of resources are in each binding slot in your shaders
- Textures, samplers, buffers, push constants, etc
• Can be shared among different pipeline objects
© Copyright Khronos Group 2016 - Page 248
Pipeline layout
• Use SPIRV-Cross to automatically get binding information from SPIR-V shaders
SPIR-V shader
SPIRV-cross Pipeline layout
Descriptor set layout
Push constant range
© Copyright Khronos Group 2016 - Page 249
Descriptor Sets
• Textures, uniform buffers, etc. are bound to shaders in descriptor sets
- Hierarchical invalidation
- Order descriptor sets by update frequency
• Ideally all descriptors are pre-baked during level load
- Keep track of low level descriptor sets per material
- But, this is not trivial
© Copyright Khronos Group 2016 - Page 250
Descriptor Sets
• Our solution:
- Keep track of bindings and update descriptor sets when necessary
- Keep cache of descriptor sets used with immutable Vulkan objects
Public interface
SetShaders()
SetConstantData()
SetTexture()
Draw() Internal
Command Request cached
Buffer descriptor sets
Allocate descriptor sets Descriptor pool
Write descriptor sets Descriptor set layouts
BindDescriptorSets
© Copyright Khronos Group 2016 - Page 251
Descriptor Set emulation
• We also need to support this in OpenGL
• Our solution:
- Emulate descriptor sets in our OpenGL backend
- SPIRV-Cross collapses and serializes bindings
© Copyright Khronos Group 2016 - Page 252
Descriptor Set emulation
Shader
Set 0 Set 1 Set 2
0 GlobalVSData 0 MeshData 0 MaterialData
1 GlobalFSData 1 TexAlbedo
2 TexNormal
3 TexEnvmap
SPIR-V library to GLSL
Uniform block bindings Texture bindings
0 GlobalVSData 0 TexAlbedo
1 GlobalFSData 1 TexNormal
2 MeshData 2 TexEnvmap
© Copyright Khronos Group 2016 - Page 253
Push Constants
• Push constants replace non-opaque uniforms
- Think of them as small, fast-access uniform buffer memory
• Update in Vulkan with vkCmdPushConstants
• Directly mapped to registers on Mali GPUs
// New
layout(push_constant, std430) uniform PushConstants {
mat4 MVP;
vec4 MaterialData;
} RegisterMapped;
// Old, no longer supported in Vulkan GLSL
uniform mat4 MVP;
uniform vec4 MaterialData;
© Copyright Khronos Group 2016 - Page 254
Push Constant Emulation
• But again, we need to support OpenGL as well
• Our solution:
- Use SPIRV-Cross to turn push constants into regular non-opaque uniforms
- Logic in our OpenGL/Vulkan backends redirect the push constant data appropriately
© Copyright Khronos Group 2016 - Page 255
Render pass
• Used to denote beginning and end of rendering to a framebuffer
• Can be re-used but must be compatible
- Attachments: Framebuffer format, image layout, MSAA?
- Subpasses DepthStencil
- Attachment load/store Color targets
Public interface
BeginRenderPass Internal
RequestFramebuffer
RequestRenderPass
Command CreateCompatibleRend
Buffer erPass
CreateFramebuffer
BeginRenderPass
© Copyright Khronos Group 2016 - Page 256
Subpass Inputs
• Vulkan supports subpasses within render passes
• Standardized GL_EXT_shader_pixel_local_storage!
• Also useful for desktop GPUs
// GLSL
#extension GL_EXT_shader_pixel_local_storage : require
__pixel_local_inEXT GBuffer {
layout(rgba8) vec4 albedo;
layout(rgba8) vec4 normal;
...
} pls;
// Vulkan
layout(input_attachment_index = 0) uniform subpassInput albedo;
layout(input_attachment_index = 1) uniform subpassInput normal;
...
© Copyright Khronos Group 2016 - Page 257
Subpass Input Emulation
• Supporting subpasses in GL is not trivial, and probably not feasible on a lot of
implementations
• Our solution:
- Use SPIRV-Cross to rewrite subpass inputs to Pixel Local Storage variables or texture
lookups
- This will only support a subset of the Vulkan subpass features, but good enough for our
current use
© Copyright Khronos Group 2016 - Page 258
Synchronization
• Submitted work is completed out of order by the GPU
• Dependencies must be tracked by the application and handled explicitly
- Using output from a previous render pass
- Using output from a compute shader
- Etc
• Synchronization primitives in Vulkan
- Pipeline barriers and events
- Fences
- Semaphores
© Copyright Khronos Group 2016 - Page 259
Render passes and pipeline barriers
• Most of the time the application knows upfront how the output of a renderpass is going to
be used afterwards
• Internally we have a couple of usage flags that we assign to a render pass
- On EndRenderPass we implicitly trigger a pipeline barrier
Public interface
BeginRenderPass
Render pass usage flags
DrawSomething Pipeline stages?
Memory domains?
EndRenderPass
Command
Buffer Internal
vkCmdEndRenderPass
vkCmdPipelineBarrier
© Copyright Khronos Group 2016 - Page 260
Image Layout Transitions
• Must match how the image is used at any time
• Pedantic or relaxed
- Some implementations will require careful tracking of previous and new layout to achieve
optimal performance
- For Mali we can be quite relaxed with this – most of the time we can keep the image
layout as VK_IMAGE_LAYOUT_GENERAL
© Copyright Khronos Group 2016 - Page 261
Summary
• Don’t allocate or release during runtime
• Batching still applies
• Multi-thread your code!
• Use push-constants as much as possible
• Multi-pass is fantastic on mobile GPUs
© Copyright Khronos Group 2016 - Page 262