AMD Intermediate Language (IL) Specification v2
AMD Intermediate Language (IL) Specification v2
O c t o b e r 2 0 11 v. 2.4
doc_rev 2.4
2011 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated Parallel Processing logo, ATI, the ATI logo, Radeon, FireStream, FirePro, Catalyst, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Microsoft, Visual Studio, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the U.S. and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (AMD) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. Except as set forth in AMDs Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMDs products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMDs product could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice.
Advanced Micro Devices, Inc. One AMD Place P.O. Box 3453 Sunnyvale, CA 94088-3453 www.amd.com
ii
Contents
Contents Preface Chapter 1 Overview 1.1 1.2 1.3 1.4 Chapter 2 Open Design ..................................................................................................................................... 1-1 DirectX as a Design Basis .............................................................................................................. 1-1 Threading Model .............................................................................................................................. 1-2 Access Model for Local Shared Memory...................................................................................... 1-2
Binary Stream Format 2.1 2.2 IL Stream........................................................................................................................................... 2-1 IL Token Descriptions...................................................................................................................... 2-2 2.2.1 Language Token ...............................................................................................................2-2 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 2.2.7 2.2.8 Version Token ...................................................................................................................2-2 Opcode Token...................................................................................................................2-3 Destination Token ............................................................................................................2-3 Destination Modifier Token .............................................................................................2-4 Source Token....................................................................................................................2-5 Source Modifier Token.....................................................................................................2-7 Source Token Examples................................................................................................2-10
Chapter 3
Text Instruction Syntax 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Version .............................................................................................................................................. 3-1 Registers ........................................................................................................................................... 3-2 Control Specifiers ............................................................................................................................ 3-2 Destination Modifiers....................................................................................................................... 3-2 Write Mask ........................................................................................................................................ 3-3 Source Modifiers .............................................................................................................................. 3-3 Comments......................................................................................................................................... 3-4 Checkerboard Shader Example...................................................................................................... 3-4
Chapter 4
Shader Operations 4.1 4.2 4.3 Shader Requirements ...................................................................................................................... 4-1 Link Restrictions .............................................................................................................................. 4-2 Multi-Pass Shaders .......................................................................................................................... 4-2
1-iii
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
4.4 Chapter 5
Register Types
1-iv
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5.38 5.39 5.40 5.41 5.42 5.43 5.44 5.45 5.46 5.47 5.48 5.49 5.50 5.51 5.52 5.53 5.54 5.55 5.56 5.57 5.58 Chapter 6
vWINCOORD ................................................................................................................................... 5-29
Enumerated Types 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 ILAddressing .................................................................................................................................... 6-1 ILAnisoFilterMode ............................................................................................................................ 6-1 ILCmpValue....................................................................................................................................... 6-2 ILComponentSelect.......................................................................................................................... 6-2 ILDefaultVal....................................................................................................................................... 6-2 ILDivComp ........................................................................................................................................ 6-3 ILElementFormat .............................................................................................................................. 6-3 ILFirstBitType.................................................................................................................................... 6-3 ILImportComponent ......................................................................................................................... 6-4 ILImportUsage .................................................................................................................................. 6-5 ILInterpMode..................................................................................................................................... 6-7 ILLanguageType ............................................................................................................................... 6-7 ILLdsSharingMode ........................................................................................................................... 6-7 ILLoadStoreDataSize ....................................................................................................................... 6-8 ILLogicOp.......................................................................................................................................... 6-8 ILMatrix.............................................................................................................................................. 6-8 ILMipFilterMode................................................................................................................................ 6-8
1-v
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6.18 6.19 6.20 6.21 6.22 6.23 6.24 6.25 6.26 6.27 6.28 6.29 6.30 6.31 6.32 6.33 6.34 Chapter 7
ILModDstComponent ....................................................................................................................... 6-9 ILNoiseType ...................................................................................................................................... 6-9 ILOpcode........................................................................................................................................... 6-9 ILOutputTopology .......................................................................................................................... 6-17 ILPixTexUsage ................................................................................................................................ 6-18 ILRegType ....................................................................................................................................... 6-18 ILRelOp............................................................................................................................................ 6-18 ILShader .......................................................................................................................................... 6-19 ILShiftScale..................................................................................................................................... 6-19 ILTexCoordMode............................................................................................................................. 6-20 ILTexFilterMode .............................................................................................................................. 6-20 ILTexShadowMode ......................................................................................................................... 6-21 ILTopologyType .............................................................................................................................. 6-21 ILTsDomain ..................................................................................................................................... 6-21 ILTsOutputPrimitive........................................................................................................................ 6-22 ILTsPartition.................................................................................................................................... 6-22 ILZeroOp ......................................................................................................................................... 6-22
Instructions 7.1 7.2 Formats ............................................................................................................................................. 7-1 Instruction Notes ............................................................................................................................. 7-1 7.2.1 Notes on Comparison Instructions................................................................................7-1 7.2.2 7.2.3 7.2.4 7.2.5 7.2.6 7.2.7 7.2.8 7.2.9 7.2.10 7.2.11 7.2.12 7.2.13 7.2.14 7.3 7.4 7.5 7.6 7.7 7.8 Notes on Flow Control Instructions ..............................................................................7-2 Notes on Input/Output Instructions ...............................................................................7-3 Notes on Conversion Instructions.................................................................................7-5 Notes on Double Precision Instructions .......................................................................7-5 Notes on Arithmetic Instructions...................................................................................7-5 Notes on Shift Instructions.............................................................................................7-6 Notes on Simple 64-Bit Integer Instructions ................................................................7-6 Notes on Bit Operations .................................................................................................7-6 Note on LDS Memory Operations ..................................................................................7-6 Notes on GDS Memory Operations ...............................................................................7-8 Notes on UAV Memory Operations................................................................................7-9 Notes on Multi-Media Instructions...............................................................................7-10 Notes on Evergreen GPU Series Memory Controls...................................................7-10
Prefix Instruction ............................................................................................................................7-11 Flow Control Instructions ............................................................................................................. 7-13 Declaration and Initialization Instructions .................................................................................. 7-32 Input/Output Instructions.............................................................................................................. 7-68 Integer Arithmetic Instructions ...................................................................................................7-116 Unsigned Integer Operations ..................................................................................................... 7-131
1-vi
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7.9 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 7.18 7.19
Bit Operations .............................................................................................................................. 7-139 Conversion Instructions.............................................................................................................. 7-143 Float Instructions ......................................................................................................................... 7-148 Double-Precision Instructions .................................................................................................... 7-210 Multi-Media Instructions.............................................................................................................. 7-222 Miscellaneous Special Instructions ........................................................................................... 7-228 Evergreen GPU Series Memory Controls.................................................................................. 7-229 LDS Instructions .......................................................................................................................... 7-237 GDS Instructions.......................................................................................................................... 7-300 Virtual Function / Interface Support .......................................................................................... 7-331 Macro Processor Support........................................................................................................... 7-335
Appendix A Shadow Texture Loads Appendix B IL Enumerations Sequence Appendix C ASIC- and Model-Specific Restrictions C.1 C.2 C.3 IL_Opcode Restrictions...................................................................................................................C-1 ShaderModel Restrictions...............................................................................................................C-1 Instruction Restrictions...................................................................................................................C-6
1-vii
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
1-viii
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Tables
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 3.1 3.2 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20 6.21 6.22
IL Stream Instruction Packet Ordering ................................................................................. 2-1 IL_Lang: Source Language Information ............................................................................... 2-2 IL_Version: Source Version Information ............................................................................... 2-2 IL_Opcode: Instruction Opcode Details................................................................................ 2-3 IL_Dst: Destination Operand Information ............................................................................. 2-3 Instruction Modifiers.............................................................................................................. 2-4 IL_Dst_Mod: Destination Modification Information............................................................... 2-5 IL_Src: Source Operand Information.................................................................................... 2-6 IL_Src_Mod: Source Operand Modification Information ...................................................... 2-8 Modifiers and Types Relationship ...................................................................................... 2-10 Destination Modifiers ............................................................................................................ 3-3 Source Modifiers ................................................................................................................... 3-3 Registers Mapping: DX10 to IL ............................................................................................ 5-1 Mapping of DX10 Declaration Information to IL .................................................................. 5-2 Mapping of DX9 (3.0) Registers to IL Types ....................................................................... 5-2 Mapping of DX9 (2.0 and Lower) Registers to IL Types..................................................... 5-3 Special HOS-Related Fields ................................................................................................. 5-3 Mapping of DX10 and DX11 Compute Shader (CS) Registers to IL Types ....................... 5-3 Registers and Their Restrictions .......................................................................................... 5-4 IL Register Types Overview ................................................................................................. 5-5 ILAddressing Enumeration Types......................................................................................... 6-1 ILAnisoFilterMode Enumeration Types................................................................................. 6-1 ILCmpValue Enumeration Types .......................................................................................... 6-2 ILComponentSelect Enumeration Types .............................................................................. 6-2 ILDefaultVal Enumeration Types .......................................................................................... 6-2 ILDivComp Enumeration Types ............................................................................................ 6-3 ILElementFormat Enumeration Types .................................................................................. 6-3 IL_FIRSTBIT Enumeration Types......................................................................................... 6-3 ILImportComponent Enumeration Types .............................................................................. 6-4 ILImportUsage Enumeration Types ...................................................................................... 6-5 ILInterpolation Enumeration Types ....................................................................................... 6-7 ILLanguageType Enumeration Types................................................................................... 6-7 IL LDS Sharing Mode ........................................................................................................... 6-7 IL LOAD_STORE_DATA_SIZE............................................................................................. 6-8 ILLogicOp Enumeration Types ............................................................................................. 6-8 ILMatrix Enumeration Types ................................................................................................. 6-8 ILMipFilterMode Enumeration Types .................................................................................... 6-8 ILModDstComp Enumeration Types..................................................................................... 6-9 ILNoiseType Enumeration Types.......................................................................................... 6-9 ILOpcode Enumeration Types .............................................................................................. 6-9 IL_OUTPUT_TOPOLOGY Enumeration Types (Output from a Geometry Shader) ......... 6-17 ILPixTexUsage Enumeration Types.................................................................................... 6-18
1-ix
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6.23 6.24 6.25 6.26 6.27 6.28 6.29 6.30 6.31 6.32 6.33 C-1 C-2 C-3 C-4
ILRelOp Enumeration Types ............................................................................................... 6-18 ILShader Enumeration Types.............................................................................................. 6-19 ILShiftScale Enumeration Types ......................................................................................... 6-19 ILTexCoordMode Enumeration Types ................................................................................. 6-20 ILTexFilterMode Enumeration Types................................................................................... 6-20 ILTexShadowMode Enumeration Types.............................................................................. 6-21 IL_TOPOLOGY Enumeration Types (Input to a Geometry Shader).................................. 6-21 ILTsDomain Enumeration Types ......................................................................................... 6-21 ILTsOutputPrimitive Enumeration Types ............................................................................. 6-22 ILTsPartition Enumeration Types......................................................................................... 6-22 ILZeroOp Enumeration Types ............................................................................................. 6-22 Instructions for SM40 Only .................................................................................................. C-2 Registers for SM40 Only...................................................................................................... C-5 Instructions for SM30 Only .................................................................................................. C-5 Registers for SM30 Only...................................................................................................... C-5
1-x
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Preface
Audience
This document is intended for programmers writing application and system software, including operating systems, compilers, loaders, linkers, device drivers, and system utilities. It assumes an understanding of the AMD GPU processor microarchitecture and of programming practices for either graphics or generalpurpose computing.
Contact Information
To submit questions or comments about this document, contact our technical documentation staff at: [email protected]. For questions concerning AMD Accelerated Parallel Processing products, please email: [email protected]. For questions about developing with AMD Accelerated Parallel Processing, please submit a helpdesk request at AMD_Software_Developer_Help_Request. You can learn more about AMD Accelerated Parallel Processing at: http://www.amd.com/stream. We also have a growing community of AMD Accelerated Parallel Processing users! Come visit us at the AMD Accelerated Parallel Processing Developer Forum (http://www.amd.com/streamdevforum) to find out what applications other users are trying on their AMD Accelerated Parallel Processing products!
1-xi
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Organization
This document begins with an overview summarizing the similarities and differences between AMD Intermediate Language (IL) and general-purpose computer languages. It describes text and binary formats of the IL program instructions. Then, it describes the types of instructions in detail, presenting a high-level description of the instruction fields, and restrictions that must be observed. It also describes the instruction syntax for text representation. Further, it presents the specification of each type of instruction. A glossary of terms and acronyms ends the document.
Endian Order
The R600, R700, and Evergreen GPU architectures address memory and registers using little-endian byte-ordering and bit-ordering. Multi-byte values are stored with their least-significant (low-order) byte (LSB) at the lowest byte address; they are illustrated with their LSB at the right side. Byte values are stored with their least-significant (low-order) bit (lsb) at the lowest bit address; they are illustrated with their lsb at the right side.
Conventions
The following conventions are used in this document.
mono-spaced font * <> [1,2) [1,2] {x | y} 0.0 1011b 7:4 A filename, file path, or code. Any number of alphanumeric characters in the name of a microcode format, microcode parameter, or instruction. Angle brackets denote streams. A range that includes the left-most value (in this case, 1) but excludes the right-most value (in this case, 2). A range that includes both the left-most and right-most values (in this case, 1 and 2). One of the multiple options listed. In this case, x or y. A single-precision (32-bit) floating-point value. A binary value, in this example a 4-bit value. A bit range, from bit 7 to 4, inclusive. The high-order bit is shown first.
Terminology
In graphics applications, programs written for a GPU often are called shaders. In compute applications, similar programs usally are called kernels. This document defines a low-level language that can be used to write both shaders and kernels.
1-xii
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Related Documents
AMD, R600-Family Instruction Set Architecture, Sunnyvale, CA, 2008. This document includes the RV670 GPU instruction details. ISO/IEC 9899:TC2 - International Standard - Programming Languages - C Kernighan Brian W., and Ritchie, Dennis M., The C Programming Language, Prentice-Hall, Inc., Upper Saddle River, NJ, 1978. IEEE, 754-1985 IEEE Standard for Binary Floating-Point Arithmetic, 2003. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, Brook for GPUs: stream computing on graphics hardware, ACM Trans. Graph., vol. 23, no. 3, pp. 777786, 2004. Buck, Ian; Foley, Tim; Horn, Daniel; Sugerman, Jeremy; Hanrahan, Pat; Houston, Mike; Fatahalian, Kayvon. BrookGPU http://graphics.stanford.edu/projects/brookgpu/ Buck, Ian. Brook Spec v0.2. October 31, 2003. http://merrimac.stanford.edu/brook/brookspec-05-20-03.pdf OpenGL Programming Guide, at http://www.glprogramming.com/red/ Microsoft DirectX Reference Website, at http://msdn.microsoft.com/en-us/library/bb219740(VS.85).aspx Microsoft Programming Guide for HLSL, http://msdn2.microsoft.com/en-us/library/bb509635.aspx GPGPU: http://www.gpgpu.org, and Stanford BrookGPU discussion forum http://www.gpgpu.org/forums/
1-xiii
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
1-xiv
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Chapter 1 Overview
This document defines the format and behavior of the IL. The Intermediate Language (IL) is an abstract representation for hardware vertex, pixel, and geometry shaders, as well as compute kernels that can be taken as input by other modules implementing the IL. An IL compiler uses an IL shader or kernel in conjunction with driver state information to translate these shaders into hardware instructions or a software emulation layer.
unconditional conditional (comparing two float values) logical (comparing a single integer value with zero) Boolean (comparing a Boolean register with true)
To support DX10, many operations required several similar IL opcodes. For example, there are three forms of break: unconditional, break on a Boolean, and break on a logical value. Other operations were required both in vector and scalar forms. For example, there are two rsq instructions: rsq corresponds to the IL 1 scalar opcode,
1-1
rsq_vec corresponds to the DX10 vector form that computes the reciprocal square root on each component. In DX9, 0* any value was defined to be 0. DX10 changed this to more closely match IEEE arithmetic which defines 0*Nan = Nan. All float operations containing a multiply now take a flag to specify Nan behavior.
SR Globally shared registers. Sharing between all wavefronts in a SIMD. Column sharing on the SIMD. Persistent registers. LDS local data share - read/write. These are read/write registers that support sharing between all work-items in a group. GDS global data share. These read/write registers support sharing between all work-items in all groups. Requires synchronization. Data sharing between all work-items in a group. Required synchronization. Memory - read/write. Constant buffers Texture cache
1-3
1-4
The following chapter defines the format in which kernels written using the IL are passed to the compiler.
2.1 IL Stream
Clients pass kernels as a stream of 32-bit tokens organized as variable-length instruction packets. These tokens include information about the client language, shader type, and instruction packets that describe the operation of the kernels. Table 2.1 indicates the ordering of packets. Table 2.1 IL Stream Instruction Packet Ordering
Description IL_Lang token. See Section 2.2.1, on page 2-2. IL_Version token. See Section 2.2.2, on page 2-2. IL_Opcode token describing the operation of the first instruction in the stream and the beginning of the first IL instruction packet. More IL instruction packets. See Chapter 3, Text Instruction Syntax.
Instruction Packet 1 2 3
n IL instruction packet for an END instruction. See page 7-21. (number of 32-bit tokens in the stream)
Instruction Packets start out with a special token: ILOpcode. They contain all the information needed to perform the single instruction specified in this token. This information can include data about source and destination operands, destination or target labels, and additional data needed to perform the instruction. There are assorted IL statements that can be used to declare resources, samplers, or registers. Any declaration of an object must appear before all uses of the object. There is no requirement to group all declarations at the start of the program. Most IL statements and types can be used in any kind of shader. However, as noted below, some statements and types are restricted to specific kinds of shaders.
2-1
2.2.1
Language Token
This token indicates the type of client generating the IL. This token must be at the beginning of every IL stream passed to the compiler.
Table 2.2
Field Name client_type
reserved
31:8
2.2.2
Version Token
This token specifies the version of IL used in this IL stream. It also specifies the type of kernel the IL stream represents (pixel or vertex).
Table 2.3
Field Name minor_version major_version shader_type multipass realtime
reserved
31:26
IL Text combines ther IL_Lang and IL_Version tokens into a single version instruction with the following syntax. il_ps_major_minor il_vs_major_minor il_cs_major_minor il_gs_major_minor
2-2
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
2.2.3
Opcode Token
This token specifies the current operation and information required to perform the operation.
Table 2.4
Field Name code control
29:16 Opcode specific control. Possible values for this depend on the value of code. Specifies further instruction behavior. This field must be zero for all instructions not using it. 30 Specifies whether an opcode-specific token describing further instruction behavior follows the primary modifier token. 0 An opcode specific token does not follow. 1 An opcode specific token follows. This field must be zero for all instructions not using it. Specifies whether an opcode-specific token describing further instruction behavior follows this token. 0 An opcode specific token does not follow. 1 An opcode specific token follows. This field must be zero for all instructions not using it.
sec_modifier_present
pri_modifier_present
31
2.2.4
Destination Token
This token specifies the register to which the hardware passes the result of the current instruction and other information pertaining to this result. This token can only be issued after an IL_Opcode token has been issued as part of an instruction packet. By default, all components of the register specified are written unless the modifier_present field is 1 and an IL_Dst_Mod token follows. DX10 allows an additional kind of indexing: some temporary objects can be indexed by a register; thus, the IL now allows an additional modifier (register relative modifier). Two kinds of destinations can be indexed: IL_TEMP_ARRAY and IL_OUTPUT. The immediate_present field is used for indexed data types: itemp, cb, etc. It can be used if the data type uses absolute, reg-relative, loop, or addr relative addressing. See Section 2.2.6, Source Token, page 2-5, for examples.
Table 2.5
Field Name register_num register_type
modifier_present
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
2-3
Table 2.5
Field Name
relative_address
dimension
25
immediate_present
26
reserved extended
27:30 31
Must be zero. 0 1
2.2.5
Instruction Modifiers
Description Shift scale left by 2 modifier. Shift scale left by 4 modifier. Shift scale left by 8 modifier. Shift scale right by 2 modifier. Example add_x2 r0, r1, r2 add_x4 r0, r1, r2 add_x8 r0, r1, r2 add_d2 r0, r1, r2
2-4
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 2.6
Modifier _d4 _d8 _sat
The modifiers precedence is: 1. shift_scale 2. clamp An example of all these operations performed on the x-component of a destination operand is: dstCmpMod(clamp(shift_scale(dst.x))) Table 2.7
Field Name component_x_r component_y_g component_z_b component_w_a clamp
shift_scale reserved
12:9 31:13
1. See Section 6.18, ILModDstComponent, page 6-9. 2. See Section 6.26, ILShiftScale, page 6-19.
Destination scale modifiers can be applied to either float or double operations, but cannot be applied to integer or unsigned operations. The clamp modifier can be used only with float operands.
2.2.6
Source Token
This token specifies the register that the instruction uses as a source operand. This token can only be issued after an IL_Opcode token as part of an instruction packet. If an IL_Src_Mod token does not follow, then:
the first component is set to the x component, the second component is set to the y component, the third component is set to the z component, and the fourth component is set to the w component.
2-5
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Starting with IL 2.0, some source tokens are allowed to use register relative indexing. Only one level of indexing is allowed. The type of register that has indexed sources must be one of the following: IL_REGTYPE_ITEMP, IL_REGTYPE_CONST_BUFF, or IL_REGTYPE_INPUT. Since the index must be a scalar value, a modifier field must be used to replicate a single component into four slots. Table 2.8 lists and briefly describes the IL_src source operands. IL version 2.0 also allows a source token to refer to a literal defined in a dcl_literal statement. DX10 statements such as: add r1, r2, float4 (1.0f, 2.0f, 3.0f, 4.0f) can be translated into: dcl_literal_float4, l1, 1.0f, 2.0f, 3.0f, 4.0f add r1, r2, l1
Table 2.8
Field Name register_num register_type
modifier_present
relative_address
24:23
dimension
25
immediate_present
26
reserved extended
27:30 31
Must be zero. 0 1
2-6
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
2.2.7
If this token is not present, the x, y, z, w corresponds to the first, second, third, and fourth components, respectively. For floating point arithmetic instructions, the negate modifier simply flips the sign of the number(s) in the source operand, including on INF values. Applying negate on NaN preserves NaN, although the particular NaN bit pattern that results is not defined. For double instructions the negate modifier only modifies the upper half of the double source. It is ignored on the lower half of a double. In the same way, abs only modifies the upper half of a double source (changing the sign) without any effect on the lower half. For integer instructions, the negate modifier takes the 2s complement of the number(s) in the source operand. For floating point arithmetic instructions: the abs modifier simply forces the sign of the number(s) on the source operand positive, including on INF values. Applying abs on NaN preserves NaN, although the NaN bit pattern that results is not defined. The 1 swizzle inserts a floating point 1.0f, even if the opcode is an integer operation. This can lead to unexpected results. For example, when evaluating the second source of iadd r1, r1, r1.1_neg(xyzw), implementations take a floating point 1.0f and treat it as an integer. To negate an integer, use the INEGATE function.
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
2-7
The modifiers use the following precedence: 1. swizzle - rearranges and/or replicates components 2. _invert - inverts components 1 - x 3. _bias - biases components x - 0.5 4. _x2 - multiplies components by 2.0 _bx2 - signed scaling: combines _bias and _x2 modifiers 5. _sign - signs components: components < 0 become -1; components = 0 become 0; components > 1 become 1 6. _divComp(type) - performs division based on divcomp value; type y, z, w unknown 7. _abs - takes the absolute value of components 8. neg(comp) - provides per-component negate 9. clamp - clamps the value
An example of all of these operations performed on the x-component of a source operand is:
clamp(negate (abs (divComp(sign(x2(bias(invert(swizzle(srcs))))))))))
The modifiers bias, x2, divComp, and clamp cannot be used when the opcode of the instruction specifies an integer or logical operation. Other examples are: mov r0, r1.zyx1 mov r0, r1.xxyy add r0, r1_invert, r2 add r0, r1, r2_bias add r0, r1, r2_x2 add r0, r1, r2_bx2 mov r0, r1_sign texld\_stage(0) r0, vT0_divcomp(y) mov r0, r1_abs mov r0, r1_neg(xw) Table 2.9
Field Name swizzle_x_r negate_x_r
swizzle_y_g negate_y_g
6:4 7
2-8
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 2.9
Field Name swizzle_z_b negate_z_b
swizzle_w_a negate_w_a
14:12 15
invert
16
bias
17
x2
18
sign
19
abs
20
divComp
23:21
clamp
24
reserved
31:25
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
2-9
Table 2.10 shows the relationship between the modifiers and types. Table 2.10
Modifier swizzle _invert _bias _x2 _bx2 _sign _divcomp(type) _abs neg
2.2.8
IL_Src register_num = 5 register_type = IL_REGTYPE_ITEMP (see Section 7.5, Declaration and Initialization Instructions, page 7-32) modifier_present = 1 relative_address = IL_ADDR_ABSOLUTE dimension = 0 immediate_present = 1 extended = 0 swizzle_x_r = y swizzle_y_g = y swizzle_z_b = y swizzle_w_a = y val = 6
IL_Src_Mod
IL_Literal
2-10
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Example 2: Source Operand: x5[r2.x+6].y Here are the IL tokens for this:
IL_Src x5 register_num = 5 register_type = IL_REGTYPE_ITEMP modifier_present = 1 relative_address = IL_ADDR_REG_RELATIVE dimension = 0 immediate_present = 1 extended = 0 swizzle_x_r = y swizzle_y_g = y swizzle_z_b = y swizzle_w_a = y register_num = 2 register_type = IL_REGTYPE_TEMP modifier_present = 1 relative_address = IL_ADDR_ABSOLUTE dimension = 0 immediate_present = 0 extended = 0
IL_Src_Mod
IL_Src
IL_Literal
Example 3: Source Operand: v[1][2] (all fields set to zero, unless otherwise stated)
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_Src
Example 4: Source Operand: v[1][2].xyxx (all fields set to zero unless otherwise stated)
IL_Src register_num = 1 register_type = IL_REGTYPE_VERTEX dimension = 1 modifier_present = 1 swizzle_x_r = x swizzle_y_g = y swizzle_z_b = x swizzle_w_a = x register_num = 2 register_type = IL_REGTYPE_VERTEX
IL_Src_Mod
IL_Src
IL_Src cb register_num = 0 register_type = IL_REGTYPE_CONSTANT BUFFER modifier_present = 1 relative_address = IL_ADDR_REG_RELATIVE dimension = 1 immediate_present = 1 extended = 0 swizzle_x_r = y swizzle_y_g = y swizzle_z_b = y swizzle_w_a = y
IL_Src_Mod .y
2-12
IL_Src r6
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
register_num = 6 register_type = IL_REGTYPE_TEMP modifier_present = 1 relative_address = IL_ADDR_ABSOLUTE dimension = 0 immediate_present = 0 xxtended = 0 swizzle_x_r = w swizzle_y_g = w swizzle_z_b = w swizzle_w_a = w val = 2 register_num = 0 register_type = 0 modifier_present = 0 relative_address = IL_ADDR_REG_RELATIVE dimension = 0 immediate_present = 1 extended = 0 register_num = 2 register_type = IL_REG_TYPE_TEMP modifier_present = 1 relative_address = IL_ADDR_ABSOLUTE dimension = 0 immediate_present = 1 extended = 0 swizzle_x_r = x swizzle_y_g = x swizzle_z_b = x swizzle_w_a = x
IL_Src_Mod .w for r6
IL_Src r2
IL_Src_Mod .x for r2
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
2-13
IL_Literal val = 4
2-14
IL Token Descriptions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL Text syntax is designed to closely match the IL specification, so that there is an almost complete one-to-one mapping. Below is a simple vertex and pixel shader pair written in IL Text syntax that renders green stripes. il_vs dclv_elem(0) v0 dclv_elem(1) v1 dclv_elem(2) v2 mmul_matrix(4x4) oPos, v0, c[0] mov oPriColor0, v1 mov oT0, v2 end il_ps dclpi_x(1)_y(1)_z(1)_w(1) vPriColor0 ; Declare primary color import dclpi_x(1)_y(1)_z(*)_w(*) vT0 ; Declare vs import texture coordinates def c0, 0.5, 1, 0, 0 def c1, 0.0, 1.0, 0.0, 1.0 mod r0.x, vT0.x, c0.y ifc_relop(lt) r0.x, c0.x mov oC0, vPriColor0 else ; else mov oC0.rgb1, c1 endif end ; Declare position ; Declare color ; Declare texture coordinates ; Transform position to clip space ; Export vertex color ; Export texture coordinates
; Green color ; x = mod( s, 1.0 ) ; if ( x < 0.5 ) ; Output surface color ; Output green color
3.1 Version
The first two tokens in an IL binary stream are IL_Lang and IL_Version. IL Text syntax combines these into a single version instruction. The IL translator sets the language type to IL_LANG_VERSION and disables the language defaults. The IL_Version token has the following syntax: il_ps_major_minor_mp_rt il_vs_major_minor_mp_rt
AMD Intermediate Language Reference Guide
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
3-1
If the major and minor version are not specified, the IL translator inserts them based on its own version. Typically, the version information is omitted, unless a specific optimization made in the compiler is wanted. If a shader represents a multipass shader, append _mp to the statement. If this shader represents a realtime shader, append _rt to the statement.
3.2 Registers
Registers that are prefixed with the letter v are read-only (import) buffers, registers prefixed with the letter o are write-only (export) buffers. This section only lists the registers. See Chapter 5 for more information on register types. Common Registers: b#, c#, i#, a#, aL, and r# Vertex Shader:
v#, oPos#, oPriColor#, oSecColor#, oT#, oInterp#, oFog, oSprite, vBaryCood, vPrimIndex, vQuadIndex
Pixel Shader: vPriColor#, vSecColor#, vT#, vInterp#, vFog, vSprite, vFace, vWinCoord, oC#, oDepth
<instr>[_ctrl]
If the control specifier _ctrlspec is left off the instruction, the default action is performed. Control specifiers that have more than two modes of operation have the following syntax: instruction_ctrl(value)] For these specifiers, the parenthesis is mandatory, and the value specifies the mode of operation. Sections 3.4, 3.5, 3.6, and 3.7 describe the control specifiers in the IL spec. See Chapter 7, Instructions, to determine which specifiers go with which instructions.
3-2
Registers
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 3.1
Destination Modifiers
Modifier _x2 _x4 _x8 _d2 _d4 _d8 _sat Description Shift scale modifiers. Example add_x2 r0, r1, r2
Source Modifiers
Description Rearrange and/or replicate components. Example mov r0, r1.zyx1 mov r0, r1.xxyy
Write Mask
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
3-3
Table 3.2
Modifier _abs _bias _bx2
_x2
3.7 Comments
Only single-line comments are supported using a semicolon as the delimiter. Other commenting styles, such as /* */ matching pairs, also are supported. A comment can start from any position on the line. Example: ; The following instruction moves the contents of r1 into r0 mov r0, r1 ; mov instruction
3-4
Comments
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
} } The corresponding IL text example of a vertex and pixel shader pair that implements the same red checkerboard pattern is shown below. Note that this example purposely does not use the IL mod instruction in order to show the syntax for the call instruction. il_vs dclv_elem(0) v0 dclv_elem(1) v1 dclv_elem(2) v2 mmul_matrix(4x4) oPos, v0, c[0] mov oPriColor0, v1 mov oT0, v2 end il_ps dclpi_x(1)_y(1)_z(1)_w(1) vPriColor0 ; Declare vs import color dclpi_x(1)_y(1)_z(*)_w(*) vT0 ; Declare vs import texture coordinates def c0, 0, 1, 0.5, 0 def c1, 2.0, 2.0, 0.0, 0.0 def c2, 1.0, 0.0, 0.0, 0.0 ; Perform mod( s * sfreq, 1 ) mul r10.x, vT0.x, c1.x mov r10.y, c0.y call 1 mov r1.x, r11.x ; Perform mod( t * tfreq, 1 ) mul r10.x, vT0.y, c1.y mov r10.y, c0.y call 1 mov r2.x, r11.x ifc_relop(lt) r1.x, c0.z ifc_relop(lt) r2.x, c0.z mov r0, vPriColor0 else mov r0, c2 endif else ifc_relop(lt) r2.x, c0.z mov r0, vPriColor0 else mov r0, c2 endif endif mov oC0, r0.rgb1 ; Declare position ; Declare color ; Declare texture coordinates ; Xform position to clip space
; a = t * tfreq ; b = 1 ; tmod = mod( a, b ) ; ; ; ; ; ; ; ; ; ; ; ; ; if ( smod < 0.5 ) { if ( tmod < 0.5 ) Ci = Cs else Ci = redcolor } else { if ( tmod < 0.5 ) Ci = Cs else Ci = redcolor }
; ensure alpha is 1
3-5
endmain
; float c = mod( float a, float b ) ; r10.x is argument a ; r10.y is argument b ; r11.x is return value c ; Reference: Ebert et al. Texturing and Modeling, pg. 28 ; Translated from DX shader written by D. Mooney func 1 rcp mul frc sub mul sub
ifc_relop(lt) r10.x, c0.x add r10.x, r10.x, r10.y endif mov r11.x, r10.x ret end
; return a;
3-6
Pixel shaders must write to a PCOLOR or DEPTH register (i.e., must export a color or depth value) unless the shader is multipass. Vertex shaders must write to the POS register or write to an VOUTPUT defined as having usage IL_IMPORTUSAGE_POS (i.e., must export a position) unless the shader is multipass. The shader must begin with a Language token immediately followed by a Version token. There can be only one END instruction in a shader. The END instruction must be the last instruction in the shader (therefore it cannot be within a flow-control-block). The ENDMAIN instruction must be used before any subroutines are defined. (ENDMAIN is only required if functions are defined). All instruction after and ENDMAIN instruction except for the END instruction must be in a subroutine. Loop relative addressing can be used only on PINPUT, VOUTPUT, VERTEX, INTERP, or TEXCOORD registers. Base relative addressing can only be used on CONST_FLOAT registers. A DCLARRAY instruction must be issued to use a range of INTERP or TEXCOORD registers with loop relative addressing. An INITV or DCLV instruction must be issued on a VERTEX register before it is used as a source any other instruction. A DCLPI instruction must be issued on each INTERP, TEXCOORD, PRICOLOR, SECCOLOR, FOG, and WINCOORD register used before it is used in a regular pixel shader. A shader can use more than one unique constant in an instruction and can also use more than one different constant in a single instruction. For example, r0 = c0 + c0 is legal; r0 = c0 + c1 is also legal.
4-1
A pixel shader can use an INTERP, TEXCOORD, PRICOLOR, SECCOLOR, or FOG register even if it was not written to in the vertex shader. The value of these register depends on the DCLPI instruction. Must be able to nest CALL and CALLNZ up to four levels. Must be able to nest LOOP-ENDLOOP up to four levels. Must be able to nest IFNZ-ELSE-ENDIF and IFC-ELSE-ENDIF up to 24 levels. A DCLPP instruction must be issued on each PINPUT register used before it is used in a real-time pixel shader. A real-time pixel shader follows all of the same rules of a normal pixel shader with the following exceptions: The DCLPI and DCLPIN instructions cannot be used. Instead, a DCLPP instruction must be used. WINCOORD, SPRITECOORD, and FACE registers cannot be used. A DCLPT instruction must be issued for each texture stage before TEXLD, TEXLDD, TEXLDB, TEXLDMS, TEXWEIGHT, PROJECT, or LOD is used on the stage. At least one DCLPIN instruction must be issued on each PINPUT register before it is used. At least one DCLVOUT instruction must be issued on each VOUTPUT register before it is used.
A vertex shader using an VOUTPUT register cannot link with a pixel shader using a INTERP, TEXCOORD, PRICOLOR, SECCOLOR, or FOG registers. It can only link with a pixel shader using PINPUT registers. A pixel shader using an PINPUT register cannot link with a vertex shader using POS, SPRITE, INTERP, TEXCOORD, PRICOLOR, SECCOLOR, or FOG registers. It can only link with a vertex shader using VOUTPUT registers. If multiple usage-usageIndex pairs are packed in a single VOUTPUT register in a vertex shader, they must also be packed in a single PINPUT register in the linked pixel shader. If multiple usage-usageIndex pairs are packed in a single PINPUT register in a pixel shader, they must also be packed in a single VOUTPUT register in the linked vertex shader.
register or an VOUTPUT declared as having usage IL_IMPORTUSAGE_POS is not required (i.e. exporting a position is not required and any exports to position are ignored). In a pixel shader, writes to the PCOLOR and DEPTH registers are ignored. They are also not required.
The DCLPI and DCLPIN instructions cannot be used. Instead, DCLPP must be used. WINCOORD, SPRITECOORD, and FACE registers cannot be used. INTERP, TEXCOORD, PRICOLOR, SECCOLOR, and FOG registers also cannot be used.
A vertex shader cannot be real-time. Therefore if the shader_type field of the IL_Version token is set to IL_SHADER_VERTEX the realtime bit must be set to 0.
Real-Time Shaders
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
4-3
4-4
Real-Time Shaders
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
This chapter describes the AMD IL valid register types. Tables 5.1 through 5.4 show the mappings of DX register types to IL types. Table 5.1
DX10 Temp r# Indexed temp x# Input register v# Output register o# Input resource t# Sampler s# Constant Buffer Literal f4 Literal I4 Literal f1 Literal i1 vPrim VertexID (semantic) in cb literal literal4 literalf1 literal1
dcl_indexableTemp dcl_indexableTemp dcl_input dcl_output dcl_resource -- by use --dcl_cb literal literal literal literal NA dcl_input_sv dcl_input_sv dcl_input_sv NA dcl_input_sv or dcl_output_sv dcl_input_sv or dcl_output_sv dcl_input_sv or dcl_output_sv dcl_output_sv dcl_output_sv dcl_input NA dcl_resource -- by use -dcl_cb literal literal literal literal NA NA dcl_input dcl_input dcl_input_sv dcl_input_sv dcl_input_sv dcl_input_sv dcl_input_sv dcl_input_sv
PrimitiveID (semantic) in InstanceID (semantic) in IsFrontFace semantic) in ClipDistance CullDistance Position RenderTargetArrayIndex ViewPortArrayIndex
5-1
Table 5.1
DX10
Table 5.2
DX10 Input primitive
Maxoutputvertexcount Topology
Table 5.3
DX9 (3.0)
V# in vertex shader Temps Constants Address register a0 Boolean constants Integer constants Loop counter Al Predicate register Sampler O# in vertex shader Face vPos V# in pixel shader Oc# oDepth
5-2
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 5.4
DX9 (2.0 and Lower) V# in vertex shader Temps Constants Address register a0 Boolean constants Integer constants Loop counter Al oPos oFog oPts oD0/oD1 color oT# V0/v1 Color register Predicate register Sampler T# Oc# oDepth Window coord
Table 5.5
Indexing Mode index object_index Barycentric coord Primitive index Quad index
Table 5.6
5-3
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 5.7
Name
IL Regtype_
Syntax
Read
Write
Notes
Addr Boolean
ADDR CONST_BOOL
A0 b#
4 1
Yes No
No No
No No
c# vINDEX v# I# o# vPrim
4 4 4 4 4 1
No No No No No
Yes No No No No
No No No No No To support DX10. Can be used in all shader types except vertex. To support DX11. Cannot be used in pixel shader. To support DX10.
Timer Vertex
INPUT VERTEX
Tmr V#
2 4
Yes Yes
No Dclv or initv
No No
No No
No No
5-4
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 5.8
Register
IL_REGTYPE_ABSOLUTE_THREAD_ID IL_REGTYPE_ABSOLUTE_THREAD_ID_FLAT IL_REGTYPE_ADDR IL_REGTYPE_BARYCENTRIC_COORD IL_REGTYPE_CLIP IL_REGTYPE_CONST_BOOL IL_REGTYPE_CONST_BUFF IL_REGTYPE_CONST_FLOAT IL_REGTYPE_CONST_INT IL_REGTYPE_DEPTH IL_REGTYPE_DEPTH_GE IL_REGTYPE_DEPTH_LE IL_REGTYPE_DOMAINLOCATION IL_REGTYPE_EDGEFLAG IL_REGTYPE_FACE IL_REGTYPE_FOG IL_REGTYPE_GENERIC_MEM IL_REGTYPE_GLOBAL IL_REGTYPE_IMMED_CONST_BUFF IL_REGTYPE_INDEX IL_REGTYPE_INPUT IL_REGTYPE_LINE_STIPPLE IL_REGTYPE_INPUT_ARG IL_REGTYPE_INPUT_COVERAGE_MASK IL_REGTYPE_INPUTCP IL_REGTYPE_INTERP IL_REGTYPE_ITEMP IL_REGTYPE_LITERAL IL_REGTYPE_OBJECT_INDEX IL_REGTYPE_OCP_ID IL_REGTYPE_OMASK IL_REGTYPE_OUTPUT IL_REGTYPE_OUTPUT_ARG IL_REGTYPE_OUTPUTCP IL_REGTYPE_PATCHCONST IL_REGTYPE_PCOLOR IL_REGTYPE_PERSIST IL_REGTYPE_PINPUT
5-5
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 5.8
Register
IL_REGTYPE_POS IL_REGTYPE_PRICOLOR IL_REGTYPE_PRIMCOORD IL_REGTYPE_PRIMITIVE_INDEX IL_REGTYPE_PRIMTYPE IL_REGTYPE_PS_OUT_FOG IL_REGTYPE_QUAD_INDEX IL_REGTYPE_SECCOLOR IL_REGTYPE_SHADER_INSTANCE_ID IL_REGTYPE_SHARED_TEMP IL_REGTYPE_SPRITE IL_REGTYPE_SPRITECOORD IL_REGTYPE_STENCIL IL_REGTYPE_TEMP IL_REGTYPE_TEXCOORD IL_REGTYPE_THIS IL_REGTYPE_THREAD_GROUP_ID IL_REGTYPE_THREAD_GROUP_ID_FLAT IL_REGTYPE_THREAD_ID_IN_GROUP IL_REGTYPE_THREAD_ID_IN_GROUP_FLAT IL_REGTYPE_TIMER IL_REGTYPE_VERTEX IL_REGTYPE_VOUTPUT IL_REGTYPE_VPRIM IL_REGTYPE_WINCOORD
5-6
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5.1 ABSOLUTE_THREAD_ID
Enum: IL_REGTYPE_ABSOLUTE_THREAD_ID Text Syntax: vAbsTid Components per Register: 3 Description: This read-only input register contains an absolute work-item ID. The ID is threedimensional. It is used only in compute shaders. The xyz components of the register can be used as an index or in integer operations. The w component is not valid and must not be used. This register is used only in compute shaders. Valid in R7XX GPUs and later. Example: mov r2.x, vAbsTid.xyzx mov g[vAbsTid.x], r2
5.2 ABSOLUTE_THREAD_ID_FLATTENED
Enum: IL_REGTYPE_ABSOLUTE_THREAD_ID_FLAT Text Syntax: vAbsTidFlat (also as vaTid for back-compatibility) Components per Register: 1 Description: This read-only input register contains the flattened absolute work-item ID. Assuming the work-group size is (Dx, Dy, Dz), the flattened value is computed as vAbsTidFlat.x = vThreadGrpIdFlat.x*Dx*Dy*Dz + vTidInGrpFlat.x This register can be used as an index or in integer operations. Only the x component has a meaningful value. The y, z, and w components replicate the value of the x component. This is used only in a compute shader. Valid in R7XX GPUs and later. Example:
mov g[vAbsTidFlat.x], r2
5.3 BARYCENTRIC_COORD
Enum: IL_REGTYPE_BARYCENTRIC_COORD Text Syntax: vBaryCoord Components per Register: 4
ABSOLUTE_THREAD_ID
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-7
Description: This register type is valid only for HOS rendering. For non-HOS (higher-order surface) rendering, this register type is invalid and its contents are undefined. For HOS rendering, this is a vertex shader import for the barycentric coordinates of the current, tessellated vertex. This read-only register cannot be used with relative addressing. It is an error to use this register in a pixel shader.
5.4 CONST_BUFF
Enum: IL_REGTYPE_CONST_BUFF Text Syntax: cb#[n] Components per Register: 4 Description: Read-only register with a maximum of 4096 elements.
5.5 DEPTH
Enum: IL_REGTYPE_DEPTH Text Syntax: oDepth Components per Register: 1 Description: Pixel shader export for depth data. This is a scalar register where the depth values is contained in the first component. This write-only register cannot be the source of an instruction. It is an error to use this register in a vertex shader. This register cannot be used with relative addressing. The second, third, and fourth components of this register are unused and undefined. DEPTH, DEPTHLE, and DEPTHGE are mutually exclusive; use at most one of these in a shader.
5.6 DEPTH_GE
Enum: IL_REGTYPE_DEPTH_GE Text Syntax: oDepthGE Components per Register: 1 Description: Pixel shader export for depth data, guaranteed to be greater than or equal to rasterizer depth value. This is a scalar register where the depth value is contained in the first component. This write-only register cannot be the source of
5-8 CONST_BUFF
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
any instruction. It is an error to use this register in a vertex shader. This register cannot be used with relative addressing. The second, third, and fourth components of this register are unused and undefined. DEPTH, DEPTHLE and DEPTHGE are mutually exclusive, at most one of them can be used by a shader. If rasterizer depth is not declared in the shader, its interpolation mode is set to sample if shader runs at sample rate (the shader declares a sample index or sample attributes); otherwise, centroid interpolation mode is used. Valid in Evergreen GPUs and later.
5.7 DEPTH_LE
Enum: IL_REGTYPE_DEPTH_LE Text Syntax: oDepthLE Components per Register: 1 Description: Pixel shader export for depth data, guaranteed to be less than, or equal to, rasterizer depth value. This scalar registers depth value is in the first component. This write-only register cannot be the source of any instruction. It is an error to use this register in a vertex shader. This register cannot be used with relative addressing. The second, third, and fourth components of this register are unused and undefined. DEPTH, DEPTHLE and DEPTHGE are mutually exclusive; use at most one of these in a shader. If rasterizer depth is not declared in the shader, its interpolation mode is set to sample if the shader runs at the sample rate (if the shader declares sample index or sample attributes); otherwise, the centroid interpolation mode is used. Valid in Evergreen GPUs and later.
5.8 DOMAINLOCATION
Enum: IL_REGTYPE_DOMAINLOCATION Text Syntax: vDomain Components per Register: 4 Description: This read-only register is used in the domain shader as input only. Example:
mov r1, vDomain
DEPTH_LE
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-9
5.9 EDGEFLAG
Enum: IL_REGTYPE_EDGEFLAG Text Syntax: oEdgeFlag Components per Register: 1 Description: This is the vertex shader export for an edge flag. The first channel contains the edge flag. This write-only register cannot be the source of any instruction. This register cannot be used with relative addressing. It is an error to use this register if VOUTPUT register is used. It is an error to use this register in a pixel shader. The y, z, and w components of this register are undefined.
5.10 FACE
Enum: IL_REGTYPE_FACE Text Syntax: vFace Components per Register: 1 Description: Pixel shader import for primitive facing. The x component is negative if the pixel is the back-face of the primitive. The x component is positive if the pixel is the front-face of the primitive. Point and Line primitives are always front-facing. Points and lines rendered as a result of polygons using point or line fill mode inherit the facing of the polygon. This i read-only register cannot be the destination of an instruction. It is an error to use this register in a vertex shader. It is an error to use this register in a real-time pixel shader. This register cannot be used with relative addressing. The second, third, and fourth components of this register are undefined.
5.11 FOG
Enum: IL_REGTYPE_FOG Text Syntax (VS): oFog (write-only) Text Syntax (PS): oFog (read-only) Components per Register: 1 Description: Vertex shader export and pixel shader import for interpolated fog data. This is a scalar register where the value is contained in the first component. In a vertex shader, the second, third, and fourth components must be masked (cannot be written to). In a pixel shader the second, third, and fourth components are
5-10
EDGEFLAG
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
undefined. Perspective correct interpolation is performed on the values of this registers when passed from the vertex shader to the pixel shader. Can only use a total of 16 of INTERP, TEXCOORD, PRICOLOR, SECCOLOR, and FOG registers in a single shader. A DCLPI instruction must be issued on this shader type before is it used in any other instruction. This register cannot be used with relative addressing. It is an error to use a register of this type in a vertex shader if a VOUTPUT register is used. You can achieve similar functionality by using the DCLVOUT instruction on a VOUTPUT register and declaring its usage as IL_IMPORTUSAGE_FOG. It is an error to use a register of this type in a pixel shader if a PINPUT register is used. You can achieve similar functionality by using the DCLPIN instruction on a PINPUT register and declaring its usage as IL_IMPORTUSAGE_FOG. In a vertex shader, this is a write-only register. It cannot be the source of an instruction. In a pixel shader, this is a read-only register. It cannot be the destination of any instruction.
5.12 GENERIC_MEMORY
Enum: IL_REGTYPE_GENERIC_MEM Text Syntax: mem Components per Register: 4 Description: This is register provides a mask or swizzle. It is used by instructions such as write_lds_, which does not have a dst register, but requires a dst mask. Valid in Evergreen GPUs and later. Example:
write_lds mem.x_z_, r0.xyzw
GENERIC_MEMORY
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-11
5.13 GLOBAL
Enum: IL_REGTYPE_GLOBAL Text Syntax: g[address] Note that the address must be a single-component integer. Components per Register: 4 Description: This is a read/write register that can be used to address global memory in the text form of the shader. Global variables are indicated by a g[address]. Example:
add g[2].x, r4, g[4]
reads address 4, adds r4 to the contents of that address, and scatters the result to address 2. Each address corresponds to a 128-bit, four dword location. Most address locations are indexed. Global g registers can be indexed using the temp (r) registers. For example, add g [r5.x].x r4, g[r6.y]
5.14 IMMED_CONST_BUFFER
Enum: IL_REGTYPE_IMMED_CONST_BUFF Text Syntax: icb[n] Components per Register: 4 Description: Read-only register. Used to access the immediate constant buffer. Used in the same way as cb[n] for accessing a constant buffer.
5.15 INDEX
Enum: IL_REGTYPE_INDEX Text Syntax: vIndex Components per Register: 4 Description: Vertex shader import for the index from the index buffer of the current vertex processed. Not guaranteed to be incremental. For non-HOS (high-order surface) rendering, the first component is the index of the current vertex processed. For HOS rendering (when HOS is enabled) the first, second, third, and fourth components represent the indices for the superprim vertices for the current
5-12
GLOBAL
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
tessellated vertex being processed. The superprim indices are output by the tessellation engine based on the relevant HOS state. This read-only register cannot be used with relative addressing. It is an error to use this register in a pixel shader.
5.16 INPUT
Enum: IL_REGTYPE_INPUT Text Syntax: v#[n] Components per Register: 4 Description: This read-only register is the formal parameter of a macro. The register can be used only within a macro definition. When a macro is called, the actual parameters are copied into the macro input registers. When the macro returns, the registers are restored.
5.17 INPUT_ARG
Enum: IL_REGTYPE_INPUT_ARG Text Syntax: in# Components per Register: 4 Description: This read-write register is the formal parameter of a macro. It can be used only within a macro definition. When a macro is called, the actual parameters are copied into the macro input registers. When the macro returns, the registers are restored.
5.18 INPUT_COVERAGE_MASK
Enum: IL_REGTYPE_INPUT_COVERAGE_MASK Text Syntax: vCoverageMask Components per Register: 1 Description: Pixel shader input coverage mask. This is a scalar register where coverage mask is contained in the 1st component. It is an error to use this register outside of pixel shader. This register cannot be used with relative addressing. The 2nd, 3rd, and 4th components of this register are unused and undefined. Input coverage mask is a bitfield, where bit i from the LSB indicates (with 1) if the current primitive covers sample i in the current pixel
INPUT
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-13
on the RenderTarget. Regardless of whether the Pixel Shader is configured to be invoked at pixel frequency or sample frequency, the first n bits in input coverage mask from the LSB are used to indicate primitive coverage, given an n sample per pixel RenderTarget and/or Depth/Stencil buffer is bound at the Output Merger. The rest of the bits are 0. The input coverage bitfield is not affected by depth/stencil tests, but it is ANDed with the SampleMask Rasterizer state. If no samples are covered, such as on helper pixels executed of the bounds of a primitive to fill out 2x2 pixel stamps, input coverage mask is 0. Supported on Evergreen GPUs and later.
5.19 INPUTCP
Enum: IL_REGTYPE_INPUTCP Text Syntax: vicp[vertex#][attr#] Components per Register: 4 Description: Used in the hull shader and domain shader as input only. The vertex# is between 0 and 31. Valid in Evergreen GPUs and later. Example:
mov r1, vicp[5][2]
5.20 INTERP
Enum: IL_REGTYPE_INTERP Text Syntax (VS): oInterp# (write-only) Text Syntax (PS): vInterp# (read-only) Components per Register: 4 Description: General-purpose vertex shader export and pixel shader import for interpolated data. Perspective correct interpolation is performed on the values of these registers when passed from the vertex shader to the pixel shader. A DCLPI instruction must be issued on this shader type before is it used in any other instruction. It is an error to use a register of this type in a vertex shader if a VOUTPUT register is used. You can achieve similar functionality by using the DCLVOUT instruction on an VOUTPUT register and declaring its usage as IL_IMPORTUSAGE_GENERIC.
5-14
INPUTCP
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
It is an error to use a register of this type in a pixel shader if a PINPUT register is used. You can achieve similar functionality by using the DCLPIN instruction on a PINPUT register and declaring its usage as IL_IMPORTUSAGE_GENERIC. This register can be used only with loop relative addressing. In a vertex shader, this write-only register cannot be the source of an instruction. In a pixel shader, this read-only register cannot be the destination of an instruction.
5.21 ITEMP
Enum: IL_REGTYPE_INDEXED_TEMP Text Syntax: x#[n] Components per Register: 4 Description: Read-write register. See DCL_INDEXED_TEMP_ARRAY (page 7-34) to see how to declare indexed temps. x#[n] can be used in any ALU instruction.
5.22 LINE_STIPPLE
Enum: IL_REGTYPE_LINE_STIPPLE Text Syntax: vLineStipple Components per Register: 2 Description: Pixel shader input for the Line Stipple Texture Coord. SPI calculates the 32-bit line stipple texture coordinate and stores it in the position buffer. X = 32b tex coord Y = prim type (POINT = 0, LINE = 1, TRI = 2) Supported on Evergreen and later GPUs.
5.23 LITERAL
Enum: IL_REGTYPE_LITERAL Text Syntax: l# where # is the literal register number.
Components per Register: 4 Description: This four-components, constant, typeless, read-only register can be used in place of a GPR. The format of this register can be either integer, floating point, or fourbyte hex value. A given literal can only be defined once in a shader.
ITEMP
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-15
5.24 OBJECT_INDEX
Enum: IL_REGTYPE_OBJECT_INDEX Text Syntax: vObjIndex Components per Register: 1 Description: Vertex shader import for the ordered index for the current vertex processed (ordered vertex shader instance). Pixel shader import for the ordered index for the current pixel processed (ordered pixel shader instance). This value starts at 0 and is incremented for each successive pixel/vertex. The first component of this register contains the current vertex processed. This read-only register cannot be used with relative addressing. It is an error to use the second, third, and fourth components register.
5.25 OCP_ID
Enum: IL_REGTYPE_OCP_ID Text Syntax: vOutputControlPointID0 (only 0 is allowed) Components per Register: 4 Description: Output Control Point ID used in the hull shader as input only. Read-only register.
5.26 OMASK
Enum: IL_REGTYPE_OMASK Text Syntax: oMask Components per Register: 1 Description: When the pixel shader runs at sample-frequency, the coverage mask is ANDed with a mask that selects the sample currently being processed. As a result, sample N is always masked by bit N of oMask. This allows a shader to run at either sample-frequency or pixel-frequency with identical oMask behavior. The same rule applies to Alpha and to Coverage when the shader runs at samplefrequency. This is a scalar register where the mask value is contained in the first component. Values assigned to oMask are treated as integer. This write-only register cannot be the source of an instruction and cannot be used with relative addressing. It is an error to use this register in a vertex shader. The second, third, and fourth components of this register are unused and undefined.
5-16
OBJECT_INDEX
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5.27 OUTPUT
Enum: IL_REGTYPE_OUTPUT Text Syntax: o#[n] Components per Register: 4 Description: This is an output from the shader. See DCL_OUTPUT (page 7-45) for more detail.
5.28 OUTPUT_ARG
Enum: IL_REGTYPE_OUTPUT_ARG Text Syntax: out# Components per Register: 4 Description: This write-only register is the formal output parameter of a macro. It can be used only within a macro definition. When a macro returns the values in the macro output registers, they are copied to the actual arguments.
5.29 OUTPUT_CONTROL_POINT
Enum: IL_REGTYPE_OUTPUTCP Text Syntax: vocp[#][#] Components per Register: 1 Description: Output Control Point used in the hull shader as input only. Read-only register.
5.30 PATCH_CONST
Enum: IL_REGTYPE_PATCHCONST Text Syntax: vpc[id#] Components per Register: 4 Description: This is used in the domain shader as input only. The id# is between 0 and 31. Example:
mov r1, vpc[5] OUTPUT
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-17
5.31 PCOLOR
Enum: IL_REGTYPE_PCOLOR Text Syntax: oC# Components per Register: 4 Description: Pixel shader export for color data. The register number corresponds to the color buffer to which data is output. This write-only register cannot be the source of an instruction. It is an error to use this register in a vertex shader. This register cannot be used with relative addressing.
5.32 PERSIST
Enum: IL_REGTYPE_PERSIST Text Syntax: p[address] Components per Register: 4 Description: This read/write register addresses persistent memory.
5.33 PINPUT
Enum: IL_REGTYPE_PINPUT Text Syntax: vPixIn# Components per Register: 4 Number Per Shader: 16 Pixel shader input data. It is an error to use this register in a vertex shader. This read and write register can be used with loop-relative addressing as a source only. A DCLPIN or DCLPP instruction must be issued on a register of this type before it is used. It is an error to use this register if an INTERP, TEXCOORD, PRICOLOR, SECCOLOR, or FOG register is used.
5-18
PCOLOR
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5.34 POS
Enum: IL_REGTYPE_POS Text Syntax: oPos Components per Register: 4 Description: Vertex shader export for position data. The position of the vertex in clip space. This write-only register cannot be the source of an instruction. It is an error to use this register if a VOUTPUT register is used. Similar functionality is provided by using the DCLVOUT instruction on a VOUTPUT register and declaring its usage as IL_IMPORTUSAGE_POS. This register cannot be used with relative addressing. It is an error to use this register in a pixel shader.
5.35 PRICOLOR
Enum: IL_REGTYPE_PRICOLOR Text Syntax (VS): oPriColor# (write-only) Text Syntax (PS): vPriColor# (read-only) Components per Register: 4 Description: Vertex shader export and pixel shader import for interpolated primary color data. By convention, register number 0 represents the front-facing color, while register number 1 represents the back-facing color. When flat shading is used, no interpolation is performed on the values of these registers when passed from the vertex shader to the pixel shader. Instead, only the value of the provoking vertex is passed to the pixel shader. When smooth shading is used, perspective correct interpolation is performed on the values. A DCLPI instruction must be issued on this shader type before is it used in any other instruction. This register cannot be used with relative addressing. It is an error to use a register of this type in a vertex shader if a VOUTPUT register is used. You can achieve similar functionality by using the DCLVOUT instruction on a VOUTPUT register and declaring its usage as IL_IMPORTUSAGE_COLOR or IL_IMPORTUSAGE_BACKCOLOR. It is an error to use a register of this type in a pixel shader if a PINPUT register is used. You can achieve similar functionality by using the DCLPIN instruction on a PINPUT register and declaring its usage as IL_IMPORTUSAGE_COLOR or IL_IMPORTUSAGE_BACKCOLOR. In a vertex shader, these are write-only registers. In a pixel shader, these are read-only registers.
POS
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-19
5.36 PRIMCOORD
Enum: IL_REGTYPE_PRIMCOORD Text Syntax: vPrimCoord Components per Register: 2 Description: This is a graphics-only feature. Pixel shader input for point-aa or line-aa texture coordinates. SPRITECOORD cannot be used in any shader that uses PRIMCOORD. The first and second components contain the pixel's S and T coordinate for the point/line primitive rendered. The values of this register are undefined if the primitive rendered is not a point or a line. The third and fourth components of this register are undefined. This read-only register cannot be the destination of an instruction. This register cannot be used with relative addressing. It is an error to use this register in a vertex or geometry shader.
5.37 PRIMITIVE_INDEX
Enum: IL_REGTYPE_PRIMITIVE_INDEX Text Syntax: vPrimIndex Components per Register: 4 Description: This read-only register type is valid only for HOS rendering. For normal, nonHOS rendering, this register type is invalid, and its contents are undefined. For HOS rendering, this is the incremental index of the current tessellation primitive generated by the tessellation engine for certain types of HOS rendering. This register cannot be used with relative addressing. It is an error to use this register in a pixel shader.
5.38 PRIMTYPE
Enum: IL_REGTYPE_PRIMTYPE Text Syntax: vPrimType Components per Register: 2 Description: This is a graphics only feature. It is a pixel shader input for a primitive type. The first component has sign bit = 1, this is a point. Values in other bits are undefined. The second component has sign bit = 1, this is a line. Values in other bits are undefined. The third and fourth components of this register are undefined. This read-only register cannot be the destination of an instruction. This register cannot
5-20
PRIMCOORD
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
be used with relative addressing. It is an error to use this register in a vertex or geometry shader.
5.39 PSOUTFOG
Enum: IL_REGTYPE_PS_OUT_FOG Text Syntax: oPsFog# Components per Register: 1 Description: Pixel shader output for fog factor. The first component contains a fog factor. The second, third, and fourth components of this register are ignored. This write-only register cannot be used with relative addressing. It is an error to use this register in a vertex or geometry shader.
5.40 QUAD_INDEX
Enum: IL_REGTYPE_QUAD_INDEX Text Syntax: vQuadIndex Components per Register: 4 Description: This register type is valid only for HOS rendering. For normal, non-HOS rendering, this register type is invalid and its contents are undefined. For HOS rendering, the first component is a quad index generated by the tessellation engine for certain types of HOS rendering. This read-only register cannot be used with relative addressing. It is an error to use this register in a pixel shader.
5.41 SECCOLOR
Enum: IL_REGTYPE_SECCOLOR Text Syntax (VS): oSecColor# {write-only) Text Syntax (PS): vSecColor# (read-only) Components per Register: 4 Description: Vertex shader export and pixel shader import for interpolated secondary color data. By convention, register number 0 represents the front-facing color, while register number 1 represents the back-facing color. When flat shading is used (AS_SHADE_MODE is set to FLAT), no interpolation is performed on the values of these registers when passed from the vertex shader to the pixel shader. Instead, only the value of the provoking vertex is passed to the pixel shader.
PSOUTFOG
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-21
When smooth shading is used (AS_SHADE_MODE is set to SMOOTH), perspective correct interpolation is performed on the values. This register cannot be used with relative addressing. A DCLPI instruction must be issued on this shader type before is it used in any other instruction. It is an error to use a register of this type in a vertex shader if a VOUTPUT register is used. You can achieve similar functionality by using the DCLVOUT instruction on a VOUTPUT register and declaring its usage as IL_IMPORTUSAGE_COLOR or IL_IMPORTUSAGE_BACKCOLOR. It is an error to use a register of this type in a pixel shader if a PINPUT register is used. You can achieve similar functionality by using the DCLPIN instruction on a PINPUT register and declaring its usage as IL_IMPORTUSAGE_COLOR or IL_IMPORTUSAGE_BACKCOLOR. In a vertex shader, these are write-only registers. In a pixel shader, these are read-only registers.
5.42 SHADER_INSTANCE_ID
Enum: IL_REGTYPE_SHADER_INSTANCE_ID Text Syntax: vInstanceID Components per Register: 1 Description: This read-only register is used by the geometry shader or hull shader as a system-generated input. Example:
mov r1, vInstanceID
5.43 SHARED_TEMP
Enum: IL_REGTYPE_SHARED_TEMP Text Syntax: sr# (where # is any shared register number) Components per Register: 4 Description: This read/write register is shared by all wavefronts running on a SIMD. Only absolute address mode is allowed; for example: sr2. It is used only in compute shaders. Operations on shared registers are guaranteed atomic only when the read and write occur in the same instruction. Supported on R7XX and Evergreen GPUs. Example:
5-22 SHADER_INSTANCE_ID
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5.44 SPRITE
Enum: IL_REGTYPE_SPRITE Text Syntax: oSprite Components per Register: 1 Description: Vertex shader export for point size. The first component contains the point size. This write-only register cannot be the source of an instruction. This register cannot be used with relative addressing. It is an error to use this register if a VOUTPUT register is used. You can achieve similar functionality by using the DCLVOUT instruction on a VOUTPUT register and declaring its usage as IL_IMPORTUSAGE_POINTSIZE. It is an error to use this register in a pixel shader. The second, third and fourth components of this register are undefined.
5.45 SPRITECOORD
Enum: IL_REGTYPE_SPRITECOORD Text Syntax: vSpriteCoord Components per Register: 2 Description: Pixel shader input for sprite texture coordinate. The first and second components contain the pixels S and T coordinate for the point primitive rendered. The values of this register are undefined if the primitive rendered is not a point. The third and fourth components of this register are undefined. This read-only register cannot be the destination of an instruction. This register cannot be used with relative addressing. It is an error to use this register in a vertex shader.
5.46 STENCIL
Enum: IL_REGTYPE_STENCIL Text Syntax: oSTENCIL Components per Register: 1 Description: This is a scalar register where the stencil value is contained in the first component. The second, third, and fourth components of this register are unused
SPRITE
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-23
and undefined. Values assigned to Stencil are treated as integer. This write-only register cannot be the source of an instruction. It is an error to use this register in a vertex shader. This register cannot be used with relative addressing.
5.47 TEMP
Enum: IL_REGTYPE_TEMP Text Syntax: r Components per Register: 4 Description: This is a simple, non-indexable, read-write temporary register.
5.48 TEXCOORD
Enum: IL_REGTYPE_TEXCOORD Text Syntax (VS): oT# (write-only) Text Syntax (PS): vT# (read-only) Components per Register: 4 Description: Vertex shader export and pixel shader import interpolated for texture coordinate data. Perspective correct interpolation is performed on the values of these registers when passed from the vertex shader to the pixel shader. A DCLPI instruction must be issued on this shader type before is it used in any other instruction. It is an error to use a register of this type in a vertex shader if a VOUTPUT register is used. You can achieve similar functionality by using the DCLVOUT instruction on a VOUTPUT register and declaring its usage as IL_IMPORTUSAGE_GENERIC. In a vertex shader, this write-only register cannot be the source of an instruction. In a vertex shader, this register can be used with loop-relative addressing as a destination only. It is an error to use a register of this type in a pixel shader if a PINPUT register is used. You can achieve similar functionality by using the DCLPIN instruction on a PINPUT register and declaring its usage as IL_IMPORTUSAGE_GENERIC. In a pixel shader, this is a read-only register. In a pixel shader, this register can be used with loop-relative addressing as a source only.
5-24
TEMP
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5.49 THIS
Enum: IL_REGTYPE_THIS Text Syntax: this Components per Register: 4 Description: This read-only register can be used only within the body of a virtual function/interface. The four values contain instance data for the function:
x: selects the constant buffer. y: specifies the offset into the selected constant buffer where data for this instance begins. z: contains the instance sample ID. w: contains the instance texcoord.
This is always indexed, so this[r0.x] refers to the instance with number r0.x. Valid in Evergreen GPUs and later. Example:
mov r1.y, this[1].x mov r1, this[r3.w + 2].z
5.50 THREAD_GROUP_ID
Enum: IL_REGTYPE_THREAD_GROUP_ID Text Syntax: vThreadGrpId Components per Register: 3 Description: This read-only input register contains the work-group ID, which is threedimensional. The x, y, and z components of this register can be used as an index or in integer operations. The w component is not valid; do not use it. This register is used only in a compute shader. Valid in R7XX GPUs and later. Example:
mov r2, vThreadGrpId.xyzy mov g[vThreadGrpId.x], r2
THIS
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-25
5.51 THREAD_GROUP_ID_FLATTENED
Enum: IL_REGTYPE_THREAD_GROUP_ID_FLAT Text Syntax: vThreadGrpIdFlat (also as vTGroupid for back-compatibility) Components per Register: 1 Description: This read-only input register contains the flattened work-group ID. It assumes the number of work-groups to be dispatched is (Dx, Dy, Dz). The flattened value is computed as: vThreadGrpIdFlat.x = vThreadGrpId.z*Dx*Dy + vThreadGrpId.x + vThreadGrpId.y*Dx
This register can be used as index or in integer operations. Only the x component has a meaningful value. The y, z, and w components replicate the value of the x component. This register is used only in a compute shader. Valid in R7XX GPUs and later. Example:
mov g[vThreadGrpIdFlat.x], r2
5.52 THREAD_ID_IN_GROUP
Enum: IL_REGTYPE_THREAD_ID_IN_GROUP Text Syntax: vTidInGrp Components per Register: 3 Description: Read-only register. This read-only input register contains the work-item ID within a work-group. The ID is three-dimensional. The xyz components of this register can be used as an index or in an integer operation. The w component is not valid and must not be used. This register is used only in a compute shader. Example: mov r1, vTidInGrp.xyzz mov g[vTidInGrp.x], r1
5-26
THREAD_GROUP_ID_FLATTENED
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5.53 THREAD_ID_IN_GROUP_FLATTENED
Enum: IL_REGTYPE_THREAD_ID_IN_GROUP_FLAT Text Syntax: vTidInGrpFlat (also as vTid for back-compatibility) Components per Register: 1 Description: This read-only input register contains the flattened work-item ID within a workgroup. It assumes the work-group size is (Dx, Dy, Dz). The flattened value is computed as: vTidInGrpFlat.x = vTidInGrp.z*Dx*Dy vTidInGrp.x + vTidInGrp.y*Dx +
This register can be used as an index or in integer operations. Only the x component has a meaningful value. The y, z, and w components replicate the value of the x component. It is used only in a compute shader. Valid in R7XX GPUs and later. Example:
mov g[vTidInGrpFlat.x], r2
5.54 TIMER
Enum: IL_REGTYPE_TIMER Text Syntax: Tmr Components per Register: 2 Description: The current value of the cycle timer. This time as an absoulute cycle count and is incremented even when the shader instance is not active. The result is a 64bit unsigned integer value returned as Timer.xy. This read-only register is available to any type of shader, not just compute shaders. The third and fourth commponents of this register are undefined. The timer can only appear as the source of a move instruction. No source modifiers are allowed on this input. The counter is an implementation-dependent measure of cycles in the GPU engine. A single reading of the cycle counter is meaningless. But any shader invocation can poll the counter value any number of times. Computing a delta from cycle counter readings within a shader invocation is meaningful. Computing a delta from cycle counter readings across separate shader invocations is not
THREAD_ID_IN_GROUP_FLATTENED
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-27
meaningful on all hardware. Since execution of a shader can be interrupted by wavefront switching, delta measurements are arbitrarily larger than the number of cycles spent executing instructions in a given work-item. There is no supported way to find out the frequency of the counter. There is no way to correlate this shader internal counter with external timers such as asynchronous time queries. If the GPU speed changes (for power saving), there is no way to know this happened, or its effect on cycle measurements. The compiler treats reads of the cycle counter as memory barriers. In addition, instructions cannot be moved across a counter read, and counter reads cannot be merged. The x value of the timer is the low 32 bits LSB of the counter, the y value is the upper bits. The counter wraps on overflow. There is only one timer, so tmr4 is an error. Valid for Evergreen GPUs and later.
5.55 VERTEX
Enum: IL_REGTYPE_VERTEX Text Syntax: v# Components per Register: 4 Description: Read-only register. Input to a vertex shader that typically is generated in a previous phase and passed to the current phase. It is most frequently passed as values in the .xy channels, although all channels are available.
5.56 VOUTPUT
Enum: IL_REGTYPE_VOUTPUT Text Syntax: oVtxOut# Components per Register: 4 Number Per Shader: 18 Description: Vertex shader data. This write-only register cannot be the source of an instruction. It is an error to use this register in a pixel shader. This register can be used with loop-relative addressing as a destination only. A DCLVOUT instruction must be issued on a register of this type before it is used. It is an error to use this register if a POS, SPRITE, INTERP, TEXCOORD, PRICOLOR, SECCOLOR, or FOG register is used.
5-28
VERTEX
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5.57 VPRIM
Enum: IL_REGTYPE_VPRIM Text Syntax: vPrim Components per Register: 4 Description: This is a scalar that can be applied to each interior primitive in a geometry shader.
5.58 vWINCOORD
Enum: IL_REGTYPE_WINCOORD Text Syntax: vWinCoord.xy Components per Register: 2 Description: Pixel shader import for screen position data. The first and second components are the X and Y position of the pixel in the domain of execution. The third component is the Z coordinate of the pixel in window space. The fourth component is W. A DCLPI instruction must be issued on this shader type before is it used in any other instruction. The DCLPI instruction specifies whether the X and Y coordinate is relative to the lower-left or upper-left corner of the window and whether it represents the center or upper-left corner of the pixel. This read-only register cannot be the destination of an instruction. It is an error to use this register in a vertex shader. This register cannot be used with relative addressing.
VPRIM
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
5-29
5-30
vWINCOORD
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
This chapter lists and briefly describes the AMD IL enumerated types.
6.1 ILAddressing
See IL_Dst (page 3) and IL_Src (page 5) for more information. Table 6.1
Enumeration IL_ADDR_ABSOLUTE (no = 0) IL_ADDR_REG_RELATIVE IL_ADDR_RELATIVE
6.2 ILAnisoFilterMode
See the TEXLD (page 103), TEXLDB (page 106), and TEXLDD (page 110) instructions for more information. Table 6.2
Enumeration IL_ANISOFILTER_DISABLED IL_ANISOFILTER_MAX_1_TO_1 IL_ANISOFILTER_MAX_16_TO_1 IL_ANISOFILTER_MAX_2_TO_1 IL_ANISOFILTER_MAX_4_TO_1 IL_ANISOFILTER_MAX_8_TO_1 IL_ANISOFILTER_UNKNOWN
6-1
6.3 ILCmpValue
See the CMP instruction (page 155) for usage. Table 6.3
Enumeration IL_CMPVAL_0_0 IL_CMPVAL_0_5 IL_CMPVAL_1_0 IL_CMPVAL_NEG_0_5 IL_CMPVAL_NEG_1_0
6.4 ILComponentSelect
See Section 2.2.7, Source Modifier Token, page 2-7 for usage. IL Text details for component selection can be found in Chapter 3, Text Instruction Syntax. Table 6.4
Enumeration IL_COMPSEL_0 IL_COMPSEL_1 IL_COMPSEL_W_A IL_COMPSEL_X_R IL_COMPSEL_Y_G IL_COMPSEL_Z_B
6.5 ILDefaultVal
See the DCLDEF instruction (page 53) for usage. Table 6.5 ILDefaultVal Enumeration Types
Text Syntax Description
Enumeration IL_DEFVAL _0
_<comp>(0) Indicates that the default value for this component of the where <comp> is x, y, z, register type is 0.0. or w Indicates that the default value for this component of the _<comp>(1) where <comp> is x, y, z, register type is 1.0. or w _<comp>(*) Indicates there is no default value for this component of the where <comp> is x, y, z, register type. or w Example: dcldef_z(*)_w(*)
IL_DEFVAL_1
IL_DEFVAL_NONE
6-2
ILCmpValue
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6.6 ILDivComp
See Section 3.6, Source Modifiers, page 3-3 for usage. Table 6.6
Enumeration IL_DIVCOMP_NONE IL_DIVCOMP_UNKNOWN
IL_DIVCOMP_W
_divcomp(w)
IL_DIVCOMP_Y
_divcomp(y)
IL_DIVCOMP_Z
_divcomp(z)
6.7 ILElementFormat
Table 6.7
Enumeration IL_ELEMENTFORMAT_FLOAT IL_ELEMENTFORMAT_MIXED IL_ELEMENTFORMAT_SINT IL_ELEMENTFORMAT_SNORM IL_ELEMENTFORMAT_SRGB IL_ELEMENTFORMAT_UINT IL_ELEMENTFORMAT_UNKNOWN IL_ELEMENTFORMAT_UNORM
6.8 ILFirstBitType
Table 6.8
Enumeration IL_FIRSTBIT_TYPE_LOW_UINT IL_FIRSTBIT_TYPE_HIGH_UINT IL_FIRSTBIT_TYPE_HIGH_INT
ILDivComp
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-3
6.9 ILImportComponent
See the DLCPI (page 54), DCLPIN (page 56), and DCLVOUT (page 62) instructions for usage. Table 6.9 ILImportComponent Enumeration Types
Text Syntax Description
Enumeration IL_IMPORTSEL_DEFAULT0
_<comp>(0) This component is enabled and can be used in a where <comp> is x, y, z, or w vertex shader. If this register or this component of this register is not written to, the component defaults to 0.0 when used as a source or when the shader terminates. In a pixel shader, if this register or this component of this register is not exported in the vertex shader, the component is set to 0.0. If used in with the DCLVOUT instruction in the vertex shader, this component will default to 0.0 if it is not written to. The component is considered to be exported in the vertex shader in this case; thus, any component in a pixel shader mapped to this component will be set to 0.0 regardless of its default value set by the DCLPIN instruction.
IL_IMPORTSEL_DEFAULT1
_<comp>(1) This component is enabled and can be used. In where <comp> is x, y, z, or w a vertex shader, if this register or this component of this register is not written to, the component defaults to 1.0 when used as a source or when the shader terminates. In a pixel shader, if this register or this component of this register is not exported in the vertex shader, set the component to 1.0. If used with the DCLVOUT instruction in the vertex shader, this component defaults to 1.0 if its not written to. The component is considered to be exported in the vertex shader in this case; thus, any component in a pixel shader mapped to this component is set to 1.0, regardless of its default value set by the DCLPIN instruction.
IL_IMPORTSEL_UNDEFINED _<comp>(*) This component is enabled and can be used. If where <comp> is x, y, z, or w this register or this component of the register is not exported in the vertex shader, the value of the component is undefined (the value of the component does not matter). IL_IMPORTSEL_UNUSED _<comp>(-) This component is disabled and cannot be used where <comp> is x, y, z, or w in the shader. It is an error to reference the component in the shader. Example: dclpi_z(-)_w(-)
6-4
ILImportComponent
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6.10 ILImportUsage
See the DCLVOUT (page 62) and DCLPIN (page 56) instructions for more information. Table 6.10
Enumeration
IL_IMPORTUSAGE_BACKCOLOR _usage(backcolor)
IL_IMPORTUSAGE_COLOR
_usage(color)
When used to declare a vertex shader output, this usage indicates the register contains a color value to be interpolated across the primitive and passed to the pixel shader once the shader terminates. The processed value can be read from the PINPUT register declared with this usage and the matching usageIndex. When used to declare a pixel shader input, this usage indicates the register contains a color value that has been interpolated across the primitive. The value originates from the VOUTPUT register in the vertex shader declared with this usage and matching usageIndex. Hardware can use lower-precision interpolators for colors.
IL_IMPORTUSAGE_DENSITY_TE _usage(density_tessfact Can be used only in a hull shader to declare an SSFACTOR or) output is a density tessfactor. IL_IMPORTUSAGE_DETAIL_TES _usage(detail_tessfacto Can be used only in a hull shader to declare an SFACTOR r) output is a detail tessfactor. IL_IMPORTUSAGE_EDGE_TESS _usage(edge_tessfactor Can be used only in a hull shader to declare an FACTOR ) output is an edge tessfactor.
ILImportUsage
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-5
Table 6.10
Enumeration
IL_IMPORTUSAGE_FOG
IL_IMPORTUSAGE_GENERIC
_usage(generic)
When used to declare a vertex shader output, this usage indicates the register contains a generic value to be interpolated across the primitive and passed to the pixel shader once the shader terminates. The processed value can be read from the PINPUT register declared with this usage and the matching usageIndex. When used to declare a pixel shader input, this usage indicates the register contains a generic value that has been interpolated across the primitive. The value originates from the VOUTPUT register in the vertex shader declared with this usage and matching usageIndex.
IL_IMPORTUSAGE_INSIDE_TES _usage(inside_tessfacto Can be used only in a hull shader to declare an SFACTOR r) output is an inside tessfactor. IL_IMPORTUSAGE_POINTSIZE _usage(pointsize) When used to declare a vertex shader output, this usage indicates the x component of the register contains the vertices point size when the shader terminates. usageIndex must be zero when this usage is set. Only VOUTPUT register 1 can have this usage. This usage cannot be used in a pixel shader input. IL_IMPORTUSAGE_POS _usage(pos) When used to declare a vertex shader output, this usage indicates the register contains the vertices position when the shader terminates. usageIndex must be zero when this usage is set. Only VOUTPUT register 0 can have this usage. This usage cannot be used in a pixel shader input. IL_IMPORTUSAGE_WINCOORD _usage(wincoord) Can only be used in a pixel shader. The x, y, z, w values correspond to the screen position.
6-6
ILImportUsage
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6.11 ILInterpMode
Table 6.11
Enumeration IL_INTERPMODE_CONSTANT IL_INTERPMODE_LINEAR IL_INTERPMODE_LINEAR_CENTROID IL_INTERPMODE_LINEAR_NOPERSPECTIVE IL_INTERPMODE_LINEAR_NOPERSPECTIVE_SAMPLE IL_INTERPMODE_LINEAR_SAMPLE IL_INTERPMODE_NOTUSED = 0
IL_INTERPMODE_LINEAR_NOPERSPECTIVE_CENTROID _interp(noper_centroid)
6.12 ILLanguageType
Table 6.12
Enumeration IL_LANG_DX10_GS IL_LANG_DX10_PS IL_LANG_DX10_VS IL_LANG_DX11_CS IL_LANG_DX11_DS IL_LANG_DX11_GS IL_LANG_DX11_HS IL_LANG_DX11_PS IL_LANG_DX11_VS IL_LANG_DX8_PS IL_LANG_DX8_VS IL_LANG_DX9_PS IL_LANG_DX9_VS IL_LANG_GENERIC IL_LANG_OPENGL
6.13 ILLdsSharingMode
Table 6.13
Enumeration IL_LDS_SHARING_MODE_RELATIVE IL_LDS_SHARING_MODE_ABSOLUTE
ILInterpMode
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-7
6.14 ILLoadStoreDataSize
Table 6.14
Enumeration IL_LOAD_STORE_DATA_SIZE_DWORD IL_LOAD_STORE_DATA_SIZE_SHORT IL_LOAD_STORE_DATA_SIZE_BYTE
IL LOAD_STORE_DATA_SIZE
Description Dword, 32 bits. Short, 16 bits. Byte, 8 bits.
6.15 ILLogicOp
Table 6.15
Enumeration IL_LOG_EQ IL_LOG_NE
6.16 ILMatrix
Table 6.16
Enumeration IL_MATRIX_3X2 IL_MATRIX_3X3 IL_MATRIX_3X4 IL_MATRIX_4X3 IL_MATRIX_4X4
6.17 ILMipFilterMode
See the TEXLD (page 103), TEXLDB (page 106), and TEXLDD (page 110) instructions for more information. Table 6.17
Enumeration IL_MIPFILTER_BASE IL_MIPFILTER_LINEAR IL_MIPFILTER_POINT IL_MIPFILTER_UNKNOWN
6-8
ILLoadStoreDataSize
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6.18 ILModDstComponent
See Section 3.4, Destination Modifiers, page 3-2 for usage. IL Text details for write mask can be found in Section 3.5, Write Mask, page 3-3. Table 6.18
Enumeration IL_MODCOMP_0 IL_MODCOMP_1 IL_MODCOMP_NOWRITE IL_MODCOMP_WRITE
6.19 ILNoiseType
See the NOISE instruction (page 191) for more details. Table 6.19 ILNoiseType Enumeration Types
Enumeration IL_NOISETYPE_PERLIN1D IL_NOISETYPE_PERLIN2D IL_NOISETYPE_PERLIN3D IL_NOISETYPE_PERLIN4D Text Syntax _type(perlin1D) _type(perlin2D) _type(perlin3D) _type(perlin4D) Description Compute 1D Perlin noise function. Compute 2D Perlin noise function. Compute 3D Perlin noise function. Compute 4D Perlin noise function.
6.20 ILOpcode
Table 6.20
Enumeration IL_OP_ABS IL_DCL_CONST_BUFFER IL_DCL_INDEXED_TEMP_ARRAY IL_DCL_INPUT IL_DCL_INPUT_PRIMITIVE IL_DCL_LITERAL
IL_DCL_MAX_OUTPUT_VERTEX_COUNT Used to declare the maximum number of vertices that will be emitted by a shader IL_DCL_ODEPTH IL_DCL_OUTPUT IL_DCL_OUTPUT_TOPOLOGY IL_DCL_PERSIST IL_DCL_RESOURCE Used to declare that the pixel shader intends to write to its scalar output oDepth register. Declares an output register. Used to declare the output topology of a primitive. Used to declare the amount of persistent storage used by a shader. Declares an input buffer.
ILModDstComponent
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-9
Table 6.20
Enumeration
IL_OP_AND IL_OP_ASIN IL_OP_ATAN IL_OP_BREAK IL_OP_BREAK_LOGICALNZ IL_OP_BREAK_LOGICALZ IL_OP_BREAKC IL_OP_CALL IL_OP_CALL_LOGICALNZ IL_OP_CALL_LOGICALZ IL_OP_CALLNZ IL_OP_CASE IL_OP_CLAMP IL_OP_CLG IL_OP_CMOV
IL_OP_CMOV_LOGICAL
IL_OP_CMP
6-10
ILOpcode
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 6.20
Enumeration
IL_OP_D_MUL IL_OP_D_NE IL_OP_DCLARRAY IL_OP_DCLDEF IL_OP_DCLPI IL_OP_DCLPIN IL_OP_DCLPP IL_OP_DCLPT IL_OP_DCLV IL_OP_DCLVOUT IL_OP_DEF IL_OP_DEFAULT IL_OP_DEFB IL_OP_DET IL_OP_DISCARD_LOGICALNZ
ILOpcode
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-11
Table 6.20
Enumeration
IL_OP_DISCARD_LOGICALZ
IL_OP_DIST IL_OP_DIV
IL_OP_DP2 IL_OP_DP2ADD IL_OP_DP3 IL_OP_DP4 IL_OP_DST IL_OP_DSX IL_OP_DSY IL_OP_DXSINCOS IL_OP_ELSE IL_OP_EMIT IL_OP_EMIT_THEN_CUT IL_OP_END IL_OP_ENDFUNC IL_OP_ENDIF IL_OP_ENDLOOP IL_OP_ENDMAIN IL_OP_ENDSWITCH IL_OP_EQ IL_OP_EXN IL_OP_EXP IL_OP_EXP_VEC IL_OP_EXPP IL_OP_F_2_D IL_OP_FACEFORWARD
IL_OP_FLR IL_OP_FRC
6-12
ILOpcode
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 6.20
Enumeration IL_OP_FTOI IL_OP_FTOU IL_OP_FUNC
IL_OP_I_MAX
IL_OP_I_MIN
IL_OP_I_MUL
IL_OP_I_MUL_HIGH
IL_OP_I_NE IL_OP_I_NEGATE IL_OP_I_NOT IL_OP_I_OR IL_OP_I_SHL IL_OP_I_SHR IL_OP_I_XOR IL_OP_IF_LOGICALNZ IL_OP_IF_LOGICALZ IL_OP_IFC
ILOpcode
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-13
Table 6.20
Enumeration IL_OP_IFNZ IL_OP_INITV IL_OP_ITOF IL_OP_KILL
IL_OP_MAX
IL_OP_MMUL IL_OP_MOD
IL_OP_MOV
6-14
ILOpcode
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 6.20
Enumeration IL_OP_MUL
IL_OP_NE IL_OP_NOISE IL_OP_NOP IL_OP_NRM IL_OP_PIREDUCE IL_OP_POW IL_OP_PROJECT IL_OP_RCP IL_OP_REFLECT IL_OP_RESINFO IL_OP_RET IL_OP_RET_DYN IL_OP_RET_LOGICALNZ
IL_OP_RET_LOGICALZ
ILOpcode
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-15
Table 6.20
Enumeration
IL_OP_U_GE IL_OP_U_LT
6-16
ILOpcode
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table 6.20
Enumeration
IL_OP_U_MAD
IL_OP_U_MAX
IL_OP_U_MIN
IL_OP_U_MOD
IL_OP_U_MUL
IL_OP_U_MUL_HIGH
6.21 ILOutputTopology
Primitive types that can be output from a geometry shader Table 6.21
Enumeration IL_OUTPUT_TOPOLOGY_LINESTRIP IL_OUTPUT_TOPOLOGY_POINTLIST IL_OUTPUT_TOPOLOGY_TRIANGLE_STRIP
ILOutputTopology
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-17
6.22 ILPixTexUsage
There are a maximum of eight values. See DCLPT instruction ( page 59) for more information. Table 6.22
Enumeration IL_USAGE_PIXTEX_1D IL_USAGE_PIXTEX_1DARRAY IL_USAGE_PIXTEX_2D IL_USAGE_PIXTEX_2DARRAY IL_USAGE_PIXTEX_2DARRAYMSAA IL_USAGE_PIXTEX_2DMS_ARRAY IL_USAGE_PIXTEX_2DMSAA IL_USAGE_PIXTEX_3D IL_USAGE_PIXTEX_4COMP IL_USAGE_PIXTEX_BUFFER IL_USAGE_PIXTEX_CUBEMAP IL_USAGE_PIXTEX_CUBEMAPARRAY IL_USAGE_PIXTEX_UNKNOWN
6.23 ILRegType
See Chapter 5, Register Types, for information on the IL register types.
6.24 ILRelOp
See IFC, CONTINUEC, BREAKC, CMP, and SET for usage. Table 6.23
Enumeration IL_RELOP_EQ IL_RELOP_GE IL_RELOP_GT IL_RELOP_LE IL_RELOP_LT IL_RELOP_NE
6-18
ILPixTexUsage
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6.25 ILShader
Table 6.24
Enumeration IL_SHADER_COMPUTE IL_SHADER_DOMAIN IL_SHADER_GEOMETRY IL_SHADER_HULL IL_SHADER_PIXEL IL_SHADER_VERTEX
6.26 ILShiftScale
See Section 3.4, Destination Modifiers, page 3-2 for usage. Table 6.25
Enumeration IL_SHIFT_D2 IL_SHIFT_D4 IL_SHIFT_D8 IL_SHIFT_NONE IL_SHIFT_X2 IL_SHIFT_X4 IL_SHIFT_X8 _x2 _x4 _x8
ILShader
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-19
6.27 ILTexCoordMode
See DCLPT instruction for more information. Table 6.26 ILTexCoordMode Enumeration Types
Enumeration IL_TEXCOORDMODE_NORMALIZED Text Syntax _coordmode(normalized) Description The texture coordinates given in the texture load instructions are nonparametric. (the coordinate range [0.0-1.0] spans the entire texture) At shader create time, it is not known if the texture coordinates given in the texture load instructions are normalized. Instead, this is determined at shader run time based on a state value. The texture coordinates given in the texture load instructions are parametric. (the coordinate range [0.0, dimension of the texture] spans the entire dimension of the texture)
IL_TEXCOORDMODE_UNKNOWN
_coordmode(unknown)
IL_TEXCOORDMODE_UNNORMALIZED _coordmode(unnormalized)
6.28 ILTexFilterMode
See the TEXLD (page 103), TEXLDB (page 106), and TEXLDD (page 110) instructions for more information. Table 6.27
Enumeration IL_TEXFILTER_ANISO
IL_TEXFILTER_LINEAR
IL_TEXFILTER_POINT
IL_TEXFILTER_UNKNOWN
6-20
ILTexCoordMode
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6.29 ILTexShadowMode
See the TEXLD (page 103), TEXLDB (page 106), and TEXLDD (page 110) instructions for more information. Table 6.28
Enumeration IL_TEXSHADOWMODE_NEVER IL_TEXSHADOWMODE_UNKNOWN IL_TEXSHADOWMODE_Z
6.30 ILTopologyType
Primitive types that can be input to a geometry shader.
ILLogicOp
Table 6.29
Enumeration
6.31 ILTsDomain
Tessellation domain declared in the hull shader. Table 6.30
Enumeration IL_TS_DOMAIN_ISOLINE
IL_TS_DOMAIN_QUAD IL_TS_DOMAIN_TRI
6-21
6.32 ILTsOutputPrimitive
Tessellation domain declared in the hull shader. Table 6.31
Enumeration IL_TS_OUTPUT_LINE IL_TS_OUTPUT_POINT IL_TS_OUTPUT_TRIANGLE_CW IL_TS_OUTPUT_TRIANGLE_CCW
6.33 ILTsPartition
Tessellation partitioning declared in the hull shader. Table 6.32
Enumeration IL_TS_PARTITION_FRACTIONAL_EVEN IL_TS_PARTITION_FRACTIONAL_ODD IL_TS_PARTITION_INTEGER IL_TS_PARTITION_POW2
6.34 ILZeroOp
See the RSQ (page 201), RCP (page 196), LOG (page 179), LOGP (page 181), LN (page 178), NRM (page 192), and DIV (page 161) instructions for more information. Table 6.33
Enumeration IL_ZEROOP_0
IL_ZEROOP_FLTMAX IL_ZEROOP_INF_ELSE_MAX
_zeroop(fltmax) _zeroop(inf_else_max)
IL_ZEROOP_INFINITY
_zeroop(infinity)
6-22
ILTsOutputPrimitive
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
ILZeroOp
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
6-23
6-24
ILZeroOp
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Chapter 7 Instructions
An IL stream consists mainly of IL instruction packets. Each packet begins with an IL_Opcode token. The type and number of tokens that follow depends upon the value of the code field in the initial IL_Opcode token, as well as the modifier_present field in each of the following IL_Dst tokens and IL_Src tokens. This chapter describes each instruction packet, including its operation and usage within the IL stream. Some instructions are defined by pseudo code. In this chapter, the symbol V[i] refers to source i post swizzle.
7.1 Formats
Most instructions use a standardized format. Rather than repeat the format in each instruction, the common format is given here. Tokens have the following order: 1. Opcode token. 2. Destination information (zero or one set of tokens: dst token, any relative addressing, any modifiers). 3. Source information (zero or more sets of tokens), src token, any relative addressing information, any modifiers. Formats include: zero or one destination, followed by any number of sources. See the specific opcode to determine the number of destinations/sources allowed.
7-1
Assuming that l5.x contains 0x40A00000 (the IEEE OAT representation for 5.0), this can be written as
ieq r3.x, r1.x, r2.y iand r3.x, r3.x, l5.x }}
Integer instructions use signed arithmetic when comparing operands; unsigned integer instructions use unsigned arithmetic. Float instructions use signed arithmetic when comparing operands; however, denorms are flushed before all float instructions (the original source registers remain untouched). Thus, +0 and -0 are equivalent for float comparisons. Also, float instructions return FALSE when either operand holds NaN. See the IEEE 754 documentation for more information on floating point rules. The Result of a double compare is either 0, or 0xFFFFFFFF is broadcast to all destination channels. The IL compare always looks at the first two components of the sources. So it can compute one 32-bit result. The result is broadcast to all channels of the destination. The corresponding DX instruction can compare two values and produce two results. The behavior of an int64/uint64 compare is similar to that of the double compare: the compare always looks at _rst two components of the source and broadcasts the result to all destination channels.
7.2.2
A subroutine: code between FUNC and RET, or between FUNC and ENDFUNC. Between IFNZ and ENDIF. Between IFC and ENDIF. Between LOG_IF and ENDIF. Between ELSE and ENDIF. Between IFNZ and ELSE. Between IFC and ELSE. Between LOOP and ENDLOOP.
7-2
Instruction Notes
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
There are two forms of loop: LOOP/ENDLOOP used for DX9-style counted loops, and WHILELOOP/ENDLOOP used for DX10-style while loops. The following are the restrictions on when particular control flow instructions are allowed.
LOOPs and WHILELOOPs must terminate in the same flow-control-block in which they begin. END, ENDMAIN, and ENDFUNC cannot be placed within a flow-controlblock.
7.2.3
Many instructions use the control field to indicate the resource and sampler used. If the indexed_args bit is set to 1, there are two additional source arguments, corresponding to resource index and sampler index. These arguments can be either a register or a literal. The resource-index argument is added to the controls_resource field to form the final resource index. The sampler-index argument is added to the control_sampler field to form the final sampler index. If the instruction takes only a resource, the controls_sampler field is ignored, and the sampler-index argument must be literal 0. If the pri_modifier_present bit is set to 1, the Dword following the op token is the primary modifier. IL PrimarySample Mod for sample instructions from 4.0 and later shader models.
1:0 IL_PrimarySample_Mod 3:2 4 31:5 gather4_comp_sel tex_coord_type is_uav reserved
Instruction Notes
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-3
Holds the value of the enumerated type ILComponentSelect, only IL_COMPSEL_X_R, IL_COMPSEL_Y_G, IL_COMPSEL_Z_B, IL_COMPSEL_W_A are valid. If present, specifies the component to fetch for a multi-component texture resource. If absent, x channel is fetched. Applicable to fetch4, fetch4c, fetch4po and fetch4poc. Example: fetch4 resource(n) sampler(m)[ compselect(comp)] dst, src0
Holds the value of the enumerated type ILTexCoordMode. Applicable to sample, sample_b, fetch4, fetch4c, fetch4po, fetch4poc, sample_g, sample_l, sample_c_lz, sample_c, sample_c_g, sample_c_l, sample_c_b. Example: sample resource(n) sampler(m)[ coordtype(ILTexCoordMode)] dst, src0
1 if it is a UAV; otherwise, 0. Applicable to resinfo and bufinfo. Cannot be enabled when index args field is enabled. Example: bufinfo resource(n)[ uav] dst
If sec_modifier_present bit is set to 1, the next Dword is the secondary modifier. If the indexed_args bit is set to 1, the next Dword is the resource format token ILPixTexUsage. If the aoffset_present bit is set to 1, the next Dword is the address offset. For example:
sample_ext_resource(n)_sampler(m)[_resourcetype(pixtexusage)][_addroffimmi (u,v,w)] dst, src0,src1, scr2
7-4
Instruction Notes
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
final sampler index = src2 + m One extra Dword is required after the opcode token for resource format in Enum ILPixTexUsage.
7.2.4
7.2.5
All source double inputs must be in xy (after swizzle operations). For example: d_add r1.xy, r2.xy, r2.xy Or d_add r1.zw, r2.xy, r2.xy Each computes twice the value in r2.xy, and places the result in either xy or zw. The user can set the output mask to either xy or zw. The msb is in y/w, so users can test the sign of the result with single precision operations. All inputs are in the first two components xy of each source. These instructions are supported on RX70 GPUs (R670, R770, Cypress, etc.).
7.2.6
7.2.6.1 IAND, IOR, IXOR, INOT It is often useful to treat a vector element as if it were a vector of 32 individual bits. The IL language provides a set of logical operations which operate simultaneously on each bit of an element. Each of these operations reads the components of the sources, applies the operation to each separate bit, and writes the 32-bit result into the corresponding component of dst. Logical NOT computes the 1s complement of each 32-bit value in src0.
Instruction Notes
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-5
Logical operations do not support input or output modifiers. 7.2.6.2 Simple Arithmetic Instructions The IL provides a set of simple arithmetic operations. Each of these operations reads the components of the sources(src0, src1, etc.), applies the operation, and writes the 32-bit result into the corresponding component of dst. For all of these operations, the control field must be zero. Integer and unsigned integer operations are available on all GPUs, starting with R6XX series. Double operations are available on all GPUs that support double starting, with the R7XX series.
7.2.7
7.2.8
7.2.9
7.2.10
7-6
Instruction Notes
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7.2.10.1 LDS Access The access for work-items within a wavefront differs from the access for different wavefronts (within a work-group); it is specified by the so-called Sharing-Mode (relative versus absolute: if the sharing mode is relative, new and consecutive space is allocated for each wavefront; if it is absolute, all wavefronts are mapped to the same set of memory starting at address 0). In this mode, wavefronts can overwrite each others data. Owner-computes is a legacy mode; programmers are expected to move to random access when possible. The second compute model is a general read/write. Each work-item can read or write any address in the LDS. This model is supported on Evergreen GPUs and later. Both models allow work-items to read or write memory (video or system), but do not provide synchronization to memory. IL provides two ways to allocate LDS memory.
Owner-compute, which allocates addreseses in the LDS for each work-item. Random Access, which allocates the LDS independent of work-items.
Each style has read and write operations. It is not valid to mix and match. For example, using owner-compute allocate and random acess read is not expected to work correctly. 7.2.10.2 LDS Programming Model The Evergreen series of GPUs adds much functionality to the LDS and removes most of the R7XX series of GPUs LDS instructions. The programming model consists of the following. 1. There is one LDS per SIMD. Dx11 calls this group (or g) memory. 2. Work-items are organized in communicating units called work-groups. 3. All addresses to LDS are relative to the work-group; thus, no work-item can read data from different work-group. 4. Most LDS operations read or write Dwords from LDS memory; however, the address is in bytes. 5. When a work-group starts, the LDS memory is not in a known state; the application code must initialize it. 6. LDS references can be used only in compute shaders. 7. Both pixel and compute shaders can reference memory; DX11 calls this UAV memory. The current dcl_num_threads_per_group declares the number of work-items in a group. This statement is required in a kernel that uses LDS.
Instruction Notes
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-7
7.2.10.3 LDS Operations Starting with the Evergreen series of GPUs, IL supports a large number of binary atomic LDS operations. Each operation reads v as a scalar (32-bit) old value at a given LDS location, then combines v and src1.x using a specified operation, and stores the result back into LDS. There are addtional atomic read and op versions that return the orignal value of v. If multiple work-items execute the same atomic operation, then IL does not guarantee any specific ordering. Even repeated executions of the same sequence of instructions need not produce repeatable answers. The operation is a 32-bit integer ADD. If LDS is declared typeless, src0.x specifies a byte address relative to the workgroup. The address must be aligned to a Dword (the lower two bits of the address must be zero). a = src0.x/4 lds[a] += src1.x If the LDS is declared as a struct, src0.x specifies the index into the array, src0.y specifies the offset into the struct. The offset is in bytes. a = (src0.x *lds_stride + src0.y)/4 lds[a] += src1.x
7.2.11
There is one GDS across all SIMDs. Most GDS operations read or write Dwords from or to GDS memory; however, the address is in bytes. When an application starts, the GDS memory is not in a known state, so application code must initialize it. Both pixel and compute shaders can reference GDS memory.
Starting with the Evergreen family of GPUs, IL supports a large number of binary atomic GDS operations. Each operation reads v, which is a scalar (32-bit) value at a given GDS location, combines v and src1.x using a specified operation, and stores the result back into GDS. There are additional atomic read and op versions that return the original value of v.
7-8
Instruction Notes
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
If multiple work-items execute the same atomic operation, IL does not guarantee a specific ordering. Even repeated executions of the same sequence of instructions need not produce repeatable answers. Starting with the Evergreen family of GPUs, IL supports a large number of binary atomic GDS operations. Each operation reads v, a 32-bit scalar value at a given GDS location, combines v and src1.x using a specified operation, and returns v. If multiple work-items execute the same atomic operation, then IL does not guarantee any specific ordering. Even repeated executions of the same sequence of instructions need not produce repeatable answers.
7.2.12
Instruction Notes
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-9
If typed, the number of components used for the address depends on the dimension of the UAV. For examples, for Texture1D Arrays, src0.x provides the buffer address, and src0.y provides the index/offset of the array. For typed UAV access, the address is in elements (Dwords). a = src0.x uav[a] += src1.x If the UAV is declared as a struct, src0.x specifies the index into the array, src0.y specifies the offset into the struct. The offset is in bytes. a = (src0.x*lds_stride + src0.y)/4 uav[a] += src1.x The register src1.x provides a 32-bit Dword. The 32-bit UAV memory specified by the address in src0 is updated atomically by iadd(uav[src0], src1.x). Nothing is returned. Instructions with out-of-range addresses write nothing to the UAV surface, with the exception that for structured UAVs, if the offset is out-of-bounds, an undefined value is written to the UAV. If the kernel invocation is inactive, nothing is written to the UAV surface. The arena modifier differentiates between the regular UAV and arena UAV IDs. For example: uav_add_id(1) r0.x, r1.x, and uav_add_id(1)_arena r0.w, r1.x . When an atomic operation is performed on an arena UAV, the data size must be a Dword. Arena UAVs are supported only on Evergreen and Northern Island GPUs; the arena can be only UAV 8.
7.2.13
7.2.14
7-10
Instruction Notes
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
4. Most LDS operations read or write Dwords from LDS memory; however, the address is in bytes. 5. When a work-group starts, the LDS memory is not in a known state; thus, the application code must to initialize it. 6. LDS references can be used only in compute shaders. 7. Both pixel and compute shaders can reference memory. DX11 calls this UAV memory. The current dcl_num_threads_per_group is used to declare the number of work-items in a work-group, This statement is required in a shader that uses LDS.
Must be set to IL_OP_PREFIX Per-component precise control. If set to 1, the operation on the given component in the following instruction must remain precise (not refactorable). This control overrides REFACTORING_ALLOWED declared in the global flag.
precise x/y/z/w
Valid for Evergreen GPUs and later. Usage If components of a MAD instruction are tagged as PRECISE, the hardware must execute a MAD orexactequivalent, and cannot split it into a multiply followed by an add. Conversely, a multiply followed by an add, where either or both are flagged as PRECISE, cannot be merged into a fused MAD. This affects any operation, not just arithmetic.Take the following sequence of instructions as an example. 1. Write the value of the variable foo to memory address x in a UAV. 2. ... (execute any sequence of instructions) 3. Read from memory address x in the UAV.
Prefix Instruction
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-11
If REFACTORING_ALLOWED is present, the above sequence of instructions can be optimized so that the value normally be read from address X is replaced with the variable foo instead. This optimization does not occur if a memory fence operation is requested between the write and the read. If REFACTORING_ALLOWED is not declared for the shader, or if it is present but the read x is marked as PRECISE, the compiler/drivers must leave the read as is. This can reveal a behavior difference between the optimized version and the PRECISE version. For example, if memory address x is out of bounds of the UAV, the write does not happen, and the read out-of-bounds has some other well-defined behavior; thus, the read does not produce foo. Text Syntax In text, the prefix is specified on its following instruction. It takes two formats:
_prec: when precise x/y/z/w are all set. _precmask: where mask contains that channels enabled in precise x/y/z/w.
Examples Note that prec modifier is always added immediately following the opcode. add_prec r0.xyz , r1, r2 all output channels must remain precise. add_prec(y) r0.xyz , r1, r2 result in y must remain precise. uav_load_prec_id(1) r1, r2 all output channels of the load must remain precise.
7-12
Prefix Instruction
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Format Opcode
2-input. Field Name code control reserved Bits 15:0 18:16 31:19 Description IL_OP_BREAKC relop(op). See Table 6.23 on page 6-18. Must be zero.
Related
7-13
This form of Break compares a register component using integer compare to zero. The break is done if all bits of src0 are zero. src0 must have a swizzle that selects a single component. The break statement occurs only in a loop-statement or a switch statement. It causes termination of the smallest enclosing loop or switch statement; control passes to the statement following the terminated statement, if any. Valid for all GPUs.
Format Opcode
1-input. Field Name code reserved Bits 15:0 31:16 Description See Syntax, above. Must be zero.
Related
BREAK, BREAKC.
Unconditional CALL
Instructions Syntax Description CALL
Format Opcode
0-input. Token Field Name 1 code control sec_modifier_present pri_modifier_present 2 3 Must be zero. Unsigned integer representing label of the subroutine. Bits 15:0 29:16 30 31 Description IL_OP_CALL Must be zero. Must be zero. Must be zero.
Related 7-14
sec_modifier_present 30 pri_modifier_present 31 2
IL_Src token (src0) with register_type set to IL_REGTYPE_CONST_BOOL. The modifer_present and relative_address field must be set to 0. Field Name register_num register_type Remaining fields Bits 15:0 21:16 31:22 Description Any available register. CONST_BOOL Must be zero.
3 Related
7-15
IL_Src token and IL_Src_Mod token (if required) representing any valid IL Source. See Section 2.2.6, Source Token, page 2-5. Unsigned integer representing the label of the subroutine.
CALL, CALLNZ.
7-16
IL_Src token and IL_Src_Mod token (if required) representing any valid IL Source. See Section 2.2.6, Source Token, page 2-5. Unsigned integer representing the label of the subroutine.
CALL, CALLNZ
7-17
CASE Statement
Instructions Syntax Description CASE case case-value This case statement is used in a switch. This instruction is followed by a 32 integer that identifies the case. Compares are done using integer arithmetic. Falling through cases is valid, as in C. This can be used to implement a DX10 case instruction. See the SWITCH instruction for operation details. Valid for R6XX GPUs and later. Format Opcode 1-input, 0-output. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_OP_CASE Must be zero. Must be zero. Must be zero.
Unconditional CONTINUE
Instructions Syntax Description CONTINUE continue CONTINUE causes shader execution to continue at the previous LOOP or WHILELOOP instruction. This is used only in a LOOP - ENDLOOP instruction block. Valid for R6XX GPUs and later. Operation: If this is a counted loop { LoopIterationCount = LoopIterationCount - 1; LoopCounter = LoopCounter + LoopStep; LoopCounter = (LoopCounter > 0) ? LoopCounter : 0; if (LoopIterationCount > 0) Continue execution at the StartLoopOffset; } else { Continue execution at the StartLoopOffset; } Format Opcode 0-input. Field Name code reserved Related Bits 15:0 31:16 Description IL_OP_CONTINUE Must be zero.
7-18
Format Opcode
2-input. Field Name code control reserved Bits 15:0 18:16 31:19 Description IL_OP_CONTINUEC relop(op). See Table 6.23 on page 6-18. Must be zero.
Related
IL_OP_CONTINUE_LOGICALNZ continue_logicalnz src0 If src0.x 0 call <integer label>. IL_OP_CONTINUE_LOGICALZ continue_logicalnz src0 If src0.x == 0 call <integer label>. Description Conditionally continues execution at the beginning of the current loop: CONTINUE_LOGICAL_Z continues the loop if all bits of src0.x are zero. CONTINUE_LOGICAL_NE continues the loop if any bit of src0.x is not zero. Can only be within a LOOP - ENDLOOP switch block. Valid for R6XX GPUs and later. Format Opcode 1-input, 0-output. Field Name code reserved Related CONTINUE, CONTINUEC. Bits 15:0 31:16 Description See Syntax, above. Must be zero.
DEFAULT Statement
Instructions Syntax DEFAULT default
7-19
DEFAULT Statement
Description DEFAULT starts an instruction block within a SWITCH instruction block (see page 30). Unlike a CASE label, a DEFAULT label does not provide a value for comparison. This is like the default in C. Falling through or into a DEFAULT section is valid. There can be only one DEFAULT statement in each SWITCH block. Valid for R6XX GPUs and later. Format Opcode 0-input. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_OP_DEFAULT Must be zero. Must be zero. Must be zero.
ELSE
Instructions Syntax Description ELSE else This instruction is the start of the ELSE clause of an IFNZ-ELSE-ENDIF or IFC-ELSE-ENDIF or LOG_IF-ELSE-ENDIF block. ELSE must be after an IFC, IFNZ, or LOG_IF instruction in the stream. Valid for all GPUs. Format Opcode 0-input. Field Name code reserved Related Bits 15:0 31:16 Description IL_OP_ELSE Must be zero.
7-20
End of Stream
Instructions Syntax Description END end END indicates the end of an IL stream and must be the last statement in the stream. All shader programming, including subroutines, must be placed before this instruction. Valid for all GPUs. Format Opcode 0-input. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_OP_END Must be zero. Must be zero. Must be zero.
7-21
End of IF block
Instructions Syntax Description ENDIF endif Indicates the end of an IFNZ - ENDIF, IFC - ENDIF, IFNZ - ELSE - ENDIF, or IFC - ELSE ENDIF block. The ENDIF statement must follow an ELSE, IFC, IFNZ or LOG_IF instruction. Valid for all GPUs. Format Opcode 0-input, 0-output. Field Name code reserved Related Bits 15:0 31:16 Description IL_OP_ENDIF Must be zero.
7-22
7-23
ENDFUNC, END.
7-24
7-25
These are integer versions of the IF statement. They skip a block of code based on the value of src0.x. The LOG_IF block must end with and ELSE or ENDIF instruction. The source selector must replicate the component to be tested into all four components. The test uses integer tests, so values like Nan or -0 are not equal to zero. Valid for R6XX GPUs and later. Operation for IF_LOGICALNZ: if (v0.x has any bit non-zero) { Execute following instructions; } else { Jump to the instruction following the next ELSE or ENDIF instruction; } Operation for IF_LOGICALZ: if (v0.x has all bits zero) { Execute following instructions; } else { Jump to the instruction following the next ELSE or ENDIF instruction; }
Format Opcode
1-input, 0-output. Field Name code reserved Bits 15:0 31:16 Description See Syntax, above. Must be zero.
Related
7-26
7-27
Format Opcode
1-input, 0-output. Field Name code rep Bits 15:0 16 Description IL_OP_LOOP repeat flag 0 src0.y holds the initial value for the current loopcounter used for relative addressing. src0.z holds the loop step. src0.y and src0.z cannot be negative. src0.y and src0.z are not used and the current auto-increment loop-counter is not incremented during this loop. src0.y and src0.z can be negative.
28:17 29 30 31
7-28
7-29
These instructions conditionally return tot he instruction after the call. src0.x is tested after swizzle. The instructions can appear anywhere in a subroutine, any number of times. The 32-bit value supplied by src0 is tested at the bit level: For RET_LOGICALNZ, if any bit is non-zero, the statement returns. For RET_LOGICALZ, if all bits are zero, the statement returns. Valid for R6XX GPUs and later.
Format Opcode
1-input, 0-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description See Syntax, above. logic_op Must be zero. Must be zero.
Related
RET, RET_DYN.
switch src0
A switch/endswitch construct behaves exactly as a switch construct in the C language. The src0 must be a 32-bit register component or immediate quantity. Compares are done using integer arithmetic. Falling through cases are valid, as in C. This instruction can be used to implement DX10 case instruction. Switch statements can be nested without limits. Valid for R6XX GPUs and later. Operation: Same as a C switch statement.
Format Opcode
1-input, 0-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_OP_SWITCH Must be zero. Must be zero. Must be zero.
Related
7-30
7-31
pri_modifier_present == 0 pri_modifier_present == 1
IL_Src token (src0). Number of elements, n, in an immediate constant buffer. Not used. First (index 0) 32-bit element of the immediate constant buffer. Not used. Second (index 1) through nth (index n-1) 32-bit elements of the immediate constant buffer.
pri_modifier_present == 0 pri_modifier_present == 1
7-32
1. Currently not used. Given an IL shader that uses raw or structured buffers, the shader compiler compiles it if the underlying hardware supports it; otherwise, it fails.
7-33
7-34
23:21 24
7-35
7-36
Declare a Primitive
Instructions Syntax Description DCL_INPUTPRIMITIVE dcl_input_primitive prim_type(op) Declare the type of primitive that can be accepted by a geometry shader. Must appear in a geometry shader. Only valid in a geometry shader. The prim_type must be an element of the enumeration IL_TOPOLOGY. Valid for R600 GPUs and later. Format Opcode 0-input, 0-output. Token Field Name 1 code control sec_modifier_present 2 3 4 5 6 Related None. Bits 15:0 29:16 30 Description IL_DCL_INPUTPRIMITIVE prim_type. Must be zero.
pri_modifier_present 31 Must be zero. IL_Src token (src0) where the register_type field is set to IL_REGTYPE_LITERAL. x-bits, 32-bit untyped literal. y-bits, 32-bit untyped literal. z-bits, 32-bit untyped literal. w-bits, 32-bit untyped literal.
30 31
Related
7-37
dcl_lds_size_per_thread 8 INIT_SHARED_REGISTERS.
7-38
Declare a Literal
Instructions Syntax Description DCL_LITERAL dcl_literal src0, <x-bits>, <y-bits>, <z-bits>, <w-bits> DCL_LITERAL declares the literal to be used in the following instruction. The instruction is followed by four words containing the actual bits of the literal, in order x, y, z, w. The lexically nearest preceding value is used. The 32-bit component literals (x-bits, y-bits, zbits, and w-bits) are untyped, so that integer and float literals can be initialized with this instruction. src0 must be a IL_REGTYPE_LITERAL or IL_REGTYPE_MLITERAL register type (see Section 5.23, LITERAL, page 5-15). No modifier bits are allowed. A given literal can be defined only once in a shader. This instruction cannot be placed in an unreachable code block such as after an unconditional break or return instruction. No modifiers are allowed. Valid for R600 GPUs and later. Format Opcode 1-input, 0-output. Token Field Name 1 2 3 4 5 6 Related None. code Bits 15:0 Description IL_DCL_LITERAL
control 31:16 Must be zero. IL_Src token (src0) where the register_type field is set to IL_REGTYPE_LITERAL. x-bits, 32-bit untyped literal. y-bits, 32-bit untyped literal. z-bits, 32-bit untyped literal. w-bits, 32-bit untyped literal.
7-39
7-40
7-41
Statically Declare Number of Input Control Points per Patch in Hull Shader
Instructions Syntax Description DCL_NUM_ICP dcl_num_icp n Statically declares the number of input control points per patch. Only used in a hull shader. Valid for Evergreen GPUs and later. Format Opcode 0-input, 0-output, 1 additional token. Token Field Name 1 code control Bits 15:0 29:16 Description IL_OP_ DCL_NUM_ICP Must be zero. Must be zero. Must be zero.
7-42
Statically Declare Number of Output Control Points per Patch in Hull Shader
Instructions Syntax Description DCL_NUM_OCP dcl_num_ocp n Statically declares the number of output control points per patch. Only used in a hull shader. Valid for Evergreen GPUs and later. Format Opcode 0-input, 0-output, 1 additional token. Token Field Name 1 code control Bits 15:0 29:16 Description IL_OP_DCL_NUM_OCP Must be zero. Must be zero. Must be zero.
Related
7-43
Declare that the Pixel Shader intends to write to its scalar output oDepth register
Instructions Syntax Description DCL_ODEPTH dcl_odepth Declare that the pixel shader intends to write to its scalar output oDepth register. DX10 has some rules for what happens if oDepth is declared, but the shader does not write it. Valid for R600 GPUs and later. Format Opcode 0-input, 0-output. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_DCL_ODEPTH Must be zero. Must be zero. Must be zero.
None.
7-44
DCL_OUTPUT_TOPOLOGY.
7-45
7-46
The optional _unnorm option in the textural source can be used to specify this bit. The _unnorm option in the textural source can be used to specify this bit. 2 The return type identifies the data type fetched from the input buffer. Return types are four groups, each of three bits that must be set to any value of the enumerated type ILElementFormat. See Section 6.7, ILElementFormat, page 6-3. Return-types are specified on a per-component basis, DX10 specification has no need to repeat identical return types; however, IL requires the type to be repeated all four times. reserved fmtx fmty fmtz fmtw Example Related 19:0 22:20 25:23 28:26 31:29 Must be zero. x-component format. y-component format. z-component format. w-component format.
dcl_resource_id(1)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
None.
7-47
7-48
1-input, 0-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_DCL_STREAM Unsigned integer, representing the stream ID. Must be zero. Must be zero.
Related
None.
7-49
Additional tokens: up to three unsigned integers representing the literal value for n (n1, n2, n3).
Related
DCL_TS_OUTPUT_PRIMITIVE, DCL_TS_PARTITION.
7-50
DCL_TS_DOMAIN, DCL_TS_PARTITION.
DCL_TS_DOMAIN, DCL_TS_OUTPUT_PRIMITIVE.
7-51
Declare a Primitive ID
Instructions Syntax Description DCL_VPRIM dcl_vprim Declares that the geometry shader intends to use its scalar input register vPrim. For the geometry shader, input primitive data only comes in the form of a scalar (vPrim, no mask). Also, there is no Primitive Data for adjacent primitives available in a geometry shader invocation. Valid for R600 GPUs and later. Format Opcode 0-input, 0-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_DCL_VPRIM Must be zero. Must be zero. Must be zero.
7-52
The component default types can be any value of the enumerated type ILDefaultVal. See Table 6.5 on page 6-2. 2 IL_Dst token (dst) where the register_type field is set to IL_REGTYPE_TEMP or IL_REGTYPE_ADDR. The modifer_present and relative_address fields must be set to 0.
Example Related
dcldef_z(*)_w(*) dst indicates that there is no default value for the z and w components of the dst register. None.
7-53
7-54
Related
None.
7-55
usage usageIndex
20:16 28:21
reserved
30:29
pri_instruction 31 _modifier
7-56
yimport
3:2
zimport
5:4
wimport
7:6
centroid
Constant interpolation. Do not perform perspective divide during interpolation. Must be zero.
IL_Dst token (dst) where the register_type field is set to IL_REGTYPE_PINPUT. The modifier_present and relative_address fields must be set to 0.
7-57
IL_Dst toke (dst), where register_type is set to IL_REGTYPE_PINPUT. The modifier_present and relative_address fields must be set to 0.
Related
None.
7-58
7-59
7-60
reserved
30:22
pri_modifier 31 _present 2
Primary vertex shader input register declaration modifier. IL_PrimaryDCLV_Mod token described below. The IL_PrimaryDCLV_Mod is present only if the pri_modifier_present field is 1 in the previous IL_Opcode token. ximport 1:0 Specifies if the x component is enabled for the vertex buffer element specified by elem for this declaration. Also specifies the default value if the component is enabled. Can be any value of the enumerated type ILImportComponent. See Table 6.9 on page 6-4. Specifies if the y component is enabled for the vertex buffer element specified by elem for this declaration. Also specifies the default value if the component is enabled. Can be any value of the enumerated type ILImportComponent. See Table 6.9 on page 6-4. Specifies if the z component is enabled for the vertex buffer element specified by elem for this declaration. Also specifies the default value if the component is enabled. Can be any value of the enumerated type ILImportComponent. See Table 6.9 on page 6-4. Specifies if the w component is enabled for the vertex buffer element specified by elem for this declaration. Also specifies the default value if the component is enabled. Can be any value of the enumerated type ILImportComponent. See Table 6.9 on page 6-4. Must be zero.
yimport
3:2
zimport
5:4
wimport
7:6
reserved 3
31:8
IL_Dst token with register_type set to IL_REGTYPE_TEMP or IL_REGTYPE_VERTEX (dst). The modifier_present and relative_address field must be set to 0.
Related
None.
7-61
usageIndex reserved
28:21 30:29
7-62
Primary vertex shader output register declaration modifier. IL_PrimaryDCLVOUT_Mod token1 described below. Field Name xexport Bits 1:0 Description Specifies if the x component is enabled for the usageusageIndex for this declaration. Also specifies the default value if the component is enabled. Can be any value of the enumerated type ILImportComponent. See Table 6.9 on page 6-4. Specifies if the y component is enabled for the usageusageIndex for this declaration. Also specifies the default value if the component is enabled. Can be any value of the enumerated type ILImportComponent. See Table 6.9 on page 6-4. Specifies if the z component is enabled for the usageusageIndex for this declaration. Also specifies the default value if the component is enabled. Can be any value of the enumerated type ILImportComponent. See Table 6.9 on page 6-4. Specifies if the w component is enabled for the usageusageIndex for this declaration. Also specifies the default value if the component is enabled. Can be any value of the enumerated type ILImportComponent. See Table 6.9 on page 6-4. Must be zero.
yexport
3:2
zexport
5:4
wexport
7:6
31:8
IL_Dst token (dst) where the register_type field is set to IL_REGTYPE_VOUTPUT. The modifier_present and relative_address fields must be set to 0.
1. The IL_PrimaryDCLOUT_Mod is present only if the pri_modifier_present field is 1 in the previous IL_Opcode token.
7-63
IL_Dst token (dst) where the register_type field is set to IL_REGTYPE_CONST_INT or IL_REGTYPE_CONST_FLOAT. The modifer_present and relative_address fields must be set to zero. x-component, 32-bit integer or float y-component, 32-bit integer or float z-component, 32-bit integer or float w-component, 32-bit float only (not used for a CONST_INT register)
3 4 5 6 Related DEFB.
7-64
IL_Dst token (dst) where the register_type field is set to IL_REGTYPE_CONST_BOOL. The modifer_present and relative_address fields must be set to zero. Boolean value, 32-bit unsigned integer 0: FALSE not 0: TRUE
Related
DEF.
7-65
Format Opcode
7-66
Initialize Vertex
Instructions Syntax Description INITV initv dst, src Initializes a vertex shader input to the value of src. This instruction provides another way to initialize the vertex shader input registers. See DCLV (page 60) for normal operation. This instruction typically is used for higher-order surface shaders. Do not use this instruction in a pixel shader. The INITV and DCLV instruction are mutually exclusive for a given VERTEX register. There can be at most one INITV per VERTEX register. The INITV instruction for a given VERTEX register must occur before the given register is used as a source register. This instruction cannot be used within a flow-control-block. Thus, loop relative addressing cannot be used with this instruction. VERTEX registers cannot be used as a source until a DCLV or INITV instruction is used on it. Default values for VERTEX registers must be explicitly done in the shader prior to this instruction. The register used to initialize the VERTEX register must contain any default values required by the client. Valid for all GPUs. Format Opcode 0-input, o-output, 1 additional token. Token Description 1 2 3 4 5 6 Operation IL_Opcode token with code set to IL_OP_INITV. IL_Dst token (dst) where register_type is set to IL_REGTYPE_VERTEX. The relative_address field must be set to 0. IL_Dst_Mod token1. IL_Src token (src). IL_Src_Mod token2. IL_Rel_Addr token3 where loop_relative is set to 0.
Related
1. IL_Dst_Mod token only present if modifier_present field is 1 in previous IL_Dst token. 2. IL_Src_Mod token only present if modifier_present field is 1 in previous IL_Src token. 3. IL_Rel_Addr token only present if relative_address field is 1 in the preceding IL_Src or IL_Dst token.
7-67
Description
7-68
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
discard_logicalnz Discard results if all bits in src0.select_compon src0.{x|y|z|w} 0. ent discard_logicalz Discard results if all bits in src0.select_compon src0.{x|y|z|w} == 0. ent
IL_OP_DISCARD_LOGICALZ
Description
Conditionally flags results of pixel shader to be discarded when the end of the program is reached. This instruction flags the current pixel as terminated, while continuing execution, so that other pixels executing in parallel can obtain gradients, if necessary. Although execution continues, all pixel shader output writes before or after the discard_* instruction are discarded. The discard_* instruction can be present inside any flow control construct. Multiple discard instructions can be present in a pixel shader, and if any is executed, the pixel is terminated. Can be used only in a pixel shader. Valid for R600 GPUs and later.
Format Opcode
1-input, 0-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description See Opcode part of Syntax, above. Must be zero. Must be zero. Must be zero.
Related
KILL.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-69
Emit a Vertex
Instructions Syntax Description EMIT emit Causes all declared o# registers to be read out of a geometry shader to generate a vertex. Multiple EMIT instructions are used to generate a primitive. Any number of EMIT instructions can appear in a geometry shader, including within flow control. This instruction can be used only in a geometry shader. Valid for R600 GPUs and later. Format Opcode 0-input, 0-output. Field Name code control Related Bits 15:0 31:16 Description IL_OP_EMIT Must be zero.
7-70
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-71
EVAL_SAMPLE_INDEX, EVAL_SNAPPED.
EVAL_CENTROID, EVAL_SNAPPED.
7-72
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
eval_snapped r0.xyzw, v[0].xyzw, l(0xf, 0x7, 0, 0); // -0.0625, 0.4375 r1.xy = l(-1.0, 7.0, 0, 0) / l(16.0, 16.0, 16.0, 16.0); eval_snapped r0.xyzw, v[0].xyzw, r1.xy; eval_snapped r0.xyzw, v[0].xyzw, l(0xf, 0x7, 0, 0); // -0.0625, 0.4375 r1.xy = l(-1.0, 7.0, 0, 0) / l(16.0, 16.0, 16.0, 16.0); eval_snapped r0.xyzw, v[0].xyzw, r1.xy;
EVAL_CENTROID, EVAL_SAMPLE_INDEX
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-73
Fence for Synchronization Work-Items, and/or LDS and/or GDS and/or Global Memory or UAV
Instructions Syntax Description FENCE fence[_threads][_lds][gds][_memory][_sr] It must include at least one option. It is an error if none is specified. All selected options must complete before the work-item continues. _threads - synchronize work-items in a work-group so that all work-items must reach this point before any work-item can go further. The instruction cannot be used inside of any control flow. It can be used only in a compute shader (except for bit 18; see below). _lds - shared memory fence. It ensures that: no LDS read/write instructions can be re-ordered or moved across this fence instruction. all LDS write instructions are complete (the data has been written to LDS memory, not in internal buffers) and visible to other work-items. _memory - global/scatter memory fence. It ensures that: no memory import/export instructions can be re-ordered or moved across this fence instruction. all memory export instructions are complete (the data has been written to physical memory, not in the cache) and is visible to other work-items. _sr - shared register write/read fence. No shared register writes/reads can be re-ordered or moved across this fence instruction. Also, all prior writes to shared registers done by this work-item become visible to all other work-items. _mem_write_only - same as _memory, except that the memory import (load) instructions can move across this fence instruction. If this happens, the load instruction must be disambiguated as not an alias of store instructions before the fence. _mem_read_only - same as _memory except that the memory export (store) instructions can move across this fence instruction. If this happens, the store instruction must be disambiguated as not an alias of load instructions before the fence.) _gds: shared memory fence. Ensures that (1) no GDS read/write instructions can be reordered or moved across this fence instruction; and (2) all GDS write instructions are complete in the sense that the data has been written to GDS memory (not to internal buffers). Note that this option does not cause synchronization between members of a work-group. This requires the addition of the work-items option. All prior writes to shared registers done by this work-item become visible to all other workitems. Pixel kernels can only use fence instructions for global memory. Compute kernels can use all of options. In pixel kernels, use of discard instructions implies a fence_memory. Use of discard with a fence_threads instruction is undefined in IL, although specific implementations can select an interpolation. DX11 has sync_uglobal and sync_ulocal. They both are m apped to fence_memory in IL. Valid for R7XX GPUs and later. Formats Examples 0-input, 0-output, no additional token.
lds_write_vec
mem._y__, r0.xy
Bits 15:0 16
r0.yyyy
fence_threads_lds
lds_read_vec r1,
Opcode Field Name code _threads
Description
IL_OP_FENCE
1 = this is a fence for work-item synchronization.
7-74
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Fence for Synchronization Work-Items, and/or LDS and/or GDS and/or Global Memory or UAV (Cont.)
_lds _memory _sr _mem_write_only _mem_read_only _gds Controls sec_modifier_present pri_modifier_present Examples 17 18 19 20 21 22 29:20 30 31 1 = this is a fence for the LDS. 1 = this is a fence for global memory or UAV. 1 = this is a fence for sr. 1 = this is a global memory or UAV write-only fence 1 = this is a global memory or UAV read-only fence. 1 = this is a fence for gds. Must be zero. Must be zero. Must be zero.
Related
None.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-75
Description
IL_OP_FETCH4
Field Name resource sampler arguments aoffimmi Bits 23:16 27:24 28 29 Description resource_id, 0 to 255. sampler_id, 0 to 15. indexed_args. 0 = aoffimmi does not exist. 1 = aoffimmi exists.
30 31
7-76
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_OP_FETCH4_PO_C
Field Name resource sampler arguments aoffimmi Bits 23:16 27:24 28 29 Description resource_id, 0 to 255. sampler_id, 0 to 15. indexed_args. 0 = aoffimmi does not exist. 1 = aoffimmi exists.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-77
IL_OP_FETCH4_C
Field Name resource sampler arguments aoffimmi Bits 23:16 27:24 28 29 Description resource_id, 0 to 255. sampler_id, 0 to 15. indexed_args. 0 = aoffimmi does not exist. 1 = aoffimmi exists.
7-78
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_OP_FETCH4_PO
Field Name resource sampler arguments aoffimmi Bits 23:16 27:24 28 29 Description resource_id, 0 to 255. sampler_id, 0 to 15. indexed_args. 0 = aoffimmi does not exist. 1 = aoffimmi exists.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-79
Pixel Kill
Instructions Syntax Description KILL kill[_stage(n)_sample] src0 Component-wise test to terminate current pixel shader execution and discard all results. If sample is set to 1, KILL does not use the actual value of src0 to test. Instead, the shader performs a KILL based upon a texture sample at the coordinate specified by src0 on the texture stage/unit specified by stage. It is an error to use this instruction in a vertex or geometry shader. To kill based on a subset of the four components, set the swizzle for the component not used for the test to IL_COMPSEL_1. Valid for all GPUs. Operation: VECTOR v; If(sample == 1) { v = Sample(src0, stage); } else { v = EvalSource(src0); } if((v[0] < 0.0) || (v[1] < 0.0) || (v[2] < 0.0) || (v[3] < 0.0)) { Discard outputs; Terminate pixel shader; } Format Opcode 1-input, 0-output. Field Name code control Bits 15:0 29:16 Description IL_OP_KILL Field Name Bits Description stage 23:16 Texture stage or unit number, 0 to 255, if sample is 1; otherwise, must be zero. sample 24 0: src0 is the test value. 1: Test value is sampled from texture stage or unit stage. src0 is the sample coordinate, and the resulting sample is used as the test value. Must be zero.
31:25
IL_Src_Mod token: only present if modifier_present field is 1 in previous IL_Src token. IL_Src token (src0: only present if relative_address field is 1 in the preceding IL_Src or IL_Dst token. Related DISCARD_LOGICALNZ, DISCARD_LOGICALZ.
7-80
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
lds_read_vec r1, r0.xy lds_read_vec_neighborExch r1, r0.xy lds_read_vec_sharingMode(rel) r1, r0.xy lds_read_vec_neighborExch_sharingMode(abs) r1, r0.xy
Related
LDS_WRITE_VEC
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-81
lds_write_vec mem._y__, r0.yyyy lds_write_vec_lOffset(4) mem.xy__, r0.xyzw lds_write_vec_sharingMode(rel) mem.__zw, r0.xyzw lds_write_vec_lOffset(4)_sharingMode(abs) mem.x_z_w, r0.xyzw
Related
LDS_READ_VEC.
7-82
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Description
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-83
7-84
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Let N be number of sample of the given resource. If N > 1 then For each i < N, result[4*i+3 : 4*i] = fragment pointer for sample i, valid values are 0 to N - 1 For each i >= N, result[4*i+3 : 4*i] = 0 else result is undefined Valid for Evergreen GPUs and later. Formats Opcode 1-input, 1-output, no additional token. Field Name code control Bits 15:0 29:16 Description IL_OP_LOAD_FPTR Field Name resource sampler arguments aoffimmi 30 31 Must be zero. Must be zero. Bits 23:16 27:24 28 29 Description resource_id, 0 to 255. sampler_id, 0 to 15. indexed_args. 0 = aoffimmi does not exist. 1 = aoffimmi exists. sec_modifier_present pri_modifier_present Related LOAD.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-85
7-86
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
MEMIMPORT.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-87
21:16 31:22
IL_Dst token (dst) where register_type is set to IL_REGTYPE_TEMP. IL_Dst_Mod token is present only if the modifier_present field is 1 in the previous IL_Dst token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token. IL_Src token (src0). IL_Src_Mod token is present only if the modifier_present field is 1 in the previous IL_Src token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token.
MEMEXPORT.
7-88
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Description
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-89
Description
7-90
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
SAMPLE_B, SAMPLE_C, SAMPLE_C_B, SAMPLE_C_G, SAMPLE_C_L, SAMPLE_C_LZ, SAMPLE_G, SAMPLE_L, SAMPLEINFO, SAMPLEPOS.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-91
Description
SAMPLE, SAMPLE_C, SAMPLE_C_B, SAMPLE_C_G, SAMPLE_C_L, SAMPLE_C_LZ, SAMPLE_G, SAMPLE_L, SAMPLEINFO, SAMPLEPOS.
7-92
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Description
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-93
SAMPLE, SAMPLE_B, SAMPLE_C_B, SAMPLE_C_G, SAMPLE_C_L, SAMPLE_C_LZ, SAMPLE_G, SAMPLE_L, SAMPLEINFO, SAMPLEPOS.
Sample Data From Resource (Not Buffer) With Filter and Comparison
Instructions Syntax SAMPLE_C_B sample_c_b_resource(n)_sampler(m)[_coordtype(ILTexCoordMode)][_addroffimmi(u, v,w)] dst, src0 , src1, src2 sample_c_b_ext_ resource(n)_sampler(m)[_resourcetype(pixtexusage)][_coordtype(ILTexCoordMode) ][_addroffimmi(u,v,w)] dst, src0, src1, scr2, src3, src4 Description Samples data from the specified element/texture using the filtering mode identified by the given sampler. The source data can come from any non-array resource type, other than buffers. This instruction behaves exactly as the sample_c_z instruction, except that src2.x contains a LOD bias value. This instruction produces undefined results if used with texture arrays. This instruction has the same restrictions as SAMPLE_C_L. The first syntax example is valid for R600 GPUs and later. The second is valid for Evergreen GPUs and later. The second also supports indexing. Format 3-input, 1-output. 5-input, 1-output. Opcode Field Name code control Bits 15:0 29:16 Description IL_OP_SAMPLE_C_B Field Name resource sampler arguments aoffimmi sec_modifier_present 30 pri_modifier_present 31 Related Must be zero. Must be zero. Bits 23:16 27:24 28 29 Description resource_id, 0 to 255. sampler_id, 0 to 15. indexed_args. 0 = aoffimmi does not exist. 1 = aoffimmi exists.
SAMPLE, SAMPLE_B, SAMPLE_C, SAMPLE_C_G, SAMPLE_C_L, SAMPLE_C_LZ, SAMPLE_G, SAMPLE_L, SAMPLEINFO, SAMPLEPOS.
7-94
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Description
SAMPLE, SAMPLE_B, SAMPLE_C, SAMPLE_C_B, SAMPLE_C_L, SAMPLE_C_LZ, SAMPLE_G, SAMPLE_L, SAMPLEINFO, SAMPLEPOS.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-95
Description
SAMPLE, SAMPLE_B, SAMPLE_C, SAMPLE_C_B, SAMPLE_C_G, SAMPLE_C_LZ, SAMPLE_G, SAMPLE_L, SAMPLEINFO, SAMPLEPOS.
7-96
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Sample Data from Resource with Filter and Comparison Level Zero
Instructions Syntax SAMPLE_C_LZ sample_c_lz_resource(n)_sampler(m)[_coordtyped(ILTexCoordMode][_addroffimmi(u, v,w)] dst, src0, src1 sample_c_lz_ext_resource(n)_sampler(m)[_resourcetype(pixtexusage)][_coordtyped (ILTexCoordMode][_addroffimmi(u,v,w)] dst, src0, src1, src2, src3 Performs a comparison filter. SAMPLE_C mainly provides a building-block for PercentageCloser Depth filtering. The 'c' in SAMPLE_C stands for comparison. src0 is the index. src1.x contains the reference value. Same as SAMPLE_C, except LOD is 0, and derivatives are ignored (as if they are 0). The LZ stands for level-zero. Because derivatives are ignored, this instruction is available in vertex and geometry shaders. It can also be used inside of control flow. The first syntax example is valid for R600 GPUs and later. The second is valid for Evergreen GPUs and later. The second also supports indexing. Formats Opcode 2-input, 1-output or 4-input, 1-output. Field Name code control Bits 15:0 29:16 Description IL_OP_SAMPLE_C_LZ Field Name resource sampler arguments aoffimmi 30 31 Must be zero. Must be zero. Bits 23:16 27:24 28 29 Description resource_id, 0 to 255. sampler_id, 0 to 15. indexed_args. 0 = aoffimmi does not exist. 1 = aoffimmi exists. sec_modifier_present pri_modifier_present Related
Description
SAMPLE, SAMPLE_B, SAMPLE_C, SAMPLE_C_B, SAMPLE_C_G, SAMPLE_C_L, SAMPLE_G, SAMPLE_L, SAMPLEINFO, SAMPLEPOS.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-97
Description
SAMPLE, SAMPLE_B, SAMPLE_C, SAMPLE_C_B, SAMPLE_C_G, SAMPLE_C_L, SAMPLE_C_LZ, SAMPLE_L, SAMPLEINFO, SAMPLEPOS.
7-98
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Sample Data from Specified Memory Using Given Sampler. Source Data can come from any Resource Type
Instructions Syntax SAMPLE_L sample_l_resource(n)_sampler(m)[_coordtyped(ILTexCoordMode] dst, src0, src1 sample_l_ext_resource(n)_sampler(m)[_resourcetype(pixtexusage)][_coordtyped(I LTexCoordMode][_addroffimmi(u,v,w)] dst, src0, src1, src2, src3 This is identical to the SAMPLE instruction (7-90), except that the level of detail (LOD) is provided directly by the application as a scalar value, representing no anisotropy. This instruction also is available in all programmable shader stages, not only the pixel shader (as with SAMPLE). It samples the texture using src1.x to choose the LOD. If the LOD value is negative, the 0'th (biggest map) is chosen with MAGFILTER applied. Since src1.x is a floating point value, the fractional value is used to interpolate (if MIPFILTER is linear) between two mip levels. This instruction ignores address gradients (filtering is isotropic). See the description of the SAMPLE instruction for operational details of this instruction other than the LOD calculation. Note that when used in the pixel kernel, sample_l implies the choice of LOD is per-pixel, with no effect from neighboring pixels. The first syntax example is valid for R600 GPUs and later. The second is valid for Evergreen GPUs and later. The second also supports indexing. Formats Opcode 2-input, 1-output or 4-input, 1-output. Field Name code control Bits 15:0 29:16 Description IL_OP_SAMPLE_L Field Name resource sampler arguments aoffimmi 30 31 Must be zero. Must be zero. Bits 23:16 27:24 28 29 Description resource_id, 0 to 255. sampler_id, 0 to 15. indexed_args. 0 = aoffimmi does not exist. 1 = aoffimmi exists. sec_modifier_present pri_modifier_present Related
Description
SAMPLE, SAMPLE_B, SAMPLE_C, SAMPLE_C_B, SAMPLE_C_G, SAMPLE_C_L, SAMPLE_C_LZ, SAMPLE_G, SAMPLEINFO, SAMPLEPOS.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-99
Description
SAMPLE, SAMPLE_B, SAMPLE_C, SAMPLE_C_B, SAMPLE_C_G, SAMPLE_C_L, SAMPLE_C_LZ, SAMPLE_G, SAMPLE_L, SAMPLEPOS.
7-100
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Description
SAMPLE, SAMPLE_B, SAMPLE_C, SAMPLE_C_B, SAMPLE_C_G, SAMPLE_C_L, SAMPLE_C_LZ, SAMPLE_G, SAMPLE_L, SAMPLEINFO.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-101
Related
None.
7-102
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Texture LOAD
Instructions Syntax Description TEXLD texld_stage(n)_centroid_shadowmode(op)_mag(op)_min(op)_volmag(op)_volmin(op)_mi p(op)_aniso(op)_lodbias(f)_xoffset(x)_yoffset(y)_zoffset(z) dst, src0 Samples a texture specified by stage at the coordinates specified by the src0 register. This instruction cannot be used on a stage that has not been declared with a DCLPT instruction. Also, it cannot be used on a stage set to IL_USAGE_PIXTEX_2DMSAA by a DCLPT instruction. The value of stage corresponds to the stage/unit. The coordinates can be projected using the divComp source modifier on src0. If the divComp source modifier is used, and the component to divide by is negative, the result of this instruction is undefined. That is, if IL_DIVCOMP_Y is used, and the second component of src0 is negative, the results of this instruction are undefined. If centroid is 1, sampling is done based on the pixel centroid, not center. The lodbias value specifies a constant value to bias the mipmap from which to load for this instruction. This value is added to the bias value set in the state (the value set through AS_TEX_LODBIAS_N(stage)), and the bias value in the fourth component of src2. The following determines the mipmap level(s) from which to sample: The computed LOD is the mipmap level-of-detail determined based on the ratio of texels in the base texture to the pixel. The instruction LOD is the value specified by the lodbias parameter in the IL_PrimaryTEXLD_Mod token). The minLOD is the state-based floating point minimum mipmap LOD value. The maxLOD is the state-based floating point maximum mipmap LOD value. The minLevel is the smallest mipmap level specified by state to use. The maxLevel is the largest mipmap level specified by state to use. The mipmap level(s) to sample from are determined by: Adding the state based LOD to the computed LOD. If LOD clamping is enabled in state, clamping the resulting value to minLOD and maxLOD. Adding the instruction LOD. Clamping the resulting value to minLevel and maxLevel.
The following pseudocode demonstrates the algorithm used to determine the mipmap level(s) to sample from: (initial bias LOD) = (state based bias) + (computed LOD) (clamped LOD) = (minLOD) (initial bias LOD) (maxLOD) (secondary bias LOD) = (clamped LOD) + (instruction LOD) (final LOD) = (minLevel) (secondary bias LOD) (minLevel) The mag, min, volmag, volmin, mip, and aniso parameters specify whether (and how) to override filter settings. If the IL_PrimaryTEXLD_Mod token is not present, the filters set through external state are used.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-103
Must be zero. Any value of the enumerated type ILTexShadowMode. See Table 6.28 on page 6-21. Must be zero. 0: 1: No secondary modifier token is present. IL_SecondaryTEXLD_Mod token immediately follows this token or an IL_PRIMARYTEXLD_Mod token, if bit 31 is set.
sec_modifier_present 30
pri_modifier_present 31
0: 1:
No primary modifier token is present. IL_PrimaryTEXLD_Mod token immediately follows this token.
Primary Texture Load Instruction Modifier token (is present only if the pri_modifier_present field is 1 in the previous IL_Opcode token). mag 2:0 Specifies how to filter texture values in the S and T directions when a single texel maps to multiple pixels. Can be any value of the enumerated type ILTexFilterMode. See Table 6.27 on page 6-20. Specifies how to filter texture values in the S and T directions when multiple texels map to a single pixel. Can be any value of the enumerated type ILTexFilterMode. Specifies how to filter texture values in the R direction when the pixel maps to an area less than one texel. Can be any value of the enumerated type ILTexFilterMode.
Min
5:3
volmag
8:6
7-104
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
mip
14:12
aniso
17:15
lodbias
24:18
reserved 3
31:25
Secondary Texture Load Instruction Modifier token (is present only if the pri_modifier_present field is 1 in the previous IL_Opcode token). Xoffset 7:0 Unnormalized (texel space) values added to the X texture coordinate before sampling. Signed fixed point with one bit of fraction (7.1) that allows ranges from [64, 63.5]. Unnormalized (texel space) values added to the Y texture coordinate before sampling. Signed fixed point with one bit of fraction (7.1) that allows ranges from [64, 63.5]. Unnormalized (texel space) values added to the Z texture coordinate before sampling. Signed fixed point with one bit of fraction (7.1) that allows ranges from [64, 63.5]. Must be zero.
Yoffset
15:8
Zoffset
23:16
31:24
IL_Dst_Mod token is present only if the modifier_present field is 1 in the previous IL_Dst token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token. IL_Src token (src0) IL_Src_Mod token is present only if the modifier_present field is 1 in the previous IL_Src token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-105
Description
The following pseudo code demonstrates the algorithm used to determine the mipmap level(s) to sample from: (initial bias LOD) = (fourth component of src1) + (state based bias) + (computed LOD) (clamped LOD) = (minLOD) (initial bias LOD) (maxLOD) (secondary bias LOD) = (clamped LOD) + (instruction LOD) (final LOD) = (minLevel) (secondary bias LOD) (minLevel)
7-106
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
The following pseudo code demonstrates the algorithm used to determine the mipmap level(s) to sample from: (initial bias LOD) = (fourth component of src2) + (state based bias) (clamped LOD) = (minLOD) (initial bias LOD) (maxLOD) (secondary bias LOD) = (clamped LOD) + (instruction LOD) (final LOD) = (minLevel) (secondary bias LOD) (minLevel) The mag, min, volmag, volmin, mip, and aniso parameters specify whether (and how) to override external filter settings. If the IL_PrimaryTEXLD_Mod token is not present, the filters are set externally. The values of xoffset, yoffset, and zoffset are added to the unnormalized values of the first, second, and third components of src0, respectively, within the sample mipmap. These values are applied whether or not normalized texture coordinates are used. Clamping policy is obeyed when sampling outside the textures dimensions using these offset parameters. If the IL_SecondaryTEXLD_Mod token is not present, xoffset, yoffset, and zoffset default to 0.0. The shadowmode parameter specifies if this instruction performs a shadow map load, (compare the texture value to the z-component of src0). shadowMode indicates if a shadow load never occurs or always occurs. See shadow texture load appendix for texture load algorithm. If a shadow load occurs with this instruction, the mag, min, volmag, volmin, aniso, xoffset, yoffset, and zoffset parameters are ignored. Valid for all GPUs. Format Token 1 Field Name code stage centroid Bits 15:0 23:16 24 25 Description IL_OP_TEXLDB Stage or unit number. 0: 1: absolute Sample on pixel center. Sample on pixel centroid.
0: the fourth component of src2 is a relative mipmap. 1: the fourth component of src2 is an absolute mipmap.
27:26 29:28 30
Any value of the enumerated type ILTexShadowMode. See Table 6.28 on page 6-21. Must be zero. 0: 1: No secondary modifier token is present. IL_SecondaryTEXLD_Mod token immediately follows the IL_OP_TEXLDB token or an IL_PrimaryTEXLDB_Mod token if bit 31 is set. No primary modifier token is present. IL_PrimaryTEXLD_Mod token immediately follows the IL_OP_TEXLDB token.
pri_modifier_present
31
0: 1:
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-107
Min
5:3
volmag
8:6
Volmin
11:9
Mip
14:12
Aniso
17:15
lodbias
24:18
qualitybias
25
Secondary Texture Load Instruction Modifier token (is present only if the sec_modifier_present field is 1 in the previous IL_Opcode token). Xoffset 7:0 Unnormalized (texel space) values added to the X texture coordinate before sampling. Signed fixed point with 1 bit of fraction (7.1) which allows ranges from [64, 63.5]. Unnormalized (texel space) values added to the Y texture coordinate before sampling. Signed fixed point with 1 bit of fraction (7.1) which allows ranges from [64, 63.5]. Unnormalized (texel space) values added to the Z texture coordinate before sampling. Signed fixed point with 1 bit of fraction (7.1) which allows ranges from [64, 63.5]. Must be zero.
Yoffset
15:8
Zoffset
23:16
reserved
31:24
7-108
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-109
Description
The following pseudo code demonstrates the algorithm used to determine the mipmap level(s) to sample from: (initial bias LOD) = (state based bias) + (computed LOD) (clamped LOD) = (minLOD) (initial bias LOD) (maxLOD) (secondary bias LOD) = (clamped LOD) + (instruction LOD) (final LOD) = (minLevel) (secondary bias LOD) (minLevel) The mag, min, volmag, volmin, mip, and aniso parameters specify whether (and how) to override filter settings. If the IL_PrimaryTEXLD_Mod token is not present, the external filters settings are used.
7-110
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Must be zero. Any value of the enumerated type ILTexShadowMode. See Table 6.28 on page 6-21. Must be zero. 0: 1: 0: 1: No secondary modifier token is present. IL_SecondaryTEXLD_Mod token is present. No primary modifier token is present. IL_PrimaryTEXLD_Mod token is present.
sec_modifier_present 30 pri_modifier_present 31
Primary Texture Load Instruction Modifier token (is present only if the pri_modifier_present field is 1 in the previous IL_Opcode token). Mag 2:0 Specifies how to filter texture values in the S and T directions when the pixel maps to an area than one texel. Can be any value of the enumerated type ILTexFilterMode. Specifies how to filter texture values in the S and T directions when the pixel maps to an area than one texel. Can be any value of the enumerated type ILTexFilterMode. Specifies how to filter texture values in the R direction when the pixel maps to an area than one texel. Can be any value of the enumerated type ILTexFilterMode. Specifies how to filter texture values in the R direction when the pixel maps to an area than one texel. Can be any value of the enumerated type ILTexFilterMode. Specifies how to filter values of multiple mipmaps when the pixel maps to an area greater than one texel of the base map. Can be any value of the enumerated type ILMipFilterMode.
Min
5:3
volmag
8:6
Volmin
11:9
Mip
14:12
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-111
lodbias
24:18
reserved 3
31:25
Secondary Texture Load Instruction Modifier token (is present only if the sec_modifier_present field is 1 in the previous IL_Opcode token.) Xoffset 7:0 Unnormalized (texel space) values added to the X texture coordinate before sampling. Signed fixed point with 1 bit of fraction (7.1) which allows ranges from [64, 63.5]. Unnormalized (texel space) values added to the Y texture coordinate before sampling. Signed fixed point with 1 bit of fraction (7.1) which allows ranges from [64, 63.5]. Unnormalized (texel space) values added to the Z texture coordinate before sampling. Signed fixed point with 1 bit of fraction (7.1) which allows ranges from [64, 63.5]. Must be zero.
Yoffset
15:8
Zoffset
23:16
31:24
IL_Dst_Mod token is present only if the modifier_present field is 1 in the previous IL_Dst token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token. IL_Src token (src1) IL_Src_Mod token is present only if the sec_modifier_present field is 1 in the previous IL_Opcode token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token. IL_Src token (src2) IL_Src_Mod token is present only if the sec_modifier_present field is 1 in the previous IL_Opcode token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token. IL_Src token (src3) IL_Src_Mod token is present only if the sec_modifier_present field is 1 in the previous IL_Opcode token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token.
7-112
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
sec_modifier_present 30
pri_modifier_present 31
0:
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-113
Xoffset
15:8
2816 31:29
IL_Dst_Mod token is present only if the modifier_present field is 1 in the previous IL_Dst token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token. IL_Src token (src0) IL_Src_Mod token is present only if the modifier_present field is 1 in the previous IL_Src token. IL_Rel_Addr token is present only if the relative_address field is 1 in the preceding IL_Src or IL_Dst token.
7-114
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Input/Output Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-115
7-116
<
Description
Component-wise compares two vectors using an integer comparison. If src0.{xy|zw} and the corresponding component in src1 satisfy the comparison condition, the corresponding component of dst is set to TRUE; otherwise, it is set to FALSE. Primary and secondary opcode modifiers are not permitted, and no additional control options are supported. Thus, the control field, pri_modifier_present, and sec_modifier_present fields must be zero. All these instructions are valid for Evergreen GPUs and later.
Format Opcode
2-input, 1-output. Field Name code control Bits 15:0 31:16 Description See Syntax, above. Must be zero.
Related
None.
7-117
IL_OP_I64_MAX i64max dst, src0, src1 dst = src0 > src1 ? src0:src1 IL_OP_I64_MIN i64min dst, src0, src1 dst = src0 < src1 ? src0:src1
Component-wise integer maximum (I64_MAX) and minimum (I64_MIN). Compares each component of src0 with the corresponding component of src1. If the comparison evaluates true, src0 is returned in the corresponding component of dst; otherwise, src1 is returned. For the MAX and MIN instructions, if src0.{x|y|z|w} or src1.{x|y|z|w} is NaN, then the other component, src1.{x|y|z|w} or src0.{x|y|z|w}, respectively, is returned. Denorms are flushed before the comparison is performed. If a flushed denorm is the maximum value (for the MAX instruction) or minimum value (for the MIN instruction), then the flushed denorm is returned. The MAX instruction uses a greater-than-or-equal-to comparison. Thus, if min(src0, src1) = src0, then max(src0, src1) = src1, including cases with +0 and -0, such as when denorms are flushed to sign preserve zero. If the _ieee flag is included, then MIN and MAX follow IEEE-754r rules for minnum and maxnum (except denorms are flushed before the comparison is made). Also, MIN returns -0, and MAX returns +0, for comparisons between -0 and +0. Both instructions are valid for Evergreen GPUs and later.
Format Opcode
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_OP_I64_MAX, IL_OP_I64_MIN Must be zero. Must be zero. Must be zero.
Related
None.
7-118
7-119
7-120
Integer AND
Instructions Syntax Description IAND iand dst, src0, src1 Component-wise logical AND of each pair of 32-bit values from src0 and src1. The 32-bit result is placed in dst. Valid for R600 GPUs and later. Format Opcode 2-input 1-output. Field Name code control Related None. Bits 15:0 31:16 Description IL_OP_I_AND Must be zero.
7-121
Integer Subtract-Borrow
Instructions Syntax Description IBORROW iborrow dst, src0, src1 Forms the borrow bit after subtracting two integer vectors. First scr1 is subtracted from src0. Then, dst is set to 1 if a borrow is produced; otherwise, a 0. Operates per component. The neg modifier can be used on any of the inputs to this instruction. Valid for Evergreen GPUs and later. Format Opcode 2-input, 1-output. Field Name code control Related None. Bits 15:0 31:16 Description IL_OP_I_BORROW Must be zero.
Integer Add-Carry
Instructions Syntax Description ICARRY icarry dst, src0, src1 Forms the carry bit after adding two integer vectors. If (src0 + src1 > 0xFFFFFFFF { dst = 1;} else {dst = 0} Operates per component. The neg modifier can be used on any of the inputs to this instruction. Valid for Evergreen GPUs and later. Format Opcode 2-input, 1-output. Field Name code control Related None. Bits 15:0 31:16 Description IL_OP_I_CARRY Must be zero.
7-122
Integer Compare
Instructions Syntax IEQ, IGE, ILT, INE Function Opcode == IL_OP_I_EQ Syntax Description
ieq dst, src0, src1 Compares if integer vector ins scr0 is equal to the one in src1. If TRUE, 0xFFFFFFFF is returned for that component; otherwise, 0x00000000 is returned. ige dst, src0, src1 Compares if integer vector in src0 is greater or equal to the one in src1. If TRUE, 0xFFFFFFFF is returned for that component; otherwise, 0x00000000 is returned. ilt dst, src0, src1 Compares if integer vector in src0 is less than the one in src1. If TRUE, 0xFFFFFFFF is returned for that component; otherwise, 0x00000000 is returned. ine dst, src0, src1 Compares two integer vectors, one in src0, the other in src1, to check if they are not equal. If TRUE, 0xFFFFFFFF is returned for that component; otherwise, 0x00000000 is returned.
IL_OP_I_GE
<
IL_OP_I_LT
IL_OP_I_NE
Description
Component-wise compares two vectors using an integer comparison. If src0.{x|y|z|w} and the corresponding component in src1 satisfy the comparison condition, the corresponding component of dst is set to TRUE; otherwise, it is set to FALSE. Primary and secondary opcode modifiers are not permitted, and no additional control options are supported. Thus, the control field, pri_modifier_present, and sec_modifier_present fields must be zero. All these instructions are valid for R600 GPUs and later.
Format Opcode
2-input, 1-output. Field Name code control Bits 15:0 31:16 Description See Syntax, above. Must be zero.
Related
None.
7-123
7-124
Component-wise integer maximum (I_MAX) and minimum (I_MIN). Compares each component of src0 with the corresponding component of src1. If the comparison evaluates true, src0 is returned in the corresponding component of dst; otherwise, src1 is returned. For the MAX and MIN instructions, if src0.{x|y|z|w} or src1.{x|y|z|w} is NaN, then the other component, src1.{x|y|z|w} or src0.{x|y|z|w}, respectively, is returned. Denorms are flushed before the comparison is performed. If a flushed denorm is the maximum value (for the MAX instruction) or minimum value (for the MIN instruction), then the flushed denorm is returned. The MAX instruction uses a greater-than-or-equal-to comparison. Thus, if min(src0, src1) = src0, then max(src0, src1) = src1, including cases with +0 and -0, such as when denorms are flushed to sign preserve zero. If the _ieee flag is included, then MIN and MAX follow IEEE-754r rules for minnum and maxnum (except denorms are flushed before the comparison is made). Also, MIN returns -0, and MAX returns +0, for comparisons between -0 and +0. Both instructions are valid for R600 GPUs and later.
Format Opcode
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_OP_I_MAX, IL_OP_I_MIN Must be zero. Must be zero. Must be zero.
Related
None.
7-125
7-126
7-127
Integer Negate
Instructions Syntax Description INOT inot dst, src0 Performs a bit-wise one's complement on each component of src0. The 32-bit results are placed in the corresponding components of dst. Valid for R600 GPUs and later. Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_I_NOT Must be zero. Must be zero. Must be zero.
7-128
IXOR
Description
Performs a bit-wise logical operation of each component of src0 with the corresponding component of src1. The 32-bit results are placed in the corresponding components of dst. These instructions are valid for R600 GPUs and later.
Format Opcode
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description See Syntax, above. Must be zero. Must be zero. Must be zero.
Related
None.
7-129
7-130
IL_OP_U64_MAX u64max dst, src0, src1 IL_OP_U64_MIN u64min dst, src0, src1
Description
Compares each component of src0 with the corresponding component of src1. If the comparison is TRUE, src0 is returned in the corresponding component of dst; otherwise, src1 is returned. For the MAX and MIN instructions, if src0.{xy|zw} or src1.{xy|zw} is NaN, then the other component, src1.{xy|zw} or src0.{xy|zw}, respectively, is returned. Denorms are flushed before the comparison is performed. If a flushed denorm is the maximum value (for the MAX instruction) or minimum value (for the MIN instruction), then the flushed denorm is returned. The MAX instruction uses a greater-than-or-equal-to comparison. Thus, if min(src0, src1) = src0, then max(src0, src1) = src1, including cases with +0 and -0 such as when denorms are flushed to sign preserve zero. If the _ieee flag is included, MIN and MAX follow IEEE-754r rules for minnum and maxnum (except denorms are flushed before the comparison is made). In addition, MIN returns -0 and MAX returns +0 for comparisons between -0 and +0. Both instructions are valid for Evergreen GPUs and later.
Format Opcode
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description See Syntax, above. Must be zero. Must be zero. Must be zero.
Related
None.
7-131
IL_OP_U64_GE u64ge dst, src0, src1 IL_OP_U64_LT u64lt dst, src0, src1
Component-wise compares two vectors using an unsigned integer comparison. For UGE: (src0 src1); for ULT: (src0 < src1). If src0.{xy|zw} and the corresponding component in src1 satisfy the comparison condition, the corresponding component of dst is set to TRUE and returns 0xFFFFFFFF; otherwise, it is set to FALSE and returns 0x00000000. Primary and secondary opcode modifiers are not permitted, and no additional control options are supported. Thus, the control field, pri_modifier_present, and sec_modifier_present fields must be zero. These instructions are valid for Evergreen GPUs and later.
Format Opcode
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_OP_U_GE, IL_OP_U_LT Must be zero. Must be zero. Must be zero.
Related
None.
7-132
Component-wise compares two vectors using an unsigned integer comparison. For UGE: (src0 src1); for ULT: (src0 < src1). If src0.{x|y|z|w} and the corresponding component in src1 satisfy the comparison condition, the corresponding component of dst is set to TRUE and returns 0xFFFFFFFF; otherwise, it is set to FALSE and returns 0x00000000. Primary and secondary opcode modifiers are not permitted, and no additional control options are supported. Thus, the control field, pri_modifier_present, and sec_modifier_present fields must be zero. These instructions are valid for R600 GPUs and later.
Format Opcode
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_OP_U_GE, IL_OP_U_LT Must be zero. Must be zero. Must be zero.
Related
None.
7-133
7-134
IL_OP_U_MIN
Description
Compares each component of src0 with the corresponding component of src1. If the comparison is TRUE, src0 is returned in the corresponding component of dst; otherwise, src1 is returned. For the MAX and MIN instructions, if src0.{x|y|z|w} or src1.{x|y|z|w} is NaN, then the other component, src1.{x|y|z|w} or src0.{x|y|z|w}, respectively, is returned. Denorms are flushed before the comparison is performed. If a flushed denorm is the maximum value (for the MAX instruction) or minimum value (for the MIN instruction), then the flushed denorm is returned. The MAX instruction uses a greater-than-or-equal-to comparison. Thus, if min(src0, src1) = src0, then max(src0, src1) = src1, including cases with +0 and -0 such as when denorms are flushed to sign preserve zero. If the _ieee flag is included, MIN and MAX follow IEEE-754r rules for minnum and maxnum (except denorms are flushed before the comparison is made). In addition, MIN returns -0 and MAX returns +0 for comparisons between -0 and +0. Both instructions are valid for R600 GPUs and later.
Format Opcode
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description See Syntax, above. Must be zero. Must be zero. Must be zero.
Related
None.
7-135
7-136
Description Component-wise multiply of 32-bit unsigned operands src0 and src1. The upper 32 bits of the 64bit result (per component) is placed in the corresponding component of dst. Valid for R600 GPUs and later. Format Opcode 2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related UMUL. Bits 15:0 29:16 30 31 Description IL_OP_U_MUL_HIGH Must be zero. Must be zero. Must be zero.
7-137
7-138
Bit Operations
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-139
7-140
Bit Operations
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
UBIT_INSERT, UBIT_REVERSE.
Bit Operations
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-141
Bit Insert
Instructions Syntax Description UBIT_INSERT ubit_insert dst, src0, src1, src2, src3 Replaces a range of bits. Component-wise, the five lsb of src0 specify a bitfield width (0-31); the five lsb of src1 specify the bit offset from bit 0; src2 specifies the replacement bits; and src3 specifies Dword for which the bits are to be replaced. dst = src4 If ( width != 0 ) { bitmask = (((1<<width)-1) << offset) & 0xFFFFFFFF dst = ((src2<<offset) & bitmask) | (src3 & ~bitmask) } Valid for Evergreen GPUs and later. Format Opcode 4-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_OP_U_BIT_INSERT Must be zero. Must be zero. Must be zero.
7-142
Bit Operations
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Valid for all GPUs that support double precision. Format Opcode 1-input, 1-output. Field Name code control Related F2D. Bits 15:0 31:16 Description IL_OP_D_2_F Must be zero.
Conversion Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-143
Valid for all GPUs that support double precision. Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_F_2_D Must be zero. Must be zero. Must be zero.
7-144
Conversion Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Conversion Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-145
7-146
Conversion Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Conversion Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-147
7-148
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Logical-AND
Instructions Syntax Description AND and dst, src0, src1 Performs a component-wise logical AND of each pair of 32-bit values from src1 and src2. The 32-bit result is placed in dst. Valid for R6XX GPUs and later. Format Opcode 2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_AND Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-149
7-150
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-151
7-152
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-153
7-154
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
reserved cmpval
20:19 23:21
29:24
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-155
7-156
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Cosine (cos)
Instructions Syntax Description COS cos dst, src0 Computes the cosine of src0.w. By default, this instruction operates on src0.w, but can operate on any component by swizzling it into the fourth component. The fourth component of src0 must be in the range [-, instruction is undefined. The max absolute error is 0.002. Valid for all GPUs. Example: dst = cos(src0) Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related COS_VEC. Bits 15:0 29:16 30 31 Description IL_OP_COS Must be zero. Must be zero. Must be zero.
Component-Wise Cosine
Instructions Syntax Description COS_VEC cos_vec dst, src0 Computes the cosine of each component of src0 in radians. The maximum absolute error is 0.0008 in the range [-100*, 100*]. Valid for R600 GPUs and later. Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related COS. Bits 15:0 29:16 30 31 Description IL_OP_COS_VEC Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-157
Cross Product
Instructions Syntax Description CRS crs dst, src0, src1 Computes a cross product. This instruction does not write to the dst.w component; however, the if the component_w_a field of the IL_Dst_Mod token is set to IL_MODCOMP_0 or IL_MODCOMP_1, dst.w is written. That is, dst.w can be set to 0.0 or 1.0 if IL_MODCOMP_0 or IL_MODCOMP_1 is used on the component_w_a field of the IL_Dst_Mod token. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v2 = EvalSource(src1); VECTOR v; v[0] = v1[1] * v2[2] - v1[2] * v2[1]; v[1] = v1[2] * v2[0] - v1[0] * v2[2]; v[2] = v1[0] * v2[1] - v1[1] * v2[0]; WriteResult(v, dst); Format Opcode 2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_CRS Must be zero. Must be zero. Must be zero.
7-158
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
f = v0[0]*(v1[1]*(v2[2]*v3[3]-v2[3]*v3[2]) -v1[2]*(v2[1]*v3[3]-v3[1]*v2[3]) +v1[3]*(v2[1]*v3[2]-v3[1]*v2[2])) -v0[1]*(v1[0]*(v2[2]*v3[3]-v2[3]*v3[2]) -v1[2]*(v2[0]*v3[3]-v3[0]*v2[3]) +v1[3]*(v2[0]*v3[2]-v3[0]*v2[2])) +v0[2]*(v1[0]*(v2[1]*v3[3]-v2[3]*v3[1]) -v1[1]*(v2[0]*v3[3]-v3[0]*v2[3] +v1[3]*(v2[0]*v3[1]-v3[0]*v2[1])) -v0[3]*(v1[0]*(v2[1]*v3[2]-v2[2]*v3[1]) -v1[1]*(v2[0]*v3[2]-v3[0]*v2[2])+v1[2]*(v2[0]*v3[1]-v3[0]*v2[1])); v[0] = v[1] = v[2] = v[3] = f; WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_DET Must be the enumerated type ILMatrix(IL_MATRIX_4X4). See Table 6.16 on page 6-8. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-159
Vector Distance
Instructions Syntax Description DIST dist dst, src0, src1 Computes the vector distance from src0.[xyz] to src1.[xyz]. The 32-bit scalar result is placed in all four components of dst. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0; VECTOR v2 = EvalSource(src1; VECTOR v; V[0] = v[1] = v[2] = v[3] = sqrt((v1[0]-v2[0])*(v1[0]-v2[0]) + (v1[1]-v2[1])*(v1[1]-v2[1]) + (v1[2]-v2[2])*[v1[2]-v2[2]) + (v1[3]-v2[3])*(v1[3]-v2[3])); WriteResult(v, dst); Format Opcode 2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_DIST Must be zero. Must be zero. Must be zero.
7-160
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-161
7-162
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-163
7-164
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-165
Vector Distance
Instructions Syntax Description DST dst dst, src0, src1 Computes a distance-related value from operands src0 and src1. The result vector is placed in dst. If the control field is 0, then 0*n = 0, even if n = NaN. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v2 = EvalSource(src1); VECTOR v; V[0] = 1.0; V[1] = v1[1] * v2[1]; V[2] = v1[2]; V[3] = v2[3]; WriteResult(v, dst); Format Opcode 2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_DST Must be zero. Must be zero. Must be zero.
7-166
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Instantaneous Derivative in X
Instructions Syntax Description DSX dsx[_fine] dst, src0 Instantaneous derivative in the x-direction. Computes the rate of change of each float32 component of src0 (post-swizzle) in the RenderTarget x direction. The results are undefined if used in a vertex or geometry shader. When _fine is not specified, the data in the current pixel shader invocation may or may not participate in the calculation of the requested derivative, since the derivative is calculated only once per 2x2 quad. GPUs before the Evergreen series ignore the _fine setting. Valid for all GPUs. Format Opcode 1-input, 1-output. Field Name code control Bits 15:0 29:16 Description IL_OP_DSX Field Name Bits reserved _fine 22:16 23 Description Must be zero. 0: Gradients can be computed once per quad. 1: Gradients are computed for each pixel. reserved sec_modifier_present 30 pri_modifier_present 31 Related DSY, DXSINCOS. 29:24 Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-167
Instantaneous Derivative in Y
Instructions Syntax Description DSY dsy[_fine] dst, src0 Instantaneous derivative in the y-direction. Computes the rate of change of each float32 component of src0 (post-swizzle) in the RenderTarget y direction. The results are undefined if used in a vertex or geometry shader. If bit 8 is zero, only one x,y derivative pair is computed for each 2x2 stamp of pixels. When _fine is not specified, the data in the current pixel shader invocation may or may not participate in the calculation of the requested derivative, since the derivative is calculated only once per 2x2 quad. GPUs before the Evergreen series ignore the _fine setting. Valid for all GPUs. Format Opcode 1-input, 1-output. Field Name code control Bits 15:0 29:16 Description IL_OP_DSY Field Name Bits reserved fine 22:16 23 Description Must be zero. 0: Gradients can be computed once per quad. 1: Gradients are computed for each pixel. reserved sec_modifier_present 30 pri_modifier_present 31 Related DSX, DXSINCOS. 29:24 Must be zero. Must be zero. Must be zero.
7-168
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-169
2 Raised to a Power
Instructions Syntax Description EXP exp dst, src0 Full precision base 2 power src0: 2src. Computes two to the power of src0.w. The floating point result is placed in all four components of dst. By default this instruction operates on src0.w, but can operate on any component by swizzling it into the fourth component. Valid for all GPUs. Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related EXP_VEC, EXPP, EXN. Bits 15:0 29:16 30 31 Description IL_OP_EXP Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-171
Face Forward
Instructions Syntax Description FACEFORWARD faceforward dst, src0, src1, src2 Performs the following calculation: d(dst = src2*sign(dot(src0, src1)) Valid for all GPUs. Format Opcode 3-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_FACEFORWARD Must be zero. Must be zero. Must be zero.
7-172
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Floor
Instructions Syntax Description FLR flr dst, src0 Performs a component-wise floor operation on the operand to generate a result vector. Calculates the floor of each component of src0, and places the 32-bit result in the corresponding component of dst.The floor of a component is defined as the largest integer less-than-or -equal to the value in the component. For example, the floor of 2.3 is 2.0 and the floor of -3.6 is -4.0. The operation is identical to ROUND_NEG_INF. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v; for (i=0; i < 4; i++) v[i] = floor(v1[i]); WriteResult(v, dst); Example: (float)(src0) floor(2.3) = 2.0, floor(-3.6) = -4.0 Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_FLR Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-173
Fraction
Instructions Syntax Description FRC frc dst, src0 Extracts the fractional portion of each component of src0. The results are returned in the corresponding component of dst. The fractional portion of a component is defined as the result after subtracting the floor of the component from the component (see FLR). The result is always in the range [0.0, 1.0). For negative values, the fractional portion is not the number to the right of the decimal point. For example, the fractional portion of -1.7 is 0.3, not 0.7. In this case, it is produced by subtracting the floor of -1.7 - (-2.0) from 1.7. Valid for all GPUs. Example: src0 - floor(src0); frc(1.7) = 0.3 Operation: VECTOR v1 = EvalSource(src0); VECTOR v; for (i=0; i < 4; i++) v[i] = v1[i] - (float)floor(v1[i]); WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_FRC Must be zero. Must be zero. Must be zero.
7-174
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Filter Width
Instructions Syntax Description FWIDTH fwidth dst, src0 Computes the sum of the absolute derivative in x and y using local differencing for each component of src0. The result is returned in the corresponding component of dst. If used in a vertex shader, the results are undefined. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); for (i=0; i < 4; i++) v[i] = abs(dPdx(v1[i])) + abs(dPdy(v1[i])); Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_FWIDTH Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-175
Move Data
Instructions Syntax Summary Description INV_MOV invariant_move dst, src0 Move data between registers. Moves value from src0 to dst. As a result of using this instruction, the compiler ensures that any other shader that computes this source using the same instructions gets the identical answer. Generally, use of INV_MOVE prevents many compiler optimizations and lowers performance. Valid for R600 GPUs and later. Operation: VECTOR v = EvalSource(src0); WriteResult(v, dst); Format Opcode 1-input 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_INVARIANT_MOV Must be zero. Must be zero. Must be zero.
Vector Length
Instructions Syntax Description LEN len dst, src0 Computes the length of a vector. Computes the vector length of the three component vector in src0.[xyz]. The scalar 32-bit floating point result is placed in all four components of dst. Valid for all GPUs. Example: dst4(src0,src1) Operation: VECTOR v1 = EvalSource(src0); VECTOR v; V[0] = v[1] = v[2] = v[3] = sqrt(v1[0]*v1[0]+ v1[1]*v1[1] + v1[2]*v1[2] + v1[3]*v1[3]); WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present 7-176 Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Vector Length
pri_modifier_present Related None. 31 Must be zero.
Lighting Coefficient
Instructions Syntax Description LIT lit dst, src0 Calculates lighting coefficients for ambient, diffuse, and specular light contributions. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v; float epsilon = 1.192092896e-07F; float g = v1[3]; if(g < -(128.0-epsilon)) g = -(128.0-epsilon); else if (g > (128.0-epsilon)) g = 128.0-epsilon; v[0] = 1.0; v[1] = (v1[0] > 0.0) ? v1[0] : 0.0; v[2] = ((v1[0] > 0.0) && (V1[1] > 0)) ? EXP2(g * LOG2(v1[1])) : 0.0; v[3] = 1.0; WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_LIT Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-177
Natural Logarithm
Instructions Syntax Description LN ln_zeroop(op) dst, src0 Computes the natural logarithm of src0.w. The result of the operation is accurate to at least bits. The 32-bit floating point result is placed in all four components of dst. By default this instruction operates on src0.w, but can operate on any component by swizzling it into the fourth component. zeroop can be any value of the enumerated type ILZeroOp(zeroop) except IL_ZEROOP_0. If no zeroop value is provided, zeroop(fltmax) is used. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v; float f; if (v1[3] == 0.0) { if(zeroop == IL_ZEROOP_FLT_MAX) f = -FLT_MAX; else if(zeroop == IL_ZEROOP_INFINITY) f = -INFINITY; else if(zeroop == IL_ZEROOP_INF_ELSE_MAX) f = -INFINITY or -FLT_MAX; # Depends on IL Implementation } else if (v1[3] < 0.0) { f = undefined; } else { f = (float)(log10(v1[3])/log10(e)); } v[0] = v[1] = v[2] = v[3] = f; WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control Bits 15:0 29:16 Description IL_OP_LN Any value of the enumerated type ILZeroOp(zeroop). See Table 6.33 on page 6-22. Must be zero. Must be zero.
7-178
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Base-2 Logarithm
Instructions Syntax Description LOG log_zeroop(op) dst, src0 Computes the base-2 logarithm of src0.w. The result of the operation is accurate to at least 21 bits. The 32-bit floating point result is placed in all four components of dst. By default this instruction operates on src0.w, but can operate on any component by swizzling it into the fourth component. zeroop can be any value of the enumerated type ILZeroOp(zeroop) except IL_ZEROOP_0. If no zeroop value is provided, zeroop(fltmax) is used. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v; float f; if (v1[3] == 0) { if(zeroop == IL_ZEROOP_FLT_MAX) f = -FLT_MAX; else if(zeroop == IL_ZEROOP_INFINITY) f = -INFINITY; else if(zeroop == IL_ZEROOP_INF_ELSE_MAX) f = -INFINITY or -FLT_MAX; # Depends on IL Implementation } else if (v1[3] < 0.0) { f = undefined; } else { f = (float)(log10(v1[3])/log10(2)); } v[0] = v[1] = v[2] = v[3] = f; WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control Bits 15:0 29:16 Description IL_OP_LOG Any value of the enumerated type ILZeroOp(zeroop). See Table 6.33 on page 6-22. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-179
7-180
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-181
Linear Interpolation
Instructions Syntax Description LRP lrp dst, src0, src1, src2 Computes the linear interpolation between two vectors. Valid for all GPUs. Example: src1 src0 + src2 (1.0 - src0) Operation: VECTOR v1 = EvalSource(src0); VECTOR v2 = EvalSource(src1); VECTOR v3 = EvalSource(src2); VECTOR v; for (i=0; i < 4; i++) v[i] = v2[i] * v1[i] + v3[i] * (1 v1[i]); WriteResult(v, dst); Format Opcode 3-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_LRP Must be zero. Must be zero. Must be zero.
7-182
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Less Than
Instructions Syntax Description LT lt dst, src0, src1 Compares two float vectors, src0 and src1, for each component, and writes the result to dst. If the comparison is true, 0xFFFFFFFF is returned for that component; otherwise, 0x0000000 is returned. This instruction follows DX10 Floating Point Rules. Denorms are flushed before comparison (original source registers untouched). +0 equals -0. Comparison with NaN returns false. Primary and secondary opcode modifiers are not permitted, and no additional control options are supported. Thus, the pri_modifier_present and sec_modifier_present fields must be zero. Valid for R6XX GPUs and later. Format Opcode 2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_LT Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-183
7-184
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Description
Computes maximum value per component. Both max(+0, -0) and max(-0, +0) return +0. NaN has special handling: If one source operand is NaN, then the other source operand is returned (choice made per-component). If both are NaN, any NaN representation is returned. Denorms are flushed to sign preserved 0s before comparison; if it is the maximum, the flushed denorm is written to dst. If the _ieee flag is included then MIN and MAX follow IEEE-754r rules for minnum and maxnum except denorms are flushed before the comparison is made. In addition, MIN returns -0 and MAX returns +0 for comparisons between -0 and +0. Valid for R600 GPUs and later that support the IEEE flag. dst = max(src0, src1) Operation: VECTOR v1 = EvalSource(src0); VECTOR v2 = EvalSource(src1); VECTOR v; for (i=0; i < 4; i++) v[i] = (v1[i] > v2[i]) ? v1[i] : v2[i]; WriteResult(v, dst);
Format Opcode
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_OP_MAX MAX: 0 1 No special NaN rules. IEEE-style NaN rules.
Related
None.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-185
7-186
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-187
7-188
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-189
Not Equal
Instructions Syntax Description NE nt dst, src0, src1 Compares two float vectors, src0 and src1, for each component, and writes the result to dst. If the comparison is true, 0xFFFFFFFF is returned for that component; otherwise, 0x0000000 is returned. This instruction follows DX10 Floating Point Rules. Denorms are flushed before comparison (original source registers untouched). +0 equals -0. Comparison with NaN returns false. Primary and secondary opcode modifiers are not permitted, and no additional control options are supported. Thus, the pri_modifier_present and sec_modifier_present fields must be zero. Valid for R600 GPUs and later. Format Opcode 2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_NE Must be zero. Must be zero. Must be zero.
7-190
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-191
nrm4
18
31:19
7-192
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Reduce Vector to [,
Instructions Syntax Description PIREDUCE
pireduce dst, src0 All four components of the vector in src0 are reduced to the range [-, Valid for all GPUs. Example: dst = (fract((src0/2)) + 0.5) x 2 x ) - Operation: VECTOR v1 = EvalSource(src0); VECTOR v; for (i=0; i < 4; i++) v[i] = (frac((v1[i]/(2*Pi))+0.5)* 2 * PI) PI; WriteResult(v, dst);
].
Format Opcode
1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_OP_PIREDUCE Must be zero. Must be zero. Must be zero.
Related
None.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-193
Project Vector
Instructions Syntax Description POW project_stage(n) dst, src_divComp (unknown) Moves a value from src to dst. An IL_Src_Mod token is required in this instruction. The modifier_present field of the IL_Src token must be set to 1. This instruction cannot be used on a stage that has not been declared with a DCLPT instruction. Inputs are src0.w and src1.w. Input is in radians. Results are broadcast to all four channels of dst. The divComp source modifier must be set to IL_DIVCOMP_UNKNOWN, so the component used to divide is specified by AS_TEX_PROJECTED_N(stage). If the component to divide by is negative, the result of this instruction is undefined. Valid for all GPUs. Operation: VECTOR v = EvalSource(src); WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control Related None. Bits 15:0 31:16 Description IL_OP_POW Specifies the texture/stage unit.
7-194
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
X to the Power of Y
Instructions Syntax Description POWER pow dst, src0, src1 Computes src0.w to the power of src1.w (src0.wsrc1.w). By default, this instruction operates on src0.w and src1.w, but can operate on any component of either operand by swizzling it into the fourth component. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v2 = EvalSource(src1); VECTOR v; If(v1[3] < 0.0) { v[0] = v[1] = v[2] = v[3] = undefined; } else { v[0] = v[1] = v[2] = v[3] = exp2(v2[3] * log2(v1[3])); } WriteResult(v, dst); Format Opcode 2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_POW Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-195
Reciprocal
Instructions Syntax Description RCP rcp_zeroop(op) dst, src0 Computes the reciprocal of src0.w. By default, this instruction operates on src0.w and src1.w, but can operate on any component of either operand by swizzling it into the fourth component. If no zeroop value is provided, zeroop(fltmax) is used. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v; float f = v1[3]; if (f == 0.0) { if(zeroop == IL_ZEROOP_0) f = 0.0; else if(zeroop == IL_ZEROOP_FLT_MAX) f = FLT_MAX; else if(zeroop == IL_ZEROOP_INFINITY) f = INFINITY; else if(zeroop == IL_ZEROOP_INF_ELSE_MAX) f = INFINITY or FLT_MAX; # Depends on IL Implementation } else if (f != 1.0) f = 1/f; v[0] = v[1] = v[2] = v[3] = f; WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control Bits 15:0 17:16 Description IL_OP_RCP The value of the enumerated type ILZeroOp(zeroop), which controls how this instruction behaves when the value of src0 is 0.0. See Table 6.33 on page 6-22. Must be zero. Must be zero. Must be zero.
7-196
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-197
Round
Instructions Syntax Description RND rnd dst, src0 Rounds the float value in each component of a floating point vertex (src0) to the nearest integer. Valid for R600 GPUs and later. Example: src0 + 0.5 Operation: VECTOR v1 = EvalSource(src0); VECTOR v; for (i=0; i < 4; i++) v[i] = floor(v1[i] + 0.5); WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_OP_RND Must be zero. Must be zero. Must be zero.
7-198
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Round to
Instructions Syntax Description ROUND_NEG_INF round_neginf dst, src0 Rounds the float values in each component of src0 towards - . This is sometimes called a floor() instruction. Valid for R600 GPUs and later. Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_OP_ROUND_NEG_INF Must be zero. Must be zero. Must be zero.
Float Round to +
Instructions Syntax Description ROUND_PLUS_INF round_plusinf dst, src0 Rounds the float values in each component of src0 towards ceil() instruction. Valid for R600 GPUs and later. Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_OP_ROUND_PLUS_INF Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-199
7-200
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-201
Set on Comparison
Instructions Syntax Description SET set_relop(op) dst, src0, src1 Compares each component of src0 with the corresponding component of src1. The type of comparison performed is dictated by relop(op). If the comparison src0.{x|y|z|w} relop(op) src1.{x|y|z|w} evaluates TRUE, the result is 1.0; otherwise, the result is 0.0. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v2 = EvalSource(src1); VECTOR v; for (i=0; i < 4; i++) v[i] = (v1[i] relop v2[i]) ? 1.0 : 0.0; WriteResult(v, dst); Format Opcode 2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_SET Any value of the enumerated type ILRelOp(relop). See Table 6.23 on page 6-18. Must be zero. Must be zero.
7-202
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Compute Sign
Instructions Syntax Description SGN sgn dst, src0 Computes the sign of each component of src0. Valid for all GPUs. Operation: VECTOR v = EvalSource(src0); for (i=0; i < 4; i++) { if (v[i] < 0.0) v[i] = -1.0; else if (v[i] == 0.0) v[i] = 0.0; else v[i] = 1.0; } WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_SGN Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-203
Sine (sin)
Instructions Syntax Description SIN sin dst, src0 Computes the sine of src0.w. src0.w is in radians for trigonometric functions. src0.w must be within the range [-,] for each function; otherwise, the results are undefined. By default, this instruction operates on src0.w, but can operate on any component by swizzling it into the fourth component. The maximum absolute error is 0.002. Valid for R600 GPUs and later. Example: dst = sin(src0) Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_OP_SIN Must be zero. Must be zero. Must be zero.
7-204
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Component-Wise Sine
Instructions Syntax Description SIN_VEC sin_vec dst, src0 Computes the sine of each component of src0 in radians. The maximum absolute error is 0.0008 in the range [-100*, 100*]. Valid for R600 GPUs and later. Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related Bits 15:0 29:16 30 31 Description IL_OP_SIN_VEC Must be zero. Must be zero. Must be zero.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-205
Square Root
Instructions Syntax Description SQRT sqrt dst, src0 Computes the square root of src0.w. By default, this instruction operates on src0.w, but can operate on any component by swizzling it into the fourth component. If src0.w is less than zero, the result is undefined. The result is approximate. Valid for all GPUs. Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related SQRT_VEC. Bits 15:0 29:16 30 31 Description IL_OP_SQRT Must be zero. Must be zero. Must be zero.
7-206
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-207
Tangent (tan)
Instructions Syntax Description TAN tan dst, src0 Computes the tangent of src0.w. src0.w is in radians. src0.w must be within the range [-, ] for each function; otherwise, the results are undefined. By default, this instruction operates on src0.w, but can operate on any component by swizzling it into the fourth component. The maximum absolute error is 0.002. Valid for all GPUs. Operation: VECTOR v1 = EvalSource(src0); VECTOR v v[0] = v[1] = v[2] = v[3] = tan(v1[3]); WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_TAN Must be zero. Must be zero. Must be zero.
7-208
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
+ 1 + 2 + 3
30 31
Float Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-209
Valid for all GPUs that support double floating-point operations. Format Opcode 2-input 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_D_ADD Must be zero. Must be zero. Must be zero.
7-210
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-211
d+0 +0 +0
[+0.0,+1.0)* NAN
Valid for all GPUs that support double floating-points. Format Opcode 1-input 1-output. Field Name code control Bits 15:0 29:16 Description IL_OP_D_FRAC Must be zero. Must be zero. Must be zero.
output wz NAN
Valid for all GPUs that support double floating-points. Format Opcode 1-input 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_D_FREXP Must be zero. Must be zero. Must be zero.
7-212
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_OP_D_GE Must be zero. Must be zero. Must be zero.
Related
None.
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-213
7-214
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
2-input, 1-output. Field Name code control sec_modifier_present pri_modifier_present Bits 15:0 29:16 30 31 Description IL_OP_D_LT Must be zero. Must be zero. Must be zero.
Related
None.
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-215
7-216
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-217
Valid for all GPUs that support double floating-points. Format Opcode 2-input 1-output. Field Name code control sec_modifier_present pri_modifier_present Related None. Bits 15:0 29:16 30 31 Description IL_OP_D_MUL Must be zero. Must be zero. Must be zero.
7-218
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-219
Reciprocal
Instructions Syntax Description D_RCP drcp_zeroop(op) dst, src The first two bits of control are set to a value of the enumerated type ILZeroOp(zeroop), which controls how this instruction behaves when the value of src is 0.0. It computes reciprocal value of the 4th channel of src. By default, this instruction operates on src.w, but can operate on any component by swizzling it into the fourth channel. If no zeroop value is provided, zeroop(fltmax) is used. Valid for Evergreen GPUs and later that support double floating-point. Operation: VECTOR v1 = EvalSource(src); VECTOR v; Double d = v1[2,1]; if (d == 0.0) { if(zeroop == IL_ZEROOP_0) d = 0.0; else if(zeroop == IL_ZEROOP_FLT_MAX) d = DBL_MAX; else if(zeroop == IL_ZEROOP_INFINITY) d = INFINITY; else if(zeroop == IL_ZEROOP_INF_ELSE_MAX) d = INFINITY or dbl_MAX; # Depends on IL Implementation } else if (d != 1.0) d = 1/d; v[1,0] = d; WriteResult(v, dst); Format Opcode 1-input, 1-output. Field Name code control Related None. Bits 15:0 31:16 Description IL_OP_D_RCP Must be zero.
7-220
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Related
None.
Square Root
Instructions Syntax Description D_SQRT dsqrt dst, src0 Computes the double square root of src0.xy. The result goes into either dst.xy or dst.zw.. dst = src0.xy dst = |src0.xy| The result is approximate. This opcode is not part of DX11. Valid for Evergreen GPUs and later that support double floating-points. Format Opcode 1-input, 1-output. Field Name code control Related None. Bits 15:0 31:16 Description IL_OP_D_SQRT Must be zero.
Double-Precision Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-221
If R1.y has 0x1234567, and R1.x has 0x89ABCDEF, then: dcl_literal l10, 1, 11, 24, 0 Bitalign r0.x, r1.y, r1.x, l10.x results in r0.x having 0xA4D5E6F7 Bitalign r0.x, r1.y, r1.x, 110.y results in r0.x having 0xACF13579 Bitalign r0.x, r1.y, r1.x, l10.z results in r0.x having 0x23456789 None.
Related
7-222
Multi-Media Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Multi-Media Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-223
7-224
Multi-Media Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Multi-Media Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-225
7-226
Multi-Media Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Multi-Media Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-227
31:16
Null Operation
Instructions Syntax Description NOP nop No operation performed. Valid for all GPUs. Format Opcode 0-input, 0-output. Field Name code control Related None. Bits 15:0 31:16 Description IL_OP_NOP Must be zero.
7-228
7-229
Consume an existing data slot in append buffer and return the address of the slot
Instructions Syntax Description APPEND_BUF_CONSUME append_buf_consume_id(n) dst.x Append buffer is essentially a UAV buffer (must be raw or structured) with the Append flag set by the driver at binding time. The append buffer lets a shader write data to memory in a compacted and unordered way. Append_buf_consume_id(n) consumes an existing data slot in the append buffer with ID(n) for each active work-item, and returns the position/index of the data slot to dst. (Internally, this instruction decrements a hidden counter associated with an append buffer, and returns the original value of the counter.) The returned index in dst can be used by subsequent instructions to compute the address for UAV instructions. For example, the returned value can be multiplied by 4 to get the byte address for UAV_RAW_LOAD. dst must have a mask of .x, .y, .z, or .w. A single append buffer can not be used for both Append_buf_alloc and Append_buf_consume. This instruction is only for pixel or compute shaders. Valid for Evergreen GPUs and later. Format Opcode 0-input, 1-output, 0 additional token. Token Field Name 1 code ID sec_modifier_present pri_modifier_present Example Related append_buf_consume_id(1) r3.x None. Bits 15:0 29:16 30 31 Description IL_OP_APPEND_BUF_CONSUME Resource ID. Must be zero. Must be zero.
7-230
7-231
7-232
7-233
dcl_struct_srv_id(1) 32 None.
7-234
dcl_struct_uav_id(1) 32 None.
7-235
dcl_uav_id(1)_type(1d)_fmtx(float) None.
7-236
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-237
7-238
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-239
7-240
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-241
7-242
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-243
7-244
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-245
7-246
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-247
Compare Src1 With Read LDS memory, If Equal Replace With Src2
Instructions Syntax Description LDS_READ_CMP_XCHG lds_read_cmp_xchg_resource(id) dst, src0, src1, src2 The address is in bytes, but the two least significant bits must be zero. The data is a Dword. If the LDS is declared typeless, src0.x (after swizzle) specifies a byte address relative to the work-group. Dst.x = lds[src0.x] if (lds[src0x] == src1.x) {lds[src0.x] = src2.x;} If the LDS is declared as struct, src0.x (after swizzle) specifies the index into the array; src0.y (after swizzle) specifies the offset into the struct. The offset is in bytes. Dst.x = lds[(src0.x*lds_stride + src0.y)/4] If (lds[(src0.x*lds_stride + src0.y)/4] == src1.x) { lds[(src0.x*lds_stride + src0.y)/4] = src2.x } Valid for Evergreen GPUs and later. Format Opcode 3-input, 1-output. Field Name code control reserved Related None. Bits 15:0 19:16 31:20 Description IL_OP_LDS_CMP_XCHG Resource ID. Must be zero.
7-248
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-249
7-250
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-251
7-252
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-253
7-254
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-255
7-256
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-257
Write LDS
Instructions Syntax Description LDS_STORE lds_store_id(id) src0, src1 Address is in bytes, but the two least significant bits must be zero. The store data is a Dword. If the LDS is declared typeless, src0.x (after swizzle) specifies a byte address relative to the work-group. lds[src0.x] = src1.x (using integer add) If the LDS is declared as struct, src0.x (after swizzle) specifies the index into the array, src0.y (after swizzle) specifies the offset into the struct. The offset is in bytes. lds[(src0.x*lds_stride + src0.y)/4] = src1.x Valid for R700 GPUs and later. Format Opcode 2-input, 0-output. Field Name code control reserved Related None. Bits 15:0 19:16 31:20 Description IL_OP_LDS_STORE Resource ID. Must be zero.
7-258
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-259
7-260
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-261
7-262
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Related
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-263
Random Access Read From a Structured SRV Buffer (Returns up to Four Dwords)
Instructions Syntax Description SRV_STRUCT_LOAD srv_struct_load_id(n) dst, src0.xy Read from a structured SRV. SRV with ID n must have been declared as a structured SRV buffer. src0.xy (post swizzle) specifies the index of the structure and the offset within the structure, respectively. The offset is in bytes and must be four-bytes aligned. Four consecutive 32-bit components are read from SRV(n) at the address specified by src0.xy (post-swizzle). One to four Dwords are written to dst, depending on the dst mask. Output swizzle is not allowed. An instruction with an out-of-range address returns 0, unless the offset is out-of-bounds, in which case the return value is undefined. The SRV is a read-only input buffer that can be used by all shader types. There is a limit on the number of SRV buffers exposed in a shader (currently 128). Indexed resource ID is supported on Evergreen GPUs or later. This is specified by the ext keyword. With this option on, an extra input is needed as indexed input (see example below). Valid for R7XX GPUs and later. Format Opcode 1-input, 1-output, 0 additional token. Token Field Name 1 code control reserved flag reserved Example Bits 15:0 23:16 27:24 28 31:29 Description IL_OP_SRV_STRUCT_LOAD Resource ID. Must be zero. Flag for indexed resource ID. Must be zero.
Related
7-264
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-265
7-266
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-267
7-268
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-269
7-270
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-271
7-272
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-273
reserved_arena_atomic 31 Example
Related
7-274
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-275
7-276
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-277
Atomic Bitwise Compare and Write to UAV, Return Old Value in UAV
Instructions Syntax Description UAV_READ_CMP_XCHG uav_read_cmp_xchg_id(n) dst.x, src0, src1.x, src2.x Atomic single component bitwise compare and write to UAV: if (uav[src0] == src2.x) uav[src0]=src1.x. Whether the compared values are identical or not, the old value is always returned in uav[src0]. A UAV with ID n must have been declared as raw, structured, or typed. If typed, it must be declared as 32-bit UINT or SINT. src0 provides the address. If raw, src0.x (post-swizzle) provides the address in bytes; if structured, src0.xy provides the address of struct index and offset in bytes. If typed, the number of components used for the address depends on the UAV dimension. For example, for texture1D Arrays, src0.x (post-swizzle) provides the buffer address, and src0.y (post-swizzle) provides the index/offset of the array. src1.x (post-swizzle) provides a 32-bit Dword to be written to the UAV. src2.x (post-swizzle) provides a 32-bit Dword to be compared (bitwise compare with uav[src0]). The 32-bit UAV memory specified by the address in src0 is overwritten by src1.x if the compared values are identical. A 32-bit value in the UAV before the operation is always returned to dst, which must have a mask of .x, .y, .z, or .w. An instruction with an out-of-range address writes nothing to the UAV surface; however, for a structured UAV, if the offset is out-of-bounds, an undefined value is written to the UAV. An undefined value is returned to dst. If the shader invocation is inactive, nothing is written to the UAV surface, and an undefined value is returned to dst. Valid for Evergreen GPUs and later. Format Opcode 3-input, 1-output, 0 additional token Token Field Name 1 code control sec_modifier_present pri_modifier_present Example Related Bits 15:0 29:16 30 31 Description IL_OP_UAV_READ_CMP_XCHG Resource ID. Must be zero. Must be zero.
7-278
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-279
7-280
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-281
7-282
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-283
7-284
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-285
7-286
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-287
7-288
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-289
7-290
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-291
7-292
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-293
7-294
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-295
7-296
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-297
7-298
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
LDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-299
7-300
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-301
7-302
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-303
7-304
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-305
7-306
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-307
7-308
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-309
7-310
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-311
7-312
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Compare src1 With Lead GDS Memory and, if Equal, Exchange With src2
Instructions Syntax Description GDS_READ_CMP_XCHG gds_read_cmp_xchg_id(id) dst, src0, src1, src2 Read global data share (GDS) memory, compare with src1, and exchange with src2. Compare and exchange is the IL version of a classic lock. Using the same addressing rules as the other GDS atomics (see preceding opcode) it performs the following operation. a = ... as above dst.x = gds[a] if (dst.x == src1.x) { gds[a] = src2.x; } Valid for Evergreen GPUs and later. Format Opcode 3-input, 1-output. Token Field Name 1 code ID Bits 15:0 29:16 Description IL_OP_GDS_READ_CMP_XCHG Resource ID. Must be zero. Must be zero.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-313
7-314
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Atomic GDS Memory Read and Integer Increment of Src1 in GDS Memory
Instructions Syntax Description GDS_READ_INC gds_read_inc_id(id) dst, src0, src1 Atomic read global data share (GDS) memory and integer increment. If GDS is declared typeless, src0.x specifies a byte address relative to the work-group. The address must be aligned to a Dword: the lower two bits of the address must be zero. a = src0.x/4 dst.x = gds[a] gds[a] = ( gds[a] >= src1.x ? 0 : gds[a] + 1 ) If GDS is declared as a struct, then src0.x specifies the index into the array; src0.y specifies the offset, in bytes, into the struct. a = (src0.x * gds_stride + src0.y)/4 dst.x = gds[a] gds[a] = ( gds[a] >= src1.x ? 0 : gds[a] + 1 ) Valid for Evergreen GPUs and later. Format Opcode 2-input, 1-output. Token Field Name 1 code ID Bits 15:0 29:16 Description IL_OP_GDS_READ_INC Resource ID. Must be zero. Must be zero.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-315
7-316
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-317
7-318
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-319
7-320
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-321
7-322
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-323
7-324
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-325
7-326
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Valid for Evergreen GPUs and later. Format Opcode 2-input, 0-output. Token Field Name 1 code ID Bits 15:0 29:16 Description IL_OP_GDS_SUB Resource ID. Must be zero. Must be zero.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-327
Valid for Evergreen GPUs and later. Format Opcode 2-input, 0-output. Token Field Name 1 code ID Bits 15:0 29:16 Description IL_OP_GDS_UMAX Resource ID. Must be zero. Must be zero.
7-328
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
7-329
7-330
GDS Instructions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
0-input format.
Ordinal 1 2 Token IL_Opcode token with code set to IL_OP_DCL_FUNCTION_BODY (control field ignored). Unsigned integer representing label of the function body.
Related
Defines the virtual functions/interfaces that can be reached by a set of calls on the same object type. The statement establishes the id for an object type. This is analogous to a C++ virtual function table. The Opcode token is followed by a word containing an unsigned integer specifying the number of table entries (function body IDs), followed by a list of words that contain the function body ID numbers. Valid for Evergreen GPUs and later. Format Opcode 0-input format, followed by n function body labels. Token Field Name 1 code control Bits 15:0 29:16 Description IL_OP_ DCL_FUNCTION_TABLE
Table ID.
Must be zero. Must be zero.
Unsigned integer specifying the number of table entries (function body IDs). List of Dwords containing the function body id numbers.
7-331
array_size: number of interface pointers declared = number of available instances that can be used. length: number of times this interface is called = the size of the associated function tables. Unsigned integer specifying the number of table entries (function table IDs). List of Dwords containing the function table ID numbers.
7-332
sec_modifier_present 30 pri_modifier_present 31
7-333
Related
7-334
7-335
Define a Macro
Instructions Syntax Description MACRODEF mdef(k)_out(m)_in(n)[_outline] The macro identifier, k, must be a unique numerical ID. Following tokens, up to the EndMacro token, are called macro body. The mdef instruction cannot appear within the body of a macro. The special IL register types IN and OUT can be used within the macro body. All temporary registers and all literals are scoped to the macro. The IN registers are zero-based and map up to n-1, as specified in the macro definition. The OUT registers are zero-based and map up to m-1, as specified in the macro definition. If the outline option is not specified, all macros are inlined into the calling function. Valid for all GPUs. Format Opcode 0-input, 0-output. Token 1 Field Name code control Bits 15:0 29:16 Description IL_OP_ MACRODEF Macro k. Must be zero. Must be zero. Number of inputs. Number of outputs. Outline option use. Reserved.
The following macro computes the overflow of adding in0 and in1 as unsigned 32-bit integers. It returns 1 in case of an overflow, 0 if the result fits in 32 bits. The result can be used as a carry flag for a multi-word add. mdef(0)_out(1)_in(2) INOT r1, in0 ULT r2, r1, in1 Dcl_literal l1, 1, 1, 1, 1 IAND out0, r2, l1 mend To outline this macro as a function: mdef(0)_out(1)_in(2)_outline INOT r1, in0 ULT r2, r1, in1 Dcl_literal l1, 1,1,1,1 IAND out0, r2, l1
Related
MACROEND, MCALL.
7-336
End a Macro
Instructions Syntax Description MACROEND mend Closes the lexical scope of a macro started by MACRODEF. It can be followed by another macro definition or an IL language token. It can appear only at the end of a macro body. Valid for all GPUs. Format Opcode 0-input, 0-output. Field Name code control sec_modifier_present pri_modifier_present Related MACRODEF, MCALL. Bits 15:0 29:16 30 31 Description IL_OP_ MACROEND Must be zero. Must be zero. Must be zero.
7-337
Call a Macro
Instructions Syntax Description MCALL mcall(k) (list of outputs), (list of inputs) Calls the macro identified by k, which is the unique integer identifier of the macro, and expands the macro following the macro expansion rules noted below. Replaces the call with the text from the expanded macro. Note that in the syntax the parentheses are required, even if there are no inputs or outputs. Each actual parameter can be either a TEMP or LITERAL. Macro calls can be nested. Each output must be a destination token; each input must be a source token. Valid for all GPUs. Macro expansion consists of the following steps. 1. 2. 3. 4. The macro expander inserts a prolog consisting of a set of moves from actual input parameters to formal input arguments. A copy of the macro body is inserted. An epilog consisting of moves from formal output arguments to actual output parameters is added. The macro registers are renamed. Renaming consists of: a. b. c. All temporary registers in the macro body are replaced by new, unique temporary registers. All literal registers in the macro body are replaced by new, unique literal registers. All input and output argument registers are replaced by new temporary registers.
For example: Given mcall(5), r1, r2, l1, macro expansion generates the following. Macro mdef(5)_out(1)_in(2) Expansion ; prolog mov in0, r2 mov in1, l1 mov r1, in1 iadd out0, in0, r1 ; epilog mov r1, out0 Rename mov rm0, r2 mov rm1, l1 mov rm2, rm1 iadd rm3, rm0, rm2 mov r1, rm3
Conditionals in macros can be expressed with normal S statements. Format Opcode 0-input, 0-output. Token 1 Field Name code control Bits 15:0 29:16 Description IL_OP_ MCALL Macro k. Must be zero. Must be zero.
7-338
The following example shows a macro to generate greater than or equal sequence (for different types). mdef(2)_out(1),in(3) // 1 for unsigned if in3.x eq 1 uge out0, in0, in1 else // 2 for signed if in3.x eq 2 ge out0, in0, in1 else // 3 for double if in3.x eq 3 dge out0, in0, in1 endif endif endif mend def l1, 1,2,3,4 mcall(2), (r1),(r2,r3,l1.x) // mcall(2), (r1),(r2,r3,l1.z) //
Related
MACROEND, MACRODEF.
7-339
7-340
The following algorithm is used when a shadow texture fetch is required by the value of the shadowmode field in the TEXLD, TEXLDB, and TEXLDD instructions and by the shadow enable value set through STATE. While the POINT shadow filter must be the fasted, the WEIGHTED_QUAD produce a smoother edge in the shadow. The BEST shadow filter is provided for implementations that cannot accelerate the WEIGHTED_QUAD filter.
VECTOR vsrc = EvalSource(src); VECTOR vdst; VECTOR temp; VECTOR weights; float failVal = Eval(AS_TEX_SHADOW_COMPARE_FAIL_VALUE_N(stage)); float topTexel; float bottomTexel; float texel; if(AS_TEX_SHADOW_FILTER_N(stage) == { temp[0]=tfetch(stage, vsrc[0] temp[1]=tfetch(stage, vsrc[0] + temp[2]=tfetch(stage, vsrc[0] temp[3]=tfetch(stage, vsrc[0] + WEIGHTED_QUAD) xoffset(0.5), xoffset(0.5), xoffset(0.5), xoffset(0.5), vsrc[1] vsrc[1] vsrc[1] vsrc[1] + + yoffset(0.5), yoffset(0.5), yoffset(0.5), yoffset(0.5), 0.0, 0.0, 0.0, 0.0, 1.0); 1.0); 1.0); 1.0);
if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == GEQUAL) { for (i=0; i < 4; i++) temp[i] = (vsrc[2] >= temp[i]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == LEQUAL) { for (i=0; i < 4; i++) temp[i] = (vsrc[2] <= temp[i]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == GREATER) { for (i=0; i < 4; i++) temp[i] = (vsrc[2] > temp[i]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == LESS) { for (i=0; i < 4; i++) temp[i] = (vsrc[2] < temp[i]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == EQUAL) { for (i=0; i < 4; i++) temp[i] = (vsrc[2] == temp[i]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == NOTEQUAL) {
A-1
for (i=0; i < 4; i++) temp[i] = (vsrc[2] != temp[i]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == ALWAYS) { for (i=0; i < 4; i++) temp[i] = 1.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == NEVER) { for (i=0; i < 4; i++) temp[i] = 0.0; } # max temp.xyzw temp.xyzw failval for (i=0; i < 4; i++) temp[i] = (temp[i] > failVal) ? temp[i] : failVal; # perform a weighted bilinear filter with lerps weights = GetWeights(vsrc); topTexel = temp[0] * weights[0] + temp[1] * (1.0 weights[0]); bottomTexel = temp[2] * weights[0] + temp[3] * (1.0 weights[0]); texel = bottomTexel * weights[1] + topTexel * (1.0 weights[1]); } else if(AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == POINT) { temp[0] = tfetch(stage, vsrc[0], vsrc[1], vsrc[2], vsrc[3]); if (AS_TEX_SHADOW_COMPARE_FUNC_N == GEQUAL) { temp[0] = (vsrc[2] >= temp[0]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == LEQUAL) { temp[0] = (vsrc[2] <= temp[0]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == GREATER) { temp[0] = (vsrc[2] > temp[0]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == LESS) { temp[0] = (vsrc[2] < temp[0]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == EQUAL) { temp[0] = (vsrc[2] == temp[0]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == NOTEQUAL) { temp[0] = (vsrc[2] == temp[0]) ? 1.0 : 0.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == ALWAYS) { temp[0] = 1.0; } else if (AS_TEX_SHADOW_COMPARE_FUNC_N(stage) == NEVER) { temp[0] = 0.0; } # max texel temp.x failval texel = (temp[0] > failVal) ? temp[0] : failVal; } A-2
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
else if(AS_TEX_SHADOW_FILTER_N(stage) == BEST) { Largest number of samples that can be supported in implementation; }
if (AS_TEX_SHADOW_USAGE_N(stage) == ALPHA) { v[0] = 0.0; v[1] = 0.0; v[2] = 0.0; v[3] = texel; } else if (AS_TEX_SHADOW_USAGE_N(stage) == LUMINANCE) { v[0] = texel; v[1] = texel; v[2] = texel; v[3] = 1.0; } else if (AS_TEX_SHADOW_USAGE_N(stage) == INTENSITY) { v[0] = texel; v[1] = texel; v[2] = texel; v[3] = texel; } else if (AS_TEX_SHADOW_USAGE_N(stage) == NONE) { v[0] = 0.0; v[1] = 0.0; v[2] = 0.0; v[3] = 1.0; } WriteResult(v, dst);
A-3
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Description
This instruction, which declares properties of a texture stage, must be issued on each resource that is used in a ld or sample instruction. The value of stage corresponds to the stage/unit defined in state. There can be at most one DECLRES per value. A shader cannot use this instruction on the same stage more than once. If type is IL_USAGE_PIXTEX_UNKNOWN, this instruction indicates that the texture type of the texture on the stage/unit indicated by stage is not known at shader create time and is determined at shader runtime based upon a setting in STATE. If coordmode is set to IL_TEXCOORDMODE_NORMALIZED, this instruction indicates that the texture coordinate is normalized in any subsequent TEXLD, TEXLDB, or LOD instruction for the texture on stage. If coordmode is set to IL_TEXCOORDMODE_UNNORMALIZED, this instruction indicates that the texture coordinate is not normalized in any subsequent TEXLD, TEXLDD, TEXLDB, TEXLDMS, TEXWEIGHT, PROJECT, or LOD instruction for the texture on stage when the wrap mode set in state for the stage/unit is set to clamp-to-border, clamp-half-way-to-border, or clamp-to-edge. In this case, the x texture coordinate ranges from 0.0 to the width of the texture, the y texture coordinate ranges from 0.0 to the height of the texture, and the z texture coordinate ranges from 0.0 to the depth of the texture. If coordmode is set to IL_TEXCOORDMODE_UNKNOWN, external state is used to determine if the texture coordinate used in any subsequent TEXLD, TEXLDD, TEXLDB, TEXLDMS, TEXWEIGHT, PROJECT, or LOD instructions are normalized at shader run time. This instruction must occur before the first executable instruction of a shader. It can only occur after a COMMENT, DCLDEF, DCLPP, DCLPI, DCLPIN, DCLVOUT, DCLV, DCLARRAY, DEF, DEFB, NOP, or another DCLPT instruction. Ordinal 1 Token IL_Opcode token with code set to IL_OP_DCLRES. The five bits of the control field must be set to a valid value.
Opcode
A-4
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
This appendix specifies the order of the IL enumerations as expected by the compiler. Copy the following code, and paste it into a header file that is to be included in your project.
#define IL_OP_DMAX IL_OP_D_MAX #define IL_OP_DMIN IL_OP_D_MIN #define IL_MINOR_VERSION 0 #define IL_MAJOR_VERSION 2 // comments in this enum will be added to the il spec enum IL_Shader_Type { IL_SHADER_VERTEX, // This code describes a pixel shader IL_SHADER_PIXEL, // This code describes a vertex shader IL_SHADER_GEOMETRY, // This code describes a geometry shader(R6xx and later) IL_SHADER_COMPUTE, // This code describes a compute shader (r7xx and later) IL_SHADER_HULL, //This code describes a hull shader (Rxx and later) IL_SHADER_DOMAIN, //This code describes a domain shader (Rxx and later) IL_SHADER_LAST /* dimension the enumeration */ }; // comments in this enum will be added to the il spec enum IL_Language_Type { IL_LANG_GENERIC, // allows for language specific overrides IL_LANG_OPENGL, // any flavor of opengl IL_LANG_DX8_PS, // direct x 1.x pixel shader IL_LANG_DX8_VS, // direct x 1.x vertex shader IL_LANG_DX9_PS, // direct x 2.x and 3.x pixel shader IL_LANG_DX9_VS, // direct x 2.x and 3.x vertex shader IL_LANG_DX10_PS, // direct x 4.x pixel shader IL_LANG_DX10_VS, // direct x 4.x vertex shader IL_LANG_DX10_GS, // direct x 4.x geometry shader IL_LANG_DX11_PS, // direct x 5.x pixel shader IL_LANG_DX11_VS, // direct x 5.x vertex shader IL_LANG_DX11_GS, // direct x 5.x Geometry shader IL_LANG_DX11_CS, // direct x 5.x Compute shader IL_LANG_DX11_HS, // direct x 5.x Hull shader IL_LANG_DX11_DS, // direct x 5.x Domain shader IL_LANG_LAST /* dimension the enumeration */ }; enum ILOpCode { IL_OP_UNKNOWN, IL_OP_ABS, IL_OP_ACOS, IL_OP_ADD, IL_OP_ASIN, IL_OP_ATAN, IL_OP_BREAK, IL_OP_BREAKC,
B-1
IL_OP_CALL, IL_OP_CALLNZ, IL_OP_CLAMP, IL_OP_CLG, IL_OP_CMOV, IL_OP_CMP, IL_OP_COLORCLAMP, IL_OP_COMMENT, IL_OP_CONTINUE, IL_OP_CONTINUEC, IL_OP_COS, IL_OP_CRS, IL_OP_DCLARRAY, IL_OP_DCLDEF, IL_OP_DCLPI, IL_OP_DCLPIN, IL_OP_DCLPP, IL_OP_DCLPT, IL_OP_DCLV, IL_OP_DCLVOUT, IL_OP_DEF, IL_OP_DEFB, IL_OP_DET, IL_OP_DIST, IL_OP_DIV, IL_OP_DP2ADD, IL_OP_DP3, IL_OP_DP4, IL_OP_DST, IL_OP_DSX, IL_OP_DSY, IL_OP_ELSE, IL_OP_END, IL_OP_ENDIF, IL_OP_ENDLOOP, IL_OP_ENDMAIN, IL_OP_EXN, IL_OP_EXP, IL_OP_EXPP, IL_OP_FACEFORWARD, IL_OP_FLR, IL_OP_FRC, IL_OP_FUNC, IL_OP_FWIDTH, IL_OP_IFC, IL_OP_IFNZ, IL_OP_INITV, IL_OP_KILL, IL_OP_LEN, IL_OP_LIT, IL_OP_LN, IL_OP_LOD, IL_OP_LOG, IL_OP_LOGP, IL_OP_LOOP, IL_OP_LRP, IL_OP_MAD, IL_OP_MAX, IL_OP_MEMEXPORT, IL_OP_MEMIMPORT, IL_OP_MIN, IL_OP_MMUL, IL_OP_MOD, IL_OP_MOV, IL_OP_MUL, IL_OP_NOISE, IL_OP_NOP, IL_OP_NRM, IL_OP_PIREDUCE, IL_OP_POW, IL_OP_PRECOMP, B-2
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_OP_PROJECT, IL_OP_RCP, IL_OP_REFLECT, IL_OP_RET, IL_OP_RND, IL_OP_RSQ, IL_OP_SET, IL_OP_SGN, IL_OP_SIN, IL_OP_SINCOS, IL_OP_SQRT, IL_OP_SUB, IL_OP_TAN, IL_OP_TEXLD, IL_OP_TEXLDB, IL_OP_TEXLDD, IL_OP_TEXLDMS, IL_OP_TEXWEIGHT, IL_OP_TRANSPOSE, IL_OP_TRC, IL_OP_DXSINCOS, IL_OP_BREAK_LOGICALZ, IL_OP_BREAK_LOGICALNZ, IL_OP_CALL_LOGICALZ, IL_OP_CALL_LOGICALNZ, IL_OP_CASE, IL_OP_CONTINUE_LOGICALZ, IL_OP_CONTINUE_LOGICALNZ, IL_OP_DEFAULT, IL_OP_ENDSWITCH, IL_OP_ENDFUNC, IL_OP_IF_LOGICALZ, IL_OP_IF_LOGICALNZ, IL_OP_WHILE, IL_OP_SWITCH, IL_OP_RET_DYN, IL_OP_RET_LOGICALZ, IL_OP_RET_LOGICALNZ, IL_DCL_CONST_BUFFER, IL_DCL_INDEXED_TEMP_ARRAY, IL_DCL_INPUT_PRIMITIVE, IL_DCL_LITERAL, IL_DCL_MAX_OUTPUT_VERTEX_COUNT, IL_DCL_ODEPTH, IL_DCL_OUTPUT_TOPOLOGY, IL_DCL_OUTPUT, IL_DCL_INPUT, IL_DCL_VPRIM, IL_DCL_RESOURCE, IL_OP_CUT, IL_OP_DISCARD_LOGICALZ, IL_OP_DISCARD_LOGICALNZ, IL_OP_EMIT, IL_OP_EMIT_THEN_CUT, IL_OP_LOAD, IL_OP_RESINFO, IL_OP_SAMPLE, IL_OP_SAMPLE_B, IL_OP_SAMPLE_G, IL_OP_SAMPLE_L, IL_OP_SAMPLE_C, IL_OP_SAMPLE_C_LZ, IL_OP_I_NOT, IL_OP_I_OR, IL_OP_I_XOR, IL_OP_I_ADD, IL_OP_I_MAD, IL_OP_I_MAX, IL_OP_I_MIN, IL_OP_I_MUL, IL_OP_I_MUL_HIGH, B-3
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_OP_I_EQ, IL_OP_I_GE, IL_OP_I_LT, IL_OP_I_NEGATE, IL_OP_I_NE, IL_OP_I_SHL, IL_OP_I_SHR, IL_OP_U_SHR, IL_OP_U_DIV, IL_OP_U_MOD, IL_OP_U_MAD, IL_OP_U_MAX, IL_OP_U_MIN, IL_OP_U_LT, IL_OP_U_GE, IL_OP_U_MUL, IL_OP_U_MUL_HIGH, IL_OP_FTOI, IL_OP_FTOU, IL_OP_ITOF, IL_OP_UTOF, IL_OP_AND, IL_OP_CMOV_LOGICAL, IL_OP_EQ, IL_OP_EXP_VEC, IL_OP_GE, IL_OP_LOG_VEC, IL_OP_LT, IL_OP_NE, IL_OP_ROUND_NEAR, IL_OP_ROUND_NEG_INF, IL_OP_ROUND_PLUS_INF, IL_OP_ROUND_ZERO, IL_OP_RSQ_VEC, IL_OP_SIN_VEC, IL_OP_COS_VEC, IL_OP_SQRT_VEC, IL_OP_DP2, IL_OP_INV_MOV, IL_OP_SCATTER, IL_OP_D_FREXP, IL_OP_D_ADD, IL_OP_D_MUL, IL_OP_D_2_F, IL_OP_F_2_D, IL_OP_D_LDEXP, IL_OP_D_FRAC, IL_OP_D_MULADD, IL_OP_FETCH4, IL_OP_SAMPLEINFO, IL_OP_GETLOD, // the dx10.1 version of lod IL_DCL_PERSIST, IL_OP_DNE, IL_OP_DEQ, IL_OP_DGE, IL_OP_DLT, IL_OP_SAMPLEPOS, IL_OP_D_DIV, IL_OP_DCL_SHARED_TEMP, IL_OP_INIT_SR, // a special init inst for 7xx CS IL_OP_INIT_SR_HELPER, // an internal IL OP, only used by SC IL_OP_DCL_NUM_THREAD_PER_GROUP, // thread group = thread block IL_OP_DCL_TOTAL_NUM_THREAD_GROUP, IL_OP_DCL_LDS_SIZE_PER_THREAD, IL_OP_DCL_LDS_SHARING_MODE, IL_OP_LDS_READ_VEC, // R7xx LDS IL_OP_LDS_WRITE_VEC, IL_OP_FENCE, // R7xx/Evergreen fence IL_OP_LDS_LOAD_VEC, // DX10_CS: DX11 style LDS instruction in vector (4 dwords) IL_OP_LDS_STORE_VEC, // DX10_CS: DX11 style LDS instruction in vector (4 B-4
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
dwords) IL_OP_DCL_UAV, // Evergreen UAV IL_OP_DCL_RAW_UAV, IL_OP_DCL_STRUCT_UAV, IL_OP_UAV_LOAD, IL_OP_UAV_RAW_LOAD, IL_OP_UAV_STRUCT_LOAD, IL_OP_UAV_STORE, IL_OP_UAV_RAW_STORE, IL_OP_UAV_STRUCT_STORE, IL_OP_DCL_ARENA_UAV, // 8xx/OpenCL Arena UAV IL_OP_UAV_ARENA_LOAD, IL_OP_UAV_ARENA_STORE, IL_OP_UAV_ADD, IL_OP_UAV_SUB, IL_OP_UAV_RSUB, IL_OP_UAV_MIN, IL_OP_UAV_MAX, IL_OP_UAV_UMIN, IL_OP_UAV_UMAX, IL_OP_UAV_AND, IL_OP_UAV_OR, IL_OP_UAV_XOR, IL_OP_UAV_CMP, IL_OP_UAV_READ_ADD, IL_OP_UAV_READ_SUB, IL_OP_UAV_READ_RSUB, IL_OP_UAV_READ_MIN, IL_OP_UAV_READ_MAX, IL_OP_UAV_READ_UMIN, IL_OP_UAV_READ_UMAX, IL_OP_UAV_READ_AND, IL_OP_UAV_READ_OR, IL_OP_UAV_READ_XOR, IL_OP_UAV_READ_XCHG, IL_OP_UAV_READ_CMP_XCHG, IL_OP_APPEND_BUF_ALLOC, // Evergreen Append buf aloc/consum IL_OP_APPEND_BUF_CONSUME, IL_OP_DCL_RAW_SRV, // Evergreen SRV IL_OP_DCL_STRUCT_SRV, IL_OP_SRV_RAW_LOAD, IL_OP_SRV_STRUCT_LOAD, IL_DCL_LDS, // Evergreen LDS IL_DCL_STRUCT_LDS, IL_OP_LDS_LOAD, IL_OP_LDS_STORE, IL_OP_LDS_ADD, IL_OP_LDS_SUB, IL_OP_LDS_RSUB, IL_OP_LDS_MIN, IL_OP_LDS_MAX, IL_OP_LDS_UMIN, IL_OP_LDS_UMAX, IL_OP_LDS_AND, IL_OP_LDS_OR, IL_OP_LDS_XOR, IL_OP_LDS_CMP, IL_OP_LDS_READ_ADD, IL_OP_LDS_READ_SUB, IL_OP_LDS_READ_RSUB, IL_OP_LDS_READ_MIN, IL_OP_LDS_READ_MAX, IL_OP_LDS_READ_UMIN, IL_OP_LDS_READ_UMAX, IL_OP_LDS_READ_AND, IL_OP_LDS_READ_OR, IL_OP_LDS_READ_XOR, IL_OP_LDS_READ_XCHG, IL_OP_LDS_READ_CMP_XCHG, IL_OP_CUT_STREAM, IL_OP_EMIT_STREAM, B-5
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_OP_EMIT_THEN_CUT_STREAM, IL_OP_SAMPLE_C_L, IL_OP_SAMPLE_C_G, IL_OP_SAMPLE_C_B, IL_OP_I_COUNTBITS, IL_OP_I_FIRSTBIT, IL_OP_I_CARRY, IL_OP_I_BORROW, IL_OP_I_BIT_EXTRACT, IL_OP_U_BIT_EXTRACT, IL_OP_U_BIT_REVERSE, IL_DCL_NUM_ICP, IL_DCL_NUM_OCP, IL_DCL_NUM_INSTANCES, IL_OP_HS_CP_PHASE, IL_OP_HS_FORK_PHASE, IL_OP_HS_JOIN_PHASE, IL_OP_ENDPHASE, IL_DCL_TS_DOMAIN, IL_DCL_TS_PARTITION, IL_DCL_TS_OUTPUT_PRIMITIVE, IL_DCL_MAX_TESSFACTOR, IL_OP_DCL_FUNCTION_BODY, IL_OP_DCL_FUNCTION_TABLE, IL_OP_DCL_INTERFACE_PTR, IL_OP_FCALL, IL_OP_U_BIT_INSERT, IL_OP_BUFINFO, IL_OP_FETCH4_C, IL_OP_FETCH4_PO, IL_OP_FETCH4_PO_C, IL_OP_D_MAX, IL_OP_D_MIN, IL_OP_F_2_F16, IL_OP_F16_2_F, IL_OP_UNPACK0, IL_OP_UNPACK1, IL_OP_UNPACK2, IL_OP_UNPACK3, IL_OP_BIT_ALIGN, IL_OP_BYTE_ALIGN, IL_OP_U4LERP, IL_OP_SAD, IL_OP_SAD_HI, IL_OP_SAD_4, IL_OP_F_2_U4, IL_OP_EVAL_SNAPPED, IL_OP_EVAL_SAMPLE_INDEX, IL_OP_EVAL_CENTROID, IL_OP_D_MOV, IL_OP_D_MOVC, IL_OP_D_SQRT, IL_OP_D_RCP, IL_OP_D_RSQ, IL_OP_MACRODEF, IL_OP_MACROEND, IL_OP_MACROCALL, IL_DCL_STREAM, IL_DCL_GLOBAL_FLAGS, IL_OP_RCP_VEC, IL_OP_LOAD_FPTR, IL_DCL_MAX_THREAD_PER_GROUP, IL_OP_PREFIX, // GDS IL_DCL_GDS, IL_DCL_STRUCT_GDS, IL_OP_GDS_LOAD, IL_OP_GDS_STORE, IL_OP_GDS_ADD, IL_OP_GDS_SUB, IL_OP_GDS_RSUB, B-6
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_OP_GDS_INC, IL_OP_GDS_DEC, IL_OP_GDS_MIN, IL_OP_GDS_MAX, IL_OP_GDS_UMIN, IL_OP_GDS_UMAX, IL_OP_GDS_AND, IL_OP_GDS_OR, IL_OP_GDS_XOR, IL_OP_GDS_MSKOR, IL_OP_GDS_CMP_STORE, IL_OP_GDS_READ_ADD, IL_OP_GDS_READ_SUB, IL_OP_GDS_READ_RSUB, IL_OP_GDS_READ_INC, IL_OP_GDS_READ_DEC, IL_OP_GDS_READ_MIN, IL_OP_GDS_READ_MAX, IL_OP_GDS_READ_UMIN, IL_OP_GDS_READ_UMAX, IL_OP_GDS_READ_AND, IL_OP_GDS_READ_OR, IL_OP_GDS_READ_XOR, IL_OP_GDS_READ_MSKOR, IL_OP_GDS_READ_XCHG, IL_OP_GDS_READ_CMP_XCHG, IL_OP_U_MAD24, IL_OP_U_MUL24, IL_OP_FMA, IL_OP_UAV_UINC, IL_OP_UAV_UDEC, IL_OP_I_MAD24, IL_OP_I_MUL24, IL_OP_UAV_READ_UINC, IL_OP_UAV_READ_UDEC, IL_OP_LDS_LOAD_BYTE, IL_OP_LDS_LOAD_SHORT, IL_OP_LDS_LOAD_UBYTE, IL_OP_LDS_LOAD_USHORT, IL_OP_LDS_STORE_BYTE, IL_OP_LDS_STORE_SHORT, IL_OP_UAV_BYTE_LOAD, IL_OP_UAV_SHORT_LOAD, IL_OP_UAV_UBYTE_LOAD, IL_OP_UAV_USHORT_LOAD, IL_OP_UAV_BYTE_STORE, IL_OP_UAV_SHORT_STORE, IL_OP_I64_ADD, IL_OP_I64_EQ, IL_OP_I64_GE, IL_OP_I64_LT, IL_OP_I64_MAX, IL_OP_I64_MIN, IL_OP_I64_NE, IL_OP_I64_NEGATE, IL_OP_I64_SHL, IL_OP_I64_SHR, IL_OP_U64_GE, IL_OP_U64_LT, IL_OP_U64_MAX, IL_OP_U64_MIN, IL_OP_U64_SHR, IL_OP_DCL_TYPED_UAV, IL_OP_DCL_TYPELESS_UAV, IL_OP_I_MUL24_HIGH, IL_OP_U_MUL24_HIGH, IL_OP_LDS_INC, IL_OP_LDS_DEC, IL_OP_LDS_READ_INC, IL_OP_LDS_READ_DEC, IL_OP_LDS_MSKOR, B-7
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_OP_LDS_READ_MSKOR, IL_OP_F_2_F16_NEAR, IL_OP_F_2_F16_NEG_INF, IL_OP_F_2_F16_PLUS_INF, IL_OP_I64_MUL, IL_OP_U64_MUL, IL_OP_LDEXP, IL_OP_FREXP_EXP, IL_OP_FREXP_MANT, IL_OP_D_FREXP_EXP, IL_OP_D_FREXP_MANT, IL_OP_DTOI, IL_OP_DTOU, IL_OP_ITOD, IL_OP_UTOD, IL_OP_FTOI_RPI, IL_OP_FTOI_FLR, IL_OP_MIN3, IL_OP_MED3, IL_OP_MAX3, IL_OP_I_MIN3, IL_OP_I_MED3, IL_OP_I_MAX3, IL_OP_U_MIN3, IL_OP_U_MED3, IL_OP_U_MAX3, IL_OP_CLASS, IL_OP_D_CLASS, IL_OP_SAMPLE_RETURN_CODE, IL_OP_CU_ID, IL_OP_WAVE_ID, IL_OP_I64_SUB, IL_OP_I64_DIV, IL_OP_U64_DIV, IL_OP_I64_MOD, IL_OP_U64_MOD, IL_DCL_GWS_THREAD_COUNT, IL_DCL_SEMAPHORE, IL_OP_SEMAPHORE_INIT, IL_OP_SEMAPHORE_SIGNAL, IL_OP_SEMAPHORE_WAIT, IL_OP_DIV_SCALE, IL_OP_DIV_FMAS, IL_OP_DIV_FIXUP, IL_OP_D_DIV_SCALE, IL_OP_D_DIV_FMAS, IL_OP_D_DIV_FIXUP, IL_OP_D_TRIG_PREOP, IL_OP_MSAD_U8, IL_OP_QSAD_U8, IL_OP_MQSAD_U8, IL_OP_LAST /* dimension the enumeration */ }; // \todo remove this define once sc is promoted to dxx and dx=>il has been updated. #define IL_OP_I_BIT_INSERT IL_OP_U_BIT_INSERT // comments in this enum will be added to the il spec enum ILRegType { IL_REGTYPE_CONST_BOOL, // single bit boolean constant IL_REGTYPE_CONST_FLOAT, IL_REGTYPE_CONST_INT, IL_REGTYPE_ADDR, IL_REGTYPE_TEMP, IL_REGTYPE_VERTEX, IL_REGTYPE_INDEX, IL_REGTYPE_OBJECT_INDEX, IL_REGTYPE_BARYCENTRIC_COORD, IL_REGTYPE_PRIMITIVE_INDEX, B-8
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_REGTYPE_QUAD_INDEX, IL_REGTYPE_VOUTPUT, IL_REGTYPE_PINPUT, IL_REGTYPE_SPRITE, IL_REGTYPE_POS, IL_REGTYPE_INTERP, IL_REGTYPE_FOG, IL_REGTYPE_TEXCOORD, IL_REGTYPE_PRICOLOR, IL_REGTYPE_SECCOLOR, IL_REGTYPE_SPRITECOORD, IL_REGTYPE_FACE, IL_REGTYPE_WINCOORD, IL_REGTYPE_PRIMCOORD, IL_REGTYPE_PRIMTYPE, IL_REGTYPE_PCOLOR, IL_REGTYPE_DEPTH, IL_REGTYPE_STENCIL, IL_REGTYPE_CLIP, IL_REGTYPE_VPRIM, IL_REGTYPE_ITEMP, IL_REGTYPE_CONST_BUFF, IL_REGTYPE_LITERAL, IL_REGTYPE_INPUT, IL_REGTYPE_OUTPUT, IL_REGTYPE_IMMED_CONST_BUFF, IL_REGTYPE_OMASK, IL_REGTYPE_PERSIST, IL_REGTYPE_GLOBAL, IL_REGTYPE_PS_OUT_FOG, IL_REGTYPE_SHARED_TEMP, IL_REGTYPE_THREAD_ID_IN_GROUP, // 3-dim IL_REGTYPE_THREAD_ID_IN_GROUP_FLAT, // 1-dim IL_REGTYPE_ABSOLUTE_THREAD_ID, // 3-dim IL_REGTYPE_ABSOLUTE_THREAD_ID_FLAT, // 1-dim IL_REGTYPE_THREAD_GROUP_ID, // 3-dim IL_REGTYPE_THREAD_GROUP_ID_FLAT, // 1-dim IL_REGTYPE_GENERIC_MEM, // generic memory type, used w/mask of lds_write IL_REGTYPE_INPUTCP, // tessellation, input control-point register IL_REGTYPE_PATCHCONST, // tessellation, patch constants IL_REGTYPE_DOMAINLOCATION, // domain shader, domain location IL_REGTYPE_OUTPUTCP, // tessellation, output control-point register IL_REGTYPE_OCP_ID, // tessellation, output control-point id IL_REGTYPE_SHADER_INSTANCE_ID, // tessellation hs fork/join instance id or gs instance id IL_REGTYPE_THIS, IL_REGTYPE_EDGEFLAG, //edge flag IL_REGTYPE_DEPTH_LE, //dx11 conservative depth guaranteed to be <= raster depth IL_REGTYPE_DEPTH_GE, //dx11 conservative depth guaranteed to be >= raster depth IL_REGTYPE_INPUT_COVERAGE_MASK, //dx11 ps input coverage mask IL_REGTYPE_TIMER, IL_REGTYPE_LINE_STIPPLE, // Evergreen+, anti-aliased line stipple IL_REGTYPE_INPUT_ARG, // macro processor input IL_REGTYPE_OUTPUT_ARG, // macro processor output IL_REGTYPE_LAST, // Must be < 64, we only have 6 bits for encoding the reg type // SoftIL requires IL_REGTYPE_LAST <= 63 // We cannot add any more IL_REGTYPE without using extended field }; enum ILMatrix { IL_MATRIX_4X4, IL_MATRIX_4X3, IL_MATRIX_3X4, IL_MATRIX_3X3, IL_MATRIX_3X2, IL_MATRIX_LAST
/* /* /* /* /*
None */ Divide the x component by y */ Divide the x and y components by z */ Divide the x, y, and z components by w */ Divide each component by the value of AS_**** */
enum ILZeroOp { IL_ZEROOP_FLTMAX, IL_ZEROOP_0, IL_ZEROOP_INFINITY, IL_ZEROOP_INF_ELSE_MAX, IL_ZEROOP_LAST /* dimension the enumeration */ }; enum ILModDstComponent { IL_MODCOMP_NOWRITE, // do not write this component IL_MODCOMP_WRITE, // write the result to this component IL_MODCOMP_0, // force the component to float 0.0 IL_MODCOMP_1, // force the component to float 1.0 IL_MODCOMP_LAST /* dimension the enumeration */ }; enum ILComponentSelect { IL_COMPSEL_X_R, //select the 1st component (x/red) for the channel. IL_COMPSEL_Y_G, //select the 2nd component (y/green) for the channel. IL_COMPSEL_Z_B, // select the 3rd component (z/blue) for the channel. IL_COMPSEL_W_A, //select the 4th component (w/alpha) for the channel. IL_COMPSEL_0, //Force this channel to 0.0 IL_COMPSEL_1, //Force this channel to 1.0 IL_COMPSEL_LAST /* dimension the enumeration */ }; enum ILShiftScale { IL_SHIFT_NONE = 0, IL_SHIFT_X2, IL_SHIFT_X4, IL_SHIFT_X8, IL_SHIFT_D2, IL_SHIFT_D4, IL_SHIFT_D8, IL_SHIFT_LAST }; enum ILRelOp { IL_RELOP_NE, IL_RELOP_EQ, IL_RELOP_GE, IL_RELOP_GT, IL_RELOP_LE, IL_RELOP_LT, IL_RELOP_LAST };
/* /* /* /* /* /* /*
enum ILDefaultVal { IL_DEFVAL_NONE = 0, IL_DEFVAL_0, IL_DEFVAL_1, IL_DEFVAL_LAST /* dimension the enumeration */ }; B-10
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
enum ILImportComponent { IL_IMPORTSEL_UNUSED = 0, IL_IMPORTSEL_DEFAULT0, IL_IMPORTSEL_DEFAULT1, IL_IMPORTSEL_UNDEFINED, IL_IMPORTSEL_LAST /* dimension the enum */ }; enum ILImportUsage { IL_IMPORTUSAGE_POS = 0, IL_IMPORTUSAGE_POINTSIZE, IL_IMPORTUSAGE_COLOR, IL_IMPORTUSAGE_BACKCOLOR, IL_IMPORTUSAGE_FOG, IL_IMPORTUSAGE_PIXEL_SAMPLE_COVERAGE, IL_IMPORTUSAGE_GENERIC, IL_IMPORTUSAGE_CLIPDISTANCE, IL_IMPORTUSAGE_CULLDISTANCE, IL_IMPORTUSAGE_PRIMITIVEID, IL_IMPORTUSAGE_VERTEXID, IL_IMPORTUSAGE_INSTANCEID, IL_IMPORTUSAGE_ISFRONTFACE, IL_IMPORTUSAGE_LOD, IL_IMPORTUSAGE_COLORING, IL_IMPORTUSAGE_NODE_COLORING, IL_IMPORTUSAGE_NORMAL, IL_IMPORTUSAGE_RENDERTARGET_ARRAY_INDEX, IL_IMPORTUSAGE_VIEWPORT_ARRAY_INDEX, IL_IMPORTUSAGE_UNDEFINED, IL_IMPORTUSAGE_SAMPLE_INDEX, IL_IMPORTUSAGE_EDGE_TESSFACTOR, IL_IMPORTUSAGE_INSIDE_TESSFACTOR, IL_IMPORTUSAGE_DETAIL_TESSFACTOR, IL_IMPORTUSAGE_DENSITY_TESSFACTOR, IL_IMPORTUSAGE_LAST /* dimension the enum */ }; enum ILCmpVal { IL_CMPVAL_0_0 = 0, IL_CMPVAL_0_5, IL_CMPVAL_1_0, IL_CMPVAL_NEG_0_5, IL_CMPVAL_NEG_1_0, IL_CMPVAL_LAST };
/* /* /* /* /* /*
compare vs. 0.0 */ compare vs. 0.5 */ compare vs. 1.0 */ compare vs. -0.5 */ compare vs. -1.0 */ dimension the enumeration */
// dependent upon this table: // - dx10interpreter.cpp - table that maps dx names to il // - iltables.cpp - il_pixtex_props_table enum ILPixTexUsage { // dx9 only il allows 0-7 values for dclpt IL_USAGE_PIXTEX_UNKNOWN = 0, IL_USAGE_PIXTEX_1D, IL_USAGE_PIXTEX_2D, IL_USAGE_PIXTEX_3D, IL_USAGE_PIXTEX_CUBEMAP, IL_USAGE_PIXTEX_2DMSAA, // dx10 formats after this point IL_USAGE_PIXTEX_4COMP, IL_USAGE_PIXTEX_BUFFER, IL_USAGE_PIXTEX_1DARRAY, IL_USAGE_PIXTEX_2DARRAY, IL_USAGE_PIXTEX_2DARRAYMSAA, IL_USAGE_PIXTEX_2D_PLUS_W, // Pele hardware feature IL_USAGE_PIXTEX_CUBEMAP_PLUS_W, // Pele hardware feature B-11
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
// dx 10.1 IL_USAGE_PIXTEX_CUBEMAP_ARRAY, IL_USAGE_PIXTEX_LAST /* dimension the enumeration */ }; enum ILTexCoordMode { IL_TEXCOORDMODE_UNKNOWN = 0, IL_TEXCOORDMODE_NORMALIZED, IL_TEXCOORDMODE_UNNORMALIZED, IL_TEXCOORDMODE_LAST /* dimension the enumeration */ }; enum ILElementFormat { IL_ELEMENTFORMAT_UNKNOWN = 0, IL_ELEMENTFORMAT_SNORM, IL_ELEMENTFORMAT_UNORM, IL_ELEMENTFORMAT_SINT, IL_ELEMENTFORMAT_UINT, IL_ELEMENTFORMAT_FLOAT, IL_ELEMENTFORMAT_SRGB, IL_ELEMENTFORMAT_MIXED, IL_ELEMENTFORMAT_LAST }; enum ILTexShadowMode { IL_TEXSHADOWMODE_NEVER = 0, IL_TEXSHADOWMODE_Z, IL_TEXSHADOWMODE_UNKNOWN, IL_TEXSHADOWMODE_LAST /* dimension the enumeration */ }; enum ILTexFilterMode { IL_TEXFILTER_UNKNOWN = 0, IL_TEXFILTER_POINT, IL_TEXFILTER_LINEAR, IL_TEXFILTER_ANISO, IL_TEXFILTER_LAST /* dimension the enumeration */ }; enum ILMipFilterMode { IL_MIPFILTER_UNKNOWN = 0, IL_MIPFILTER_POINT, IL_MIPFILTER_LINEAR, IL_MIPFILTER_BASE, IL_MIPFILTER_LAST /* dimension the enumeration */ }; enum ILAnisoFilterMode { IL_ANISOFILTER_UNKNOWN = 0, IL_ANISOFILTER_DISABLED, IL_ANISOFILTER_MAX_1_TO_1, IL_ANISOFILTER_MAX_2_TO_1, IL_ANISOFILTER_MAX_4_TO_1, IL_ANISOFILTER_MAX_8_TO_1, IL_ANISOFILTER_MAX_16_TO_1, IL_ANISOFILTER_LAST /* dimension the enumeration */ }; enum ILNoiseType { IL_NOISETYPE_PERLIN1D = 0, IL_NOISETYPE_PERLIN2D, IL_NOISETYPE_PERLIN3D, IL_NOISETYPE_PERLIN4D, IL_NOISETYPE_LAST /* dimension the enumeration */ B-12
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
}; enum ILAddressing { IL_ADDR_ABSOLUTE = 0, IL_ADDR_RELATIVE, IL_ADDR_REG_RELATIVE, IL_ADDR_LAST /* dimension the enumeration */ }; enum ILInterpMode { IL_INTERPMODE_NOTUSED = 0, IL_INTERPMODE_CONSTANT, IL_INTERPMODE_LINEAR, IL_INTERPMODE_LINEAR_CENTROID, IL_INTERPMODE_LINEAR_NOPERSPECTIVE, IL_INTERPMODE_LINEAR_NOPERSPECTIVE_CENTROID, IL_INTERPMODE_LINEAR_SAMPLE, IL_INTERPMODE_LINEAR_NOPERSPECTIVE_SAMPLE, IL_INTERPMODE_LAST /* dimension the enumeration */ }; // types for scatter enum IL_SCATTER { IL_SCATTER_BY_PIXEL, IL_SCATTER_BY_QUAD }; // types that can be input to a gemetry shader enum IL_TOPOLOGY { IL_TOPOLOGY_POINT, IL_TOPOLOGY_LINE, IL_TOPOLOGY_TRIANGLE, IL_TOPOLOGY_LINE_ADJ, IL_TOPOLOGY_TRIANGLE_ADJ, IL_TOPOLOGY_PATCH1, IL_TOPOLOGY_PATCH2, IL_TOPOLOGY_PATCH3, IL_TOPOLOGY_PATCH4, IL_TOPOLOGY_PATCH5, IL_TOPOLOGY_PATCH6, IL_TOPOLOGY_PATCH7, IL_TOPOLOGY_PATCH8, IL_TOPOLOGY_PATCH9, IL_TOPOLOGY_PATCH10, IL_TOPOLOGY_PATCH11, IL_TOPOLOGY_PATCH12, IL_TOPOLOGY_PATCH13, IL_TOPOLOGY_PATCH14, IL_TOPOLOGY_PATCH15, IL_TOPOLOGY_PATCH16, IL_TOPOLOGY_PATCH17, IL_TOPOLOGY_PATCH18, IL_TOPOLOGY_PATCH19, IL_TOPOLOGY_PATCH20, IL_TOPOLOGY_PATCH21, IL_TOPOLOGY_PATCH22, IL_TOPOLOGY_PATCH23, IL_TOPOLOGY_PATCH24, IL_TOPOLOGY_PATCH25, IL_TOPOLOGY_PATCH26, IL_TOPOLOGY_PATCH27, IL_TOPOLOGY_PATCH28, IL_TOPOLOGY_PATCH29, IL_TOPOLOGY_PATCH30, IL_TOPOLOGY_PATCH31, IL_TOPOLOGY_PATCH32, B-13
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_TOPOLOGY_LAST /* dimension the enumeration */ }; enum IL_OUTPUT_TOPOLOGY { IL_OUTPUT_TOPOLOGY_POINT_LIST, IL_OUTPUT_TOPOLOGY_LINE_STRIP, IL_OUTPUT_TOPOLOGY_TRIANGLE_STRIP, IL_OUTPUT_TOPOLOGY_LAST /* dimension the enumeration */ }; // for R7xx Compute shader enum IL_LDS_SHARING_MODE { IL_LDS_SHARING_MODE_RELATIVE = 0, // for wavefront_rel IL_LDS_SHARING_MODE_ABSOLUTE, // for wavefront_abs IL_LDS_SHARING_MODE_LAST }; // for OpenCL Arena UAV load/store and others enum IL_LOAD_STORE_DATA_SIZE { IL_LOAD_STORE_DATA_SIZE_DWORD = 0, // dword, 32 bits IL_LOAD_STORE_DATA_SIZE_SHORT, // short, 16 bits IL_LOAD_STORE_DATA_SIZE_BYTE, // byte, 8 bits IL_LOAD_STORE_DATA_SIZE_LAST }; enum IL_UAV_ACCESS_TYPE { IL_UAV_ACCESS_TYPE_RW = 0, IL_UAV_ACCESS_TYPE_RO, IL_UAV_ACCESS_TYPE_WO, IL_UAV_ACCESS_TYPE_PRIVATE, IL_UAV_ACCESS_TYPE_LAST }; // type of firstbit enum IL_FIRSTBIT_TYPE { IL_FIRSTBIT_TYPE_LOW_UINT, IL_FIRSTBIT_TYPE_HIGH_UINT, IL_FIRSTBIT_TYPE_HIGH_INT }; enum ILTsDomain { IL_TS_DOMAIN_ISOLINE = 0, IL_TS_DOMAIN_TRI = 1, IL_TS_DOMAIN_QUAD = 2, IL_TS_DOMAIN_LAST, }; enum ILTsPartition { IL_TS_PARTITION_INTEGER, IL_TS_PARTITION_POW2, IL_TS_PARTITION_FRACTIONAL_ODD, IL_TS_PARTITION_FRACTIONAL_EVEN, IL_TS_PARTITION_LAST, }; enum ILTsOutputPrimitive { IL_TS_OUTPUT_POINT, IL_TS_OUTPUT_LINE, IL_TS_OUTPUT_TRIANGLE_CW, IL_TS_OUTPUT_TRIANGLE_CCW, IL_TS_OUTPUT_LAST, };
// // // //
B-14
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
//Granularity of gradient enum IL_DERIV_GRANULARITY { IL_DERIVE_COARSE = 0, IL_DERIVE_FINE = 0x80 }; enum IL_IEEE_CONTROL { IL_IEEE_IGNORE = 0, IL_IEEE_PRECISE = 0x1 }; enum IL_UAV_READ_TYPE { IL_UAV_SC_DECIDE = 0, IL_UAV_FORCE_CACHED = 1, IL_UAV_FORCE_UNCACHED = 2 }; #ifdef __cplusplus }
B-15
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
B-16
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
IL_OP_CONTINUE IL_OP_LOD All integer opcodes. Fetch opcodes are not supported in vertex shaders.
C-1
Table C-1
IL_DCL_CONST_BUFFER IL_DCL_GDS IL_DCL_GLOBAL_FLAGS IL_DCL_INDEXED_TEMP_ARRAY IL_DCL_INPUT IL_DCL_INPUT_PRIMITIVE IL_DCL_LDS IL_DCL_LITERAL IL_DCL_MAX_OUTPUT_VERTEX_COUNT IL_DCL_MAX_TESSFACTOR IL_DCL_NUM_ICP IL_DCL_NUM_INSTANCES IL_DCL_NUM_OCP IL_DCL_ODEPTH IL_DCL_OUTPUT IL_DCL_OUTPUT_TOPOLOGY IL_DCL_RESOURCE IL_DCL_STREAM IL_DCL_STRUCT_GDS IL_DCL_STRUCT_LDS IL_DCL_TS_DOMAIN IL_DCL_TS_OUTPUT_PRIMITIVE
IL_OP_RESINFO IL_OP_SAMPLE IL_OP_SAMPLE_B IL_OP_SAMPLE_C IL_OP_SAMPLE_C_B IL_OP_SAMPLE_C_G IL_OP_SAMPLE_C_L IL_OP_SAMPLE_C_LZ IL_OP_SAMPLE_G IL_OP_SAMPLE_L
IL_OP_FETCH4_PO_C
IL_OP_GDS_ADD
IL_OP_GDS_AND IL_OP_GDS_CMP_STORE IL_OP_GDS_DEC IL_OP_GDS_INC IL_OP_GDS_MAX IL_OP_GDS_MIN IL_OP_GDS_MSKOR IL_OP_GDS_OR IL_OP_GDS_READ_ADD IL_OP_GDS_READ_AND IL_OP_GDS_READ_CMP_XCH G IL_OP_GDS_READ_DEC IL_OP_GDS_READ_INC IL_OP_GDS_READ_MAX IL_OP_GDS_READ_MIN IL_OP_GDS_READ_MSKOR IL_OP_GDS_READ_OR IL_OP_GDS_READ_RSUB IL_OP_GDS_READ_SUB
C-2
ShaderModel Restrictions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table C-1
IL_OP_LDS_ADD IL_OP_LDS_AND IL_OP_LDS_CMP IL_OP_LDS_DEC IL_OP_LDS_INC IL_DCL_TS_PARTITION IL_OP_LDS_LOAD IL_OP_LDS_LOAD_BYTE IL_OP_LDS_LOAD_SHORT IL_OP_LDS_LOAD_UBYTE IL_OP_LDS_LOAD_USHORT IL_DCL_VPRIM IL_OP_APPEND_BUF_ALLOC IL_OP_APPEND_BUF_CONSUME IL_OP_BUFINFO IL_OP_CUT IL_OP_CUT_STREAM IL_OP_LDS_LOAD_VEC IL_OP_LDS_MAX IL_OP_LDS_MIN IL_OP_LDS_OR IL_OP_LDS_READ_ADD IL_OP_LDS_READ_AND
IL_OP_UAV_LOAD
ShaderModel Restrictions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
C-3
Table C-1
IL_OP_DCL_ARENA_UAV
IL_OP_DCL_FUNCTION_BODY IL_OP_DCL_FUNCTION_TABLE IL_OP_DCL_INTERFACE_PTR IL_OP_DCL_LDS_SHARING_MODE IL_OP_DCL_LDS_SIZE_PER_THREAD IL_OP_DCL_NUM_THREAD_PER_GROUP IL_OP_DCL_RAW_SRV IL_OP_DCL_RAW_UAV IL_OP_DCL_SHARED_TEMP IL_OP_DCL_STRUCT_SRV IL_OP_DCL_STRUCT_UAV
IL_OP_LDS_READ_MAX IL_OP_LDS_READ_MIN IL_OP_LDS_READ_OR IL_OP_LDS_READ_RSUB IL_OP_LDS_READ_SUB IL_OP_LDS_READ_UMAX IL_OP_LDS_READ_UMIN IL_OP_LDS_READ_VEC IL_OP_LDS_READ_XCHG IL_OP_LDS_READ_XOR IL_OP_LDS_RSUB
IL_OP_UAV_READ_CMP_XCHG IL_OP_UAV_READ_MAX IL_OP_UAV_READ_MIN IL_OP_UAV_READ_OR IL_OP_UAV_READ_RSUB IL_OP_UAV_READ_SUB IL_OP_UAV_READ_UMAX IL_OP_UAV_READ_UMIN IL_OP_UAV_READ_XCHG IL_OP_UAV_READ_XOR IL_OP_UAV_RSUB IL_OP_UAV_STORE
IL_OP_DCL_TOTAL_NUM_THREAD_GROUP IL_OP_LDS_STORE IL_OP_LDS_STORE_BYTE IL_OP_LDS_STORE_SHORT IL_OP_DCL_UAV IL_OP_DISCARD_LOGICALNZ IL_OP_DISCARD_LOGICALZ IL_OP_LDS_STORE_VEC IL_OP_LDS_SUB IL_OP_LDS_UMAX
C-4
ShaderModel Restrictions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Table C-2
IL_REGTYPE_ABSOLUTE_THREAD_ID_FLAT IL_REGTYPE_OMASK IL_REGTYPE_CONST_BUFF IL_REGTYPE_DEPTH_GE IL_REGTYPE_DEPTH_LE IL_REGTYPE_DOMAINLOCATION IL_REGTYPE_GENERIC_MEM IL_REGTYPE_IMMED_CONST_BUFF IL_REGTYPE_INPUT IL_REGTYPE_INPUT_COVERAGE_MASK IL_REGTYPE_INPUTCP IL_REGTYPE_ITEMP IL_REGTYPE_LITERAL IL_REGTYPE_OUTPUT IL_REGTYPE_OUTPUTCP IL_REGTYPE_PATCHCONST IL_REGTYPE_SHADER_INSTANCE_ID IL_REGTYPE_SHARED_TEMP IL_REGTYPE_THIS IL_REGTYPE_THREAD_GROUP_ID IL_REGTYPE_THREAD_GROUP_ID_FLAT IL_REGTYPE_THREAD_ID_IN_GROUP IL_REGTYPE_THREAD_ID_IN_GROUP_FLAT IL_REGTYPE_TIMER
Table C-3
IL_OP_COLORCLAMP IL_OP_DCLVOUT IL_OP_DCLDEF IL_OP_DCLPI IL_OP_DCLPIN IL_OP_DCLPP IL_OP_DCLPT IL_OP_DCLV IL_OP_DEF IL_OP_DEFB IL_OP_DET IL_OP_DIST IL_OP_MEMEXPORT IL_OP_MEMIMPORT
Table C-4
ShaderModel Restrictions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
C-5
Table C-4
DCL_NUM_THREAD_PER_GROUP and DCL_MAX_THREAD_PER_GROUP cannot be used in the same program. DCL_STREAM, EMIT_STREAM, CUT_STREAM, EMIT_THEN_CUT_STREAM are not compatible with EMIT, CUT, EMIT_THEN_CUT. Scatter operations and UAV operations cannot be used in the same program. DCLPI and DCLV are not compitable with DCLPIN and DCLVOUT.
C-6
Instruction Restrictions
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Valid Devices Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for R6XX GPUs and later. Valid for R6XX GPUs and later. Valid for R6XX GPUs and later. Valid for R6XX GPUs and later. Valid for R6XX GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for 8XX GPUs and later. Valid for R6XX GPUs and later. Valid for all GPUs. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R6XX GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs.
Page Number
BREAK BREAKC BREAK_LOGICALNZ, BREAK_LOGICALZ CALL CALLNZ CALL_LOGICALNZ CALL_LOGICALZ CASE CONTINUE CONTINUEC CONTINUE_LOGICALNZ, CONTINUE_LOGICALZ DEFAULT ELSE END ENDFUNC ENDIF ENDLOOP ENDMAIN ENDPHASE ENDSWITCH FUNC HS_CP_PHASE HS_FORK_PHASE HS_JOIN_PHASE IF_LOGICALNZ, IF_LOGICALZ IFC IFNZ LOOP RET
page 7-13 page 7-13 page 7-14 page 7-14 page 7-15 page 7-16 page 7-17 page 7-18 page 7-18 page 7-19 page 7-19 page 7-19 page 7-20 page 7-21 page 7-21 page 7-22 page 7-22 page 7-23 page 7-23 page 7-24 page 7-24 page 7-25 page 7-25 page 7-26 page 7-26 page 7-27 page 7-27 page 7-28 page 7-29
D-1
Instruction Name
Valid Devices Valid for R600 GPUs and later. Valid for R6XX GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later.
Page Number
page 7-29 page 7-30 page 7-30 page 7-31 page 7-32 page 7-33
DCL_GLOBAL_FLAGS flag1 | flag2 | The refactoring_allowed parameter is valid for ... R6XX GPUs and later.
The force_early_depth_stencil parameter is valid for Evergreen GPUs and later. The enable_raw_structured_buffers parameter is valid for Evergreen GPUs and later. The enable_double_precision_float_ops parameter is valid for Evergreen GPUs and later.
Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid only for R7XX GPUs. Valid only for R7XX GPUs. Valid for R600 GPUs and later.
page 7-34 page 7-35 page 7-37 page 7-37 page 7-38 page 7-39 page 7-40 page 7-41 page 7-41 page 7-42 page 7-42 page 7-43 page 7-43 page 7-44 page 7-45 page 7-46 page 7-46 page 7-47 page 7-48 page 7-49 page 7-50 page 7-50 page 7-51 page 7-51 page 7-52 page 7-52 page 7-53
DCL_MAX_OUTPUT_VERTEX_CO Valid for R600 GPUs and later. UNT DCL_MAX_TESSFACTOR DCL_NUM_ICP DCL_NUM_INSTANCES DCL_NUM_OCP DCL_ODEPTH DCL_OUTPUT DCL_OUTPUT_TOPOLOGY DCL_PERSISTENT DCL_RESOURCE DCL_SHARED_TEMP DCL_STREAM
Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for R670 GPUs only. Valid for R600 and later; valid for shader model 4 (SM4) and later. Valid for R700 GPUs and later. Valid for Evergreen GPUs and later.
DCL_TOTAL_NUM_THREAD_GRO Valid for R7XX GPUs and later. UP DCL_TS_DOMAIN DCL_TS_OUTPUT_PRIMITIVE DCL_TS_PARTITION DCL_VPRIM DCLARRAY DCLDEF
Valid for Evergreen GPUs and later. Valid for RXX GPUs and later. Valid for Evergreen GPUs and later. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs.
D-2
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Instruction Name
Valid Devices Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for R600 GPUs and later. Valid for R7XX GPUs and later. Valid for all GPUs. Valid for Evergreen GPUs and later. Valid for R600 GPUs and later. Valid for Evergreen GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for Evergreen GPUs and later. Valid for R600 GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R7XX GPUs and later.
Page Number
DCLPI DCLPIN DCLPP DCLPT DCLV DCLVOUT DEF DEFB INIT_SHARED_REGISTERS INITV
Input/Output Instructions
page 7-54 page 7-56 page 7-58 page 7-59 page 7-60 page 7-62 page 7-64 page 7-65 page 7-66 page 7-67 page 7-68 page 7-68 page 7-69 page 7-69 page 7-70 page 7-70 page 7-71 page 7-71 page 7-72 page 7-72 page 7-73 page 7-74
BUFINFO CUT CUT_STREAM DISCARD_LOGICALNZ, DISCARD_LOGICALZ EMIT EMIT_STREAM EMIT_THEN_CUT EMIT_THEN_CUT_STREAM EVAL_CENTROID EVAL_SAMPLE_INDEX EVAL_SNAPPED FENCE FETCH4 FETCH4_PO_C FETCH4C FETCH4po KILL LDS_READ_VEC LDS_WRITE_VEC LOAD LOAD_FPTR LOD MEMEXPORT MEMIMPORT
The first syntax example is valid for R5XX GPUs and page 7-76 later. The second is valid for Evergreen GPUs and later. The second also supports indexing. Valid for Evergreen GPUs and later. The second syntax example supports indexing. Valid for Evergreen GPUs and later. The second syntax example supports indexing. Valid for Evergreen GPUs and later. The second syntax example supports indexing. Valid for all GPUs. Valid only for R7XX GPUs. Valid for R7XX GPUs only.
page 7-77 page 7-78 page 7-79 page 7-80 page 7-81 page 7-82
The first syntax example is valid for R600 GPUs and page 7-83 later. The second is valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R600 GPUs and later. Valid for R670 GPUs and later. Valid for R670 GPUs and later.
D-3
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Instruction Name
Valid Devices
Page Number
RESINFO SAMPLE
The first syntax example is valid for R600 GPUs and page 7-89 later. The second is valid for Evergreen GPUs and later. The first syntax example above is valid for R600 GPUs and later. The second is valid for Evergreen GPUs and later. The second syntax example supports indexing. The first syntax example above is valid for R600 GPUs and later. The second is valid for Evergreen GPUs and later. The second one also supports indexing.
page 7-90
SAMPLE_B
page 7-92
SAMPLE_C
The first syntax example is valid for R600 GPUs and page 7-93 later. The second is valid for Evergreen GPUs and later. The second also supports indexing. Source src0 is the index; source src1.x has the reference value. The first syntax example is valid for R600 GPUs and page 7-94 later. The second is valid for Evergreen GPUs and later. The second also supports indexing. The first syntax example is valid for R600 GPUs and page 7-95 later. The second is valid for Evergreen GPUs and later. The second also supports indexing. For R7xx and later GPUs, this instruction works on all resource types, other than buffers. For R6xx GPUs, this instruction works on non-array resource types, other than buffers. This instruction produces undefined results if it is used on an unsupported format. The first syntax example is valid for R600 GPUs and page 7-96 later. The second is valid for Evergreen GPUs and later. The second also supports indexing. The first syntax example is valid for R600 GPUs and page 7-97 later. The second is valid for Evergreen GPUs and later. The second also supports indexing. The first syntax example (not indexed) is valid for page 7-98 R600 GPUs and later. The second is valid for Evergreen GPUs and later. The second also supports indexing. The first syntax example is valid for R600 GPUs and page 7-99 later. The second is valid for Evergreen GPUs and later. The second also supports indexing. The first syntax example above is valid for R670 page 7-100 GPUs and later. The second syntax example above is valid for Evergreen GPUs and later. The first syntax example above is valid for R670 page 7-101 GPUs and later. The second syntax example above is valid for Evergreen GPUs and later. Valid for R520 GPUs only. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for R6XX GPUs and later.
SAMPLE_C_B SAMPLE_C_G
SAMPLE_L SAMPLEINFO
SAMPLEPOS
page 7-102 page 7-103 page 7-106 page 7-110 page 7-113
Instruction Name
Valid Devices Valid for all GPUs. Valid for Evergreen GPUs and later.
Page Number
TEXWEIGHT
Integer Arithmetic Instructions
I64_ADD I64EQ, I64GE, I64LT, I64NE I64MAX, I64MIN I64NEGATE I64SHL, I64SHR IADD IAND IBORROW ICARRY IEQ, IGE, ILT, INE IMAD IMAD24 IMAX, IMIN IMUL IMUL_HIGH IMUL24 IMUL24_HIGH INEGATE INOT IOR, IXOR ISHL, ISHR
Unsigned Integer Operations
All these instructions are valid for Evergreen GPUs page 7-117 and later. Both instructions are valid for Evergreen GPUs and page 7-118 later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. All these instructions are valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for Northern Islands GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for Northern Islands GPUs and later. Valid for Northern Islands GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. These instructions are valid for R600 GPUs and later. Valid for R600 GPUs and later.
page 7-119 page 7-120 page 7-121 page 7-121 page 7-122 page 7-122 page 7-123 page 7-124 page 7-124 page 7-126 page 7-126 page 7-127 page 7-127 page 7-128 page 7-128 page 7-129 page 7-130
Both instructions are valid for R600 GPUs and later. page 7-125
U64MAX, U64MIN UDIV U64GE, U64LT U64SHR UGE, ULT UMAD UMAD24 UMAX, UMIN UMOD UMUL UMUL_HIGH UMUL24
Both instructions are valid for Evergreen GPUs and page 7-131 later. Valid for R600 GPUs and later.
page 7-132
These instructions are valid for Evergreen GPUs and page 7-132 later. Valid for Evergreen GPUs and later. These instructions are valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for Evergreen GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for Evergreen GPUs and later.
page 7-133 page 7-133 page 7-134 page 7-134 page 7-136 page 7-136 page 7-137 page 7-137
Both instructions are valid for R600 GPUs and later. page 7-135
D-5
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Instruction Name
Valid Devices Valid for Evergreen GPUs and later. Valid for R600 GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for all GPUs that support double precision. Valid for all GPUs that support double precision.
Page Number
UMUL24_high USHR
Bit Operations
page 7-138 page 7-138 page 7-139 page 7-140 page 7-140 page 7-141 page 7-142 page 7-142 page 7-143 page 7-144
Valid for Evergreen GPUs and later with double-pre- page 7-144 cision floating-point. Valid for Evergreen GPUs and later with double-pre- page 7-145 cision floating-point. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for R600 GPUs and later. Valid for R6XX GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for R600 GPUs and later. Valid on all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs.
page 7-145 page 7-146 page 7-146 page 7-147 page 7-148 page 7-148 page 7-149 page 7-149 page 7-150 page 7-151 page 7-152 page 7-153 page 7-154 page 7-155 page 7-156 page 7-157 page 7-157 page 7-158 page 7-159 page 7-160 page 7-161 page 7-162 page 7-163 page 7-164 page 7-165
ABS ACOS ADD AND ASIN ATAN CLAMP CMOV CMOV_LOGICAL CMP COLORCLAMP COS COS_VEC CRS DET DIST DIV DP2 DP2ADD DP3 DP4
D-6
Instruction Name
Valid Devices Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs.
Page Number
DST DSX DSY DXSINCOS EXN EQ EXP EXP_VEC EXPP FACEFORWARD FLR FMA FRC FWIDTH GE INV_MOV LEN LIT LN LOG LOG_VEC LOGP LRP LT MAD MAX MIN MMUL MOD MOV MUL NE NOISE NRM PIREDUCE POW POWER RCP
page 7-166 page 7-167 page 7-168 page 7-169 page 7-170 page 7-170 page 7-171 page 7-171 page 7-172 page 7-172 page 7-173
Valid for Evergreen GPUs and later with double pre- page 7-173 cision capability. Valid for all GPUs. Valid for all GPUs. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for R6XX GPUs and later. Valid for R600 GPUs and later.
page 7-174 page 7-175 page 7-175 page 7-176 page 7-176 page 7-177 page 7-178 page 7-179 page 7-180 page 7-181 page 7-182 page 7-183 page 7-184
Valid for R600 GPUs and later that support the IEEE page 7-185 flag. Valid for R600 GPUs and later that support the IEEE page 7-186 flag. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs.
Valid for all GPUs. For R600 GPUs, IEEE flag was page 7-189 added. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs.
page 7-190 page 7-191 page 7-192 page 7-193 page 7-194 page 7-195 page 7-196
D-7
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Instruction Name
Valid Devices Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for R600 GPUs and later. Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for R600 GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs.
Page Number
REFLECT RND ROUND_NEAREST ROUND_NEG_INF ROUND_PLUS_INF ROUND_ZERO RSQ RSQ_VEC SET SGN SIN SINCOS SIN_VEC SQRT SQRT_VEC SUB TAN TRANSPOSE
Double-Precision Instructions
page 7-197 page 7-198 page 7-198 page 7-199 page 7-199 page 7-200 page 7-201 page 7-202 page 7-202 page 7-203 page 7-204 page 7-205 page 7-205 page 7-206 page 7-206 page 7-207 page 7-208 page 7-209
DADD D_DIV D_EQ D_FRAC D_FREXP D_GE D_LDEXP D_LT D_MAX D_MIN D_MOV D_MOVC D_MUL D_MULADD D_NE D_RCP
Valid for all GPUs that support double floating-point page 7-210 operations. Valid for all GPUs that support double floating-points. page 7-211 Valid for R670 GPUs and later that support double floating-points.
page 7-211
Valid for all GPUs that support double floating-points. page 7-212 Valid for all GPUs that support double floating-points. page 7-212 Valid for R670 GPUs and later.
page 7-213
Valid for all GPUs that support double floating-points. page 7-214 Valid for all GPUs from the R6XX series that support page 7-215 double floating-point. Valid for Evergreen GPUs and later that support dou- page 7-216 ble-precision floating-points. Valid for Evergreen GPUs and later that support dou- page 7-216 ble-precision floating-points. Valid for Evergreen GPUs and later that support dou- page 7-217 ble floating-point. Valid for Evergreen GPUs and later that support dou- page 7-217 ble floating-point. Valid for all GPUs that support double floating-points. page 7-218 Valid for all GPUs that support double floating-points. page 7-219 Valid for R670 GPUs and later that support double floating-points
page 7-219
Valid for Evergreen GPUs and later that support dou- page 7-220 ble floating-point.
D-8
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Instruction Name
Valid Devices
Page Number
D_RSQ D_SQRT
Multi-Media Instructions
Valid for Evergreen GPUs and later that support dou- page 7-221 ble floating-point. Valid for Evergreen GPUs and later that support dou- page 7-221 ble floating-points. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R7XX GPUs and later. Valid for R7XX GPUs and later. Valid for R700 GPUs and later. Valid for R7XX GPUs and later. Valid for R7XX GPUs and later. Valid for R700 GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R700 GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R700 GPUs and later.
BitAlign ByteAlign F_2_u4 SAD SAD_HI SAD4 U4LERP Unpack0 Unpack1 Unpack2 Unpack3
Miscellaneous Special Instructions
page 7-222 page 7-222 page 7-223 page 7-223 page 7-224 page 7-224 page 7-225 page 7-225 page 7-226 page 7-226 page 7-227 page 7-228 page 7-228 page 7-229 page 7-230 page 7-232 page 7-232 page 7-233 page 7-233 page 7-234 page 7-235 page 7-236 page 7-237 page 7-238 page 7-238 page 7-239 page 7-239 page 7-240 page 7-240 page 7-241 page 7-241 page 7-242 page 7-243
D-9
COMMENT NOP APPEND_BUF_ALLOC APPEND_BUF_CONSUME DCL_ARENA_UAV DCL_LDS DCL_RAW_SRV DCL_RAW_UAV DCL_STRUCT_LDS DCL_STRUCT_SRV DCL_STRUCT_UAV DCL_UAV
LDS Instructions
Valid only for Evergreen and Northern Islands GPUs. page 7-231
LDS_ADD LDS_AND LDS_CMP LDS_DEC LDS_INC LDS_LOAD LDS_LOAD_BYTE LDS_LOAD_SHORT LDS_LOAD_UBYTE LDS_LOAD_USHORT LDS_LOAD_VEC
Instruction Name
Valid Devices Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R700 GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R700 GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R7XX GPUs and later. Valid for R7XX GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later.
Page Number
LDS_MAX LDS_MIN LDS_MSKOR LDS_OR LDS_READ_ADD LDS_READ_AND LDS_READ_CMP_XCHG LDS_READ_MAX LDS_READ_MIN LDS_READ_OR LDS_READ_RSUB LDS_READ_SUB LDS_READ_UMAX LDS_READ_UMIN LDS_READ_XCHG LDS_READ_XOR LDS_RSUB LDS_STORE LDS_STORE_BYTE LDS_STORE_SHORT LDS_STORE_VEC LDS_SUB LDS_UMAX LDS_UMIN LDS_XOR SRV_RAW_LOAD SRV_STRUCT_LOAD UAV_ADD UAV_AND UAV_ARENA_LOAD UAV_ARENA_STORE UAV_CMP UAV_LOAD UAV_MAX UAV_MIN UAV_OR UAV_RAW_LOAD UAV_RAW_STORE UAV_READ_ADD UAV_READ_AND
page 7-244 page 7-244 page 7-245 page 7-245 page 7-246 page 7-247 page 7-248 page 7-249 page 7-250 page 7-251 page 7-252 page 7-253 page 7-254 page 7-255 page 7-256 page 7-257 page 7-257 page 7-258 page 7-258 page 7-259 page 7-260 page 7-261 page 7-261 page 7-262 page 7-262 page 7-263 page 7-264 page 7-265 page 7-266
Valid only for Evergreen and Northern Islands GPUs. page 7-267 Valid only for Evergreen and Northern Islands GPUs. page 7-268 Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R700 GPUs and later. Valid for R700 GPUs and later. For R7XX GPUs, only a single UAV is allowed. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later.
page 7-269 page 7-270 page 7-271 page 7-272 page 7-273 page 7-274 page 7-275 page 7-276 page 7-277
D-10
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Instruction Name
Valid Devices Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for R700 GPUs and later. Valid for R7XX GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later.
Page Number
UAV_READ_CMP_XCHG UAV_READ_MAX UAV_READ_MIN UAV_READ_OR UAV_READ_RSUB UAV_READ_SUB UAV_READ_UDEC UAV_READ_UINC UAV_READ_UMAX UAV_READ_UMIN UAV_READ_XCHG UAV_READ_XOR UAV_RSUB UAV_STORE UAV_STRUCT_LOAD UAV_STRUCT_STORE UAV_SUB UAV_UDEC UAV_UINC UAV_UMAX UAV_UMIN UAV_XOR
GDS Instructions
page 7-278 page 7-279 page 7-280 page 7-281 page 7-282 page 7-283 page 7-284 page 7-285 page 7-286 page 7-287 page 7-288 page 7-289 page 7-290 page 7-291 page 7-292 page 7-293 page 7-294 page 7-295 page 7-296 page 7-297 page 7-298 page 7-299 page 7-300 page 7-301 page 7-302 page 7-303 page 7-304 page 7-305 page 7-306 page 7-307 page 7-307 page 7-308 page 7-309 page 7-310 page 7-311 page 7-312 page 7-313 page 7-314 page 7-315 page 7-316
D-11
DCL_GDS DCL_STRUCT_GDS GDS_ADD GDS_AND GDS_CMP_STORE GDS_DEC GDS_INC GDS_LOAD GDS_MAX GDS_MIN GDS_MSKOR GDS_OR GDS_READ_ADD GDS_READ_AND GDS_READ_CMP_XCHG GDS_READ_DEC GDS_READ_INC GDS_READ_MAX
Instruction Name
Valid Devices Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for Evergreen GPUs and later. Valid for all GPUs. Valid for all GPUs. Valid for all GPUs.
Page Number
GDS_READ_MIN GDS_READ_MSKOR GDS_READ_OR GDS_READ_RSUB GDS_READ_SUB GDS_READ_UMAX GDS_READ_UMIN GDS_READ_XOR GDS_RSUB GDS_STORE GDS_SUB GDS_UMAX GDS_UMIN GDS_XOR
Virtual Function/Interface Support
page 7-317 page 7-318 page 7-319 page 7-320 page 7-321 page 7-322 page 7-323 page 7-324 page 7-325 page 7-326 page 7-327 page 7-328 page 7-329 page 7-330 page 7-331 page 7-331 page 7-332 page 7-333 page 7-336 page 7-337 page 7-338
D-12
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Glossary of Terms
Term * <> [1,2) [1,2] {BUF, SWIZ} {x | y} 0.0 0x 1011b 29b0 7:4 ABI absolute active mask
Description Any number of alphanumeric characters in the name of a microcode format, microcode parameter, or instruction. Angle brackets denote streams. A range that includes the left-most value (in this case, 1) but excludes the right-most value (in this case, 2). A range that includes both the left-most and right-most values (in this case, 1 and 2). One of the multiple options listed. In this case, the string BUF or the string SWIZ. One of the multiple options listed. In this case, x or y. A single-precision (32-bit) floating-point value. Indicates that the following is a hexadecimal number. A binary value, in this example a 4-bit value. 29 bits with the value 0. A bit range, from bit 7 to 4, inclusive. The high-order bit is shown first. Application Binary Interface. A displacement that references the base of a code segment, rather than an instruction pointer. See relative. A 1-bit-per-pixel mask that controls which pixels in a quad are really running. Some pixels might not be running if the current primitive does not cover the whole quad. A mask can be updated with a PRED_SET* ALU instruction, but updates do not take effect until the end of the ALU clause. A stack that contains only addresses (no other state). Used for flow control. Popping the address stack overrides the instruction address field of a flow control instruction. The address stack is only modified if the flow control instruction decides to jump. AMD Core Math Library. Includes implementations of the full BLAS and LAPACK routines, FFT, Math transcendental and Random Number Generator routines, stream processing backend for load balancing of computations between the CPU and GPU compute device. Loop register. A three-component vector (x, y and z) used to count iterations of a loop. To reserve storage space for data in an output buffer (scratch buffer, ring buffer, stream buffer, or reduction buffer) or for data in an input buffer (scratch buffer or ring buffer) before exporting (writing) or importing (reading) data or addresses to, or from that buffer. Space is allocated only for data, not for addresses. After allocating space in a buffer, an export operation can be done.
address stack
ACML
Glossary-1
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Term ALU
Description Arithmetic Logic Unit. Responsible for arithmetic operations like addition, subtraction, multiplication, division, and bit manipulation on integer and floating point values. In stream computing, these are known as stream cores. ALU.[X,Y,Z,W] - an ALU that can perform four vector operations in which the four operands (integers or single-precision floating point values) do not have to be related. It performs SIMD operations. Thus, although the four operands need not be related, all four operations execute the same instruction. ALU.Trans - (not relevant on HD 6900 and later devices) An ALU unit that can perform one ALU.Trans (transcendental, scalar) operation, or advanced integer operation, on one integer or single-precision floating-point value, and replicate the result. A single instruction can co-issue four ALU.Trans operations to an ALU.[X,Y,Z,W] unit and one (possibly complex) operation to an ALU.Trans unit, which can then replicate its result across all four component being operated on in the associated ALU.[X,Y,Z,W] unit. A performance profiling tool for developing, debugging, and profiling stream kernels using high-level stream computing languages. Address register. A complete software development suite for developing applications for AMD Accelerated Parallel Processing compute devices. Currently, the ATI Stream SDK includes OpenCL and CAL. Absolute work-item ID (formerly thread ID). It is the ordinal count of all work-items being executed (in a draw call). A bit, as in 1Mb for one megabit, or lsb for least-significant bit. A byte, as in 1MB for one megabyte, or LSB for least-significant byte. Basic Linear Algebra Subroutines. Four 32-bit floating-point numbers (XYZW) specifying the border color. The number of work-items executed during a branch. For AMD GPUs, branch granularity is equal to wavefront granularity. The limited write combining ability. See write combining. Eight bits. A read-only or write-only on-chip or off-chip storage space. Compute Abstraction Layer. A device-driver library that provides a forward-compatible interface to AMD Accelerated Parallel Processing compute devices. This lower-level API gives users direct control over the hardware: they can directly open devices, allocate memory resources, transfer data and initiate kernel execution. CAL also provides a JIT compiler for AMD IL. Control Flow. Constant file or constant register. A component in a vector. To hold within a stated range. A group of instructions that are of the same type (all stream core, all fetch, etc.) executed as a group. A clause is part of a CAL program written using the compute device ISA. Executed without pre-emption.
aTid b B BLAS border color branch granularity burst mode byte cache CAL
Glossary-2
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Description The total number of slots required for an stream core clause. Temporary values stored at GPR that do not need to be preserved past the end of a clause. To write a bit-value of 0. Compare set. A value written by the host processor directly to the GPU compute device. The commands contain information that is not typically part of an application program, such as setting configuration registers, specifying the data domain on which to operate, and initiating the start of data processing. A logic block in the R700 (HD4000-family of devices) that receives host commands, interprets them, and performs the operations they indicate. (1) A 32-bit piece of data in a vector. (2) A 32-bit piece of data in an array. (3) One of four data items in a 4-component register. A parallel processor capable of executing multiple work-items of a kernel in order to process streams of data. Similar to a pixel shader, but exposes data sharing and synchronization. Similar to a pixel shader, but exposes data sharing and synchronization.
compute unit pipeline A hardware block consisting of five stream cores, one stream core instruction decoder and issuer, one stream core constant fetcher, and support logic. All parts of a compute unit pipeline receive the same instruction and operate on different data elements. Also known as slice. constant buffer Off-chip memory that contains constants. A constant buffer can hold up to 1024 fourcomponent vectors. There are fifteen constant buffers, referenced as cb0 to cb14. An immediate constant buffer is similar to a constant buffer. However, an immediate constant buffer is defined within a kernel using special instructions. There are fifteen immediate constant buffers, referenced as icb0 to icb14. A constant cache is a hardware object (off-chip memory) used to hold data that remains unchanged for the duration of a kernel (constants). Constant cache is a general term used to describe constant registers, constant buffers or immediate constant buffers. Same as constant register. Same as AR register. On-chip registers that contain constants. The registers are organized as four 32-bit component of a vector. There are 256 such registers, each one 128-bits wide. Relative addressing of a constant file. See waterfalling. A representation of the state of a CAL device. See engine clock. The clock at which the GPU compute device stream core runs. Central Processing Unit. Also called host. Responsible for executing the operating system and the main part of the application. The CPU provides data and instructions to the GPU compute device. Constant registers. There are 512 CRs, each one 128 bits wide, organized as four 32bit values.
constant cache
constant file constant index register constant registers constant waterfalling context core clock CPU
CRs
Glossary-3
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Description Compute shader; commonly referred to as a compute kernel. A shader type, analogous to VS/PS/GS/ES. Close-to-Metal. A thin, HW/SW interface layer. This was the predecessor of the AMD CAL. Data Copy Shader. A device is an entire AMD Accelerated Parallel Processing compute device. Direct-memory access. Also called DMA engine. Responsible for independently transferring data to, and from, the GPU compute devices local memory. This allows other computations to occur in parallel, increasing overall system performance. Dword. Two words, or four bytes, or 32 bits. Eight words, or 16 bytes, or 128 bits. Also called octword. A specified rectangular region of the output buffer to which work-items are mapped. Data-Parallel Processor. The X slot of an destination operand. Double word. Two words, or four bytes, or 32 bits. A component in a vector. The clock driving the stream core and memory fetch units on the GPU compute device. A seven-bit field that specifies an enumerated set of decimal values (in this case, a set of up to 27 values). The valid values can begin at a value greater than, or equal to, zero; and the number of valid values can be less than, or equal to, the maximum supported by the field. A token sent through a pipeline that can be used to enforce synchronization, flush caches, and report status back to the host application. To write data from GPRs to an output buffer (scratch, ring, stream, frame or global buffer, or to a register), or to read data from an input buffer (a scratch buffer or ring buffer) to GPRs. The term export is a partial misnomer because it performs both input and output functions. Prior to exporting, an allocation operation must be performed to reserve space in the associated buffer. Flow control. Fast Fourier Transform. A bit that is modified by a CF or stream core operation and that can affect subsequent operations. Floating Point Operation. To writeback and invalidate cache data. Fused multiply add. A single two-dimensional screenful of data, or the storage space required for it. Off-chip memory that stores a frame. Sometimes refers to the all of the GPU memory (excluding local memory and caches).
double word double quad word domain of execution DPP dst.X dword element engine clock enum(7)
event export
Glossary-4
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Term FS
Description Fetch subroutine. A global program for fetching vertex data. It can be called by a vertex shader (VS), and it runs in the same work-item context as the vertex program, and thus is treated for execution purposes as part of the vertex program. The FS provides driver independence between the process of fetching data required by a VS, and the VS itself. This includes having a semantic connection between the outputs of the fetch process and the inputs of the VS. A subprogram called by the main program or another function within an AMD IL stream. Functions are delineated by FUNC and ENDFUNC. Reading from arbitrary memory locations by a work-item.
Input streams are treated as a memory array, and data elements are addressed directly.
GPU memory space containing the arbitrary address locations to which uncached kernel outputs are written. Can be read either cached or uncached. When read in uncached mode, it is known as mem-import. Allows applications the flexibility to read from and write to arbitrary locations in input buffers and output buffers, respectively. Memory for reads/writes between work-items. On HD Radeon 5XXX series devices and later, atomic operations can be used to synchronize memory operations. General-purpose compute device. A GPU compute device that performs general-purpose calculations. General-purpose register. GPRs hold vectors of either four 32-bit IEEE floating-point, or four 8-, 16-, or 32-bit signed or unsigned integer or two 64-bit IEEE double precision data components (values). These registers can be indexed, and consist of an on-chip part and an off-chip part, called the scratch buffer, in memory. Graphics Processing Unit. An integrated circuit that renders and displays graphical images on a monitor. Also called Graphics Hardware, Compute Device, and Data Parallel Processor. Also called 3D engine speed.
GPU
GPU compute device A parallel processor capable of executing multiple work-items of a kernel in order to process streams of data. GS HAL host iff IL Geometry Shader. Hardware Abstraction Layer. Also called CPU. If and only if. Intermediate Language. In this manual, the AMD version: AMD IL. A pseudo-assembly language that can be used to describe kernels for GPU compute devices. AMD IL is designed for efficient generalization of GPU compute device instructions so that programs can run on a variety of platforms without having to be rewritten for each platform. A work-item currently being processed. A computing function specified by the code field of an IL_OpCode token. Compare opcode, operation, and instruction packet. A group of tokens starting with an IL_OpCode token that represent a single AMD IL instruction.
Glossary-5
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Description A 2-bit field that specifies an integer value. Instruction Set Architecture. The complete specification of the interface between computer programs and the underlying computer hardware. A memory area containing waterfall (off-chip) constants. The cache lines of these constants can be locked. The constant registers are the 256 on-chip constants. A user-developed program that is run repeatedly on a stream of data. A parallel function that operates on every element of input streams. A device program is one type of kernel. Unless otherwise specified, an AMD Accelerated Parallel Processing compute device program is a kernel composed of a main program and zero or more functions. Also called Shader Program. This is not to be confused with an OS kernel, which controls hardware. Linear Algebra Package. Local Data Share. Part of local memory. These are read/write registers that support sharing between all work-items in a work-group. Synchronization is required. Linear Interpolation. Dedicated hardware that a) processes fetch instructions, b) requests data from the memory controller, and c) loads registers with data returned from the cache. They are run at stream core or engine clock speeds. Formerly called texture units. Level Of Detail. A register initialized by software and incremented by hardware on each iteration of a loop. Least-significant bit. Least-significant byte. Multiply-Add. A fused instruction that both multiplies and adds. (1) To prevent from being seen or acted upon. (2) A field of bits used for a control purpose. Must be zero. An AMD IL term random writes to the global buffer. Uncached reads from the global buffer. The clock driving the memory chips on the GPU compute device. An encoding format whose fields specify instructions and associated parameters. Microcode formats are used in sets of two or four. For example, the two mnemonics, CF_WORD[0,1] indicate a microcode-format pair, CF_WORD0 and CF_WORD1. Multiple Instruction Multiple Data. Multiple SIMD units operating in parallel (Multi-Processor System) Distributed or shared memory Multiple Render Target. One of multiple areas of local GPU compute device memory, such as a frame buffer, to which a graphics pipeline writes data. Multi-Sample Anti-Aliasing. Most-significant bit.
LAPACK LDS LERP local memory fetch units LOD loop index lsb LSB MAD mask MBZ mem-export mem-import memory clock microcode format
MIMD
Glossary-6
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Description Most-significant byte. A group of four work-items in the same wavefront that have consecutive work-item IDs (Tid). The first Tid must be a multiple of four. For example, work-items with Tid = 0, 1, 2, and 3 form a neighborhood, as do work-items with Tid = 12, 13, 14, and 15. A numeric value in the range [a, b] that has been converted to a range of 0.0 to 1.0 using the formula: normalized value = value/ (ba+ 1) Eight words, or 16 bytes, or 128 bits. Same as double quad word. Also referred to as octa word. The numeric value of the code field of an instruction. A 32-bit value that describes the operation of an instruction. The function performed by an instruction. Parameter Cache. A high-speed computer expansion card interface used by modern graphics cards, GPU compute devices and other peripherals needing high data transfer rates. Unlike previous expansion interfaces, PCI Express is structured around point-to-point links. Also called PCIe. Position Cache. Write stack entries to their associated hardware-maintained control-flow state. The POP_COUNT field of the CF_WORD1 microcode format specifies the number of stack entries to pop for instructions that pop the stack. Compare push. The act of temporarily interrupting a task being carried out on a computer system, without requiring its cooperation, with the intention of resuming the task at a later time. Unless otherwise stated, the AMD Accelerated Parallel Processing compute device. Unless otherwise specified, a program is a set of instructions that can run on the AMD Accelerated Parallel Processing compute device. A device program is a type of kernel. Pixel Shader, aka pixel kernel. Read hardware-maintained control-flow state and write their contents onto the stack. Compare pop. Previous vector register. It contains the previous four-component vector result from a ALU.[X,Y,Z,W] unit within a given clause. For a compute kernel, this consists of four consecutive work-items. For pixel and other shaders, this is a group of 2x2 work-items in the NDRange. Always processed together. The process of mapping work-items from the domain of execution to the SIMD engine. This term is a carryover from graphics, where it refers to the process of turning geometry, such as triangles, into pixels. The order of the work-item mapping generated by rasterization. Random Access Target. Same as UAV. Allows, on DX11 hardware, writes to, and reads from, any arbitrary location in a buffer. Ring Buffer.
normalized oct word opcode opcode token operation PaC PCI Express
PoC pop
Glossary-7
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Description For a GPU, this is a 128-bit address mapped memory space consisting of four 32-bit components. Referencing with a displacement (also called offset) from an index register or the loop index, rather than from the base address of a program (the first control flow [CF] instruction). The hardware units in a processing element responsible for writing the results of a kernel to output streams by writing the results to an output cache and transferring the cache data to memory. A block of memory used for input to, or output from, a kernel. An on-chip buffer that indexes itself automatically in a circle. Reserved. A structure that contains information necessary to access data in a resource. Also called Fetch Unit. Shader Compiler. A single data component, unlike a vector which contains a set of two or more data elements. Writes (by uncached memory) to arbitrary locations. Kernel outputs to arbitrary address locations. Must be uncached. Must be made to a memory space known as the global buffer. A variable-sized space in off-chip-memory that stores some of the GPRs. To write a bit-value of 1. Compare clear. Pre-OpenCL term that is now deprecated. Also called thread processor. User developed program. Also called kernel. Pre-OpenCL term that is now deprecated. Single instruction multiple data unit. Each SIMD receives independent stream core instructions. Each SIMD applies the instructions to multiple data elements. Now called a compute unit. Pre-OpenCL term that is now deprecated. A collection of thread processors, each of which executes the same instruction each cycle. In OpenCL terminology: compute unit pipeline. Pre-OpenCL term that is now deprecated. A hardware block consisting of five stream cores, one stream core instruction decoder and issuer, one stream core constant fetcher, and support logic. All parts of a SIMD pipeline receive the same instruction and operate on different data elements. Also known as slice. Input, output, fetch, stream core, and control flow per SIMD engine. A position, in an instruction group, for an instruction or an associated literal constant. An ALU instruction group consists of one to seven slots, each 64 bits wide. All ALU instructions occupy one slot, except double-precision floating-point instructions, which occupy either two or four slots. The size of an ALU clause is the total number of slots required for the clause. Shader processing unit.
resource ring buffer Rsvd sampler SC scalar scatter scatter write scratch buffer set shader processor shader program SIMD
SPU
Glossary-8
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Term SR
Description Globally shared registers. These are read/write registers that support sharing between all wavefronts in a SIMD (not a work-group). The sharing is column sharing, so workitems with the same work-item ID within the wavefront can share data. All operations on SR are atomic. In floating-point operation syntax, a 32-bit source operand. Src0_64 is a 64-bit source operand. A sampler and resource pair. A collection of data elements of the same type that can be operated on in parallel. A variable-sized space in off-chip memory that stores an instruction stream. It is an output-only buffer, configured by the host processor. It does not store inputs from off-chip memory to the processor. The fundamental, programmable computational units, responsible for performing integer, single, precision floating point, double precision floating point, and transcendental operations. They execute VLIW instructions for a particular work-item. Each processing element handles a single instruction within the VLIW instruction. A node that can restructure data. To copy or move any component in a source vector to any element-position in a destination vector. Accessing elements in any combination. Pre-OpenCL term that is now deprecated. One invocation of a kernel corresponding to a single element in the domain of execution. An instance of execution of a shader program on an ALU. Each thread has its own data; multiple threads can share a single program counter. Generally, in OpenCL terms, there is a one-to-one mapping of workitems to threads. Pre-OpenCL term that is now deprecated. It contains one or more thread blocks. Threads in the same thread-group but different thread-blocks might communicate to each through global per-SIMD shared memory. This is a concept mainly for global data share (GDS). A thread group can contain one or more wavefronts, the last of which can be a partial wavefront. All wavefronts in a thread group can run on only one SIMD engine; however, multiple thread groups can share a SIMD engine, if there are enough resources. Generally, in OpenCL terms, there is a one-to-one mapping of work-groups to thread groups. Pre-OpenCL term that is now deprecated. The hardware units in a SIMD engine responsible for executing the threads of a kernel. It executes the same instruction per cycle. Each thread processor contains multiple stream cores. Also called shader processor. Pre-OpenCL term that is now deprecated. A group of threads which might communicate to each other through local per SIMD shared memory. It can contain one or more wavefronts (the last wavefront can be a partial wavefront). A thread-block (all its wavefronts) can only run on one SIMD engine. However, multiple thread blocks can share a SIMD engine, if there are enough resources to fit them in. Work-item id (previously called a thread id) within a thread block. An integer number from 0 to Num_threads_per_block-1 A 32-bit value that represents an independent part of a stream or instruction. Unordered Access View. Same as random access target (RAT). They allow compute shaders to store results in (or write results to) a buffer at any arbitrary location. On DX11 hardware, UAVs can be created from buffers and textures. On DX10 hardware, UAVs cannot be created from typed resources (textures).
stream core
thread group
thread processor
thread-block
Glossary-9
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Description The hardware units in a GPU compute device responsible for handling uncached read or write requests from local memory on the GPU compute device. (1) A set of up to four related values of the same data type, each of which is an element. For example, a vector with four elements is known as a 4-vector and a vector with three elements is known as a 3-vector. (2) See AR. (3) See ALU.[X,Y,Z,W]. Very Long Instruction Word. Co-issued up to 6 operations (5 stream cores + 1 FC); where FC = flow control. 1.25 Machine Scalar operation per clock for each of 64 data elements Independent scalar source and destination addressing Work-item ID (formerly thread ID) within a work-group. To use the address register (AR) for indexing the GPRs. Waterfall behavior is determined by a configuration registers. Group of work-items executed together on a single SIMD engine. Composed of quads. A full wavefront contains 64 work-items; a wavefront with fewer than 64 work-items is called a partial wavefront. Wavefronts that have fewer than a full set of work-items are called partial wavefronts. For the HD4000-family of devices, there are 64. 32, 16 workitems in a full wavefront. Work-items within a wavefront execute in lockstep. Combining several smaller writes to memory into a single larger write to minimize any overhead associated with write commands.
VLIW design
write combining
Glossary-10
Copyright 2011 Advanced Micro Devices, Inc. All rights reserved.
Index
Symbols _abs definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 source modifier . . . . . . . . . . . . . . . . . . . . . 3-4 _bias definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 source modifier . . . . . . . . . . . . . . . . . . . . . 3-4 _bx2 definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 source modifier . . . . . . . . . . . . . . . . . . . . . 3-4 _ctrlspec . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 _divcomp definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 source modifier . . . . . . . . . . . . . . . . . . . . . 3-4 _invert definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 source modifier . . . . . . . . . . . . . . . . . . . . . 3-4 _neg definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 source modifier . . . . . . . . . . . . . . . . . . . . . 3-4 _prec format . . . . . . . . . . . . . . . . . . . . . . . . 7-12 _precmask format . . . . . . . . . . . . . . . . . . . . 7-12 _rst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2, 7-6 _sign definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 source modifier . . . . . . . . . . . . . . . . . . . . . 3-4 _x2 definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 source modifier . . . . . . . . . . . . . . . . . . . . . 3-4 Numerics 32-bit token . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 A abs definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-9 modifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 ABS instruction . . . . . . . . . . . . . . . . . . . . . 7-148 absolute address mode . . . . . . . . . . . . . . . 5-22 absolute work-item ID. . . . . . . . . . . . . . . . . . 5-7
ABSOLUTE_THREAD_ID register type . . . . . . . . . . . . . . . . . . . . . . . 5-7 ABSOLUTE_THREAD_ID_FLATTENED register type . . . . . . . . . . . . . . . . . . . . . . . 5-7 access mode wavefront . . . . . . . . . . . . . . . . 1-2 ACOS instruction . . . . . . . . . . . . . . . . . . . 7-148 ADD instruction. . . . . . . . . . . . . . . . . . . . . 7-149 ADDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 address base relative . . . . . . . . . . . . . . . . . . . . . . . 4-1 loop relative . . . . . . . . . . . . . 4-1, 5-24, 5-28 out-of-range instruction . . . . . . . . . . . . . . . . . . . . . . 7-10 persistent memory . . . . . . . . . . . . . . . . . 5-18 work-group . . . . . . . . . . . . . . . . . . . . . . . . 7-7 address mode absolute . . . . . . . . . . . . . . . . . . . . . . . . . 5-22 AND instruction. . . . . . . . . . . . . . . . . . . . . 7-149 APPEND_BUF_ALLOC instruction . . . . . 7-229 APPEND_BUF_CONSUME instruction . . 7-230 application code initializing . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 arena modifier. . . . . . . . . . . . . . . . . . . . . . . . . . 7-10 UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10 argument resource-index . . . . . . . . . . . . . . . . . . . . . 7-3 sampler-index . . . . . . . . . . . . . . . . . . . . . . 7-3 arithmetic instruction notes. . . . . . . . . . . . . . 7-5 arithmetic operations IL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 ASIN instruction . . . . . . . . . . . . . . . . . . . . 7-150 ATAN instruction. . . . . . . . . . . . . . . . . . . . 7-151 aTid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 atomic operations UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 B barrier memory. . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 BARYCENTRIC_COORD register type . . . . . . . . . . . . . . . . . . . . . . . 5-7 base relative address . . . . . . . . . . . . . . . . . . 4-1
Index-1-1
BEST shadow filter . . . . . . . . . . . . . . . . . . . A-1 bias definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . . 2-9 binary atomic GDS operations . . . . . . . . . . . . . . . . . . . . 7-9 LDS operations . . . . . . . . . . . . . . . . . . . . . 7-8 binary control specifier . . . . . . . . . . . . . . . . . 3-2 bit index_args. . . . . . . . . . . . . . . . . . . . . . . . . 7-3 operation notes . . . . . . . . . . . . . . . . . . . . . 7-6 pri-modifier_present . . . . . . . . . . . . . . . . . 7-3 bit strings packing and unpacking . . . . . . . . . . . . . . . 7-6 BitAlign instruction. . . . . . . . . . . . . . . . . . . 7-222 BREAK instruction. . . . . . . . . . . . . . . . . . . . 7-13 BREAK_LOGICALNZ instruction . . . . . . . . 7-14 BREAK_LOGICALZ instruction . . . . . . . . . . 7-14 BREAKC instruction . . . . . . . . . . . . . . . . . . 7-13 BUFINFO instruction . . . . . . . . . . . . . . . . . . 7-68 byte convert to float . . . . . . . . . . . . . . . . . . . . 7-10 ByteAlign instruction . . . . . . . . . . . . . . . . . 7-222 C CALL instruction . . . . . . . . . . . . . . . . . . . . . 7-14 CALL nesting. . . . . . . . . . . . . . . . . . . . . . . . . 4-2 CALL_LOGICALNZ instruction . . . . . . . . . . 7-16 CALL_LOGICALZ instruction . . . . . . . . . . . 7-17 CALLNZ instruction . . . . . . . . . . . . . . . . . . . 7-15 CALLNZ nesting . . . . . . . . . . . . . . . . . . . . . . 4-2 CASE instruction . . . . . . . . . . . . . . . . . . . . . 7-18 centroid interpolation mode. . . . . . . . . . . . . . 5-9 clamp definition . . . . . . . . . . . . . . . . . . . . . . 2-5, 2-9 IL_Dst_Mod . . . . . . . . . . . . . . . . . . . . . . . . 2-5 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . . 2-9 CLAMP instruction. . . . . . . . . . . . . . . . . . . 7-152 client language . . . . . . . . . . . . . . . . . . . . . . . 2-1 client_type . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 CMOV instruction . . . . . . . . . . . . . . . . . . . 7-153 CMOV_LOGICAL instruction. . . . . . . . . . . 7-154 CMP instruction . . . . . . . . . . . . . . . . . . . . . 7-155 code definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 flow-control-block . . . . . . . . . . . . . . . . . . . 7-2 GDS initializing . . . . . . . . . . . . . . . . . . . . . . . 7-8 IL_Opcode. . . . . . . . . . . . . . . . . . . . . . . . . 2-3 initializing. . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 LDS initializing . . . . . . . . . . . . . . . . . . . . . . 7-11 pseudo . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
Index-1-2
color data . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18 COLORCLAMP instruction . . . . . . . . . . . . 7-156 column sharing . . . . . . . . . . . . . . . . . . . . . . . 1-3 comment delimiter . . . . . . . . . . . . . . . . . . . . . 3-4 COMMENT instruction . . . . . . . . . . . . . . . 7-228 common register . . . . . . . . . . . . . . . . . . . . . . 3-2 comparison instruction notes . . . . . . . . . . . . 7-1 component first source token . . . . . . . . . . . . . . . . . . . . . 2-5 forced to zero . . . . . . . . . . . . . . . . . . . . . . 3-3 fourth source token . . . . . . . . . . . . . . . . . . . . . 2-5 second source token . . . . . . . . . . . . . . . . . . . . . 2-5 third source token . . . . . . . . . . . . . . . . . . . . . 2-5 vector written. . . . . . . . . . . . . . . . . . . . . . . 3-3 component_w_a definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 IL_Dst_Mod. . . . . . . . . . . . . . . . . . . . . . . . 2-5 component_x_r definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 IL_Dst_Mod. . . . . . . . . . . . . . . . . . . . . . . . 2-5 component_y_g definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 IL_Dst_Mod. . . . . . . . . . . . . . . . . . . . . . . . 2-5 component_z_b definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 IL_Dst_Mod. . . . . . . . . . . . . . . . . . . . . . . . 2-5 component-wise write mask . . . . . . . . . . . . . 3-3 compute delta . . . . . . . . . . . . . . . . . . . . . . . 5-27 compute shader 5-7, 5-25, 5-26, 5-27, 7-7, 7-8, 7-11 available index values . . . . . . . . . . . . . . . 1-3 CONST_BOOL . . . . . . . . . . . . . . . . . . . . . . . 5-4 CONST_BUFF register type . . . . . . . . . . . . . . . . . . . . . . . 5-8 CONST_FLOAT . . . . . . . . . . . . . . . . . . . . . . 5-4 CONST_INT . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 constant buffers. . . . . . . . . . . . . . . . . . . . . . . 1-3 CONTINUE instruction . . . . . . . . . . . . . . . . 7-18 CONTINUE_LOGICALNZ instruction . . . . . 7-19 CONTINUE_LOGICALZ instruction . . . . . . 7-19 CONTINUEC instruction . . . . . . . . . . . . . . . 7-19 control definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 IL_Opcode. . . . . . . . . . . . . . . . . . . . . . . . . 2-3 specifier. . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 control flow instructions restrictions. . . . . . . . . . . . . . . . . . . . . . . . . 7-3 control_sampler field . . . . . . . . . . . . . . . . . . . 7-3 controls_resource field . . . . . . . . . . . . . . . . . 7-3
conversion instruction notes . . . . . . . . . . . . . 7-5 COS instruction. . . . . . . . . . . . . . . . . . . . . 7-157 COS_VEC instruction . . . . . . . . . . . . . . . . 7-157 coverage mask . . . . . . . . . . . . . . . . . . . . . . 5-16 CRS instruction. . . . . . . . . . . . . . . . . . . . . 7-158 CUT instruction . . . . . . . . . . . . . . . . . . . . . . 7-68 CUT_STREAM instruction . . . . . . . . . . . . . 7-69 cycle timer value. . . . . . . . . . . . . . . . . . . . . 5-27 D D_DIV instruction . . . . . . . . . . . . . . . . . . . 7-211 D_EQ instruction. . . . . . . . . . . . . . . . . . . . 7-211 D_FRAC instruction . . . . . . . . . . . . . . . . . 7-212 D_FREXP instruction . . . . . . . . . . . . . . . . 7-212 D_GE instruction. . . . . . . . . . . . . . . . . . . . 7-213 D_LDEXP instruction . . . . . . . . . . . . . . . . 7-214 D_LT instruction . . . . . . . . . . . . . . . . . . . . 7-215 D_MAX instruction . . . . . . . . . . . . . . . . . . 7-216 D_MIN instruction . . . . . . . . . . . . . . . . . . . 7-216 D_MOV instruction . . . . . . . . . . . . . . . . . . 7-217 D_MOVC instruction . . . . . . . . . . . . . . . . . 7-217 D_MUL instruction . . . . . . . . . . . . . . . . . . 7-218 D_MULADD instruction. . . . . . . . . . . . . . . 7-219 D_NE instruction. . . . . . . . . . . . . . . . . . . . 7-219 D_RCP instruction . . . . . . . . . . . . . . . . . . 7-220 D_RSQ instruction . . . . . . . . . . . . . . . . . . 7-221 D_SQRT instruction . . . . . . . . . . . . . . . . . 7-221 D2F instruction . . . . . . . . . . . . . . . . . . . . . 7-143 DADD instruction . . . . . . . . . . . . . . . . . . . 7-210 data interpolated . . . . . . . . . . . . . . . . . . . . . . . 5-14 sharing between all work-items . . . . . . . . 1-3 texture coordinate. . . . . . . . . . . . . . . . . . 5-24 DCL_ARENA_UAV instruction . . . . . . . . . 7-231 DCL_CB instruction . . . . . . . . . . . . . . . . . . 7-32 DCL_FUNCTION_BODY instruction . . . . 7-331 DCL_FUNCTION_TABLE instruction . . . . 7-331 DCL_GDS instruction . . . . . . . . . . . . . . . . 7-300 DCL_GLOBAL_FLAGS instruction . . . . . . . 7-33 DCL_INDEXED_TEMP_ARRAY instruction 7-34 DCL_INPUT instruction . . . . . . . . . . . . . . . 7-35 DCL_INPUTPRIMITIVE instruction . . . . . . 7-37 DCL_INTERFACE_PTR instruction . . . . . 7-332 DCL_LDS instruction . . . . . . . . . . . . . . . . 7-232 DCL_LDS_SHARING_MODE instruction . . 7-37 DCL_LDS_SIZE_PER_THREAD instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-38 DCL_LITERAL instruction. . . . . . . . . . . . . . 7-39 dcl_literal statement . . . . . . . . . . . . . . . . . . . 2-6 DCL_MAX_OUTPUT_VERTEX_COUNT instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-40 DCL_MAX_TESSFACTOR instruction . . . . 7-41
DCL_MAX_THREAD_PER_GROUP instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-41 DCL_NUM_ICP instruction . . . . . . . . . . . . . 7-42 DCL_NUM_INSTANCES instruction . . . . . 7-42 DCL_NUM_OCP instruction . . . . . . . . . . . . 7-43 DCL_NUM_THREAD_PER_GROUP instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-43 dcl_num_threads_per_group . . . . . . . 7-7, 7-11 DCL_ODEPTH instruction . . . . . . . . . . . . . 7-44 DCL_OUTPUT instruction . . . . . . . . . . . . . 7-45 DCL_OUTPUT_TOPOLOGY instruction . . 7-46 DCL_PERSISTENT instruction. . . . . . . . . . 7-46 DCL_RAW_SRV instruction . . . . . . . . . . . 7-232 DCL_RAW_UAV instruction . . . . . . . . . . . 7-233 DCL_RESOURCE instruction. . . . . . . . . . . 7-47 DCL_SHARED_TEMP instruction . . . . . . . 7-48 DCL_STREAM instruction . . . . . . . . . . . . . 7-49 DCL_STRUCT_GDS instruction. . . . . . . . 7-301 DCL_STRUCT_LDS instruction . . . . . . . . 7-233 DCL_STRUCT_SRV instruction . . . . . . . . 7-234 DCL_STRUCT_UAV instruction . . . . . . . . 7-235 DCL_TOTAL_NUM_THREAD_GROUP instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-50 DCL_TS_DOMAIN instruction . . . . . . . . . . 7-50 DCL_TS_OUTPUT_PRIMITIVE instruction 7-51 DCL_TS_PARTITION instruction . . . . . . . . 7-51 DCL_UAV instruction . . . . . . . . . . . . . . . . 7-236 DCL_VPRIM instruction . . . . . . . . . . . . . . . 7-52 DCLARRAY instruction. . . . . . . . . . . . 4-1, 7-52 DCLDEF instruction . . . . . . . . . . . . . . . . . . 7-53 DCLPI instruction . . 4-1, 4-2, 5-11, 5-14, 5-19, 5-22, 5-24, 5-29, 7-54 DCLPIN instruction 4-2, 5-11, 5-18, 5-19, 5-22, 5-24, 7-56 DCLPP instruction . . . . . . . 4-2, 4-3, 5-18, 7-58 DCLPT instruction. . . . . . . . . . . . . . . . 4-2, 7-59 DCLV instruction. . . . . . . . . . . . . . . . . 4-1, 7-60 DCLVOUT instruction . . . 4-2, 5-11, 5-14, 5-19, 5-22, 5-23, 5-24, 5-28, 7-62 declaration information DX10 to IL . . . . . . . . . . . . . . . . . . . . . . . . 5-2 declare an object . . . . . . . . . . . . . . . . . . . . . 2-1 declare resources register . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 DEF instruction . . . . . . . . . . . . . . . . . . . . . . 7-64 DEFAULT instruction . . . . . . . . . . . . . . . . . 7-19 DEFB instruction. . . . . . . . . . . . . . . . . . . . . 7-65 definition _abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _bx2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _divcomp . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Index-1-3
_invert . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _neg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 clamp. . . . . . . . . . . . . . . . . . . . . . . . . 2-5, 2-9 code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 component_w_a . . . . . . . . . . . . . . . . . . . . 2-5 component_x_r . . . . . . . . . . . . . . . . . . . . . 2-5 component_y_g. . . . . . . . . . . . . . . . . . . . . 2-5 component_z_b. . . . . . . . . . . . . . . . . . . . . 2-5 control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 dimension . . . . . . . . . . . . . . . . . . . . . 2-4, 2-6 divComp . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 extended . . . . . . . . . . . . . . . . . . . . . . 2-4, 2-6 immediate_present . . . . . . . . . . . . . . 2-4, 2-6 Intermediate Language (IL) . . . . . . . . . . . 1-1 invert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 modifier_present . . . . . . . . . . . . . . . . 2-3, 2-6 negate_w_a. . . . . . . . . . . . . . . . . . . . . . . . 2-9 negate_x_r . . . . . . . . . . . . . . . . . . . . . . . . 2-8 negate_y_g . . . . . . . . . . . . . . . . . . . . . . . . 2-8 negate_z_b . . . . . . . . . . . . . . . . . . . . . . . . 2-9 pri_modifier_present . . . . . . . . . . . . . . . . . 2-3 register_num . . . . . . . . . . . . . . . . . . . 2-3, 2-6 register_type . . . . . . . . . . . . . . . . . . . 2-3, 2-6 relative_address . . . . . . . . . . . . . . . . 2-4, 2-6 sec_modifier_present . . . . . . . . . . . . . . . . 2-3 shift_scale . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 swizzle. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 swizzle_w_a . . . . . . . . . . . . . . . . . . . . . . . 2-9 swizzle_x_r . . . . . . . . . . . . . . . . . . . . . . . . 2-8 swizzle_y_g . . . . . . . . . . . . . . . . . . . . . . . . 2-8 swizzle_z_b . . . . . . . . . . . . . . . . . . . . . . . . 2-9 x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 delimiter comment . . . . . . . . . . . . . . . . . . . . . 3-4 delta computing . . . . . . . . . . . . . . . . . . . . . . 5-27 DEPTH register . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 register type . . . . . . . . . . . . . . . . . . . . . . . 5-8 depth rasterizer . . . . . . . . . . . . . . . . . . . . . . . 5-9 DEPTH_GE register type . . . . . . . . . . . . . . . 5-8 DEPTH_LE register type. . . . . . . . . . . . . . . . 5-9 destination index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 modifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 modifier token . . . . . . . . . . . . . . . . . . . . . . 2-4 operand . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 IL_Dst . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 token . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Index-1-4
destination information order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 destination modification IL_Dst_Mod. . . . . . . . . . . . . . . . . . . . . . . . 2-5 DET instruction . . . . . . . . . . . . . . . . . . . . . 7-159 dimension definition . . . . . . . . . . . . . . . . . . . . . . 2-4, 2-6 IL_Dst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 IL_Src . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 dimension type UAV. . . . . . . . . . . . . . . . . . . . . . . . . 7-9, 7-10 DirectX multiple form instructions . . . . . . . . . 1-1 Boolean . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 conditional . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 logical . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 unconditional . . . . . . . . . . . . . . . . . . . . . . . 1-1 DISCARD_LOGICALNZ instruction . . . . . . 7-69 DISCARD_LOGICALZ instruction . . . . . . . . 7-69 DIST instruction. . . . . . . . . . . . . . . . . . . . . 7-160 DIV instruction. . . . . . . . . . . . . . . . . . . . . . 7-161 divComp definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod. . . . . . . . . . . . . . . . . . . . . . . . 2-9 domain shader . . . . . . . . . . . . . 5-9, 5-14, 5-17 DOMAINLOCATION register type . . . . . . . . . . . . . . . . . . . . . . . 5-9 double comparison instruction . . . . . . . . . . . 7-1 double precision instruction notes . . . . . . . . 7-5 DP2 instruction . . . . . . . . . . . . . . . . . . . . . 7-162 DP2ADD instruction . . . . . . . . . . . . . . . . . 7-163 DP3 instruction . . . . . . . . . . . . . . . . . . . . . 7-164 DP4 instruction . . . . . . . . . . . . . . . . . . . . . 7-165 DST instruction . . . . . . . . . . . . . . . . . . . . . 7-166 DSX instruction . . . . . . . . . . . . . . . . . . . . . 7-167 DSY instruction . . . . . . . . . . . . . . . . . . . . . 7-168 Dword . . . . . . . . . . . . . . . . 7-7, 7-8, 7-10, 7-11 DX instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-2 DX10 declaration information to IL . . . . . . . . . . . 5-2 mapping to IL . . . . . . . . . . . . . . . . . . . . . . 5-1 support . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 DX10 compute shader (CS) register mapping to IL . . . . . . . . . . . . . . . . . . . . . . 5-3 DX11 . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7, 7-11 compute shader (CS) register mapping to IL . . . . . . . . . . . . . . . . . . . . 5-3 DX9 registers (2.0) mapping to IL types . . . . . . . . . . . . . . . . . 5-3 DX9 registers (3.0) mapping to IL types . . . . . . . . . . . . . . . . . 5-2 DXSINCOS instruction . . . . . . . . . . . . . . . 7-169
E EDGEFLAG register type . . . . . . . . . . . . . . . . . . . . . . 5-10 ELSE instruction . . . . . . . . . . . . . . . . . . . . . 7-20 EMIT instruction . . . . . . . . . . . . . . . . . . . . . 7-70 EMIT_STREAM instruction. . . . . . . . . . . . . 7-70 EMIT_THEN_CUT instruction. . . . . . . . . . . 7-71 EMIT_THEN_CUT_STREAM instruction . . 7-71 END instruction. . . . . . . . . . . . . . . . . . 4-1, 7-21 ENDFUNC instruction. . . . . . . . . . . . . . . . . 7-21 ENDIF instruction . . . . . . . . . . . . . . . . . . . . 7-22 ENDLOOP instruction. . . . . . . . . . . . . . . . . 7-22 ENDMAIN instruction . . . . . . . . . . . . . 4-1, 7-23 ENDPHASE instruction. . . . . . . . . . . . . . . . 7-23 ENDSWITCH instruction. . . . . . . . . . . . . . . 7-24 engine tessellation . . . . . . . . . . . . . 5-13, 5-20, 5-21 enumeration sequence order IL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1 enumeration type IL LDS Sharing Mode. . . . . . . . . . . . . . . . 6-7 IL_FIRSTBIT. . . . . . . . . . . . . . . . . . . . . . . 6-3 IL_LOAD_STORE_DATA_SIZE . . . . . . . . 6-8 IL_OUTPUT_TOPOLOGY . . . . . . . . . . . 6-17 IL_TOPOLOGY. . . . . . . . . . . . . . . . . . . . 6-21 ILAddressing. . . . . . . . . . . . . . . . . . . . . . . 6-1 ILAnisoFilterMode. . . . . . . . . . . . . . . . . . . 6-1 ILCmpValue . . . . . . . . . . . . . . . . . . . . . . . 6-2 ILComponentSelect . . . . . . . . . . . . . . . . . 6-2 ILDefaultVal . . . . . . . . . . . . . . . . . . . . . . . 6-2 ILDivComp . . . . . . . . . . . . . . . . . . . . . . . . 6-3 ILElementFormat . . . . . . . . . . . . . . . . . . . 6-3 ILImportComponent . . . . . . . . . . . . . . . . . 6-4 ILImportUsage . . . . . . . . . . . . . . . . . . . . . 6-5 ILInterpolation . . . . . . . . . . . . . . . . . . . . . . 6-7 ILLanguageType. . . . . . . . . . . . . . . . . . . . 6-7 ILLogicOp . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 ILMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 ILMipFilterMode . . . . . . . . . . . . . . . . . . . . 6-8 ILModDstComp. . . . . . . . . . . . . . . . . . . . . 6-9 ILNoiseType . . . . . . . . . . . . . . . . . . . . . . . 6-9 ILOpcode . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 ILPixTexUsage . . . . . . . . . . . . . . . . . . . . 6-18 ILRegType . . . . . . . . . . . . . . . . . . . . . . . 6-18 ILRelOp. . . . . . . . . . . . . . . . . . . . . . . . . . 6-18 ILShader . . . . . . . . . . . . . . . . . . . . . . . . . 6-19 ILShiftScale. . . . . . . . . . . . . . . . . . . . . . . 6-19 ILTexCoordMode . . . . . . . . . . . . . . . . . . 6-20 ILTexFilterMode . . . . . . . . . . . . . . . . . . . 6-20 ILTexShadowMode. . . . . . . . . . . . . . . . . 6-21 ILTopologyType . . . . . . . . . . . . . . . . . . . 6-21 ILTsPartition . . . . . . . . . . . . . . . . . . . . . . 6-22
ILZeroOp. . . . . . . . . . . . . . . . . . . . . . . . . 6-22 EQ instruction . . . . . . . . . . . . . . . . . . . . . . 7-170 EVAL_CENTROID instruction . . . . . . . . . . 7-72 EVAL_SAMPLE_INDEX instruction . . . . . . 7-72 EVAL_SNAPPED instruction . . . . . . . . . . . 7-73 Evergreen memory control notes. . . . . . . . . . . . . . . 7-10 programming model . . . . . . . . . . . . . . . . 7-10 example source token . . . . . . . . . . . . . . . . . . . . . . 2-10 EXN instruction . . . . . . . . . . . . . . . . . . . . . 7-170 EXP instruction . . . . . . . . . . . . . . . . . . . . . 7-171 EXP_VEC instruction . . . . . . . . . . . . . . . . 7-171 EXPP instruction. . . . . . . . . . . . . . . . . . . . 7-172 extended definition . . . . . . . . . . . . . . . . . . . . . . 2-4, 2-6 IL_Dst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 IL_Src . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 F F_2_F16 instruction . . . . . . . . . . . . . . . . . 7-144 F_2_u4 instruction . . . . . . . . . . . . . . . . . . 7-223 F16_2_F instruction . . . . . . . . . . . . . . . . . 7-145 F2D instruction . . . . . . . . . . . . . . . . . . . . . 7-144 FACE register type . . . . . . . . . . . . . . . . . . . 5-10 FACEFORWARD instruction . . . . . . . . . . 7-172 FCALL instruction . . . . . . . . . . . . . . . . . . . 7-333 FENCE instruction . . . . . . . . . . . . . . . . . . . 7-74 FETCH4 instruction . . . . . . . . . . . . . . . . . . 7-76 FETCH4_PO_C instruction. . . . . . . . . . . . . 7-77 FETCH4C instruction . . . . . . . . . . . . . . . . . 7-78 FETCH4po instruction . . . . . . . . . . . . . . . . 7-79 filter WEIGHTED_QUAD . . . . . . . . . . . . . . . . . A-1 flattened absolute work-item ID . . . . . . . . . . 5-7 flattened work-group ID . . . . . . . . . . . . . . . 5-26 flattened work-item ID . . . . . . . . . . . . . . . . 5-27 float comparison instruction . . . . . . . . . . . . . . . 7-1 signed arithmetic . . . . . . . . . . . . . . . . . 7-2 converting byte to. . . . . . . . . . . . . . . . . . 7-10 floating point arithmetic instructions. . . . . . . . . . . . . . . . 2-7 format . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 flow control instruction notes . . . . . . . . . . . . 7-2 flow-control-block . . . . . . . . . . . . . . . . . 4-1, 7-3 code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 FLR instruction . . . . . . . . . . . . . . . . . . . . . 7-173 FMA instruction. . . . . . . . . . . . . . . . . . . . . 7-173 FOG register . . . . . . . . . . . . . . . . 5-11, 5-18, 5-28 register type . . . . . . . . . . . . . . . . . . . . . . 5-10
Index-1-5
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
fog factor pixel shader output . . . . . . . . . . . . . . . . . 5-21 forced to zero component . . . . . . . . . . . . . . . . . . . . . . . . 3-3 format instruction packet . . . . . . . . . . . . . . . . . . . 7-1 kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 four-byte hex value format . . . . . . . . . . . . . 5-15 FRC instruction . . . . . . . . . . . . . . . . . . . . . 7-174 FTOI instruction. . . . . . . . . . . . . . . . . . . . . 7-145 FTOU instruction . . . . . . . . . . . . . . . . . . . . 7-146 FUNC instruction . . . . . . . . . . . . . . . . . . . . . 7-24 function/interface virtual. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25 FWIDTH instruction . . . . . . . . . . . . . . . . . . 7-175 G gather4_comp_sel . . . . . . . . . . . . . . . . . . . . . 7-4 GDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 application code initializing . . . . . . . . . . . . 7-8 binary atomic operation . . . . . . . . . . . . . . 7-9 memory operation notes . . . . . . . . . . . . . . 7-8 memory operations programming model . 7-8 GDS_ADD instruction . . . . . . . . . . . . . . . . 7-302 GDS_AND instruction . . . . . . . . . . . . . . . . 7-303 GDS_CMP_STORE instruction. . . . . . . . . 7-304 GDS_DEC instruction . . . . . . . . . . . . . . . . 7-305 GDS_INC instruction . . . . . . . . . . . . . . . . . 7-306 GDS_LOAD instruction . . . . . . . . . . . . . . . 7-307 GDS_MAX instruction . . . . . . . . . . . . . . . . 7-307 GDS_MIN instruction. . . . . . . . . . . . . . . . . 7-308 GDS_MSKOR instruction . . . . . . . . . . . . . 7-309 GDS_OR instruction . . . . . . . . . . . . . . . . . 7-310 GDS_READ_ADD instruction . . . . . . . . . . 7-311 GDS_READ_AND instruction . . . . . . . . . . 7-312 GDS_READ_CMP_XCHG instruction. . . . 7-313 GDS_READ_DEC instruction . . . . . . . . . . 7-314 GDS_READ_INC instruction . . . . . . . . . . . 7-315 GDS_READ_MAX instruction . . . . . . . . . . 7-316 GDS_READ_MIN instruction. . . . . . . . . . . 7-317 GDS_READ_MSKOR instruction . . . . . . . 7-318 GDS_READ_OR instruction . . . . . . . . . . . 7-319 GDS_READ_RSUB instruction . . . . . . . . . 7-320 GDS_READ_SUB instruction . . . . . . . . . . 7-321 GDS_READ_UMAX instruction. . . . . . . . . 7-322 GDS_READ_UMIN instruction . . . . . . . . . 7-323 GDS_READ_XOR instruction . . . . . . . . . . 7-324 GDS_RSUB instruction . . . . . . . . . . . . . . . 7-325 GDS_STORE instruction. . . . . . . . . . . . . . 7-326 GDS_SUB instruction . . . . . . . . . . . . . . . . 7-327 GDS_UMAX instruction. . . . . . . . . . . . . . . 7-328 GDS_UMIN instruction . . . . . . . . . . . . . . . 7-329 GDS_XOR instruction . . . . . . . . . . . . . . . . 7-330
Index-1-6
GE instruction . . . . . . . . . . . . . . . . . . . . . . 7-175 general read write memory access model . . . . . . . . . . . . . . . 1-2 generic token. . . . . . . . . . . . . . . . . . . . . . . . . 2-2 GENERIC_MEMORY register type. . . . . . . 5-11 geometry shader . . . . . . . . . . . . . . . . . . . . . 5-22 global data share (GDS). . . . . . . . . . . . . . . . 1-3 GLOBAL register type. . . . . . . . . . . . . . . . . 5-12 globally shared register SR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 graphic clients instructions . . . . . . . . . . . . . . . . . . . . . . . . 7-3 guidelines shader model . . . . . . . . . . . . . . . . . . . . . C-6 H hierarchical threading model. . . . . . . . . . . . . 1-2 HOS rendering . . . . . . . . 5-8, 5-12, 5-20, 5-21 HOS-related fields. . . . . . . . . . . . . . . . . . . . . 5-3 HS_CP_PHASE instruction. . . . . . . . . . . . . 7-25 HS_FORK_PHASE instruction . . . . . . . . . . 7-25 HS_JOIN_PHASE instruction . . . . . . . . . . . 7-26 hull shader. . . . . . . . . . . . . . . . . . . . . 5-14, 5-22 I I64_ADD instruction . . . . . . . . . . . . . . . . . 7-116 I64EQ instruction. . . . . . . . . . . . . . . . . . . . 7-117 I64GE instruction. . . . . . . . . . . . . . . . . . . . 7-117 I64LT instruction . . . . . . . . . . . . . . . . . . . . 7-117 I64MAX instruction . . . . . . . . . . . . . . . . . . 7-118 I64MIN instruction . . . . . . . . . . . . . . . . . . . 7-118 I64NE instruction . . . . . . . . . . . . . . . . . . . . 7-117 I64NEGATE instruction . . . . . . . . . . . . . . . 7-119 I64SHL instruction. . . . . . . . . . . . . . . . . . . 7-120 I64SHR instruction . . . . . . . . . . . . . . . . . . 7-120 IADD instruction . . . . . . . . . . . . . . . . . . . . 7-121 IAND instruction . . . . . . . . . . . . . . . . . . . . 7-121 IBIT_EXTRACT instruction . . . . . . . . . . . . 7-139 IBORROW instruction . . . . . . . . . . . . . . . . 7-122 ICARRY instruction . . . . . . . . . . . . . . . . . . 7-122 ICOUNTBITS instruction . . . . . . . . . . . . . . 7-140 IEQ instruction. . . . . . . . . . . . . . . . . . . . . . 7-123 IF_LOGICALNZ instruction . . . . . . . . . . . . . 7-26 IF_LOGICALZ instruction . . . . . . . . . . . . . . 7-26 IFC instruction . . . . . . . . . . . . . . . . . . . . . . . 7-27 IFC-ELSE-ENDIF nesting . . . . . . . . . . . . . . . 4-2 IFIRSTBIT instruction . . . . . . . . . . . . . . . . 7-140 IFNZ instruction . . . . . . . . . . . . . . . . . . . . . . 7-27 IFNZ-ELSE-ENDIF nesting . . . . . . . . . . . . . . 4-2 IGE instruction. . . . . . . . . . . . . . . . . . . . . . 7-123 IL 64-bit integer instructions . . . . . . . . . . . . . 7-6 binary stream . . . . . . . . . . . . . . . . . . . . . . 3-1
compiler . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 control flow . . . . . . . . . . . . . . . . . . . . . . . . 7-2 enumeration sequence order . . . . . . . . . . B-1 instruction device validity. . . . . . . . . . . . . . . . . . . . D-1 restriction . . . . . . . . . . . . . . . . . . . . . . . C-1 kernels written passed to compiler . . . . . 2-1 macro processor support . . . . . . . . . . . 7-335 mapping from DX9 registers (2.0) . . . . . . 5-3 mapping from DX9 registers (3.0) . . . . . . 5-2 memory access model . . . . . . . . . . . . . . . 1-2 opcode . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 open design . . . . . . . . . . . . . . . . . . . . . . . 1-1 register mapping. . . . . . . . . . . . . . . . . . . . 5-1 register restrictions . . . . . . . . . . . . . . . . . . C-1 register types overview . . . . . . . . . . . . . . 5-5 restriction . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 shader. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 shift operations . . . . . . . . . . . . . . . . . . . . . 7-6 simple arithmetic operations . . . . . . . . . . 7-6 statements used in any shader . . . . . . . . 2-1 stream. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 packet ordering . . . . . . . . . . . . . . . . . . 2-1 text syntax . . . . . . . . . . . . . . . . . . . . . . . . 3-1 token descriptions . . . . . . . . . . . . . . . . . . 2-2 translator. . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 types used in any shader. . . . . . . . . . . . . 2-1 valid register types . . . . . . . . . . . . . . . . . . 5-1 IL LDS Sharing Mode enumeration type . . . . . . . . . . . . . . . . . . . 6-7 IL PrimarySample Mod . . . . . . . . . . . . . . . . . 7-3 IL_Dst destination operand . . . . . . . . . . . . . . . . . 2-3 dimension . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 extended . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 immediate_present . . . . . . . . . . . . . . . . . . 2-4 modifier_present . . . . . . . . . . . . . . . . . . . . 2-3 register_num. . . . . . . . . . . . . . . . . . . . . . . 2-3 register_type . . . . . . . . . . . . . . . . . . . . . . . 2-3 relative_address . . . . . . . . . . . . . . . . . . . . 2-4 IL_Dst token . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 IL_Dst_Mod clamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 component_w_a . . . . . . . . . . . . . . . . . . . . 2-5 component_x_r . . . . . . . . . . . . . . . . . . . . . 2-5 component_y_g . . . . . . . . . . . . . . . . . . . . 2-5 component_z_b . . . . . . . . . . . . . . . . . . . . 2-5 destination modification . . . . . . . . . . . . . . 2-5 shift_scale. . . . . . . . . . . . . . . . . . . . . . . . . 2-5 IL_FIRSTBIT enumeration type . . . . . . . . . . . . . . . . . . . 6-3 IL_IMPORTUSAGE_BACKCOLOR . 5-19, 5-22 IL_IMPORTUSAGE_COLOR . . . . . . 5-19, 5-22
IL_IMPORTUSAGE_FOG. . . . . . . . . . . . . . 5-11 IL_IMPORTUSAGE_GENERIC . . . . 5-14, 5-24 IL_IMPORTUSAGE_POINTSIZE . . . . . . . . 5-23 IL_IMPORTUSAGE_POS . . . . . . . . . . . . . . 5-19 IL_Lang token . . . . . . . . . . . . . . . . . . . . . . . . 2-2 IL_LOAD_STORE_DATA_SIZE enumeration type . . . . . . . . . . . . . . . . . . . 6-8 IL_Opcode code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 instruction details . . . . . . . . . . . . . . . . . . . 2-3 pri_modifier_present . . . . . . . . . . . . . . . . . 2-3 restrictions . . . . . . . . . . . . . . . . . . . . . . . . C-1 sec_modifier_present . . . . . . . . . . . . . . . . 2-3 token . . . 2-1, 2-3, 2-5, 7-1, 7-14, 7-17, 7-57, 7-61, 7-63, 7-86, 7-104, 7-105, 7-108, 7-111, 7-112, 7-113, 7-114, 7-115 IL_OUTPUT . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 IL_OUTPUT_TOPOLOGY enumeration type . . . . . . . . . . . . . . . . . . 6-17 IL_REG_TYPES . . . . . . . . . . . . . . . . . . . . 7-335 IL_REGTYPE_ABSOLUTE_THREAD_ID . . 5-7 IL_REGTYPE_ABSOLUTE_THREAD_ID_FLAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7 IL_REGTYPE_BARYCENTRIC_COORD. . . 5-7 IL_REGTYPE_CONST_BUFF . . . . . . . . . . . 5-8 IL_REGTYPE_DEPTH . . . . . . . . . . . . . . . . . 5-8 IL_REGTYPE_DEPTH_GE . . . . . . . . . . . . . 5-8 IL_REGTYPE_DEPTH_LE . . . . . . . . . . . . . . 5-9 IL_REGTYPE_DOMAINLOCATION . . . . . . . 5-9 IL_REGTYPE_EDGEFLAG . . . . . . . . . . . . 5-10 IL_REGTYPE_FACE . . . . . . . . . . . . . . . . . 5-10 IL_REGTYPE_FOG . . . . . . . . . . . . . . . . . . 5-10 IL_REGTYPE_GENERIC_MEM . . . . . . . . . 5-11 IL_REGTYPE_GLOBAL . . . . . . . . . . . . . . . 5-12 IL_REGTYPE_IMMED_CONST_BUFF . . . 5-12 IL_REGTYPE_INDEX . . . . . . . . . . . . . . . . . 5-12 IL_REGTYPE_INDEXED_TEMP . . . . . . . . 5-15 IL_REGTYPE_INPUT . . . . . . . . . . . . . . . . . 5-13 IL_REGTYPE_INPUT_ARG . . . . . . . . . . . . 5-13 IL_REGTYPE_INPUT_COVERAGE_MASK 5-13 IL_REGTYPE_INPUTCP . . . . . . . . . . . . . . 5-14 IL_REGTYPE_INTERP. . . . . . . . . . . 5-14, 5-15 IL_REGTYPE_LITERAL . . . . . . . . . . . . . . . 5-15 IL_REGTYPE_OBJECT_INDEX. . . . . . . . . 5-16 IL_REGTYPE_OCP_ID . . . . . . . . . . 5-16, 5-17 IL_REGTYPE_OMASK . . . . . . . . . . . . . . . . 5-16 IL_REGTYPE_OUTPUT_ARG . . . . . . . . . . 5-17 IL_REGTYPE_PATCHCONST . . . . . . . . . . 5-17 IL_REGTYPE_PCOLOR. . . . . . . . . . . . . . . 5-18 IL_REGTYPE_PERSIST . . . . . . . . . . . . . . 5-18 IL_REGTYPE_PINPUT. . . . . . . . . . . . . . . . 5-18 IL_REGTYPE_POS . . . . . . . . . . . . . . . . . . 5-19
Index-1-7
IL_REGTYPE_PRICOLOR . . . . . . . . . . . . . 5-19 IL_REGTYPE_PRIMCOORD . . . . . . . . . . . 5-20 IL_REGTYPE_PRIMITIVE_INDEX . . . . . . . 5-20 IL_REGTYPE_PRIMTYPE . . . . . . . . . . . . . 5-20 IL_REGTYPE_PS_OUT_FOG . . . . . . . . . . 5-21 IL_REGTYPE_QUAD_INDEX . . . . . . . . . . . 5-21 IL_REGTYPE_SECCOLOR . . . . . . . . . . . . 5-21 IL_REGTYPE_SHADER_INSTANCE_ID . . 5-22 IL_REGTYPE_SHARED_TEMP . . . . . . . . . 5-22 IL_REGTYPE_SPRITE . . . . . . . . . . . . . . . . 5-23 IL_REGTYPE_SPRITECOORD . . . . . . . . . 5-23 IL_REGTYPE_STENCIL . . . . . . . . . . . . . . . 5-23 IL_REGTYPE_TEMP . . . . . . . . . . . . . . . . . 5-24 IL_REGTYPE_TEXCOORD . . . . . . . . . . . . 5-24 IL_REGTYPE_THIS . . . . . . . . . . . . . . . . . . 5-25 IL_REGTYPE_THREAD_GROUP_ID. . . . . 5-25 IL_REGTYPE_THREAD_GROUP_ID_FLAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26 IL_REGTYPE_THREAD_ID_IN_GROUP . . 5-26 IL_REGTYPE_THREAD_ID_IN_GROUP_FLAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27 IL_REGTYPE_TIMER . . . . . . . . . . . . . . . . . 5-27 IL_REGTYPE_VERTEX . . . . . . . . . . . . . . . 5-28 IL_REGTYPE_VOUTPUT . . . . . . . . . . . . . . 5-28 IL_REGTYPE_VPRIM . . . . . . . . . . . . . . . . . 5-29 IL_REGTYPE_WINCOORD . . . . . . . . . . . . 5-29 IL_Rel_Addr token conjunction with source modifier token . . 2-7 source modifier token precedes . . . . . . . . . . . . . . . . . . . . . . . . 2-7 IL_Src dimension . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 extended . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 immediate_present . . . . . . . . . . . . . . . . . . 2-6 modifier_present . . . . . . . . . . . . . . . . . . . . 2-6 register_num . . . . . . . . . . . . . . . . . . . . . . . 2-6 register_type . . . . . . . . . . . . . . . . . . . . . . . 2-6 relative_address . . . . . . . . . . . . . . . . . . . . 2-6 source operand . . . . . . . . . . . . . . . . . . . . . 2-6 token . . . . . . . . . . . . . . . . . . . . . . . . . 2-7, 7-1 IL_Src_Mod abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 clamp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 divComp . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 invert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 negate_w_a. . . . . . . . . . . . . . . . . . . . . . . . 2-9 negate_x_r . . . . . . . . . . . . . . . . . . . . . . . . 2-8 negate_y_g . . . . . . . . . . . . . . . . . . . . . . . . 2-8 negate_z_b . . . . . . . . . . . . . . . . . . . . . . . . 2-9 sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 source operand modification . . . . . . . . . . 2-8 swizzle_w_a . . . . . . . . . . . . . . . . . . . . . . . 2-9
Index-1-8
swizzle_x_r . . . . . . . . . . . . . . . . . . . . . . . . 2-8 swizzle_y_g. . . . . . . . . . . . . . . . . . . . . . . . 2-8 swizzle_z_b. . . . . . . . . . . . . . . . . . . . . . . . 2-9 x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_TEMP_ARRAY . . . . . . . . . . . . . . . . . . . . . 2-3 IL_TOPOLOGY enumeration type . . . . . . . . . . . . . . . . . . 6-21 IL_Version token . . . . . . . . . . . . . . . . . . . . . . 2-2 ILAddressing enumeration type . . . . . . . . . . . . . . . . . . . 6-1 ILAnisoFilterMode enumeration type . . . . . . . . . . . . . . . . . . . 6-1 ILCmpValue enumeration type . . . . . . . . . . . . . . . . . . . 6-2 ILComponentSelect enumeration type . . . . . . . . . . . . . . . . . . . 6-2 ILDefaultVal enumeration type . . . . . . . . . . . . . . . . . . . 6-2 ILDivComp enumeration type . . . . . . . . . . . . . . . . . . . 6-3 ILElementFormat enumeration type . . . . . . . . . . . . . . . . . . . 6-3 ILImportComponent enumeration type . . . . . . . . . . . . . . . . . . . 6-4 ILImportUsage enumeration type . . . . . . . . . . . . . . . . . . . 6-5 ILInterpolation enumeration type . . . . . . . . . . . . . . . . . . . 6-7 ILLanguageType enumeration type . . . . . . . . . . . . . . . . . . . 6-7 ILLogicOp enumeration type . . . . . . . . . . . . . . . . . . . 6-8 ILMatrix enumeration type . . . . . . . . . . . . . . . . . . . 6-8 ILMipFilterMode enumeration type . . . . . . . . . . . . . . . . . . . 6-8 ILModDstComp enumeration type . . . . . . . . . . . . . . . . . . . 6-9 ILNoiseType enumeration type . . . . . . . . . . . . . . . . . . . 6-9 ILOpcode enumeration type . . . . . . . . . . . . . . . . . . . 6-9 ILPixTexUsage enumeration type . . . . . . . . . . . . . . . . . . 6-18 ILRegType enumeration type . . . . . . . . . . . . . . . . . . 6-18 ILRelOp enumeration type . . . . . . . . . . . . . . . . . . 6-18 ILShader enumeration type . . . . . . . . . . . . . . . . . . 6-19 ILShiftScale enumeration type . . . . . . . . . . . . . . . . . . 6-19 ILT instruction . . . . . . . . . . . . . . . . . . . . . . 7-123
ILTexCoordMode enumeration type . . . . . . . . . . . . . . . . . . 6-20 ILTexFilterMode enumeration type . . . . . . . . . . . . . . . . . . 6-20 ILTexShadowMode enumeration type . . . . . . . . . . . . . . . . . . 6-21 ILTopologyType enumeration type . . . . . . . . . . . . . . . . . . 6-21 ILTsPartition enumeration type . . . . . . . . . . . . . . . . . . 6-22 ILZeroOp enumeration type . . . . . . . . . . . . . . . . . . 6-22 IMAD instruction . . . . . . . . . . . . . . . . . . . . 7-124 IMAD24 instruction . . . . . . . . . . . . . . . . . . 7-124 IMAX instruction . . . . . . . . . . . . . . . . . . . . 7-125 IMIN instruction. . . . . . . . . . . . . . . . . . . . . 7-125 IMMED_CONST_BUFFER register type . . 5-12 immediate_present definition . . . . . . . . . . . . . . . . . . . . . . 2-4, 2-6 field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 IL_Dst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 IL_Src . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 IMUL instruction . . . . . . . . . . . . . . . . . . . . 7-126 IMUL_HIGH instruction. . . . . . . . . . . . . . . 7-126 IMUL24 instruction . . . . . . . . . . . . . . . . . . 7-127 IMUL24_HIGH instruction. . . . . . . . . . . . . 7-127 INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 register type . . . . . . . . . . . . . . . . . . . . . . 5-12 index destination . . . . . . . . . . . . . . . . . . . . . . . . 2-3 register relative . . . . . . . . . . . . . . . . . . . . . 2-6 source . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 values compute shader . . . . . . . . . . . . . . . . . . 1-3 index_args bit . . . . . . . . . . . . . . . . . . . . . . . . 7-3 INE instruction . . . . . . . . . . . . . . . . . . . . . 7-123 INEGATE instruction. . . . . . . . . . . . . . . . . 7-128 INIT_SHARED_REGISTERS instruction . . 7-66 INITV instruction . . . . . . . . . . . . . . . . . 4-1, 7-67 INOT instruction . . . . . . . . . . . . . . . . . . . . 7-128 INPUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 register type . . . . . . . . . . . . . . . . . . . . . . 5-13 input coverage mask . . . . . . . . . . . . . . . . . 5-13 input/output instruction notes . . . . . . . . . . . . 7-3 INPUT_ARG register type . . . . . . . . . . . . . 5-13 INPUT_COVERAGE_MASK register type . 5-13 INPUTCP register type . . . . . . . . . . . . . . . . 5-14 instruction ABS. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-148 ACOS . . . . . . . . . . . . . . . . . . . . . . . . . . 7-148 ADD . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-149 AND . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-149 APPEND_BUF_ALLOC . . . . . . . . . . . . 7-229
APPEND_BUF_CONSUME . . . . . . . . . 7-230 arithmetic notes . . . . . . . . . . . . . . . . . . . . 7-5 ASIN . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-150 ATAN . . . . . . . . . . . . . . . . . . . . . . . . . . 7-151 BitAlign . . . . . . . . . . . . . . . . . . . . . . . . . 7-222 BREAK . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 BREAK_LOGICALNZ . . . . . . . . . . . . . . . 7-14 BREAK_LOGICALZ . . . . . . . . . . . . . . . . 7-14 BREAKC . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 BUFINFO . . . . . . . . . . . . . . . . . . . . . . . . 7-68 ByteAligncomparison notes . . . . . . . . . . . . . . . . . . . 7-1 CONTINUE . . . . . . . . . . . . . . . . . . . . . . . 7-18 CONTINUE_LOGICALNZ. . . . . . . . . . . . 7-19 CONTINUE_LOGICALZ . . . . . . . . . . . . . 7-19 CONTINUEC . . . . . . . . . . . . . . . . . . . . . 7-19 control flow restrictions . . . . . . . . . . . . . . . . . . . . . . 7-3 conversion notes
Index-1-9
DADD . . . . . . . . . . . . . . . . . . . . . . . . . . 7-210 DCL_ARENA_UAV . . . . . . . . . . . . . . . . 7-231 DCL_CB . . . . . . . . . . . . . . . . . . . . . . . . . 7-32 DCL_FUNCTION_BODY . . . . . . . . . . . 7-331 DCL_FUNCTION_TABLE . . . . . . . . . . . 7-331 DCL_GDS . . . . . . . . . . . . . . . . . . . . . . . 7-300 DCL_GLOBAL_FLAGS . . . . . . . . . . . . . . 7-33 DCL_INDEXED_TEMP_ARRAY. . . . . . . 7-34 DCL_INPUT . . . . . . . . . . . . . . . . . . . . . . 7-35 DCL_INPUTPRIMITIVE . . . . . . . . . . . . . 7-37 DCL_INTERFACE_PTR . . . . . . . . . . . . 7-332 DCL_LDS . . . . . . . . . . . . . . . . . . . . . . . 7-232 DCL_LDS_SHARING_MODE . . . . . . . . . 7-37 DCL_LDS_SIZE_PER_THREAD . . . . . . 7-38 DCL_LITERAL . . . . . . . . . . . . . . . . . . . . 7-39 DCL_MAX_OUTPUT_VERTEX_COUNT 7-40 DCL_MAX_TESSFACTOR . . . . . . . . . . . 7-41 DCL_MAX_THREAD_PER_GROUP . . . 7-41 DCL_NUM_ICP . . . . . . . . . . . . . . . . . . . . 7-42 DCL_NUM_INSTANCES . . . . . . . . . . . . 7-42 DCL_NUM_OCP . . . . . . . . . . . . . . . . . . . 7-43 DCL_NUM_THREAD_PER_GROUP . . . 7-43 DCL_ODEPTH . . . . . . . . . . . . . . . . . . . . 7-44 DCL_OUTPUT . . . . . . . . . . . . . . . . . . . . 7-45 DCL_OUTPUT_TOPOLOGY . . . . . . . . . 7-46 DCL_PERSISTENT . . . . . . . . . . . . . . . . 7-46 DCL_RAW_SRV . . . . . . . . . . . . . . . . . . 7-232 DCL_RAW_UAV . . . . . . . . . . . . . . . . . . 7-233 DCL_RESOURCE. . . . . . . . . . . . . . . . . . 7-47 DCL_SHARED_TEMP . . . . . . . . . . . . . . 7-48 DCL_STREAM . . . . . . . . . . . . . . . . . . . . 7-49 DCL_STRUCT_GDS. . . . . . . . . . . . . . . 7-301 DCL_STRUCT_LDS . . . . . . . . . . . . . . . 7-233 DCL_STRUCT_SRV . . . . . . . . . . . . . . . 7-234 DCL_STRUCT_UAV . . . . . . . . . . . . . . . 7-235 DCL_TOTAL_NUM_THREAD_GROUP . 7-50 DCL_TS_DOMAIN . . . . . . . . . . . . . . . . . 7-50 DCL_TS_OUTPUT_PRIMITIVE . . . . . . . 7-51 DCL_TS_PARTITION . . . . . . . . . . . . . . . 7-51 DCL_UAV . . . . . . . . . . . . . . . . . . . . . . . 7-236 DCL_VPRIM . . . . . . . . . . . . . . . . . . . . . . 7-52 DCLARRAY. . . . . . . . . . . . . . . . . . . 4-1, 7-52 DCLDEF . . . . . . . . . . . . . . . . . . . . . . . . . 7-53 DCLlPIN . . . . . . . . . . . . . . . . . . . . . . . . . 5-19 DCLPI 4-1, 4-2, 5-11, 5-14, 5-19, 5-22, 5-24, 5-29, 7-54 DCLPIN . . . 4-2, 5-11, 5-18, 5-22, 5-24, 7-56 DCLPP . . . . . . . . . . . . . 4-2, 4-3, 5-18, 7-58 DCLPT . . . . . . . . . . . . . . . . . . . . . . 4-2, 7-59 DCLV. . . . . . . . . . . . . . . . . . . . . . . . 4-1, 7-60 DCLVOUT 4-2, 5-11, 5-14, 5-19, 5-22, 5-23, 5-24, 5-28, 7-62 DEF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-64
Index-1-10
DEFAULT . . . . . . . . . . . . . . . . . . . . . . . . 7-19 DEFB. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-65 DET . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-159 DISCARD_LOGICALNZ . . . . . . . . . . . . . 7-69 DISCARD_LOGICALZ . . . . . . . . . . . . . . 7-69 DIST . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-160 DIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-161 double comparison . . . . . . . . . . . . . . . . . . 7-1 double precision notesu4 . . . . . . . . . . . . . . . . . . . . . . . . . 7-223 F2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-144 FACEFORWARD . . . . . . . . . . . . . . . . . 7-172 FCALL . . . . . . . . . . . . . . . . . . . . . . . . . . 7-333 FENCE . . . . . . . . . . . . . . . . . . . . . . . . . . 7-74 FETCH4 . . . . . . . . . . . . . . . . . . . . . . . . . 7-76 FETCH4_PO_C . . . . . . . . . . . . . . . . . . . 7-77 FETCH4C . . . . . . . . . . . . . . . . . . . . . . . . 7-78 FETCH4po . . . . . . . . . . . . . . . . . . . . . . . 7-79 float comparison . . . . . . . . . . . . . . . . . . . . 7-1 signed arithmetic . . . . . . . . . . . . . . . . . 7-2 flow control notes . . . . . . . . . . . . . . . . . . . 7-2 FLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-173
FMA . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-173 for graphic clients . . . . . . . . . . . . . . . . . . . 7-3 FRC . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-174 FTOI . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-145 FTOU . . . . . . . . . . . . . . . . . . . . . . . . . . 7-146 FUNC . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-24 FWIDTH . . . . . . . . . . . . . . . . . . . . . . . . 7-175 GDS_ADD . . . . . . . . . . . . . . . . . . . . . . 7-302 GDS_AND . . . . . . . . . . . . . . . . . . . . . . 7-303 GDS_CMP_STORE . . . . . . . . . . . . . . . 7-304 GDS_DEC . . . . . . . . . . . . . . . . . . . . . . 7-305 GDS_INC . . . . . . . . . . . . . . . . . . . . . . . 7-306 GDS_LOAD . . . . . . . . . . . . . . . . . . . . . 7-307 GDS_MAX . . . . . . . . . . . . . . . . . . . . . . 7-307 GDS_MIN . . . . . . . . . . . . . . . . . . . . . . . 7-308 GDS_MSKOR. . . . . . . . . . . . . . . . . . . . 7-309 GDS_OR. . . . . . . . . . . . . . . . . . . . . . . . 7-310 GDS_READ_ADD . . . . . . . . . . . . . . . . 7-311 GDS_READ_AND . . . . . . . . . . . . . . . . 7-312 GDS_READ_CMP_XCHG . . . . . . . . . . 7-313 GDS_READ_DEC . . . . . . . . . . . . . . . . 7-314 GDS_READ_INC . . . . . . . . . . . . . . . . . 7-315 GDS_READ_MAX . . . . . . . . . . . . . . . . 7-316 GDS_READ_MIN . . . . . . . . . . . . . . . . . 7-317 GDS_READ_MSKOR. . . . . . . . . . . . . . 7-318 GDS_READ_OR. . . . . . . . . . . . . . . . . . 7-319 GDS_READ_RSUB . . . . . . . . . . . . . . . 7-320 GDS_READ_SUB. . . . . . . . . . . . . . . . . 7-321 GDS_READ_UMAX . . . . . . . . . . . . . . . 7-322 GDS_READ_UMIN. . . . . . . . . . . . . . . . 7-323 GDS_READ_XOR . . . . . . . . . . . . . . . . 7-324 GDS_RSUB . . . . . . . . . . . . . . . . . . . . . 7-325 GDS_STORE . . . . . . . . . . . . . . . . . . . . 7-326 GDS_SUB. . . . . . . . . . . . . . . . . . . . . . . 7-327 GDS_UMAX . . . . . . . . . . . . . . . . . . . . . 7-328 GDS_UMIN. . . . . . . . . . . . . . . . . . . . . . 7-329 GDS_XOR . . . . . . . . . . . . . . . . . . . . . . 7-330 GE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-175 HS_CP_PHASE . . . . . . . . . . . . . . . . . . . 7-25 HS_FORK_PHASE. . . . . . . . . . . . . . . . . 7-25 HS_JOIN_PHASE . . . . . . . . . . . . . . . . . 7-26 I64_ADD . . . . . . . . . . . . . . . . . . . . . . . . 7-116 I64EQ . . . . . . . . . . . . . . . . . . . . . . . . . . 7-117 I64GE . . . . . . . . . . . . . . . . . . . . . . . . . . 7-117 I64LT. . . . . . . . . . . . . . . . . . . . . . . . . . . 7-117 I64MAX . . . . . . . . . . . . . . . . . . . . . . . . . 7-118 I64MIN . . . . . . . . . . . . . . . . . . . . . . . . . 7-118 I64NE . . . . . . . . . . . . . . . . . . . . . . . . . . 7-117 I64NEGATE . . . . . . . . . . . . . . . . . . . . . 7-119 I64SHL . . . . . . . . . . . . . . . . . . . . . . . . . 7-120 I64SHR . . . . . . . . . . . . . . . . . . . . . . . . . 7-120 IADD . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-121 IAND . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-121
input/output notes. . . . . . . . . . . . . . . . . . . 7-3 integer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 integer 64-bit . . . . . . . . . . . . . . . . . . . . . . 7-6 integer comparison . . . . . . . . . . . . . . . . . . 7-1 signed arithmetic . . . . . . . . . . . . . . . . . 7-2 unsigned arithmetic . . . . . . . . . . . . . . . 7-2 INV_MOV . . . . . . . . . . . . . . . . . . . . . . . 7-176 IOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-129 ISHL . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-130 ISHR . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-130 ITOF . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-146 IXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-129 KILL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-80 LDS_ADD . . . . . . . . . . . . . . . . . . . . . . . 7-237 LDS_AND . . . . . . . . . . . . . . . . . . . . . . . 7-238 LDS_CMP. . . . . . . . . . . . . . . . . . . . . . . 7-238 LDS_DEC . . . . . . . . . . . . . . . . . . . . . . . 7-239 LDS_INC. . . . . . . . . . . . . . . . . . . . . . . . 7-239 LDS_LOAD . . . . . . . . . . . . . . . . . . . . . . 7-240 LDS_LOAD_BYTE . . . . . . . . . . . . . . . . 7-240 LDS_LOAD_SHORT . . . . . . . . . . . . . . 7-241 LDS_LOAD_UBYTE . . . . . . . . . . . . . . . 7-241 LDS_LOAD_USHORT . . . . . . . . . . . . . 7-242 LDS_LOAD_VEC . . . . . . . . . . . . . . . . . 7-243 LDS_MAX . . . . . . . . . . . . . . . . . . . . . . . 7-244 LDS_MIN . . . . . . . . . . . . . . . . . . . . . . . 7-244 LDS_MSKOR . . . . . . . . . . . . . . . . . . . . 7-245 LDS_OR . . . . . . . . . . . . . . . . . . . . . . . . 7-245
Index-1-11
LDS_READ_ADD . . . . . . . . . . . . . . . . . 7-246 LDS_READ_AND . . . . . . . . . . . . . . . . . 7-247 LDS_READ_CMP_XCHG . . . . . . . . . . . 7-248 LDS_READ_MAX . . . . . . . . . . . . . . . . . 7-249 LDS_READ_MIN. . . . . . . . . . . . . . . . . . 7-250 LDS_READ_OR . . . . . . . . . . . . . . . . . . 7-251 LDS_READ_RSUB . . . . . . . . . . . . . . . . 7-252 LDS_READ_SUB . . . . . . . . . . . . . . . . . 7-253 LDS_READ_UMAX . . . . . . . . . . . . . . . . 7-254 LDS_READ_UMIN . . . . . . . . . . . . . . . . 7-255 LDS_READ_VEC . . . . . . . . . . . . . . . . . . 7-81 LDS_READ_XCHG . . . . . . . . . . . . . . . . 7-256 LDS_READ_XOR . . . . . . . . . . . . . . . . . 7-257 LDS_RSUB . . . . . . . . . . . . . . . . . . . . . . 7-257 LDS_STORE . . . . . . . . . . . . . . . . . . . . . 7-258 LDS_STORE_BYTE . . . . . . . . . . . . . . . 7-258 LDS_STORE_SHORT . . . . . . . . . . . . . 7-259 LDS_STORE_VEC . . . . . . . . . . . . . . . . 7-260 LDS_SUB . . . . . . . . . . . . . . . . . . . . . . . 7-261 LDS_UMAX . . . . . . . . . . . . . . . . . . . . . . 7-261 LDS_UMIN . . . . . . . . . . . . . . . . . . . . . . 7-262 LDS_WRITE_VEC . . . . . . . . . . . . . . . . . 7-82 LDS_XOR . . . . . . . . . . . . . . . . . . . . . . . 7-262 LEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-176 LIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-177 LN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-178 LOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-83 LOAD_FPTR . . . . . . . . . . . . . . . . . . . . . . 7-85 LOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-86 LOG. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-179 LOG_VEC . . . . . . . . . . . . . . . . . . . . . . . 7-180 LOGP . . . . . . . . . . . . . . . . . . . . . . . . . . 7-181 LOOP . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-28 LRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-182 LT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-183 MACRODEF . . . . . . . . . . . . . . . . . . . . . 7-336 MACROEND . . . . . . . . . . . . . . . . . . . . . 7-337 MAD . . . . . . . . . . . . . . . . . . . . . . 7-11, 7-184 MAX . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-185 MCALL . . . . . . . . . . . . . . . . . . . . . . . . . 7-338 MEMEXPORT . . . . . . . . . . . . . . . . . . . . . 7-87 MEMIMPORT . . . . . . . . . . . . . . . . . . . . . 7-88 MIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-186 MMUL . . . . . . . . . . . . . . . . . . . . . . . . . . 7-187 MOD . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-188 modifiers . . . . . . . . . . . . . . . . . . . . . . 2-4, 3-2 MOV . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-189 MUL. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-189 multi-media notes . . . . . . . . . . . . . . . . . . 7-10 NE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-190 NOISE . . . . . . . . . . . . . . . . . . . . . . . . . . 7-191 NOP . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-228 NRM . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-192
Index-1-12
one unique constant . . . . . . . . . . . . . . . . . 4-1 out-of-range addresses. . . . . . . . . . . . . . 7-10 packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 PIREDUCE . . . . . . . . . . . . . . . . . . . . . . 7-193 POW . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-194 POWER. . . . . . . . . . . . . . . . . . . . . . . . . 7-195 prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11 RCP. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-196 REFLECT . . . . . . . . . . . . . . . . . . . . . . . 7-197 RESINFO . . . . . . . . . . . . . . . . . . . . . . . . 7-89 restrictionsrsqshift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 simple 64-bit integer notes
SWITCH . . . . . . . . . . . . . . . . . . . . . . . . . 7-30 TAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-208 TEXLD . . . . . . . . . . . . . . . . . . . . . 7-103, A-1 TEXLDB . . . . . . . . . . . . . . . . . . . . 7-106, A-1 TEXLDD . . . . . . . . . . . . . . . . . . . . 7-110, A-1 TEXLDMS. . . . . . . . . . . . . . . . . . . . . . . 7-113 TEXWEIGHT . . . . . . . . . . . . . . . . . . . . 7-115 TRANSPOSE . . . . . . . . . . . . . . . . . . . . 7-209 U4LERP . . . . . . . . . . . . . . . . . . . . . . . . 7-225 U64GE . . . . . . . . . . . . . . . . . . . . . . . . . 7-132 U64LT . . . . . . . . . . . . . . . . . . . . . . . . . . 7-132 U64MAX . . . . . . . . . . . . . . . . . . . . . . . . 7-131 U64MIN. . . . . . . . . . . . . . . . . . . . . . . . . 7-131 U64SHR . . . . . . . . . . . . . . . . . . . . . . . . 7-133 UAV_ADD. . . . . . . . . . . . . . . . . . . . . . . 7-265 UAV_AND. . . . . . . . . . . . . . . . . . . . . . . 7-266 UAV_ARENA_LOAD . . . . . . . . . . . . . . 7-267 UAV_ARENA_STORE . . . . . . . . . . . . . 7-268 UAV_CMP . . . . . . . . . . . . . . . . . . . . . . 7-269 UAV_LOAD. . . . . . . . . . . . . . . . . . . . . . 7-270 UAV_MAX. . . . . . . . . . . . . . . . . . . . . . . 7-271 UAV_MIN . . . . . . . . . . . . . . . . . . . . . . . 7-272 UAV_OR . . . . . . . . . . . . . . . . . . . . . . . . 7-273 UAV_RAW_LOAD . . . . . . . . . . . . . . . . 7-274 UAV_RAW_STORE . . . . . . . . . . . . . . . 7-275 UAV_READ_ADD. . . . . . . . . . . . . . . . . 7-276 UAV_READ_AND. . . . . . . . . . . . . . . . . 7-277 UAV_READ_CMP_XCHG . . . . . . . . . . 7-278 UAV_READ_MAX. . . . . . . . . . . . . . . . . 7-279 UAV_READ_MIN . . . . . . . . . . . . . . . . . 7-280 UAV_READ_OR . . . . . . . . . . . . . . . . . . 7-281 UAV_READ_RSUB . . . . . . . . . . . . . . . 7-282 UAV_READ_SUB . . . . . . . . . . . . . . . . . 7-283 UAV_READ_UDEC . . . . . . . . . . . . . . . 7-284 UAV_READ_UINC . . . . . . . . . . . . . . . . 7-285 UAV_READ_UMAX . . . . . . . . . . . . . . . 7-286 UAV_READ_UMIN . . . . . . . . . . . . . . . . 7-287 UAV_READ_XCHG . . . . . . . . . . . . . . . 7-288 UAV_READ_XOR. . . . . . . . . . . . . . . . . 7-289 UAV_RSUB . . . . . . . . . . . . . . . . . . . . . 7-290 UAV_STORE . . . . . . . . . . . . . . . . . . . . 7-291 UAV_STRUCT_LOAD . . . . . . . . . . . . . 7-292 UAV_STRUCT_STORE . . . . . . . . . . . . 7-293 UAV_SUB . . . . . . . . . . . . . . . . . . . . . . . 7-294 UAV_UDEC . . . . . . . . . . . . . . . . . . . . . 7-295 UAV_UINC . . . . . . . . . . . . . . . . . . . . . . 7-296 UAV_UMAX . . . . . . . . . . . . . . . . . . . . . 7-297 UAV_UMIN . . . . . . . . . . . . . . . . . . . . . . 7-298 UAV_XOR. . . . . . . . . . . . . . . . . . . . . . . 7-299 UBIT_EXTRACT. . . . . . . . . . . . . . . . . . 7-141 UBIT_INSERT . . . . . . . . . . . . . . . . . . . 7-142 UBIT_REVERSE . . . . . . . . . . . . . . . . . 7-142 UDIV . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-132
npack0 . . . . . . . . . . . . . . . . . . . . . . . . 7-225 Unpack1 . . . . . . . . . . . . . . . . . . . . . . . . 7-226 Unpack2 . . . . . . . . . . . . . . . . . . . . . . . . 7-226 Unpack3 . . . . . . . . . . . . . . . . . . . . . . . . 7-227 unsigned integer comparison . . . . . . . . . . 7-1 USHR . . . . . . . . . . . . . . . . . . . . . . . . . . 7-138 UTOF . . . . . . . . . . . . . . . . . . . . . . . . . . 7-147 WHILELOOP . . . . . . . . . . . . . . . . . . . . . 7-31 instruction packet . . . . . . . . . . . . . . . . . . . . . 7-1 format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 instructions device validity . . . . . . . . . . . . . . . . . . . . . . D-1 instructions in multiple forms DirectX . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 Boolean . . . . . . . . . . . . . . . . . . . . . . . . 1-1 conditional . . . . . . . . . . . . . . . . . . . . . . 1-1 logical . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 unconditional . . . . . . . . . . . . . . . . . . . . 1-1 integer comparison instruction . . . . . . . . . . . . . . . 7-1 signed arithmetic . . . . . . . . . . . . . . . . . 7-2 unsigned arithmetic . . . . . . . . . . . . . . . 7-2 format . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15 instruction . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 64-bit . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 Intermediate Language (IL) definition . . . . . 1-1 INTERP register . . . . . . . . . . . . . 4-1, 5-11, 5-18, 5-28 register type . . . . . . . . . . . . . . . . . . . . . . 5-14 interpolated data . . . . . . . . . . . . . . . . . . . . . 5-14 interpolated fog data. . . . . . . . . . . . . . . . . . 5-10 interpolated primary color data. . . . . . . . . . 5-19 interpolated secondary color data . . . . . . . 5-21 interpolation mode . . . . . . . . . . . . . . . . . . . . 5-9 inter-work-item communication supports . . . 1-2 INV_MOV instruction . . . . . . . . . . . . . . . . 7-176 invert definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-9 IOR instruction . . . . . . . . . . . . . . . . . . . . . 7-129 is_uav . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 ISHL instruction . . . . . . . . . . . . . . . . . . . . 7-130
Index-1-13
ISHR instruction . . . . . . . . . . . . . . . . . . . . 7-130 ITOF instruction. . . . . . . . . . . . . . . . . . . . . 7-146 IXOR instruction . . . . . . . . . . . . . . . . . . . . 7-129 K kernel definition . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 pass to compiler . . . . . . . . . . . . . . . . . . . . 2-1 KILL instruction . . . . . . . . . . . . . . . . . . . . . . 7-80 L Language token . . . . . . . . . . . . . . . . . . 2-2, 4-1 LDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2, 1-3 access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 application code initializing . . . . . . . . . . . 7-11 binary atomic operations . . . . . . . . . . . . . 7-8 memory allocation. . . . . . . . . . . . . . . . . . . 7-7 memory operation notes . . . . . . . . . . . . . . 7-6 memory operations . . . . . . . . . . . . . . 7-6, 7-7 programming model . . . . . . . . . . . . . . . . . 7-7 shared memory . . . . . . . . . . . . . . . . . . . . . 1-2 LDS_ADD instruction . . . . . . . . . . . . . . . . 7-237 LDS_AND instruction . . . . . . . . . . . . . . . . 7-238 LDS_CMP instruction . . . . . . . . . . . . . . . . 7-238 LDS_DEC instruction . . . . . . . . . . . . . . . . 7-239 LDS_INC instruction . . . . . . . . . . . . . . . . . 7-239 LDS_LOAD instruction . . . . . . . . . . . . . . . 7-240 LDS_LOAD_BYTE instruction. . . . . . . . . . 7-240 LDS_LOAD_SHORT instruction . . . . . . . . 7-241 LDS_LOAD_UBYTE instruction . . . . . . . . 7-241 LDS_LOAD_USHORT instruction . . . . . . . 7-242 LDS_LOAD_VEC instruction. . . . . . . . . . . 7-243 LDS_MAX instruction . . . . . . . . . . . . . . . . 7-244 LDS_MIN instruction . . . . . . . . . . . . . . . . . 7-244 LDS_MSKOR instruction. . . . . . . . . . . . . . 7-245 LDS_OR instruction. . . . . . . . . . . . . . . . . . 7-245 LDS_READ_ADD instruction . . . . . . . . . . 7-246 LDS_READ_AND instruction . . . . . . . . . . 7-247 LDS_READ_CMP_XCHG instruction . . . . 7-248 LDS_READ_MAX instruction . . . . . . . . . . 7-249 LDS_READ_MIN instruction . . . . . . . . . . . 7-250 LDS_READ_OR instruction. . . . . . . . . . . . 7-251 LDS_READ_RSUB instruction . . . . . . . . . 7-252 LDS_READ_SUB instruction. . . . . . . . . . . 7-253 LDS_READ_UMAX instruction . . . . . . . . . 7-254 LDS_READ_UMIN instruction. . . . . . . . . . 7-255 LDS_READ_VEC instruction. . . . . . . . . . . . 7-81 LDS_READ_XCHG instruction . . . . . . . . . 7-256 LDS_READ_XOR instruction . . . . . . . . . . 7-257 LDS_RSUB instruction . . . . . . . . . . . . . . . 7-257 LDS_STORE instruction . . . . . . . . . . . . . . 7-258 LDS_STORE_BYTE instruction . . . . . . . . 7-258
Index-1-14
LDS_STORE_SHORT instruction . . . . . . . 7-259 LDS_STORE_VEC instruction . . . . . . . . . 7-260 LDS_SUB instruction . . . . . . . . . . . . . . . . 7-261 LDS_UMAX instruction . . . . . . . . . . . . . . . 7-261 LDS_UMIN instruction. . . . . . . . . . . . . . . . 7-262 LDS_WRITE_VEC instruction . . . . . . . . . . . 7-82 LDS_XOR instruction . . . . . . . . . . . . . . . . 7-262 LEN instruction . . . . . . . . . . . . . . . . . . . . . 7-176 line-aa texture coordinate . . . . . . . . . . . . . . 5-20 link restrictions vertex and pixel shader . . . . . . . . . . . . . . 4-2 LIT instruction . . . . . . . . . . . . . . . . . . . . . . 7-177 LITERAL register type. . . . . . . . . . . . . . . . . 5-15 LN instruction . . . . . . . . . . . . . . . . . . . . . . 7-178 LOAD instruction . . . . . . . . . . . . . . . . . . . . . 7-83 LOAD_FPTR instruction . . . . . . . . . . . . . . . 7-85 local data share (LDS) . . . . . . . . . . . . . 1-2, 1-3 local memory . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 LOD instruction . . . . . . . . . . . . . . . . . . . . . . 7-86 LOG instruction . . . . . . . . . . . . . . . . . . . . . 7-179 LOG_VEC instruction . . . . . . . . . . . . . . . . 7-180 LOGP instruction . . . . . . . . . . . . . . . . . . . . 7-181 loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 LOOP instruction . . . . . . . . . . . . . . . . . . . . . 7-28 loop relative address 4-1, 5-15, 5-18, 5-24, 5-28 LOOP-ENDLOOP nesting . . . . . . . . . . . . . . . 4-2 LRP instruction . . . . . . . . . . . . . . . . . . . . . 7-182 LT instruction. . . . . . . . . . . . . . . . . . . . . . . 7-183 M macro definition . . . . . . . . . . . . . . 5-13, 5-17, 7-335 IL_REG_TYPES . . . . . . . . . . . . . . . . . . 7-335 input registers . . . . . . . . . . . . . . . . . . . . . 5-13 output registers . . . . . . . . . . . . . . . . . . . . 5-17 processor language . . . . . . . . . . . . . . . 7-335 MACRODEF instruction . . . . . . . . . . . . . . 7-336 MACROEND instruction . . . . . . . . . . . . . . 7-337 MAD instruction . . . . . . . . . . . . . . . . 7-11, 7-184 major_version . . . . . . . . . . . . . . . . . . . . . . . . 2-2 mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11 coverage . . . . . . . . . . . . . . . . . . . . . . . . . 5-16 input coverage . . . . . . . . . . . . . . . . . . . . 5-13 write. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 MAX instruction . . . . . . . . . . . . . . . . . . . . . 7-185 MCALL instruction . . . . . . . . . . . . . . . . . . . 7-338 MEMEXPORT instruction . . . . . . . . . . . . . . 7-87 MEMIMPORT instruction . . . . . . . . . . . . . . 7-88 memory barriers . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 read/write . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 UAV. . . . . . . . . . . . . . . . . . . . . . . . . 7-7, 7-11
memory access owner-computes . . . . . . . . . . . . . . . . . . . . 7-6 read/write . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 memory access model IL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 general read write . . . . . . . . . . . . . . . . 1-2 owner-computes . . . . . . . . . . . . . . . . . . 1-2 memory allocation. . . . . . . . . . . . . . . . . . . . . 7-7 owner-compute . . . . . . . . . . . . . . . . . . . . . 7-7 random access . . . . . . . . . . . . . . . . . . . . . 7-7 MIN instruction . . . . . . . . . . . . . . . . . . . . . 7-186 minor_version . . . . . . . . . . . . . . . . . . . . . . . . 2-2 MMUL instruction . . . . . . . . . . . . . . . . . . . 7-187 MOD instruction . . . . . . . . . . . . . . . . . . . . 7-188 modifier abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 instruction . . . . . . . . . . . . . . . . . . . . . 2-4, 3-2 register relative . . . . . . . . . . . . . . . . . . . . . 2-3 relationship with types . . . . . . . . . . . . . . 2-10 modifier_present definition . . . . . . . . . . . . . . . . . . . . . . 2-3, 2-6 field . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7, 7-1 IL_Dst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 IL_Src . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 modify destination. . . . . . . . . . . . . . . . . . . . . 3-2 MOV instruction . . . . . . . . . . . . . . . . . . . . 7-189 MUL instruction. . . . . . . . . . . . . . . . . . . . . 7-189 multi-media instruction notes . . . . . . . . . . . 7-10 multipass . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 shader. . . . . . . . . . . . . . . . . . . . . . . . 3-2, 4-2 vertex shader . . . . . . . . . . . . . . . . . . . . . . 4-2 N NE instruction . . . . . . . . . . . . . . . . . . . . . . 7-190 negate modifier . . . . . . . . . . . . . . . . . . . . . . . 2-7 negate_w_a definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-9 negate_x_r definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-8 negate_y_g definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-8 negate_z_b definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-9 NOISE instruction . . . . . . . . . . . . . . . . . . . 7-191 non-HOS rendering. . . . . . 5-8, 5-12, 5-20, 5-21 NOP instruction. . . . . . . . . . . . . . . . . . . . . 7-228 notes arithmetic instruction. . . . . . . . . . . . . . . . . 7-5 bit operation . . . . . . . . . . . . . . . . . . . . . . . 7-6
comparison instruction . . . . . . . . . . . . . . . 7-1 conversion instruction. . . . . . . . . . . . . . . . 7-5 double precision instruction . . . . . . . . . . . 7-5 Evergreen memory controls . . . . . . . . . . 7-10 flow control instruction . . . . . . . . . . . . . . . 7-2 GDS memory operations . . . . . . . . . . . . . 7-8 input/output instruction . . . . . . . . . . . . . . . 7-3 LDS memory operations . . . . . . . . . . . . . . . 7-6 multi-media instruction . . . . . . . . . . . . . . 7-10 shift instruction . . . . . . . . . . . . . . . . . . . . . 7-6 simple 64-bit integer instruction . . . . . . . . 7-6 UAV memory operations . . . . . . . . . . . . . 7-9 NRM instruction . . . . . . . . . . . . . . . . . . . . 7-192 O o prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 object declaration . . . . . . . . . . . . . . . . . . . . . 2-1 OBJECT_INDEX register type . . . . . . . . . . 5-16 OCP_ID register type . . . . . . . . . . . . . . . . . 5-16 OMASK register type . . . . . . . . . . . . . . . . . . . . . . 5-16 opcode IL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 token . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 order . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 open design for IL . . . . . . . . . . . . . . . . . . . . 1-1 operand destination . . . . . . . . . . . . . . . . . . . 2-4 OUTPUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 register type . . . . . . . . . . . . . . . . . . . . . . 5-17 Output Merger. . . . . . . . . . . . . . . . . . . . . . . 5-14 OUTPUT_ARG register type . . . . . . . . . . . 5-17 OUTPUTCP register type . . . . . . . . . . . . . . 5-17 owner-compute . . . . . . . . . . . . . . . . . . . . . . . 7-7 memory access . . . . . . . . . . . . . . . . . . . . 7-6 memory access model . . . . . . . . . . . . . . . 1-2 memory allocation . . . . . . . . . . . . . . . . . . 7-7 P packet instruction . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 PATCH_CONST register type . . . . . . . . . . 5-17 PCOLOR register . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 register type . . . . . . . . . . . . . . . . . . . . . . 5-18 PERSIST register type . . . . . . . . . . . . . . . . 5-18 persistent memory addressing . . . . . . . . . . . . . . . . . . . . . . . 5-18 persistent registers . . . . . . . . . . . . . . . . . . . . 1-3 perspective correct interpolation . . . 5-11, 5-19 PINPUT register 4-1, 4-2, 5-11, 5-15, 5-19, 5-22, 5-24
Index-1-15
register type . . . . . . . . . . . . . . . . . . . . . . 5-18 PIREDUCE instruction . . . . . . . . . . . . . . . 7-193 pixel shader . 3-2, 4-1, 4-2, 4-3, 5-13, 7-8, 7-11 export . . . . . . . . . . . . . . . . . . . 5-8, 5-9, 5-18 import 5-10, 5-14, 5-16, 5-19, 5-21, 5-24, 5-29 input . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23 primitive type . . . . . . . . . . . . . . . . . . . 5-20 input data . . . . . . . . . . . . . . . . . . . . . . . . 5-18 link restrictions . . . . . . . . . . . . . . . . . . . . . 4-2 output fog factor. . . . . . . . . . . . . . . . . . . . . . . 5-21 real-time shader . . . . . . . . . . . . . . . . . . . . 4-3 POINT shadow filter . . . . . . . . . . . . . . . . . . A-1 point size . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23 point-aa texture coordinate . . . . . . . . . . . . . 5-20 POS register . . . . . . . . . . . . . . . . . . . . . . 4-1, 5-28 register type . . . . . . . . . . . . . . . . . . . . . . 5-19 POW instruction . . . . . . . . . . . . . . . . . . . . 7-194 POWER instruction . . . . . . . . . . . . . . . . . . 7-195 PRECISE tag . . . . . . . . . . . . . . . . . . 7-11, 7-12 prefix instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-11 o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 pri_modifier_present definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 IL_Opcode. . . . . . . . . . . . . . . . . . . . . . . . . 2-3 PRICOLOR register . . . . . . . . . . . . . . . . 5-11, 5-18, 5-28 register type . . . . . . . . . . . . . . . . . . . . . . 5-19 PRIMCOORD register type. . . . . . . . . . . . . 5-20 primitive facing . . . . . . . . . . . . . . . . . . . . . . 5-10 PRIMITIVE_INDEX register type . . . . . . . . 5-20 pri-modifier_present bit . . . . . . . . . . . . . . . . . 7-3 PRIMTYPE register type. . . . . . . . . . . . . . . 5-20 programming model Evergreen . . . . . . . . . . . . . . . . . . . . . . . . 7-10 GDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8 LDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 pseudo code . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 PSOUTFOG register type . . . . . . . . . . . . . . 5-21 Q QUAD_INDEX register type . . . . . . . . . . . . 5-21 R random access memory allocation . . . . . . . . 7-7 rasterizer depth . . . . . . . . . . . . . . . . . . . . . . . 5-9 raw UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 RCP instruction . . . . . . . . . . . . . . . . . . . . . 7-196 read/write memory access . . . . . . . . . . . . . . 7-7 realtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Index-1-16
shader . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 real-time pixel shader . . . . . . . . . . . . . . 4-2, 4-3 exceptions . . . . . . . . . . . . . . . . . . . . . 4-2, 4-3 REFACTORING_ALLOWED. . . . . . . . . . . . 7-12 REFLECT instruction . . . . . . . . . . . . . . . . 7-197 register common . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 declare resources . . . . . . . . . . . . . . . . . . . 2-1 DEPTH . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 DX10 compute shader (CS) mapping . . . . . . . . . . . . . . . . . . . . . . . . 5-3 DX11 compute shader (CS) mapping . . . . . . . . . . . . . . . . . . . . . . . . 5-3 DX9 (2.0) mapping to IL . . . . . . . . . . . . . . . . . . . . 5-3 DX9 (3.0) mapping to IL . . . . . . . . . . . . . . . . . . . . 5-2 FOG . . . . . . . . . . . . . . . . . . 5-11, 5-18, 5-28 INTERP . . . . . . . . . . . . 4-1, 5-11, 5-18, 5-28 macro input . . . . . . . . . . . . . . . . . . . . . . . 5-13 macro output . . . . . . . . . . . . . . . . . . . . . . 5-17 mapping DX10 to IL . . . . . . . . . . . . . . . . . . . . . . 5-1 overview IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 PCOLOR . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 PINPUT 4-1, 4-2, 5-11, 5-15, 5-19, 5-22, 5-24 POS. . . . . . . . . . . . . . . . . . . . . . . . . 4-1, 5-28 PRICOLOR . . . . . . . . . . . . . 5-11, 5-18, 5-28 relative indexing . . . . . . . . . . . . . . . . . . . . 2-6 relative modifier. . . . . . . . . . . . . . . . . . . . . 2-3 restrictions. . . . . . . . . . . . . . . . . . . . . 5-4, C-1 scalar. . . . . . . . . . . . . . . . . . . . . . . 5-16, 5-23 SECCOLOR . . . . . . . . . . . . 5-11, 5-18, 5-28 SM30. . . . . . . . . . . . . . . . . . . . . . . . . . . . C-5 SM40. . . . . . . . . . . . . . . . . . . . . . . . . . . . C-5 SPRITE . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 TEXCOORD . . . . . . . . 4-1, 5-11, 5-18, 5-28 valid types . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 VERTEX . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 VOUTPUT . 4-1, 4-2, 5-11, 5-14, 5-19, 5-22, 5-23, 5-24 register type
vWINCOORD . . . . . . . . . . . . . . . . . . . . . 5-29 register_num definition . . . . . . . . . . . . . . . . . . . . . . 2-3, 2-6 IL_Dst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 IL_Src . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 register_type definition . . . . . . . . . . . . . . . . . . . . . . 2-3, 2-6 IL_Dst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
IL_Src . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 relative_address definition . . . . . . . . . . . . . . . . . . . . . . 2-4, 2-6 IL_Dst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4 IL_Src . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 RenderTarget . . . . . . . . . . . . . . . . . . . . . . . 5-14 RESINFO instruction . . . . . . . . . . . . . . . . . 7-89 resource-index argument . . . . . . . . . . . . . . . 7-3 restriction IL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 register . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 RET instruction . . . . . . . . . . . . . . . . . . . . . . 7-29 RET_DYN instruction . . . . . . . . . . . . . . . . . 7-29 RET_LOGICALNZ instruction. . . . . . . . . . . 7-30 RET_LOGICALZ instruction . . . . . . . . . . . . 7-30 RND instruction. . . . . . . . . . . . . . . . . . . . . 7-198 ROUND_NEAREST instruction . . . . . . . . 7-198 ROUND_NEG_INF instruction . . . . . . . . . 7-199 ROUND_PLUS_INF instruction . . . . . . . . 7-199 ROUND_ZERO instruction . . . . . . . . . . . . 7-200 RSQ instruction. . . . . . . . . . . . . . . . . . . . . 7-201 rsq instruction . . . . . . . . . . . . . . . . . . . . . . . . 1-1 rsq_vec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 RSQ_VEC instruction . . . . . . . . . . . . . . . . 7-202 S SAD instruction . . . . . . . . . . . . . . . . . . . . . 7-223 SAD_HI instruction . . . . . . . . . . . . . . . . . . 7-224 SAD4 instruction . . . . . . . . . . . . . . . . . . . . 7-224 SAMPLE instruction . . . . . . . . . . . . . . . . . . 7-90 SAMPLE_B instruction . . . . . . . . . . . . . . . . 7-92 SAMPLE_C instruction . . . . . . . . . . . . . . . . 7-93 SAMPLE_C_B instruction. . . . . . . . . . . . . . 7-94 SAMPLE_C_G instruction . . . . . . . . . . . . . 7-95 SAMPLE_C_L instruction . . . . . . . . . . . . . . 7-96 SAMPLE_C_LZ instruction . . . . . . . . . . . . . 7-97 SAMPLE_G instruction . . . . . . . . . . . . . . . . 7-98 SAMPLE_L instruction . . . . . . . . . . . . . . . . 7-99 SAMPLEINFO instruction . . . . . . . . . . . . . 7-100 SampleMask Rasterizer state . . . . . . . . . . 5-14 SAMPLEPOS instruction . . . . . . . . . . . . . 7-101 sampler declare resources . . . . . . . . . . . . . . 2-1 sampler-index argument . . . . . . . . . . . . . . . . 7-3 scalar register . . . . . . . . . . . . . . . . . . 5-16, 5-23 SCATTER instruction . . . . . . . . . . . . . . . . 7-102 sec_modifier_present definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 IL_Opcode . . . . . . . . . . . . . . . . . . . . . . . . 2-3 SECCOLOR register . . . . . . . . . . . . . . . . 5-11, 5-18, 5-28 register type . . . . . . . . . . . . . . . . . . . . . . 5-21 secondary color data interpolated . . . . . . . . . . . . . . . . . . . . . . . 5-21
Index-1-17
SET instruction . . . . . . . . . . . . . . . . . . . . . 7-202 SGN instruction . . . . . . . . . . . . . . . . . . . . . 7-203 shader compute 5-7, 5-25, 5-26, 5-27, 7-7, 7-8, 7-11 domain . . . . . . . . . . . . . . . . . 5-9, 5-14, 5-17 geometry . . . . . . . . . . . . . . . . . . . . . . . . . 5-22 hull. . . . . . . . . . . . . . . . . . . . . . . . . 5-14, 5-22 IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 types used in any . . . . . . . . . . . . . . . . . 2-1 model guidelines . . . . . . . . . . . . . . . . . . . . . . C-6 SM40. . . . . . . . . . . . . . . . . . . . . . . . . . C-1 multipass . . . . . . . . . . . . . . . . . . . . . . 3-2, 4-2 vertex . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 pixel . . . . . 3-2, 4-1, 4-2, 4-3, 5-13, 7-8, 7-11 export . . . . . . . . . . . . . . . . . 5-8, 5-9, 5-18 fog factor. . . . . . . . . . . . . . . . . . . . . . . 5-21 import 5-10, 5-14, 5-16, 5-19, 5-21, 5-24, 5-29 input . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23 input data . . . . . . . . . . . . . . . . . . . . . . 5-18 primitive type . . . . . . . . . . . . . . . . . . . 5-20 real-time . . . . . . . . . . . . . . . . . . . . . . 3-2, 4-3 pixel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 requirements . . . . . . . . . . . . . . . . . . . . . . . 4-1 type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 vertex . . . . . . . . . . . . . . . . 3-2, 4-1, 4-2, 4-3 data . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 export . 5-10, 5-14, 5-19, 5-21, 5-23, 5-24 import . . . . . . . . . . . . . . . . 5-8, 5-12, 5-16 shader model restrictions . . . . . . . . . . . . . . . . . . . . . . . . C-1 shader operations description . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 SHADER_INSTANCE_ID register type . . . 5-22 shader_type. . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 shading flat . . . . . . . . . . . . . . . . . . . . . . . . . 5-19, 5-21 smooth . . . . . . . . . . . . . . . . . . . . . 5-19, 5-22 shadow filter BEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 POINT . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 shadow texture fetch. . . . . . . . . . . . . . . . . . A-1 shared memory LDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 shared register . . . . . . . . . . . . . . . . . . . . . . . 1-3 SHARED_TEMP register type . . . . . . . . . . 5-22 Sharing-Mode . . . . . . . . . . . . . . . . . . . . . . . . 7-7 shift instruction . . . . . . . . . . . . . . . . . . . . . . . 7-6 notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6 shift_scale definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Index-1-18
IL_Dst_Mod. . . . . . . . . . . . . . . . . . . . . . . . 2-5 sign definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod. . . . . . . . . . . . . . . . . . . . . . . . 2-9 simple 64-bit integer instruction notes . . . . . 7-6 SIN instruction. . . . . . . . . . . . . . . . . . . . . . 7-204 SIN_VEC instruction . . . . . . . . . . . . . . . . . 7-205 SINCOS instruction . . . . . . . . . . . . . . . . . . 7-205 SM30 instructions . . . . . . . . . . . . . . . . . . . . . . . C-5 registers . . . . . . . . . . . . . . . . . . . . . . . . . C-5 SM40 instructions . . . . . . . . . . . . . . . . . . . . . . . C-2 preferred shader model . . . . . . . . . . . . . C-1 registers . . . . . . . . . . . . . . . . . . . . . . . . . C-5 source index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 information order . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 source modifier _abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _bx2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _divcomp. . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _invert . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _neg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 _x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 swizzle. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 source modifier token conjunction with IL_Rel_Addr token . . . . . 2-7 precedes the IL_Rel_Addr token . . . . . . . 2-7 source operand IL_Src . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 source operand modification IL_Src_Mod. . . . . . . . . . . . . . . . . . . . . . . . 2-8 source token . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 example. . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 first component . . . . . . . . . . . . . . . . . . . . . 2-5 fourth component . . . . . . . . . . . . . . . . . . . 2-5 second component . . . . . . . . . . . . . . . . . . 2-5 third component . . . . . . . . . . . . . . . . . . . . 2-5 specifier binary control . . . . . . . . . . . . . . . . . . . . . . 3-2 control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 SPRITE register . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 register type . . . . . . . . . . . . . . . . . . . . . . 5-23 sprite texture coordinate . . . . . . . . . . . . . . . 5-23 SpriteCoord . . . . . . . . . . . . . . . . . . . . . . . . . 5-20 SPRITECOORD register type. . . . . . . . . . . 5-23 SQRT instruction . . . . . . . . . . . . . . . . . . . . 7-206 SQRT_VEC instruction . . . . . . . . . . . . . . . 7-206
SR globally shared register . . . . . . . . . . . . . 1-3 SRV_RAW_LOAD instruction . . . . . . . . . . 7-263 SRV_STRUCT_LOAD instruction. . . . . . . 7-264 statement dcl_literal. . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 used in any shader . . . . . . . . . . . . . . . . . 2-1 STENCIL register type . . . . . . . . . . . . . . . . 5-23 stencil value . . . . . . . . . . . . . . . . . . . . . . . . 5-24 stream IL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 binary . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 structured UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 SUB instruction . . . . . . . . . . . . . . . . . . . . . 7-207 subroutine . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 SWITCH instruction . . . . . . . . . . . . . . . . . . 7-30 swizzle. . . . . . . . . . . . . . . . . . . . . . . . . 2-7, 5-11 definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 source modifier . . . . . . . . . . . . . . . . . . . . . 3-3 swizzle_w_a definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-9 swizzle_x_r definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-8 swizzle_y_g definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-8 swizzle_z_b definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-9 synchronization . . . . . . . . . . . . . . . . . . . . . . . 1-3 syntax IL Text . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 T tag PRECISE . . . . . . . . . . . . . . . . . . . 7-11, 7-12 TAN instruction . . . . . . . . . . . . . . . . . . . . . 7-208 TEMP register type . . . . . . . . . . . . . . . . . 5-15, 5-24 tessellation engine . . . . . . . . . 5-13, 5-20, 5-21 tex_coord_type . . . . . . . . . . . . . . . . . . . . . . . 7-4 TEXCOORD register . . . . . . . . . . . . . . . . 5-11, 5-18, 5-28 register type . . . . . . . . . . . . . . . . . . . . . . 5-24 TEXCOORD register . . . . . . . . . . . . . . . . . . 4-1 TEXLD instruction. . . . . . . . . . . . . . . 7-103, A-1 TEXLDB instruction . . . . . . . . . . . . . 7-106, A-1 TEXLDD instruction . . . . . . . . . . . . . 7-110, A-1 TEXLDMS instruction . . . . . . . . . . . . . . . . 7-113 texture cache . . . . . . . . . . . . . . . . . . . . . . . . 1-3 texture coordinate
data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24 line-aa . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20 point-aa. . . . . . . . . . . . . . . . . . . . . . . . . . 5-20 sprite . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23 TEXWEIGHT instruction . . . . . . . . . . . . . . 7-115 THIS register type . . . . . . . . . . . . . . . . . . . 5-25 THREAD_GROUP_ID register type . . . . . . 5-25 THREAD_GROUP_ID_FLATTENED register type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26 THREAD_ID_IN_GROUP register type . . . 5-26 THREAD_ID_IN_GROUP_FLATTENED register type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27 timer cycle value . . . . . . . . . . . . . . . . . . . . . . . 5-27 TIMER register type . . . . . . . . . . . . . . . . . . 5-27 token 32-bit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 description IL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 destination . . . . . . . . . . . . . . . . . . . . . . . . 2-3 modifier. . . . . . . . . . . . . . . . . . . . . . . . . 2-4 generic . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 IL_Dst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 IL_Lang. . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 IL_Opcode . . . 2-1, 2-3, 2-5, 7-1, 7-14, 7-17, 7-57, 7-61, 7-63, 7-86, 7-104, 7-105, 7-108, 7-111, 7-112, 7-113, 7-114, 7-115 IL_Src . . . . . . . . . . . . . . . . . . . . . . . . 2-7, 7-1 IL_Version. . . . . . . . . . . . . . . . . . . . . . . . . 2-2 Language . . . . . . . . . . . . . . . . . . . . . 2-2, 4-1 opcode . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 destination information . . . . . . . . . . . . . 7-1 opcode . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 source information . . . . . . . . . . . . . . . . 7-1 source . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 example . . . . . . . . . . . . . . . . . . . . . . . 2-10 modifier. . . . . . . . . . . . . . . . . . . . . . . . . 2-7 Version . . . . . . . . . . . . . . . . . . . . . . . 2-2, 4-1 translator IL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 TRANSPOSE instruction . . . . . . . . . . . . . 7-209 typed UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 types relationship with modifiers . . . . . . . . . . . 2-10 used in any shader . . . . . . . . . . . . . . . . . 2-1 U U4LERP instruction . . . . . . . . . . . . . . . . . U64GE instruction. . . . . . . . . . . . . . . . . . . U64LT instruction . . . . . . . . . . . . . . . . . . . U64MAX instruction . . . . . . . . . . . . . . . . . 7-225 7-132 7-132 7-131
Index-1-19
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
U64MIN instruction . . . . . . . . . . . . . . . . . . 7-131 U64SHR instruction. . . . . . . . . . . . . . . . . . 7-133 UAV arena . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10 atomic operations . . . . . . . . . . . . . . . . . . . 7-9 dimension type . . . . . . . . . . . . . . . . 7-9, 7-10 raw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 structured . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 typed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9 UAV memory . . . . . . . . . . . . . . . . . . . . 7-7, 7-11 operations notes . . . . . . . . . . . . . . . . . . . . 7-9 UAV_ADD instruction . . . . . . . . . . . . . . . . 7-265 UAV_AND instruction . . . . . . . . . . . . . . . . 7-266 UAV_ARENA_LOAD instruction . . . . . . . . 7-267 UAV_ARENA_STORE instruction. . . . . . . 7-268 UAV_CMP instruction . . . . . . . . . . . . . . . . 7-269 UAV_LOAD instruction . . . . . . . . . . . . . . . 7-270 UAV_MAX instruction . . . . . . . . . . . . . . . . 7-271 UAV_MIN instruction . . . . . . . . . . . . . . . . . 7-272 UAV_OR instruction . . . . . . . . . . . . . . . . . 7-273 UAV_RAW_LOAD instruction . . . . . . . . . . 7-274 UAV_RAW_STORE instruction. . . . . . . . . 7-275 UAV_READ_ADD instruction . . . . . . . . . . 7-276 UAV_READ_AND instruction . . . . . . . . . . 7-277 UAV_READ_CMP_XCHG instruction . . . . 7-278 UAV_READ_MAX instruction . . . . . . . . . . 7-279 UAV_READ_MIN instruction . . . . . . . . . . . 7-280 UAV_READ_OR instruction . . . . . . . . . . . 7-281 UAV_READ_RSUB instruction . . . . . . . . . 7-282 UAV_READ_SUB instruction . . . . . . . . . . 7-283 UAV_READ_UDEC instruction . . . . . . . . . 7-284 UAV_READ_UINC instruction. . . . . . . . . . 7-285 UAV_READ_UMAX instruction . . . . . . . . . 7-286 UAV_READ_UMIN instruction . . . . . . . . . 7-287 UAV_READ_XCHG instruction . . . . . . . . . 7-288 UAV_READ_XOR instruction . . . . . . . . . . 7-289 UAV_RSUB instruction . . . . . . . . . . . . . . . 7-290 UAV_STORE instruction . . . . . . . . . . . . . . 7-291 UAV_STRUCT_LOAD instruction . . . . . . . 7-292 UAV_STRUCT_STORE instruction. . . . . . 7-293 UAV_SUB instruction . . . . . . . . . . . . . . . . 7-294 UAV_UDEC instruction . . . . . . . . . . . . . . . 7-295 UAV_UINC instruction. . . . . . . . . . . . . . . . 7-296 UAV_UMAX instruction . . . . . . . . . . . . . . . 7-297 UAV_UMIN instruction . . . . . . . . . . . . . . . 7-298 UAV_XOR instruction . . . . . . . . . . . . . . . . 7-299 UBIT_EXTRACT instruction . . . . . . . . . . . 7-141 UBIT_INSERT instruction . . . . . . . . . . . . . 7-142 UBIT_REVERSE instruction . . . . . . . . . . . 7-142 UDIV instruction . . . . . . . . . . . . . . . . . . . . 7-132 UGE instruction . . . . . . . . . . . . . . . . . . . . . 7-133 ULT instruction . . . . . . . . . . . . . . . . . . . . . 7-133 UMAD instruction . . . . . . . . . . . . . . . . . . . 7-134
Index-1-20
UMAD24 instruction . . . . . . . . . . . . . . . . . 7-134 UMAX instruction . . . . . . . . . . . . . . . . . . . 7-135 UMIN instruction . . . . . . . . . . . . . . . . . . . . 7-135 UMOD instruction . . . . . . . . . . . . . . . . . . . 7-136 UMUL instruction. . . . . . . . . . . . . . . . . . . . 7-136 UMUL_HIGH instruction . . . . . . . . . . . . . . 7-137 UMUL24 instruction. . . . . . . . . . . . . . . . . . 7-137 UMUL24_HIGH instruction . . . . . . . . . . . . 7-138 Unpack0 instruction. . . . . . . . . . . . . . . . . . 7-225 Unpack1 instruction. . . . . . . . . . . . . . . . . . 7-226 Unpack2 instruction. . . . . . . . . . . . . . . . . . 7-226 Unpack3 instruction. . . . . . . . . . . . . . . . . . 7-227 unsigned integer comparison instruction . . . 7-1 USHR instruction. . . . . . . . . . . . . . . . . . . . 7-138 UTOF instruction . . . . . . . . . . . . . . . . . . . . 7-147 V v prefix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 vector component written . . . . . . . . . . . . . . . 3-3 Version token . . . . . . . . . . . . . . . . . . . . 2-2, 4-1 VERTEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 register . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 register type . . . . . . . . . . . . . . . . . . . . . . 5-28 vertex shader . . . . . . . . . . . . 3-2, 4-1, 4-2, 4-3 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 export . . . 5-10, 5-14, 5-19, 5-21, 5-23, 5-24 import . . . . . . . . . . . . . . . . . . 5-8, 5-12, 5-16 link restrictions . . . . . . . . . . . . . . . . . . . . . 4-2 virtual function/interface . . . . . . . . . . . . . . . 5-25 VOUTPUT register 4-1, 4-2, 5-11, 5-14, 5-19, 5-22, 5-24 register type . . . . . . . . . . . . . . . . . . . . . . 5-28 VOUTPUT register . . . . . . . . . . . . . . . . . . . 5-23 VPRIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4 register type . . . . . . . . . . . . . . . . . . . . . . 5-29 vTid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 vWINCOORD register type . . . . . . . . . . . . . 5-29 W wavefront . . . . . . . . . . . . . . . . . . . . . . . . 1-2, 7-7 access mode. . . . . . . . . . . . . . . . . . . . . . . 1-2 sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 switching . . . . . . . . . . . . . . . . . . . . . . . . . 5-28 WEIGHTED_QUAD filter. . . . . . . . . . . . . . . A-1 WHILELOOP instruction . . . . . . . . . . . . . . . 7-31 work-group. . . . . . . . . . . . . 1-2, 5-27, 7-7, 7-10 addresses . . . . . . . . . . . . . . . . . . . . . . . . . 7-7 ID. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25 flattened . . . . . . . . . . . . . . . . . . . . . . . 5-26 work-item . . . 1-2, 1-3, 5-28, 7-6, 7-7, 7-9, 7-10 absolute ID . . . . . . . . . . . . . . . . . . . . . . . . 5-7 flattened absolute ID. . . . . . . . . . . . . . . . . 5-7
ID flattened . . . . . . . . . . . . . . . . . . . . . . . 5-27 write mask. . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 component-wise . . . . . . . . . . . . . . . . . . . . 3-3 written vector component . . . . . . . . . . . . . . . 3-3 X x2 definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9 IL_Src_Mod . . . . . . . . . . . . . . . . . . . . . . . 2-9 Z zero force component. . . . . . . . . . . . . . . . . . 3-3
Index-1-21
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.
Index-1-22
Copyright 2011 by Advanced Micro Devices, Inc. All rights reserved.