5.306. Transpose¶
5.306.1. cnnlCreateTransposeDescriptor¶
-
cnnlStatus_t
cnnlCreateTransposeDescriptor
(cnnlTransposeDescriptor_t *desc)¶ Creates a descriptor pointed by
desc
for a transpose operation, and allocated memory for holding the information about the transpose operation.The information is defined in cnnlTransposeDescriptor_t. For more information about descriptor, see "Cambricon CNNL user Guide".
- Parameters
[out] desc
: Output. A host pointer to the transpose descriptor that holds information about the transpose operation.
- Return
- API Dependency
After calling this function, you can call the cnnlSetTransposeDescriptor function to initialize and set information to the transpose descriptor.
You need to call the cnnlDestroyTransposeDescriptor function to destroy the descriptor.
- Note
None.
- Requirements
None.
- Example
None.
5.306.2. cnnlDestroyTransposeDescriptor¶
-
cnnlStatus_t
cnnlDestroyTransposeDescriptor
(cnnlTransposeDescriptor_t desc)¶ Destroys a transpose descriptor
desc
that is previously created with the cnnlCreateTensorDescriptor function.The transpose descriptor is defined in cnnlTransposeDescriptor_t and holds the information about the transpose operation.
- Parameters
[in] desc
: Input. The transpose descriptor to be destroyed. For detailed information, see cnnlTransposeDescriptor_t.
- Return
- Note
None.
- Requirements
None.
- Example
None.
5.306.3. cnnlGetTransposeWorkspaceSize¶
-
cnnlStatus_t
cnnlGetTransposeWorkspaceSize
(cnnlHandle_t handle, const cnnlTensorDescriptor_t x_desc, const cnnlTransposeDescriptor_t desc, size_t *size)¶ Returns in
size
the size of the MLU memory that is used as an extra workspace to optimize the transpose operation.The size of extra workspace is based on the given information of the transpose operation, including the input tensor descriptor
x_desc
and transpose descriptordesc
. For more information about the workspace, see "Cambricon CNNL User Guide".- Parameters
[in] handle
: Input. Handle to a Cambricon CNNL context that is used to manage MLU devices and queues in the transpose operation. For detailed information, see cnnlHandle_t.[in] x_desc
: Input. The descriptor of the input tensor. For detailed information, see cnnlTensorDescriptor_t.[out] desc
: Input. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.[out] size
: Output. A host pointer to the returned size of the extra workspace in bytes that is used in the transpose operation.
- Return
- API Dependency
This function must be called after the cnnlCreateTensorDescriptor and cnnlSetTensorDescriptor functions to create and set the tensor descriptors
x_desc
.The allocated extra workspace should be passed to the cnnlTranspose_v2 function to perform the transpose operation.
- Note
None.
- Requirements
None.
- Example
None.
5.306.4. cnnlSetTransposeDescriptor¶
-
cnnlStatus_t
cnnlSetTransposeDescriptor
(cnnlTransposeDescriptor_t desc, const int dims, const int permute[])¶ Initializes the transpose descriptor
desc
that is previously created with the cnnlCreateTransposeDescriptor function, and set the information about the transpose operation to the transpose descriptordesc
. The information includes the permute dimensionsdims
and permute rulespermute
.- Parameters
[inout] desc
: Input/output. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.[in] dims
: Input. The number of dimensions in the permute tensor of the transpose operation. Currently, the value of this parameter should be less than or equal to 8.[in] permute
: Input. The order of transpose. Currently, for each dimension, the value of permute should be in range of [0,...,dims -1], and should not be the same in each dimension.
- Return
- Note
None.
- Requirements
None.
- Example
None.
5.306.5. cnnlTranspose¶
-
cnnlStatus_t
cnnlTranspose
(cnnlHandle_t handle, const cnnlTransposeDescriptor_t desc, const cnnlTensorDescriptor_t x_desc, const void *x, const cnnlTensorDescriptor_t y_desc, void *y)¶ Reorders the dimension according to the value of
permute
. To have better performance for over 4D transpose with large-scale cases, call the cnnlTranspose_v2 function.- Parameters
[in] handle
: Input. Handle to a Cambricon CNNL context that is used to manage MLU devices and queues in the transpose operation. For detailed information, see cnnlHandle_t.[in] desc
: Input. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.[in] x_desc
: Input. The descriptor of the input tensor. For detailed information, see cnnlTensorDescriptor_t.[in] x
: Input. Pointer to the MLU memory that stores the input tensor.[in] y_desc
: Input. The descriptor of the output tensor. For detailed information, see cnnlTensorDescriptor_t.[out] y
: Output. Pointer to the MLU memory that stores the output tensor.
- Deprecated
cnnlTranspose is deprecated and will be removed in the further release. It is recommended to use cnnlTranspose_v2 instead.
- Return
- Data Type
This function supports the following data types for input tensor
x
and output tensory
. Note that the data type of input tensor and output tensor should be same.input tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.
output tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.
- Data Layout
The dimension of input tensor should be less than or equal to 8-dimension.
- Scale Limitation
The
x
,y
andpermute
have the same shape.The dimension size of
x
,y
andpermute
should be less than or equal to CNNL_DIM_MAX.The
permute
i-th dimension is in the range [0,...n-1], where n is the rank of thex
.The
y
i-th dimension will correspond to thex
permute[i]-th dimension.The process of computing, the copy times of memcpy should be less than 65536.
- API Dependency
Before calling this function to implement transpose, you need to prepare all the parameters passed to this function. See each parameter description for details.
- Note
None.
- Example
The example of the transpose operation is as follows:
input array by 3 * 2 --> input: [[1, 4], [2, 5], [3, 6]] param: dims: 2, permute: (1, 0), output array by 2 * 3 --> output: [[1, 2, 3], [4, 5, 6]]
- Reference
5.306.6. cnnlTranspose_v2¶
-
cnnlStatus_t
cnnlTranspose_v2
(cnnlHandle_t handle, const cnnlTransposeDescriptor_t desc, const cnnlTensorDescriptor_t x_desc, const void *x, const cnnlTensorDescriptor_t y_desc, void *y, void *workspace, size_t workspace_size)¶ Reorders the dimension according to the value of
permute
. Compared with cnnlTranspose, cnnlTranspose_v2 provides better performance for above 4D transpose with extra input space.This function needs extra MLU memory as the workspace to work. You can get the size of the workspace
workspace_size
with the cnnlGetTransposeWorkspaceSize function.- Parameters
[in] handle
: Input. Handle to a Cambricon CNNL context that is used to manage MLU devices and queues in the transpose operation. For detailed information, see cnnlHandle_t.[in] desc
: Input. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.[in] x_desc
: Input. The descriptor of the input tensor. For detailed information, see cnnlTensorDescriptor_t.[in] x
: Input. Pointer to the MLU memory that stores the input tensor.[out] y_desc
: Output. The descriptor of the output tensor. For detailed information, see cnnlTensorDescriptor_t.[out] y
: Output. Pointer to the MLU memory that stores the output tensor.[in] workspace
: Input. Pointer to the MLU memory that is used as an extra workspace for the transpose operation. For more information about workspace, see "Cambricon CNNL User Guide".[in] workspace_size
: Input. The size of the extra workspace in bytes that needs to be used in the transpose operation. You can get the size of the workspace with the cnnlGetTransposeWorkspaceSize function.
- Return
- Scale Limitation
The
x
,y
andpermute
have the same shape.The dimension size of
x
,y
andpermute
should be less than or equal to CNNL_DIM_MAX.The
permute
i-th dimension is in the range [0,...n-1], where n is the rank of thex
.The
y
i-th dimension will correspond tox
permute[i]-th dimension.The process of computing, the copy times of memcpy should be less than 65536.
- Formula
See "Transpose Operator" section in "Cambricon CNNL User Guide" for details.
- Data Type
This function supports the following data types for input tensor
x
and output tensory
. Note that the data type of input tensor and output tensor should be same.input tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.
output tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.
- Data Layout
The dimension of input tensor should be less than or equal to 8-dimension.
- API Dependency
Before calling this function to implement transpose, you need to prepare all the parameters passed to this function. See each parameter description for details.
- Note
None.
- Requirements
None.
- Example
The example of the transpose operation is as follows:
* input array by 3 * 2 --> * input: [[1, 4], * [2, 5], * [3, 6]] * param: * dims: 2, permute: (1, 0), * * output array by 2 * 3 --> output: [[1, 2, 3], * [4, 5, 6]] *
- Reference