5.306. Transpose¶

5.306.1. cnnlCreateTransposeDescriptor¶

cnnlStatus_t cnnlCreateTransposeDescriptor(cnnlTransposeDescriptor_t *desc)¶

Creates a descriptor pointed by desc for a transpose operation, and allocated memory for holding the information about the transpose operation.

The information is defined in cnnlTransposeDescriptor_t. For more information about descriptor, see "Cambricon CNNL user Guide".

Parameters

[out] desc: Output. A host pointer to the transpose descriptor that holds information about the transpose operation.

Return

CNNL_STATUS_SUCCESS, CNNL_STATUS_ALLOC_FAILED

API Dependency

After calling this function, you can call the cnnlSetTransposeDescriptor function to initialize and set information to the transpose descriptor.
You need to call the cnnlDestroyTransposeDescriptor function to destroy the descriptor.

Note

None.

Requirements

None.

Example

None.

5.306.2. cnnlDestroyTransposeDescriptor¶

cnnlStatus_t cnnlDestroyTransposeDescriptor(cnnlTransposeDescriptor_t desc)¶

Destroys a transpose descriptor desc that is previously created with the cnnlCreateTensorDescriptor function.

The transpose descriptor is defined in cnnlTransposeDescriptor_t and holds the information about the transpose operation.

Parameters

[in] desc: Input. The transpose descriptor to be destroyed. For detailed information, see cnnlTransposeDescriptor_t.

Return

CNNL_STATUS_SUCCESS, CNNL_STATUS_BAD_PARAM

Note

None.

Requirements

None.

Example

None.

5.306.3. cnnlGetTransposeWorkspaceSize¶

cnnlStatus_t cnnlGetTransposeWorkspaceSize(cnnlHandle_t handle, const cnnlTensorDescriptor_t x_desc, const cnnlTransposeDescriptor_t desc, size_t *size)¶

Returns in size the size of the MLU memory that is used as an extra workspace to optimize the transpose operation.

The size of extra workspace is based on the given information of the transpose operation, including the input tensor descriptor x_desc and transpose descriptor desc. For more information about the workspace, see "Cambricon CNNL User Guide".

Parameters

[in] handle: Input. Handle to a Cambricon CNNL context that is used to manage MLU devices and queues in the transpose operation. For detailed information, see cnnlHandle_t.
[in] x_desc: Input. The descriptor of the input tensor. For detailed information, see cnnlTensorDescriptor_t.
[out] desc: Input. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.
[out] size: Output. A host pointer to the returned size of the extra workspace in bytes that is used in the transpose operation.

Return

CNNL_STATUS_SUCCESS, CNNL_STATUS_BAD_PARAM

API Dependency

This function must be called after the cnnlCreateTensorDescriptor and cnnlSetTensorDescriptor functions to create and set the tensor descriptors x_desc.
The allocated extra workspace should be passed to the cnnlTranspose_v2 function to perform the transpose operation.

Note

None.

Requirements

None.

Example

None.

5.306.4. cnnlSetTransposeDescriptor¶

cnnlStatus_t cnnlSetTransposeDescriptor(cnnlTransposeDescriptor_t desc, const int dims, const int permute[])¶

Initializes the transpose descriptor desc that is previously created with the cnnlCreateTransposeDescriptor function, and set the information about the transpose operation to the transpose descriptor desc. The information includes the permute dimensions dims and permute rules permute.

Parameters

[inout] desc: Input/output. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.
[in] dims: Input. The number of dimensions in the permute tensor of the transpose operation. Currently, the value of this parameter should be less than or equal to 8.
[in] permute: Input. The order of transpose. Currently, for each dimension, the value of permute should be in range of [0,...,dims -1], and should not be the same in each dimension.

Return

CNNL_STATUS_SUCCESS, CNNL_STATUS_BAD_PARAM

Note

None.

Requirements

None.

Example

None.

5.306.5. cnnlTranspose¶

cnnlStatus_t cnnlTranspose(cnnlHandle_t handle, const cnnlTransposeDescriptor_t desc, const cnnlTensorDescriptor_t x_desc, const void *x, const cnnlTensorDescriptor_t y_desc, void *y)¶

Reorders the dimension according to the value of permute. To have better performance for over 4D transpose with large-scale cases, call the cnnlTranspose_v2 function.

Parameters

[in] handle: Input. Handle to a Cambricon CNNL context that is used to manage MLU devices and queues in the transpose operation. For detailed information, see cnnlHandle_t.
[in] desc: Input. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.
[in] x_desc: Input. The descriptor of the input tensor. For detailed information, see cnnlTensorDescriptor_t.
[in] x: Input. Pointer to the MLU memory that stores the input tensor.
[in] y_desc: Input. The descriptor of the output tensor. For detailed information, see cnnlTensorDescriptor_t.
[out] y: Output. Pointer to the MLU memory that stores the output tensor.

Deprecated

cnnlTranspose is deprecated and will be removed in the further release. It is recommended to use cnnlTranspose_v2 instead.

Return

CNNL_STATUS_SUCCESS, CNNL_STATUS_BAD_PARAM

Data Type

This function supports the following data types for input tensor x and output tensor y. Note that the data type of input tensor and output tensor should be same.
- input tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.
- output tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.

Data Layout

The dimension of input tensor should be less than or equal to 8-dimension.

Scale Limitation

The x, y and permute have the same shape.
The dimension size of x, y and permute should be less than or equal to CNNL_DIM_MAX.
The permute i-th dimension is in the range [0,...n-1], where n is the rank of the x.
The y i-th dimension will correspond to the x permute[i]-th dimension.
The process of computing, the copy times of memcpy should be less than 65536.

API Dependency

Before calling this function to implement transpose, you need to prepare all the parameters passed to this function. See each parameter description for details.

Note

None.

Example

The example of the transpose operation is as follows:

   input array by 3 * 2 -->
       input: [[1, 4],
               [2, 5],
               [3, 6]]
   param:
     dims: 2, permute: (1, 0),

   output array by 2 * 3 --> output: [[1, 2, 3],
                                      [4, 5, 6]]

Reference

https://www.tensorflow.org/api_docs/python/tf/transpose

5.306.6. cnnlTranspose_v2¶

cnnlStatus_t cnnlTranspose_v2(cnnlHandle_t handle, const cnnlTransposeDescriptor_t desc, const cnnlTensorDescriptor_t x_desc, const void *x, const cnnlTensorDescriptor_t y_desc, void *y, void *workspace, size_t workspace_size)¶

Reorders the dimension according to the value of permute. Compared with cnnlTranspose, cnnlTranspose_v2 provides better performance for above 4D transpose with extra input space.

This function needs extra MLU memory as the workspace to work. You can get the size of the workspace workspace_size with the cnnlGetTransposeWorkspaceSize function.

Parameters

[in] handle: Input. Handle to a Cambricon CNNL context that is used to manage MLU devices and queues in the transpose operation. For detailed information, see cnnlHandle_t.
[in] desc: Input. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.
[in] x_desc: Input. The descriptor of the input tensor. For detailed information, see cnnlTensorDescriptor_t.
[in] x: Input. Pointer to the MLU memory that stores the input tensor.
[out] y_desc: Output. The descriptor of the output tensor. For detailed information, see cnnlTensorDescriptor_t.
[out] y: Output. Pointer to the MLU memory that stores the output tensor.
[in] workspace: Input. Pointer to the MLU memory that is used as an extra workspace for the transpose operation. For more information about workspace, see "Cambricon CNNL User Guide".
[in] workspace_size: Input. The size of the extra workspace in bytes that needs to be used in the transpose operation. You can get the size of the workspace with the cnnlGetTransposeWorkspaceSize function.

Return

CNNL_STATUS_SUCCESS, CNNL_STATUS_BAD_PARAM

Scale Limitation

The x, y and permute have the same shape.
The dimension size of x, y and permute should be less than or equal to CNNL_DIM_MAX.
The permute i-th dimension is in the range [0,...n-1], where n is the rank of the x.
The y i-th dimension will correspond to x permute[i]-th dimension.
The process of computing, the copy times of memcpy should be less than 65536.

Formula

See "Transpose Operator" section in "Cambricon CNNL User Guide" for details.

Data Type

This function supports the following data types for input tensor x and output tensor y. Note that the data type of input tensor and output tensor should be same.
- input tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.
- output tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.

Data Layout

The dimension of input tensor should be less than or equal to 8-dimension.

API Dependency

Before calling this function to implement transpose, you need to prepare all the parameters passed to this function. See each parameter description for details.

Note

None.

Requirements

None.

Example

The example of the transpose operation is as follows:

*    input array by 3 * 2 -->
*         input: [[1, 4],
*                 [2, 5],
*                 [3, 6]]
*     param:
*       dims: 2, permute: (1, 0),
*
*     output array by 2 * 3 --> output: [[1, 2, 3],
*                                        [4, 5, 6]]
*

Reference

https://www.tensorflow.org/api_docs/python/tf/transpose