5.306. Transpose

5.306.1. cnnlCreateTransposeDescriptor

cnnlStatus_t cnnlCreateTransposeDescriptor(cnnlTransposeDescriptor_t *desc)

Creates a descriptor pointed by desc for a transpose operation, and allocated memory for holding the information about the transpose operation.

The information is defined in cnnlTransposeDescriptor_t. For more information about descriptor, see "Cambricon CNNL user Guide".

Parameters
  • [out] desc: Output. A host pointer to the transpose descriptor that holds information about the transpose operation.

Return

CNNL_STATUS_SUCCESS, CNNL_STATUS_ALLOC_FAILED

API Dependency

Note

  • None.

Requirements

  • None.

Example

  • None.

5.306.2. cnnlDestroyTransposeDescriptor

cnnlStatus_t cnnlDestroyTransposeDescriptor(cnnlTransposeDescriptor_t desc)

Destroys a transpose descriptor desc that is previously created with the cnnlCreateTensorDescriptor function.

The transpose descriptor is defined in cnnlTransposeDescriptor_t and holds the information about the transpose operation.

Parameters
Return

Note

  • None.

Requirements

  • None.

Example

  • None.

5.306.3. cnnlGetTransposeWorkspaceSize

cnnlStatus_t cnnlGetTransposeWorkspaceSize(cnnlHandle_t handle, const cnnlTensorDescriptor_t x_desc, const cnnlTransposeDescriptor_t desc, size_t *size)

Returns in size the size of the MLU memory that is used as an extra workspace to optimize the transpose operation.

The size of extra workspace is based on the given information of the transpose operation, including the input tensor descriptor x_desc and transpose descriptor desc. For more information about the workspace, see "Cambricon CNNL User Guide".

Parameters
  • [in] handle: Input. Handle to a Cambricon CNNL context that is used to manage MLU devices and queues in the transpose operation. For detailed information, see cnnlHandle_t.

  • [in] x_desc: Input. The descriptor of the input tensor. For detailed information, see cnnlTensorDescriptor_t.

  • [out] desc: Input. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.

  • [out] size: Output. A host pointer to the returned size of the extra workspace in bytes that is used in the transpose operation.

Return

API Dependency

Note

  • None.

Requirements

  • None.

Example

  • None.

5.306.4. cnnlSetTransposeDescriptor

cnnlStatus_t cnnlSetTransposeDescriptor(cnnlTransposeDescriptor_t desc, const int dims, const int permute[])

Initializes the transpose descriptor desc that is previously created with the cnnlCreateTransposeDescriptor function, and set the information about the transpose operation to the transpose descriptor desc. The information includes the permute dimensions dims and permute rules permute.

Parameters
  • [inout] desc: Input/output. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.

  • [in] dims: Input. The number of dimensions in the permute tensor of the transpose operation. Currently, the value of this parameter should be less than or equal to 8.

  • [in] permute: Input. The order of transpose. Currently, for each dimension, the value of permute should be in range of [0,...,dims -1], and should not be the same in each dimension.

Return

Note

  • None.

Requirements

  • None.

Example

  • None.

5.306.5. cnnlTranspose

cnnlStatus_t cnnlTranspose(cnnlHandle_t handle, const cnnlTransposeDescriptor_t desc, const cnnlTensorDescriptor_t x_desc, const void *x, const cnnlTensorDescriptor_t y_desc, void *y)

Reorders the dimension according to the value of permute. To have better performance for over 4D transpose with large-scale cases, call the cnnlTranspose_v2 function.

Parameters
  • [in] handle: Input. Handle to a Cambricon CNNL context that is used to manage MLU devices and queues in the transpose operation. For detailed information, see cnnlHandle_t.

  • [in] desc: Input. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.

  • [in] x_desc: Input. The descriptor of the input tensor. For detailed information, see cnnlTensorDescriptor_t.

  • [in] x: Input. Pointer to the MLU memory that stores the input tensor.

  • [in] y_desc: Input. The descriptor of the output tensor. For detailed information, see cnnlTensorDescriptor_t.

  • [out] y: Output. Pointer to the MLU memory that stores the output tensor.

Deprecated

Return

Data Type

  • This function supports the following data types for input tensor x and output tensor y. Note that the data type of input tensor and output tensor should be same.

    • input tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.

    • output tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.

Data Layout

  • The dimension of input tensor should be less than or equal to 8-dimension.

Scale Limitation

  • The x, y and permute have the same shape.

  • The dimension size of x, y and permute should be less than or equal to CNNL_DIM_MAX.

  • The permute i-th dimension is in the range [0,...n-1], where n is the rank of the x.

  • The y i-th dimension will correspond to the x permute[i]-th dimension.

  • The process of computing, the copy times of memcpy should be less than 65536.

API Dependency

  • Before calling this function to implement transpose, you need to prepare all the parameters passed to this function. See each parameter description for details.

Note

  • None.

Example

  • The example of the transpose operation is as follows:

       input array by 3 * 2 -->
           input: [[1, 4],
                   [2, 5],
                   [3, 6]]
       param:
         dims: 2, permute: (1, 0),
    
       output array by 2 * 3 --> output: [[1, 2, 3],
                                          [4, 5, 6]]
    

Reference

5.306.6. cnnlTranspose_v2

cnnlStatus_t cnnlTranspose_v2(cnnlHandle_t handle, const cnnlTransposeDescriptor_t desc, const cnnlTensorDescriptor_t x_desc, const void *x, const cnnlTensorDescriptor_t y_desc, void *y, void *workspace, size_t workspace_size)

Reorders the dimension according to the value of permute. Compared with cnnlTranspose, cnnlTranspose_v2 provides better performance for above 4D transpose with extra input space.

This function needs extra MLU memory as the workspace to work. You can get the size of the workspace workspace_size with the cnnlGetTransposeWorkspaceSize function.

Parameters
  • [in] handle: Input. Handle to a Cambricon CNNL context that is used to manage MLU devices and queues in the transpose operation. For detailed information, see cnnlHandle_t.

  • [in] desc: Input. The descriptor of the transpose operation. For detailed information, see cnnlTransposeDescriptor_t.

  • [in] x_desc: Input. The descriptor of the input tensor. For detailed information, see cnnlTensorDescriptor_t.

  • [in] x: Input. Pointer to the MLU memory that stores the input tensor.

  • [out] y_desc: Output. The descriptor of the output tensor. For detailed information, see cnnlTensorDescriptor_t.

  • [out] y: Output. Pointer to the MLU memory that stores the output tensor.

  • [in] workspace: Input. Pointer to the MLU memory that is used as an extra workspace for the transpose operation. For more information about workspace, see "Cambricon CNNL User Guide".

  • [in] workspace_size: Input. The size of the extra workspace in bytes that needs to be used in the transpose operation. You can get the size of the workspace with the cnnlGetTransposeWorkspaceSize function.

Return

Scale Limitation

  • The x, y and permute have the same shape.

  • The dimension size of x, y and permute should be less than or equal to CNNL_DIM_MAX.

  • The permute i-th dimension is in the range [0,...n-1], where n is the rank of the x.

  • The y i-th dimension will correspond to x permute[i]-th dimension.

  • The process of computing, the copy times of memcpy should be less than 65536.

Formula

  • See "Transpose Operator" section in "Cambricon CNNL User Guide" for details.

Data Type

  • This function supports the following data types for input tensor x and output tensor y. Note that the data type of input tensor and output tensor should be same.

    • input tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.

    • output tensor: uint8, int8, uint16, int16, uint32, int31, int32, uint64, int64, bool, half, float, complex_half, complex_float.

Data Layout

  • The dimension of input tensor should be less than or equal to 8-dimension.

API Dependency

  • Before calling this function to implement transpose, you need to prepare all the parameters passed to this function. See each parameter description for details.

Note

  • None.

Requirements

  • None.

Example

  • The example of the transpose operation is as follows:

    *    input array by 3 * 2 -->
    *         input: [[1, 4],
    *                 [2, 5],
    *                 [3, 6]]
    *     param:
    *       dims: 2, permute: (1, 0),
    *
    *     output array by 2 * 3 --> output: [[1, 2, 3],
    *                                        [4, 5, 6]]
    *
    

Reference