Releases: ROCm/rocBLAS
Releases · ROCm/rocBLAS
rocBLAS 5.2.0 for ROCm 7.2.0
Added
- Level 3
syrk_exfunction for both C and FORTRAN but without API support for the ILP64 format.
Optimized
- Level 2
tpmvandsbmvfunctions.
Resolved issues
- Corrected client memory use counts for the
ROCBLAS_CLIENT_RAM_GB_LIMITenvironment variable. - Fix to avoid false Clang static analysis warnings.
rocblas 5.1.0 for ROCm 7.1.1
rocBLAS code for ROCm 7.1.1 did not change. The library was rebuilt for the updated ROCm 7.1.1 stack.
rocBLAS 5.1.0 for ROCm 7.1.0
Added
- Sample for clients using OpenMP threads calling rocBLAS functions.
- gfx1103, gfx1150, and gfx1151 enabled.
Changed
- By default, the Tensile build is no longer based on
tensile_tag.txtbut uses the same commit from shared/tensile in the rocm-libraries repository. The rmake or install-toption can build from another local path with a different commit.
Optimized
- Improved the performance of Level 2 gemv transposed (
TransA != N) for the problem sizes wheremis small andnis large on gfx90a and gfx942.
rocBLAS 5.0.2 for ROCm 7.0.2
Added
- Enabled gfx1150 and gfx1151.
- The
ROCBLAS_USE_HIPBLASLT_BATCHEDvariable to independently control the batched hipblaslt backend. SetROCBLAS_USE_HIPBLASLT_BATCHED=0to disable batched GEMM use of the hipblaslt backend.
rocBLAS 4.4.1 for ROCm 6.4.4
rocBLAS code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.
rocblas 5.0.0 for ROCm 7.0.1
rocBLAS code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.
rocBLAS 5.0.0 for ROCm 7.0.0
Added
- gfx950 support
ROCBLAS_LAYER = 8internal API logging forgemmdebugging- Support for AOCL 5.0 gcc build as a client reference library
- Allow
PkgConfigfor client reference library fallback detection
Changed
CMAKE_CXX_COMPILERis now passed on during compilation for a Tensile build- Change default atomics mode from
allowedtonot allowed
Removed
- Support code for non-production gfx targets
rocblas_hgemm_kernel_name,rocblas_sgemm_kernel_name, androcblas_dgemm_kernel_nameAPI functions- Use of
warpSizeas a constexpr - Use of deprecated behavior of
hipPeekLastError rocblas_float8.handrocblas_hip_f8_impl.hfilesrocblas_gemm_ex3,rocblas_gemm_batched_ex3,rocblas_gemm_strided_batched_ex3API functions
Optimized
- Optimized
gemmby usinggemvkernels when applicable - Optimized
gemvfor smallmandnwith a large batch count on gfx942 - Improved the performance of Level 1
dotfor all precisions and variants whenN > 100000000on gfx942 - Improved the performance of Level 1
asumandnrm2for all precisions and variants on gfx942 - Improved the performance of Level 2
sger(single precision) on gfx942 - Improved the performance of Level 3
dgmmfor all precisions and variants on gfx942
Resolved issues
- Fixed environment variable path-based logging to append multiple handle output to the same file
- Support numerics when
trsmis running withrocblas_status_perf_degraded - Fixed the build dependency installation of
joblibon some operating systems - Return
rocblas_status_internal_errorwhenrocblas_[set,get]_ [matrix,vector]is called with a host pointer in place of a device pointer - Reduced the default verbosity level for internal GEMM backend information
- Updated from the deprecated rocm-cmake to ROCmCMakeBuildTools
- Corrected AlmaLinux gfortran package dependencies
Upcoming changes
- Deprecated the use of negative indices to indicate the default solution is being used for
gemm_exwithrocblas_gemm_algo_solution_index
rocBLAS 4.4.1 for ROCm 6.4.3
rocBLAS code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.
rocBLAS 4.4.1 for ROCm 6.4.2
Resolved issues
- Zero imaginary portion of diagonal of C matrix for cherk/zherk for gfx90a/gfx942 with problem sizes
k > 500
rocBLAS 4.4.0 for ROCm 6.4.1
rocBLAS code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.