This project uses the Java Microbenchmark Harness (JMH) framework to measure the performance of various mathematical and array operations. The benchmarks compare traditional loop-based implementations with vectorized implementations using the Java Vector API. Additionally, some benchmarks compare the impact of the SuperWord optimization.
The toolset used in this project in consistent throughout all benchmarks and its specs are as described in the following sections.
- Benchmark Tool: JMH
- JMH Version: 1.36
- JMH Mode: Average Time
- JMH Time Unit: Nanoseconds
@State(Scope.Thread)
: Indicates that the benchmark state is maintained per thread.@BenchmarkMode(Mode.AverageTime)
: Specifies that the benchmark measures the average time taken by the methods.@OutputTimeUnit(TimeUnit.NANOSECONDS)
: Sets the time unit for the benchmark results.@Fork
: Configures the JVM options for the benchmark runs.
- Fields like
matrixSize
,matrix
,kernel
, andresult
are used to store the input data and results for the benchmarks. @Param
: Allows parameterization of the benchmarks to run with different input sizes.
- Methods annotated with
@Benchmark
contain the code to be benchmarked. These methods perform operations like dot product, matrix multiplication, etc., using both traditional loops and vectorized operations.
@Setup(Level.Trial)
: Initializes the input data before the benchmark runs.
- Configures and runs the benchmarks using JMH.
The benchmarks in this project examine the performance of various mathematical and array operations, comparing traditional loop-based implementations with vectorized implementations using the Java Vector API. Additionally, some benchmarks compare the impact of the SuperWord optimization.
-
ArrayStats:
- Benchmarks statistical operations on arrays using traditional loops vs. vectorized operations.
-
ArrayStatsNoSuperWord:
- Similar to
ArrayStats
, but without the SuperWord optimization.
- Similar to
-
ComplexExpression:
- Benchmarks the evaluation of complex mathematical expressions using traditional loops vs. vectorized operations.
-
ComplexExpressionNoSuperWord:
- Similar to
ComplexExpression
, but without the SuperWord optimization.
- Similar to
-
DotProduct:
- Benchmarks the calculation of the dot product of two vectors using traditional loops vs. vectorized operations.
-
DotProductNoSuperWord:
- Similar to
DotProduct
, but without the SuperWord optimization.
- Similar to
-
ElementWiseMultiplication:
- Benchmarks element-wise multiplication of two arrays using traditional loops vs. vectorized operations.
-
ElementWiseMultiplicationNoSuperWord:
- Similar to
ElementWiseMultiplication
, but without the SuperWord optimization.
- Similar to
-
MatrixMultiplication:
- Benchmarks matrix multiplication using traditional loops vs. vectorized operations.
-
MatrixMultiplicationNoSuperWord:
- Similar to
MatrixMultiplication
, but without the SuperWord optimization.
- Similar to
-
SimpleSum:
- Benchmarks summing all elements in an array using traditional loops vs. vectorized operations.
-
SimpleSumNoSuperWord:
- Similar to
SimpleSum
, but without the SuperWord optimization.
- Similar to
-
VectorAddition:
- Benchmarks adding two vectors element-wise using traditional loops vs. vectorized operations.
-
VectorAdditionNoSuperWord:
- Similar to
VectorAddition
, but without the SuperWord optimization.
- Similar to
-
Sorting:
- Benchmarks sorting an array using traditional loops vs. vectorized operations.
-
SortingNoSuperWord:
- Similar to
Sorting
, but without the SuperWord optimization.
- Similar to
NOTE:
- SuperWord optimization enhances performance by leveraging SIMD instructions to process multiple data elements in parallel. This results in significant performance gains for operations that can be vectorized, as observed in the benchmark results for operations like dot product, element-wise multiplication, and matrix multiplication. However, the effectiveness of this optimization can vary depending on the complexity and nature of the operations.
Below is a list of benchmark runs. Each benchmark run targets specific hardware and setup, as detailed in the dedicated subfolders for each run: