0% found this document useful (0 votes)
90 views2 pages

Strassen

This document describes the implementation and testing of the Strassen matrix multiplication algorithm. The algorithm was implemented recursively with a minimum block size threshold to determine when to switch to regular matrix multiplication. Testing showed that Strassen outperformed naive and blocking implementations for larger matrix sizes, with performance peaking between 300-400 MFLOPS for 512x512 matrices depending on the minimum block size used.

Uploaded by

Hamid Aslani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views2 pages

Strassen

This document describes the implementation and testing of the Strassen matrix multiplication algorithm. The algorithm was implemented recursively with a minimum block size threshold to determine when to switch to regular matrix multiplication. Testing showed that Strassen outperformed naive and blocking implementations for larger matrix sizes, with performance peaking between 300-400 MFLOPS for 512x512 matrices depending on the minimum block size used.

Uploaded by

Hamid Aslani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Brandon Merkl

Strassen Implementation
This algorithm was implemented as follows:
void inline strassen(int n,double* c,int ldc,int rc,int cc, double*
a,int lda,int ra, int ca,double *b, int ldb, int rb, int cb, int alpha,
int min_block_size ){
//C(i,j) = alpha*C(i,j) + A(i,k) * B(k,j)
//alpha [0,1]

Since it is recursive in nature, a flag named alpha, was used to delineate whether the
results need to be accumulated into C or merely set to C. Also, a minimum level of
recursion was defined using a variable called min_block_size was used. If the recursion
level produces an n such that n < min_block_size, then normal matrix multiply is used.
The results are shown below for min_block_size = 32 and min_block_size = 64. Also
included are the IJK_Blocking_20 (block size = 20) and IJK_Blocking_21 (block size =
21). A naïve implementation is included as well as the default DGEMM for comparison.

Performance comparison of Selected Algorithms


450

400

350
Performance MFLOP/s

300 strassen32
strassen64
250
DGEMM
IJK_BLOCKING_20
200
IJK_BLOCKING_21
150 IJK_NAIVE

100

50

0
0 100 200 300 400 500 600 700
N
APPENDIX A – Results Strassen
F=frobenius norm

N B DGEMM STRASSEN IJK_blocking IJK_Naive

4 20 16.53 21.1 F:000.18 24.8 F:000.18 28.9 F:000000


8 20 40.68 78.6 F:000.49 91.0 F:000.49 75.0 F:000000
16 20 60.85 166.3 F:000.61 174.7 F:000.61 116.9 F:000000
32 20 68.25 198.1 F:000.97 251.7 F:000.76 153.9 F:000000
64 20 72.87 229.8 F:0001.5 334.0 F:000001 177.7 F:000000
128 20 74.53 213.2 F:0002.4 326.8 F:0001.4 146.8 F:000000
256 20 33.92 186.3 F:0004.1 194.9 F:0001.9 44.2 F:000000
512 20 33.91 175.8 F:0006.8 240.4 F:0002.7 44.5 F:000000

4 21 19.60 21.9 F:000.36 27.7 F:000.36 30.8 F:000000


8 21 41.19 76.6 F:000.47 87.3 F:000.47 66.4 F:000000
16 21 58.02 165.9 F:000.58 173.7 F:000.58 117.8 F:000000
32 21 68.32 184.3 F:000.94 253.2 F:000.77 153.1 F:000000
64 21 72.91 231.2 F:0001.6 342.1 F:000.99 177.9 F:000000
128 21 73.00 210.7 F:0002.6 346.2 F:0001.4 147.0 F:000000
256 21 33.91 185.7 F:0004.2 277.9 F:0001.9 43.9 F:000000
512 21 34.00 176.0 F:0006.8 262.4 F:0002.6 44.3 F:000000

4 64 20.67 22.7 F:000.25 29.0 F:000.25 25.5 F:000000


8 64 42.27 84.9 F:000.46 88.0 F:000.46 68.7 F:000000
16 64 58.18 168.0 F:000.59 172.2 F:000.59 118.1 F:000000
32 64 68.22 260.2 F:000.76 260.4 F:000.76 158.4 F:000000
64 64 72.94 318.8 F:000001 348.6 F:000.94 177.8 F:000000
128 64 72.30 296.1 F:0001.5 378.2 F:0001.4 147.4 F:000000
256 64 34.98 247.6 F:0002.2 282.2 F:0001.9 43.9 F:000000
512 64 34.05 225.9 F:0003.3 124.8 F:0002.6 44.2 F:000000

You might also like