# matrix multiplication speed calculation

27 views (last 30 days)
Codefighter on 8 Jun 2018
Hello, I want to calculate matrix multiplication time with Matlab. I have two matrices (P and Q), which are size of [A x B] and [B x 48]. So I wrote the code as following:
--------
tic
R = P*Q;
time_1 = toc
---------
then I calculated the time again by separating the columns of matrix Q. Each separated matrix (Q1,Q2,Q3) has a size of [B x 16]. I wrote the code as following
----------
tic
R1 = P*Q1;
a = toc
tic
R2 = P*Q2;
b = toc
tic
R3 = P*Q3;
b = toc
time_2 = a+b+c;
-----------
I though that the "time_2" will be equal to "time_1" because it has same number of multiplication and addition. However, the results are different. The time_1 is much faster than time_2. I think it's because of the time it takes to load some libraries related to the mathematics. Do you know why this situation happened, and How do I calculate the exact matrix multiplication time ?

Razvan Carbunescu on 8 Jun 2018
Matrix multiplication (GEMM) is one of the heavily optimized methods and when operating on larger inputs more optimizations, blocking and cache reuse can be achieved.
The two extremes of this are a BLAS level 2 way where you multiply each column (GEMV - matrix vector multiply) versus the method of BLAS level 3 GEMM (matrix matrix multiply).
A naive GEMM (using 3 for loops) usually gets around 3-5% of the processors peak performance. A blocked GEMM without any other optimization (6 for loops) gets around 20% of the peak performance. The matrix multiply MATLAB uses is Intel MKL's GEMM which is tuned for different processors and can get around 80-90% of the processor's performance.
Now all those numbers above are for large matrix sizes as cache reuse and SIMD need larger sizes to overcome overheads. This is the reason for the differences you are seeing as the larger the matrices get the more optimizations MKL is able to get out of the input.