Running Code on GPU Seems much Slower than Doing so on CPU
Show older comments
Hi there,
I am using a Thinkpad W550, and my GPU is Quadro K620M. As I simply ran the following code, the profile showed that running on the GPU was much slower.
function Test_GPU()
a = [10^8, 18^8];
h = a;
c = conv2(h, a, 'full');
% Running in doube precision got a similar result
aa = single(gpuArray([10^8, 18^8]));
hh = aa;
cc = conv2(hh, aa, 'full');
end

So I ran the official gpuBench()
The result is astonishing! Running on the GPU IS slower, much much more slower.
The first picture shows the result from GPU, and the second, CPU.


I will be very grateful if anyone could tell me why. Many thanks
2 Comments
Theron FARRELL
on 27 May 2019
Jan
on 27 May 2019
a = [10^8, 18^8] is a [1x2] vector. For a speed comparison, this job is too tiny.
Accepted Answer
More Answers (2)
Walter Roberson
on 27 May 2019
0 votes
The Quadro 620M was a Maxwell architecture, GM108 chip. That architecture does double precision at 1/32 of single precision.
MTimes operations are delegated to LAPACK by MATLAB for sufficiently large arrays. LAPACK automatically uses all available CPU cores.
My CPU shows up as faster for double precision MTIMES and backslach than my GTX 780M does, but the GPU was much faster for single precision, and is faster for double precision FFT than my CPU measures as.
8 Comments
Jan
on 27 May 2019
The screenshot posted by te OP seems to show, that his GPU works slightly faster on double than on single. Strange.
Theron FARRELL
on 27 May 2019
Edited: Theron FARRELL
on 27 May 2019
Andrea Picciau
on 28 May 2019
Edited: Walter Roberson
on 29 May 2019
Hi Theron,
I ran your code on my workstation, on which I have an NVIDIA K40c and an Intel Xeon E-1650 CPU. I wasn't able to reproduce your results, which seems to suggests that the your GPU might be the "limiting factor".
What version of MATLAB are you using?
Jan
on 28 May 2019
@Andrea: This is not my code.
@Theron FARRELL: Using the profiler disables the JIT acceleration. The comparison of timings, which are displayed as "0.000s" is very fragile. You cannot expect to get a realistic view on the efficiency of the code with such comparisons.
"And now, it seems that 'single' is the fastest. So strange...." - I still think, that this is the expected effect. If you observe anything else, there is eitehr a problem in the code, or the transfer of the data to the GPU exceeds the time of the actual processing, or the total times are too short to be measured relaibaly by the profiler. Using some hundred calls in a loop and tic/toc is more accurate, but timeit is even better.
Theron FARRELL
on 29 May 2019
Theron FARRELL
on 29 May 2019
Moved: Walter Roberson
on 27 Oct 2024
Andrea Picciau
on 29 May 2019
Edited: Walter Roberson
on 29 May 2019
@Jan: Sorry, I meant to say "Theron". I changed my previous comment to fix that.
Jan
on 29 May 2019
@Theron: I do not undestand, why you expect arrayfun to have a positive effect on the processing speed. The opposite is expected.
Starting the profiler disables the JIT accleration automatically, because the JIT can re-oreder the commands if it improves the speed, but then there is no relation between the timings and te code lines anymore. This means, that running the profiler can affect the run time massively, especially for loops. Of course this sounds to be counter-productive for the job of a profiler - and it is so, in fact. Therefore the profiler and tic/toc should be used both, because they have different advantages and disadvantages. For measuring the speed of single commands or elementary loops, the profiler is not a good choice.
Miguel
on 27 Oct 2024
0 votes
I am running a vehicle simulation on GPU vs CPU, and takes hughe ammount of time, and I have a gaming PC, why?
Categories
Find more on Get Started with GPU Coder in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


