Why does my GPU not outperform my CPU/another GPU? Troubleshooting Steps

3 views (last 30 days)
Why does my GPU not outperform my CPU / another GPU?
Here are some troubleshooting steps to understanding factors which affect performance.

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 2 Nov 2023
Edited: MathWorks Support Team on 2 Nov 2023
There are multiple factors which determine a GPU's performance. The headline number of cores a GPU has is not enough to accurately gauge performance.
To isolate if the code or the GPU is the primary issue, try the following (details for each step below):
  1. Run a standard benchmark test.
  2. If the benchmark shows different behavior between the precision types: calculate the expected ratio between Single and Double Precision for their GPU card.
  3. If the card is a laptop (mobile) card, adjust expectation for performance. 
  4. If performance in the benchmarks is good and clear then it is likely a problem originating in code. 
 
Standard GPU Benchmark:
There is a benchmarking test written by MathWorks Parallel Computing Team and available on the File Exchange: 
This test will do a variety of tests involving both memory and compute intensive tasks in both single and double precision. It will also offer comparison between a relatively normal display card and a reasonable compute card. The performances are matched with the version of MATLAB being used.
>> gpuBench
 
Comparing GPU Devices.
To answer this question you will need the GPU device specifications and for completeness the CPU specs can help as well. There are then three key topics to consider when making the comparison.
1. Double vs Single Precision
Double precision and single precision performance can be wildly different between graphics cards with the same total number of cores (the variation is due to whether the cores are mostly FP32 (single) or FP64 (double)). Most GPUs are designed for mostly single precision performance since this is what graphics display demands. In comparison CPUs will not have a drop in performance for double precision. Below is an easy reference guide for information about a graphics card:
If Nvidia has declared their double precision performance it will be listed. If double precision performance is not listed, then although the compute capability may be above 1.3 (needed for double precision) then the performance is significantly lower (in the order of 24-32x slower at double precision than single precision).
To calculate the ratio between Single Precision and Double Precision:
  1. Find the GPU on the wiki page above.
  2. Get the stated single precision and double precision performance values from the table. (if there is no double precision GFLOPS value assume ratio is 24-32x slower for double precision)
  3. Divide the stated single precision GFLOPS by the double precision GFLOPS to get a ratio of how slower double is to single.
At the time of writing a high end compute card can get this ratio as low as 3x.
2. Mobile vs Desktop
Is the graphics card inside a laptop? If yes, then it is highly likely the card is a mobile graphics card. In many cases this card is suffixed with an M (not all M's mean mobile again cross reference with the wiki page above to definitively check).
Mobile graphics cards are smaller and less powerful due to the heat and power restrictions their environment imposes on them. If using a mobile GPU for computation, speed expectations should be adjusted down.
3. Display vs Compute
Is the graphics card acting also as the display card? If there is only 1 graphics card in the machine and no on-board graphics, then this is likely the case. In this situation the operating system will commonly impose a Kernel Timeout. This is shown on the 'gpuDevice' output as:
KernelExecutionTimeout: 1
A Kernel Execution Timeout's purpose is to make sure the OS is always able to print updates to the screen. If a computation on the GPU takes too much time then the operation will be killed. This tends to disrupt the CUDA environment for MATLAB and further use of the GPU by MATLAB (for either OpenGL or GPU computation) will require a restart of MATLAB. The following article has instructions on how to extend or disable this timeout period:
However, note that performance may be lowered even without hitting this timeout due to the need to share the resource with other programs.
 
Refer to the following link for more information:

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!