Nvidia NVLink in MATALB using multiple Geforce 2080 ti cards

The NVLink yields a high bidirectional bandwidth between the cards (the Geforce 2080 ti offers 100 GB/, while the GeForce RTX 2080 offers 50GB/s). When fully utilized, the NVLink will minimize inter-GPU traffic over the PCI Express interface and also allows the memory on each card to behave more as a single, shared resource. This opens up the possibility for new multi-GPU modes for scientific analysis and big dataworkloads as well.
Will Matlab take advantage of the nvlinks ability to share resources between cards in future releases (R2019a/R2019b)?

1 Comment

Questions about future products need to be asked of your Sales Representative. The people who volunteer to answer questions here are either outside volunteers or inside volunteers who are seldom authorized to speak about future products without a Non-Disclosure Agreement.

Sign in to comment.

 Accepted Answer

I understand that you want to use Nvidia NVLink with MATLAB to leverage multiple Nvidia Geforce 2080 ti graphic cards.
Currently, there is no straightforward way to utilize the high bidirectional bandwidth to distribute the computations across the available GPUs. This would be possible with the introduction of distributed GPU arrays which can use all GPUs and gather the results back automatically. Distributed GPU arrays are currently not supported in MATLAB.
Our development teams are aware of this and are currently considering "distributed GPU Arrays" for future releases of MATLAB.
Currently, there are no commands to enable NVLink in MDCS. As a workaround to leverage NVlink, you can use "gop" function as in example given below that does the computation in each gpu and get the result back :
>> gop(@plus, gpuArray(X), 'gpuArray');
Open a parpool and issue local and/or communicating instructions via spmd that will allow to do distributed computation between the GPUs.
Refer to the following link for using gop with spmd to distribute the operations to each GPU and in turn effectively use all the GPU memory for the same function using chunks of data:

10 Comments

Thank you for your fast replay :-)
If you want to see the full behaviour off gop(..., 'gpuArray') then look at the help text by typing help gpuArray/gop - this behaviour is not documented in the Help browser:
help gpuArray/gop
gop Global collective operation on gpuArray
RES = gop(FUN,X)
RES = gop(FUN,X,LABTARGET)
RES = gop(FUN,X,CLASSNAME)
RES = gop(FUN,X,LABTARGET,CLASSNAME)
gop has been optimized for gpuArrays under certain conditions:
- You specify the CLASSNAME 'gpuArray'.
- Your parallel pool is running under Linux.
- All GPUs in the pool have compute capability 3.5 or above.
- All data are non-empty full real gpuArrays of type single, double, uint8,
int32, int64 or uint64.
- All data are the same size in every dimension (no dimension expansion).
- FUN is one of @plus, @times, @min or @max.
Under these circumstances, peer-to-peer communication is used to
improve performance.
If these conditions are not met, the behaviour of gop for gpuArrays is
unchanged.
Example:
spmd
N = 1000;
x = gpuArray.randn(N);
SumX = gop(@plus,x,'gpuArray'); % Matrix addition of each x
MaxX = gop(@max,max(x(:)),'gpuArray'); % Maximum value over all workers
end
This is the only function (other than trainNetwork) which is explicitly optimized for NVLink, using the NCCL library. However, the functionality that gop supports, namely reductions, is usually the main communications bottleneck for distributed computation, so in combination with purely parallel computation (operations that take place entirely on one worker) it is possible to get highly optimized multi-gpu behaviour.
You can also emulate point-to-point and one-to-all communication using gop, thus gaining access to other useful behaviours.
function data = gpuPointToPoint(sender, receiver, data)
% gpuPointToPoint Send data from labindex==sender to labindex==receiver
% Set receiver to 0 to get broadcast behaviour
if labindex ~= sender
data = gpuArray.zeros(size(data), 'like', data);
end
data = gop(@plus, data, receiver, 'gpuArray');
end
end
Thank you very much for your feed back and merry Christmas and happy new year :-)
I will test your proposal asap.
However, I have experienced problems with NVIDIA 2080 ti FE cards using deep learning and Matlab 2019a (prerelease). MATLAB must first compile libraries before you can train the network, whereas the 1080ti cards and Titans works without problems. Have used the latest NVIDIA drivers. This can cause problems if you train multiple networks (over days) and you wish to flush the GPU memeory to ensure maximum performance.
The warning about compiling libraries is a bug which will be fixed in the general release.
Thank you for your fast reply :-)
Hi it's been more than a year since this answer, do you have any update on the subject?
We are actively working on improving MATLAB's multi-GPU support. Is there something specific you need that you cannot do using the answers in these comments?
Hi there, I am wondering if there is any update on improving MATLAB's multi-GPU support?
We have two Nvida RTX8000 GPUs that are linked together with an NVLink. We'd like to be able to use the two cards as a single card to pool their memory similar to what is described here. Based on the response above I understand that I should be able to effectively workaround this issue by using "gop" command, but before spending time to write code that will do so I wanted to see if there was any update from matlab.
Thank you for your time and help!
Everything that I know on this topic, that I have not posted in the past, is under Non-Disclosure Agreement.
You need to contact Mathworks Sales and see if they are willing to give you the relevant information under Non-Disclosure Agreement.
Joss and most other Mathworks employees are not permitted to discuss the matter until 10 days before any hypothetical official release... and Mathworks is not even running any Beta at the moment, so any hypothetical official release is not going to be any time "soon".
MATLAB's basic point-to-point comms via labSend and labReceive, as well as gop and labBroadcast now all use NVLink or NVSwitch (or other peer-to-peer comms) where available on both platforms.
MATLAB still does not have support for distributed gpuArrays. The most useful distributed array functionality is dependent on ScaLAPACK, for which there is no GPU library equivalent. That said, more multi-GPU features are planned for future releases.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!