Nvidia NVLink in MATALB using multiple Geforce 2080 ti cards

Question

0 votes

The NVLink yields a high bidirectional bandwidth between the cards (the Geforce 2080 ti offers 100 GB/, while the GeForce RTX 2080 offers 50GB/s). When fully utilized, the NVLink will minimize inter-GPU traffic over the PCI Express interface and also allows the memory on each card to behave more as a single, shared resource. This opens up the possibility for new multi-GPU modes for scientific analysis and big dataworkloads as well.

Will Matlab take advantage of the nvlinks ability to share resources between cards in future releases (R2019a/R2019b)?

1 Comment
Show -1 older comments Hide -1 older comments

Walter Roberson on 17 Dec 2018

Questions about future products need to be asked of your Sales Representative. The people who volunteer to answer questions here are either outside volunteers or inside volunteers who are seldom authorized to speak about future products without a Non-Disclosure Agreement.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Prem Ankur on 20 Dec 2018

Edited: Prem Ankur on 20 Dec 2018

Open in MATLAB Online

2 votes

I understand that you want to use Nvidia NVLink with MATLAB to leverage multiple Nvidia Geforce 2080 ti graphic cards.

Currently, there is no straightforward way to utilize the high bidirectional bandwidth to distribute the computations across the available GPUs. This would be possible with the introduction of distributed GPU arrays which can use all GPUs and gather the results back automatically. Distributed GPU arrays are currently not supported in MATLAB.

Our development teams are aware of this and are currently considering "distributed GPU Arrays" for future releases of MATLAB.

Currently, there are no commands to enable NVLink in MDCS. As a workaround to leverage NVlink, you can use "gop" function as in example given below that does the computation in each gpu and get the result back :

>> gop(@plus, gpuArray(X), 'gpuArray');

Open a parpool and issue local and/or communicating instructions via spmd that will allow to do distributed computation between the GPUs.

Refer to the following link for using gop with spmd to distribute the operations to each GPU and in turn effectively use all the GPU memory for the same function using chunks of data:

https://www.mathworks.com/help/distcomp/gop.html

10 Comments
Show 8 older comments Hide 8 older comments

Joss Knight on 22 Dec 2018

Edited: Joss Knight on 22 Dec 2018

Open in MATLAB Online

If you want to see the full behaviour off gop(..., 'gpuArray') then look at the help text by typing help gpuArray/gop - this behaviour is not documented in the Help browser:

help gpuArray/gop
 gop  Global collective operation on gpuArray
    RES = gop(FUN,X)
    RES = gop(FUN,X,LABTARGET)
    RES = gop(FUN,X,CLASSNAME)
    RES = gop(FUN,X,LABTARGET,CLASSNAME)
    
    gop has been optimized for gpuArrays under certain conditions:
    - You specify the CLASSNAME 'gpuArray'.
    - Your parallel pool is running under Linux.
    - All GPUs in the pool have compute capability 3.5 or above.
    - All data are non-empty full real gpuArrays of type single, double, uint8,
      int32, int64 or uint64.
    - All data are the same size in every dimension (no dimension expansion).
    - FUN is one of @plus, @times, @min or @max.
    Under these circumstances, peer-to-peer communication is used to
    improve performance.
    
    If these conditions are not met, the behaviour of gop for gpuArrays is
    unchanged.
    
    Example:
    spmd
        N = 1000;
        x = gpuArray.randn(N);
        SumX = gop(@plus,x,'gpuArray');        % Matrix addition of each x
        MaxX = gop(@max,max(x(:)),'gpuArray'); % Maximum value over all workers
    end
    

This is the only function (other than trainNetwork) which is explicitly optimized for NVLink, using the NCCL library. However, the functionality that gop supports, namely reductions, is usually the main communications bottleneck for distributed computation, so in combination with purely parallel computation (operations that take place entirely on one worker) it is possible to get highly optimized multi-gpu behaviour.

You can also emulate point-to-point and one-to-all communication using gop, thus gaining access to other useful behaviours.

    function data = gpuPointToPoint(sender, receiver, data)
    % gpuPointToPoint Send data from labindex==sender to labindex==receiver
    % Set receiver to 0 to get broadcast behaviour
        if labindex ~= sender
            data = gpuArray.zeros(size(data), 'like', data);
        end
        data = gop(@plus, data, receiver, 'gpuArray');
        end
    end

Walter Roberson on 4 Jun 2021

Everything that I know on this topic, that I have not posted in the past, is under Non-Disclosure Agreement.

You need to contact Mathworks Sales and see if they are willing to give you the relevant information under Non-Disclosure Agreement.

Joss and most other Mathworks employees are not permitted to discuss the matter until 10 days before any hypothetical official release... and Mathworks is not even running any Beta at the moment, so any hypothetical official release is not going to be any time "soon".

Joss Knight on 5 Jun 2021

MATLAB's basic point-to-point comms via labSend and labReceive, as well as gop and labBroadcast now all use NVLink or NVSwitch (or other peer-to-peer comms) where available on both platforms.

MATLAB still does not have support for distributed gpuArrays. The most useful distributed array functionality is dependent on ScaLAPACK, for which there is no GPU library equivalent. That said, more multi-GPU features are planned for future releases.

Sign in to comment.

Nvidia NVLink in MATALB using multiple Geforce 2080 ti cards

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

10 Comments
Show 8 older comments Hide 8 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Nvidia NVLink in MATALB using multiple Geforce 2080 ti cards

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

10 Comments Show 8 older comments Hide 8 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

10 Comments
Show 8 older comments Hide 8 older comments