Main Content

gpucoder.atomicExch

Atomically exchange variable in global or shared memory with value

Since R2021b

    Description

    The gpucoder.atomicExch function reads from a global or shared GPU memory location, then writes a value into the memory location. In generated GPU code, the operation is atomic, which means that a GPU thread performs the read-modify-write operation without interference from other threads.

    A = gpucoder.atomicExch(A,B) atomically writes the value from B into the shared or global GPU memory location, A.

    Call the gpucoder.atomicExch function directly inside a for-loop that you want to execute on the GPU in the generated CUDA® code. Each iteration of the loop must be able to read and write to A. Use the coder.gpu.kernel pragma before the loop that contains gpucoder.atomicExch.

    example

    [A,oldA] = gpucoder.atomicExch(A,B) returns the previous value of A as oldA.

    Examples

    collapse all

    Verify whether the columns of a matrix contain nonzero elements by using gpucoder.atomicExch.

    Write an entry-point function, anyCols, that accepts an M-by-N matrix, A. The function returns a 1-by-N vector, B, where each element of B is equal to 1 if the corresponding column of A contains a nonzero element, and the element is equal to 0 otherwise.

    function B = anyCols(A) %#codegen
        coder.gpu.kernelfun();
        rows = size(A,1);
        cols = size(A,2);
        B = zeros([1 cols]);
        coder.gpu.kernel();
        for i=1:cols
            coder.gpu.kernel();
            for j=1:rows
                if (A(j,i) ~= 0)
                    B(i) = 1;
                end
            end
        end
    end

    If you parallelize the for-loops across multiple threads, it is possible for multiple threads to read from and write to B(i) at once. To write to B(i) without interference from other threads, use gpucoder.atomicExch to exchange the value of B(i) with 1. If the loop sets B(i) equal to 1, the value of B(i) does not change for the rest of the loop execution, so parallelizing the loop across multiple threads does not change the result.

    function B = anyCols(A) %#codegen
        coder.gpu.kernelfun();
        rows = size(A,1);
        cols = size(A,2);
        B = zeros([1 cols]);
        coder.gpu.kernel();
        for i=1:cols
            coder.gpu.kernel();
            for j=1:rows
                if (A(j,i) ~= 0)
                    B(i) = gpucoder.atomicExch(B(i),1);
                end
            end
        end
    end

    Create a GPU code configuration object and generate code for anyCols.

    A = zeros(1024);
    cfg = coder.gpuConfig("mex");
    codegen anyCols -config cfg -args {A}

    The generated code contains the kernel anyCols_kernel2, which calculates each column of B by using the gpu_atomicExch function.

    static __global__
        __launch_bounds__(256, 1) void anyCols_kernel2(const real_T A[1048576],
                                                       real_T B[1024])
    {
      uint64_T gThreadId;
      int32_T i;
      int32_T j;
      gThreadId = mwGetGlobalThreadIndex();
      j = static_cast<int32_T>(gThreadId % 1024ULL);
      i = static_cast<int32_T>((gThreadId - static_cast<uint64_T>(j)) / 1024ULL);
      if ((i < 1024) && (A[j + (i << 10)] != 0.0)) {
        gpu_atomicExch(&B[i], 1.0);
      }
    }

    Input Arguments

    collapse all

    Reference to a shared or global GPU memory location, specified as a scalar. Use the reference as an input and output argument of gpucoder.atomicExch.

    Data Types: double | single | int32 | uint32 | uint64

    Operand, specified as a scalar.

    Data Types: double | single | int32 | uint32 | uint64

    Extended Capabilities

    expand all

    C/C++ Code Generation
    Generate C and C++ code using MATLAB® Coder™.

    GPU Code Generation
    Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

    Version History

    Introduced in R2021b

    expand all