gpucoder.atomicExch

Atomically exchange variable in global or shared memory with value

Since R2021b

Syntax

A = gpucoder.atomicExch(A,B)

[A,oldA] = gpucoder.atomicExch(A,B)

Description

The gpucoder.atomicExch function reads from a global or shared GPU memory location, then writes a value into the memory location. In generated GPU code, the operation is atomic, which means that a GPU thread performs the read-modify-write operation without interference from other threads.

A = gpucoder.atomicExch(A,B) atomically writes the value from B into the shared or global GPU memory location, A.

Call the gpucoder.atomicExch function directly inside a for-loop that you want to execute on the GPU in the generated CUDA^® code. Each iteration of the loop must be able to read and write to A. Use the coder.gpu.kernel pragma before the loop that contains gpucoder.atomicExch.

example

[A,oldA] = gpucoder.atomicExch(A,B) returns the previous value of A as oldA.

Examples

collapse all

Check if Columns Contain Nonzero Elements by Using Atomic Exchange

Verify whether the columns of a matrix contain nonzero elements by using gpucoder.atomicExch.

Write an entry-point function, anyCols, that accepts an M-by-N matrix, A. The function returns a 1-by-N vector, B, where each element of B is equal to 1 if the corresponding column of A contains a nonzero element, and the element is equal to 0 otherwise.

function B = anyCols(A) %#codegen
    coder.gpu.kernelfun();
    rows = size(A,1);
    cols = size(A,2);
    B = zeros([1 cols]);
    coder.gpu.kernel();
    for i=1:cols
        coder.gpu.kernel();
        for j=1:rows
            if (A(j,i) ~= 0)
                B(i) = 1;
            end
        end
    end
end

If you parallelize the for-loops across multiple threads, it is possible for multiple threads to read from and write to B(i) at once. To write to B(i) without interference from other threads, use gpucoder.atomicExch to exchange the value of B(i) with 1. If the loop sets B(i) equal to 1, the value of B(i) does not change for the rest of the loop execution, so parallelizing the loop across multiple threads does not change the result.

function B = anyCols(A) %#codegen
    coder.gpu.kernelfun();
    rows = size(A,1);
    cols = size(A,2);
    B = zeros([1 cols]);
    coder.gpu.kernel();
    for i=1:cols
        coder.gpu.kernel();
        for j=1:rows
            if (A(j,i) ~= 0)
                B(i) = gpucoder.atomicExch(B(i),1);
            end
        end
    end
end

Create a GPU code configuration object and generate code for anyCols.

A = zeros(1024);
cfg = coder.gpuConfig("mex");
codegen anyCols -config cfg -args {A}

The generated code contains the kernel anyCols_kernel2, which calculates each column of B by using the gpu_atomicExch function.

static __global__
    __launch_bounds__(256, 1) void anyCols_kernel2(const real_T A[1048576],
                                                   real_T B[1024])
{
  uint64_T gThreadId;
  int32_T i;
  int32_T j;
  gThreadId = mwGetGlobalThreadIndex();
  j = static_cast<int32_T>(gThreadId % 1024ULL);
  i = static_cast<int32_T>((gThreadId - static_cast<uint64_T>(j)) / 1024ULL);
  if ((i < 1024) && (A[j + (i << 10)] != 0.0)) {
    gpu_atomicExch(&B[i], 1.0);
  }
}

Input Arguments

collapse all

`A` — Reference to shared or global memory location
scalar

Reference to a shared or global GPU memory location, specified as a scalar. Use the reference as an input and output argument of gpucoder.atomicExch.

Data Types: double | single | int32 | uint32 | uint64

`B` — Operand
scalar

Operand, specified as a scalar.

Data Types: double | single | int32 | uint32 | uint64

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

Introduced in R2021b

expand all

R2026a: Use `double` data type

You can use the gpucoder.atomicExch function with the double data type.

gpucoder.atomicExch

Syntax

Description

Examples

Check if Columns Contain Nonzero Elements by Using Atomic Exchange

Input Arguments

`A` — Reference to shared or global memory location
scalar

`B` — Operand
scalar

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2026a: Use `double` data type

See Also

Functions

Topics

gpucoder.atomicExch

Syntax

Description

Examples

Check if Columns Contain Nonzero Elements by Using Atomic Exchange

Input Arguments

A — Reference to shared or global memory location scalar

B — Operand scalar

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2026a: Use double data type

See Also

Functions

Topics

`A` — Reference to shared or global memory location
scalar

`B` — Operand
scalar

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

R2026a: Use `double` data type