slow imwarp with large arrays

77 views (last 30 days)
Nic Bac
Nic Bac on 5 Nov 2024 at 16:27
Commented: Umar on 7 Nov 2024 at 15:12
Hello,
I have a piece of code that uses imwarp to transform images of varying size (up to 20k x 20k pixels), which works fine with input images up to around 500x500px:
tform = fitgeotform2d(input,spatial,"lwm",lwmZone);
[Imagenew, Rnew] = imwarp(refimg,Rimage,tform,"cubic",FillValues=0);
What I noticed is that the function seem to use only one core of the 72 available in my machine, even if the parallel pool has been enabled using parpool('threads').
This to me seems odd since the imwarp help mentiones that parallel options are available/supported.
What am I doing wrong? Is there a way to accelerate the execution time of imwarp with large arrays?
Thank you.

Accepted Answer

Joss Knight
Joss Knight on 6 Nov 2024 at 12:11
I think the documentation is just referring to using the GPU or the ability to process in the background using backgroundPool to free up your MATLAB to do other things.
You could look into blockproc, which can process images on a parallel pool, but it's mainly intended for filters rather than geometric transforms.
  4 Comments
Nic Bac
Nic Bac on 7 Nov 2024 at 9:42
understood, then I indeed misinterpreted that sentence. I guess the only option I have is to split the array into smaller ones and run the code there.
Thanks
Joss Knight
Joss Knight on 7 Nov 2024 at 10:01
I think if that option is available to you, then it means you can use blockproc to do it automatically. blockproc has automatic parallel support.

Sign in to comment.

More Answers (3)

Umar
Umar on 5 Nov 2024 at 18:29

Hi @Nic Bac ,

As mentioned in the documentation,

https://www.mathworks.com/help/images/ref/imwarp.html

specifying the OutputView parameter can enhance performance by defining the output size and location. For large images, defining an appropriate output view can minimize unnecessary computations and speed up processing:

   outputView = affineOutputView(size(refimg), tform);
   [Imagenew, Rnew] = imwarp(refimg, Rimage, tform, "OutputView", 
    outputView, "cubic", FillValues=0);

If performance is still an issue, consider using other functions or techniques that may better utilize multi-core capabilities. For example, if you have access to a compatible GPU, leveraging GPU computing could significantly speed up your image transformations:

     refimgGPU = gpuArray(refimg);
     [ImagenewGPU, Rnew] = imwarp(refimgGPU, RimageGPU, tform, "cubic", 
     FillValues=0);
     Imagenew = gather(ImagenewGPU); % Transfer back to CPU

For extremely large images or when dealing with multiple images, consider breaking down the image into smaller tiles or batches and processing them individually in parallel:

   % Example of tiling approach (pseudo-code)
   parfor i = 1:numTiles
       [Imagenew{i}, Rnew{i}] = imwarp(tiles{i}, RimageTile{i}, tform);
   end

Please bear in mind that large images require significant memory resources. Ensure your system has enough RAM to handle the image sizes you are working with and using a recent version of MATLAB that supports advanced features in parallel computing and GPU acceleration. Versions beyond R2021a have improved support for these capabilities. Also, utilize MATLAB's built-in profiler (profile on; ...; profile viewer;) to identify bottlenecks in your code execution and determine whether imwarp is indeed the limiting factor.

Hope this helps.

  2 Comments
Nic Bac
Nic Bac on 6 Nov 2024 at 8:47
Hello Umar, thank you for the detailed suggestions. I did try using gpu arrays, but unfortunately this option is not supported for the “lwm” case.
Perhaps I wasn’t fully clear on the principal issue I’m facing, which is the fact that imwarp uses only 1 core out of the many available, which shouldn’t be (at leas as far as I understand it) since matlab help does say that cpu acceleration is supported.
I initially thought that the issue was that the parallel pool was not enabled, but that doesn’t seem to solve the fact that imwarp uses only 1 core – this is independent on the size of the array.
Umar
Umar on 7 Nov 2024 at 15:12
Hi @Nic Bac,
Completely understand. In my opinion @Joss Knight provided some good suggestions.

Sign in to comment.


埃博拉酱
埃博拉酱 on 6 Nov 2024 at 14:35
Edited: 埃博拉酱 on 6 Nov 2024 at 14:44
There are two different levels of parallel acceleration in MATLAB, and you need to check if you are confusing them.
  1. Parallel pools, which rely on the Parallel Computing Toolbox, need to be explicitly specified using syntax such as parfor. Each parallel thread calculates independently and cannot share data.
  2. Automatic parallelization, does not depend on the toolbox, and does not need to be explicitly specified. During the vectorization of large arrays, the main process of MATLAB automatically invokes parallel computation. However, automatic parallelization is disabled when parallel pool (1) is enabled.
This means that if you choose option 1, you have to manually split the data into undependent chunks and distribute them across different worker processes, each of which can only use one CPU core. Conversely, if you want to take advantage of the automatic parallelization of option 2, you must not use any parallel pools, but compute entirely on the main process. In general, MATLAB will automatically apply parallelized calculations for you.
If you ensure that the calculations are only in the main process, and MATLAB does not initiate automatic parallelization, it most likely means that the algorithm you are using cannot be parallelized. You can check out the LWM algorithm explained in the documentation. Given that you have already mentioned that your algorithm cannot be executed on a GPU, it is likely that this is because the algorithm is not logically designed for parallelization, and the parts of the image may have interdependent relationships. As far as I know, very few algorithms that can be executed in parallel cannot be accelerated by a GPU. If this is the case, the only accelerating method you have available is to process multiple different images simultaneously on different CPU cores. While each image itself uses single-core computation, you can still effectively utilize multi-core CPU if you have a large number of images.

Matt J
Matt J on 5 Nov 2024 at 20:17
20k x 20k is an incredibly high resolution. Do you really need it, and if so, do you really need to use cubic interpolation, as opposed to computationally simpler linear interpolation? Also, what data type are these images? Are they integer type (uint8,uint16, ...) or floats?
  1 Comment
Nic Bac
Nic Bac on 6 Nov 2024 at 8:48
Hello Matt J. Eventually I will need to scale to that size and use cubic interpolation, images are either single or double. The size of the array though is not my main problem, which instead is the fact that imwarp uses only 1 core out of the many available, this is true even for small arrays.
In my case GPU acceleration is not possible, nevertheless the machine does have plenty of cores (72) and RAM (512GB) to do the job, but so far I have been unsuccessful in making imwarp use more cores efficiently.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!