Effectiveness of implicit and explicit parallelization of fft2
Show older comments
Hi,
I have many images on which I want to do some FFTs. I want to accelerate this image processing with multithreading. I am wondering why explicit parallelization with parfor as shown in the following example is not largely more efficient than the implicit one from the matlab function fft2 :
clear
%% parameters & variable declaration
N_slice = 40 ;
N_img=80;
a = rand(1024,1024,N_slice);
for ii=1:N_img
img{ii}=rand(1024,1024);
out{ii}=0;
end
%% multithreading inside the loop
maxNumCompThreads(4);
% 39 s with 4 threads
% 88 s with 1 thread
% parallelization efficiency of 88/(39*4)=56%
tic
for ii=1:N_img
out{ii}=sum(ifft2(a.*fft2(img{ii})),'all');
end
toc
%% parfor version
% 33 s with parpool(4)
% parallelization efficiency of 88/(32*4)=68%
maxNumCompThreads(1); %avoid the relative inefficiency of fft2 parallelisation
pool = parpool('Processes',4,'SpmdEnabled',false);
b=parallel.pool.Constant(a);
tic
parfor ii=1:N_img
% out{ii}=sum(ifft2(b.Value.*fft2(img{ii})),'all');
out{ii} = fun(b.Value,img{ii});
end
toc
function out=fun(x,y)
% Just in case, I read somewhere that parpool might be faster
% using functions rather than script - change nothing here.
out = sum(ifft2(x.*fft2(y)),'all');
end
FFT2 is supporting multithreading by default and on my pc with 4 cores, I get a 56% multithreading efficiency (defined a T1/(T2*N) with T1 the time to do a task with one thread, T2 the time with N thread).
I am expecting to get a larger multithreading efficiency by creating a pool of 4 workers, and making them work on my images in paralle. Because there is so little data transfer and so much computation time I am expecting to get a multithreading efficiency close to one. But I only get around 68% as you can see in my exemple. Why is that ?
Matthieu
Accepted Answer
More Answers (1)
Pratyush
on 18 Mar 2024
0 votes
Hi Matt,
The observed discrepancy in multithreading efficiency when using MATLAB's "parfor" for FFT operations on images, achieving around 68% efficiency instead of the expected near-perfect efficiency, can be attributed to several factors:
- Setting up parallel pools, transferring data, and scheduling tasks introduce overhead that reduces overall efficiency.
- MATLAB's "fft2" is already optimized for multithreading, making explicit parallelization less beneficial.
- Due to Amdahl's Law, as more workers are added, the speedup gained from parallelization tends to plateau because of the non-parallelizable portions of the task and increasing overhead.
- Competition for CPU time and memory bandwidth among workers and the main MATLAB process can hinder performance.
- While "parfor" is efficient for many tasks, its overhead can impact tasks that are already optimized, like FFT operations.
The gist is, achieving near-linear scaling in parallel computing is challenging, especially for already optimized operations. The 68% efficiency observed is relatively good, considering these factors. Optimizing parallel efficiency might involve minimizing data transfer, experimenting with the number of workers, and ensuring optimal hardware and MATLAB configurations.
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!