Parfor reports error which does not exist when running as a for-loop

Hi,
To speed up some calculations I am using a parfor-loop. I have to run calculations on many files and I made a simple parfor-loop which runs a function on all these files. When analysis of one file is finished, the results are saved on disk. So, in principle, there is no communication between the different workers.
I have 12 workers (local) and for each worker the first run goes without problems. Then however I always get an error message like this (where this happens exactly can vary, but the type of message is always the same):
Error using parallel_function (line 598)
In an assignment A(:) = B, the number of elements in A and B
must be the same.
Error stack:
myfunc.m at 162
func>(parfor body) at 45
Error in func (line 14)
parfor ii=151:303
When I run the code in a for-loop, there is no error-message.
I have tried several things, but did not find a solution. The problem is that I can't debug this error, because it does not happen when I don't use parfor.
The only thing that works is to reduce the amount of workers. When I choose 6 workers, the error doesn't show up.
My temporary solution was to start 2 Matlab sessions, give them each a pool of 6 workers and divide the work manually between the 2 Matlab sessions.
This solution however does not work. In the 2nd Matlab session, the old error appears again after a short while. I really don't understand what the problem is...

10 Comments

This is going to be difficult for us to debug without knowing more about the code.
Well, I don't think it's the code, since everything works fine without parfor and with parfor having 6 workers. Also debugging doesn't seem to work, because there are no errors normally.
If it would be a problem with the code, what am I looking for?
Just in case, this is the code where the error occurs:
for ii=2:nlevels
%now make unique numbers in each stack
prevmax=max(max(max(C{ii-1})));
C{ii}(C{ii}>0)=C{ii}(C{ii}>0)+prevmax;
end
C is a cell array containing 20 three-dimensional arrays.
Is the above for-loop the thing you're trying to convert to a parfor loop? It doesn't look possible. C can't be classified as any of the legal kinds of variables that parfor allows
No no, that is not the problem. As I explained in the original post, I am running one function on a large series of images. Results are saved to separate files on disk during each iteration of the loop.
The images are all independent of eachother (at least for what the function does). So I run that complete function in a parfor loop and the only input that changes in this loop is the filename. There is no information transfer between the iterations of the parfor-loop.
Simplified it looks like this:
parfor ii=1:2000
currfile=[filepattern num2str(ii) '.tif'];
runanalysis(currfile);
end
And as I asked in a comment to your answer that you deleted, could it have to do with the large size of the cell arrays I use? Especially since the error does not happen anymore when I reduce the number of workers from 12 to 6.
C is for example 20x1 cell array containing 20 times a 700x700x100 uint16 array
I don't think it would be due to the large size (not the error message you've shown at any rate). Walter's suggestion would be the prime candidate in my mind. Somehow, some of the C{ii} (and hence prevmax) ends up empty, contrary to what you intend. You need to verify the dimensions of the C{ii}.
Apparently I am not very good in explaing what is going wrong.
I have 2000 images.
On each of them I want to run the same function. There is no transfer of information from one iteration to the next.
This error happens somewhere inside this function (which is very long, but contains the piece of code I showed several posts up).
Since the only thing that changes every iteration is the filename, there is no difference between using a for-loop or a parfor-loop, except that in the case of the parfor-loop, several files are analyzed (independently) at the same time.
The error never happens when I run the function (which, again, has only the filename as input) in the for-loop. Therefore I strongly believe that the error has something to do with how matlab deals with running parallel computations. Especially since changing the number of 'labs' from 12 to 6, stops the error from happening. It can't have anything to do with this C{ii}. Prevmax does not end up empty, because C{1} to C{nlevels} always exist.
Therefore I strongly believe that the error has something to do with how matlab deals with running parallel computations... It can't have anything to do with this C{ii}.
It's still conceivable that both of the above are true simultaneously, i.e., a difference between parallel and serial modes of computation is causing the C{ii} to be read in corrupted in some cases.
We have to start by examining the C{ii} because we have nowhere else to start, and because ample evidence you provided points to it. The error message you posted says there is a dimension mismatch error. Furthermore, you insisted that this error is occurring in the line
C{ii}(C{ii}>0)=C{ii}(C{ii}>0)+prevmax;
That has to mean that prevmax is for some reason either empty or non-scalar some of the time. We must seek ways to trap that condition.
Thanks for all your suggestions. In the end I decided to put everything in a try - catch - continue sequence, while storing the ID's of the failed files.
In this way an error does not stop the parfor loop and when it's finished I can just restart it on the failed files. Normally it runs perfectly fine the second time on these failed files.

Sign in to comment.

Answers (2)

You would get that problem if C{ii-1} was empty, leading to prevmax being empty.
Remember, when you have a parfor loop, the iteration for the any particular value (e.g., #9) might be done at any time relative the iteration for the previous value (#8 in this example), so the assignment to C{8}(C{8}>0) might not have been performed before iteration #9 that calls upon C{8}. Indeed, parfor usually starts from the end. This differs from regular for.

3 Comments

Sorry, but this for-loop is not the one that I changed into parfor, and I never claimed that I did that.
Put in a try/catch that reports the size of prevmax when the problem is triggered
In parallel mode, you'll probably need to do
disp(prevmax)
to report prevmax.

Sign in to comment.

You might also consider using PMODE to troubleshoot. This will allow you to step through different commands and see their results in the parallel command window.

Categories

Asked:

on 25 Aug 2013

Commented:

on 27 Jul 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!