avoid sending large array to all workers in parfor loop

5 views (last 30 days)
Hello --
I'm processing some fairly large point clouds which are stored as 8-column tables with billions of rows (stored as datastores/tall arrays, but that's probably not important here).
In my workflow I load up a reasonable chunk of the point cloud (~3.5 GB) into memory and then I ultimately generate an image based on the data. The image is 37x44 pixels which means that I'm indexing into this point cloud 37 times row-wise and 44 times column wise. This is a very parallelizable task and am running the outer loop with parfor. However, frequently, I'm erroring out because workers abort. The workers seem to abort when my memory hits my limit (32 GB).
I think my problem is obvious when you look at my code below but i'm not sure the best way to fix it. Note below where pcl_line is subset from pcl. I'm assuming that here pcl (which is my 3.5 GB variable) is still being sent to every worker which seems bad. How can I avoid this though? Is this a job for C = parallel.pool.constant(pcl)? Seems promising but my knowledge here is a bit shaky. If not, other thoughts? -- Thanks much, Mike
%set up holders for image outputs
tot_skew = NaN(n_num_px, e_num_px); %just two example output images of many
tot_kurt = NaN(n_num_px, e_num_px);
parfor ii = 1:length(n_chunk_bounds)-1
%temporary vars by row
temp_skew = NaN(1,e_num_px);
temp_kurt = NaN(1,e_num_px);
%slice pcl by line (1/37th size of pcl since there are 37 rows in output image
sub_idx_n = find(pcl.n<n_chunk_bounds(ii) & pcl.n>=n_chunk_bounds(ii+1));
pcl_line = pcl(sub_idx_n,:); %<--guessing this is the problem since pcl is still inside the parfor loop?
%run code for each output pixel for a given image line
for j = 1:length(e_chunk_bounds)-1
sub_idx = find(pcl_line.e>=e_chunk_bounds(j) & pcl_line.e<e_chunk_bounds(j+1));
temp_skew(j) = skewness(pcl_line.h(sub_idx));
temp_kurt(j) = kurtosis(pcl_line.h(sub_idx));
end
%final assignment
tot_skew(ii,:) = temp_skew;
tot_kurt(ii,:) = temp_kurt;
end
  4 Comments
Walter Roberson
Walter Roberson on 27 Jan 2019
You are currently breaking up the area based upon values stored in the vector n_chunk_bounds indexed at ii. MATLAB does not look back and analyze how those bounds are created but you can. For example it could hypothethically be the case that n_chunk_bounds(ii) = some_minimum + some_integer_stride * ii -- a linear equation. If so then even though computing n_chunk_bounds ahead of time would seem to be more efficient, MATLAB would find it easier to analyze the portions of pcl that are needed if the n_chunk_bounds(ii) were replaced with some_minimum + some_integer_stride * ii inside the parfor: then it would be able to figure easily that it should send some_integer_stride width to each worker, with proper formulation.
This can only work if the chunks to be extracted are consistent size.
Otherwise you should break them up ahead of time into cell arrays, as parfor does know to only send the memory associated with the content of the indexed cell to the worker.
Michael Alonzo
Michael Alonzo on 27 Jan 2019
Ok, I get that and think that's likely possible. For the moment, I went ahead and tried the cell suggestion. This seems to be working well. I'd consider this answered. Thanks for the ideas.
Mike

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 27 Jan 2019
[...] extract the chunks outside the parfor loop, into a cell array, and index the cell array inside the loop.

More Answers (0)

Products


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!