I have some code, part of which a structure of the form below. for a = 1:A for b = 1:B for c = 1:C for d = 1:D for j = 1:2 cellArray = manipulate(cellArray,a,b,c,d,j,params); end end end end end A,B,C and D are constant. cellArray is a large cell array, each entry of which is a large, multidimensional matrix. The function 'manipulate' needs to know the loop parameters, a, b, c, d and j, as well as some other fixed parameters, 'params', and modifies only cellArray, simply by adding numbers to elements of its constituent matrices. The loop parameters, a, b, c, d and j specify which subset of the data is used and which subset is to have numbers added to it, with j having a slightly different status to a, b, c and d in that it partitions the cells into two distinct subsets, one of which is read and the other modified for j=1 and vice versa for j=2. I've squeezed every last bit of performance that I can out of my code, and now I want to parallelize it. My question is, how can I distribute access to cellArray and the other parameters across parallel workers and run 'manipulate' in parallel? I have a mental picture of cellArray as a large pool of data, with the parallel workers as little robots buzzing around the pool, reading what they need and then going in and adding something to the appropriate bit of the pool. I feel like this ought to be possible...or am I just crazy??

Struggling to distribute a cell array across several parallel workers

Adam on 4 Sep 2014

As a start you would need your expression to be accessing specific parts of cellArray that are independent rather than the entire cell array.

As it is right now you would have to pass the whole cell array to each worker, but even then they would potentially be writing over the top of each other even if you were allowed to do it.

for example, something like

for a = 1:A
  ...
     parfor j = 1:2
         cellArray(j,:) = manipulate( cellArray(j,:), a, b, c, d, j, params);
      end
   ...
end

can work because the parallel workers each access distinct parts of the cell array.

Obviously in this case with only 2 times round the for loop a parfor at that level would not make much sense though.

John Billingham on 4 Sep 2014

Open in MATLAB Online

Unfortunately the structure of the array and the way the parts of it are accessed depends on a,b,c,d and j in a way that makes it impossible to do this, even in principle.

How about if I were to change the function 'manipulate' so that its output was, say, dcellArray, instead, so that the structure became

for a = 1:A
     .....
     parfor j = 1:2
       dcellArray(j) = manipulate(cellarray,a,b,c,d,j)
     end
     for j = 1:2
        cellArray(j) = cellArray(j)+dcellArray(j);
      end
     ......

end

or some variant on this. Is that possible? It's really not what I'm after, but it would be something.

Adam on 4 Sep 2014

That would at least allow you to call a parfor loop, but whether it is helpful or not I'm not sure.

Have you read the following Matlab Help page?:

http://www.mathworks.co.uk/help/distcomp/programming-considerations.html

It may help you to understand exactly what you can and can't do with a parfor loop to restructure your algorithm and loops. In particular, slicing variables is of importance in your case.

Guillaume on 4 Sep 2014

At first glance, your algorithm does not appear to be parallelisable since you're modifying the cell array that is going to be used on the next iteration. Usually, if you write code like this, it means that the order of operations matters.

If it truly does not, then I don't see why you can't split your manipulate function in two. One bit calculate the destination (different for each iteration), one bit calculate the source.

Matt J on 4 Sep 2014

Edited: Matt J on 4 Sep 2014

I agree with the others. So far, there's no obvious parallel structure in your code. One remark though, is that there is no apparent logic to having your data split into multi-dimensional arrays in different cells. If all you are doing is incrementing individual elements of the arrays, you could just as well concatenate all the data into one long vector. Furthermore, you could be using linear indexing instead of the complicated multi-dimensional subscripting (a,b,c,d,j). This should make data access faster.

It also seems to me that at least a part of what manipulate() is doing is just an accumarray operation. Even if you can't parallelize the additive incrementing of your array data, you might still be able to use parfor to pre-assemble input data (subs,val) for a call to accumarray. Without a clearer picture of what's going on inside manipulate(), though, this is just speculation.

John Billingham on 4 Sep 2014

I'm starting to realize that my code, and I think the problem I'm solving, indeed doesn't have an obvious parallel structure. I have considered using linear indexing, and if you think that this will make my code faster I should give it a try.

I've written a MATLAB code to solve two player, postflop, holdem poker. The indices a,b,c, and d are the flop holdings and turn and river cards. The index j is over the two players. The function 'manipulate' does counterfactual regret minimisation on the players' strategies. It really does flicker about across the data structure in a way that isn't obviously parallelizable......although the literature suggests that it is. Looks like I have some more reading to do. Thanks for your help everyone.

John Billingham on 10 Sep 2014

Final update: I've modified the algorithm that I'm using, and now have no problem parallelising it. Thanks again.

Struggling to distribute a cell array across several parallel workers

7 Comments
Show 5 older comments Hide 5 older comments

Answers (0)

Categories

Products

Tags

Community Treasure Hunt

Struggling to distribute a cell array across several parallel workers

7 Comments Show 5 older comments Hide 5 older comments

Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

7 Comments
Show 5 older comments Hide 5 older comments