Parpool slow with chol operation

Question

0 votes

Hi, I found a bottleneck in my code, and I can't understand what is happening. I tried this in several computers and matlab versions (2014 and 2016) , and I found more or less the same pattern.

Suppose you have a large matrix operation, like a Choleski Decomposition of a large matrix. Well, if I parallelize across several cores (no matter how many, 4 ,8 ,16, regardless of the amount), I get that the single operation runs much slower. Example: Create this objects

z=randn(5200); Z=z*z'; zm=randn(1,5200);

open your parpool, and run a Chol. Decomp. in a parfor, you get for instance 3 seconds.

tic; parfor j=1:16; chol(Z,'lower'); end; toc

Now if you do the same in a standard sequential loop, you get the same result in similar time (or even a bit less)!!!

tic; for j=1:16; chol(Z,'lower'); end; toc

Why does this happen? What can I do to parallelize this code (my package does lot of things more than just a Choleski...) without penalizing so much the performance of a single task?

Many thanks in advance.

2 Comments
Show None Hide None

José-Luis on 18 Aug 2016

Edited: José-Luis on 18 Aug 2016

I am not sure I follow. How do you think you are parallelizing the decomposition? To me it looks like each parfor loop is computing the decomposition: you are just doing the same operation multiple times instead of dividing one operation between multiple threads.

I don't think you can achieve what you want with parfor in this case.

Francesco C on 18 Aug 2016

yes José, I have to perform this operation multiple times, it's exactly what I need to do. The point is, if there are 16 processes working in parallel, I can do 100 different Choleski decomposition much faster then doing them in sequence. The problem is that it takes much more time on a single operation when I use the parallel pool!

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Edric Ellis on 23 Aug 2016

1 vote

MATLAB's chol implementation is intrinsically multi-threaded. Therefore, chol is already fully utilising all the cores on your machine. If you have only local workers available, then you can't do any better than that. (Also note that Parallel Computing Toolbox workers run in a single-threaded mode by default to avoid oversubscribing the cores on your machine - which is why a single invocation of chol is slower inside parfor).

The only way to go faster is to use additional hardware - in the form of MATLAB Distributed Computing Server workers on additional machines.

1 Comment
Show -1 older comments Hide -1 older comments

Francesco C on 25 Aug 2016

Yes, Edric, this was also my guess, (see comment below to Matt J asnwer), and your detailed reasoning clears the picture, once and for all. Many thanks!

Sign in to comment.

Answer 2

Matt J on 18 Aug 2016

Edited: Matt J on 18 Aug 2016

Open in MATLAB Online

0 votes

The problem is that your Z-matrices need to be cloned and broadcast to each of the workers, which carries considerable overhead given their 5200x5200 size. If you create the Z matrices on the workers themselves, you get a more favorable comparison, e.g.,

>> tic; for j=1:16;  z=randn(5200); Z=z*z'; chol(Z,'lower');  end; toc
Elapsed time is 29.711892 seconds.
>> tic; parfor j=1:16;  z=randn(5200); Z=z*z'; chol(Z,'lower');  end; toc
Elapsed time is 18.471961 seconds.

1 Comment
Show -1 older comments Hide -1 older comments

Francesco C on 18 Aug 2016

Hi matt, thank you for your answer. Yes, in my original code I create the "Z" in each pool separately as you indicate, so that overhead communication does not involve large objects. But still, also in this case, you can see you're not grasping substantial benefits from massive parallelization: the chol operation is slower on a single execution. So that if you use a pool of 16 or even just 4, you don't run in 1/16 or 1/4 of the time. I suspect that chol , which is a built-in function, is optimized somehow to exploit multiple cores (hyper threading?), so that within parfor these efficiencies are not in place. But it is just a suspect.

Sign in to comment.

Parpool slow with chol operation

2 Comments
Show None Hide None

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Tags

Community Treasure Hunt

Parpool slow with chol operation

2 Comments Show None Hide None

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments