parfor and ticBytes/tocBytes

3 views (last 30 days)
Ben Ward
Ben Ward on 27 Oct 2017
Edited: Ben Ward on 8 Nov 2017
I am trying to use parfor on a 40-core machine. At the moment I see no improvement beyond using 16 cores in the loop, and I am wondering if this is related to communication overhead.
I have used ticBytes/tocBytes to establish that the data transferred per core is going down with the number of CPUs, but the total transferred is going up.
My question is, which of these statistics is most relevant to performance? Or, to oversimplify, is the most of the message passing effort in serial or in parallel?
Thanks, Ben

Answers (1)

Ankitha Kollegal Arjun
Ankitha Kollegal Arjun on 2 Nov 2017
Hi Ben,
I understand you want to know why your program that uses 'parfor' shows performance degradation with increase in number of workers.
There are a lot of factors that need to be taken into account when measuring the performance of parallel programs. Here are some troubleshooting steps which can be used to determine the bottlenecks and improve the performance:
1. Check the utilization of CPU cycles by the client MATLAB. Since each task is independent while using parfor, typically the client should utilize 0% of CPU cycles and the workers should utilize close to 100% of CPU cycles. If there is a lot of CPU utilization from the client MATLAB during parfor execution then it is quite possible that a continuous data exchange is happening between the client and the worker MATLAB sessions. Such a constant communication is an overhead and therefore should be avoided.
2. Profile the parallel code using the serial profiler, in order to obtain the profiling information on the client as well as the workers.
In general,the following suggestions can be adopted in order to improve the performance of a program that uses "parfor":
1. Make a local copy of the variable created outside "parfor", inside "parfor" and use the local copy in subsequent calculations. For variables of smaller size, this prevents continuous communication between the client MATLAB and the workers, thereby avoiding the communication overhead.
2. If the data is too large, it is better to save the data in a MAT file and load the MAT file directly on the workers by specifying the complete path to the MAT file. This prevents the overhead of a large data transfer between the client and the workers.
3. It is recommended that "parfor" should not be used inside a loop (say for loop).
4. The following documentation provides some tips on improving parfor performance:
Hope this helps.
  1 Comment
Ben Ward
Ben Ward on 8 Nov 2017
Thanks for your help. I've looked at the client vs worker CPU use, and it seems reasonable (given that I have to do a fair amount of work on the client).
I am also using parallel.pool.Constant to get my fixed variables onto the workers.
My question really was whether the data passed to each worker, or the total data passed to all workers, was more relevant to performance.
i.e. If you double the number of workers, and your data transferred to and from each worker goes down by 10%, are you better or worse off (bearing in mind that the total amount of data transferred has nearly doubled)?
I guess that the question is probably problem specific, but I thought I'ld ask anyway.
Thanks again for your response. It was helpful.
Ben

Sign in to comment.

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!