How to read file sequentially in Parfor loop?

Question

Haoyu Wang on 16 Sep 2015

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/243426-how-to-read-file-sequentially-in-parfor-loop

Answered: Edric Ellis on 17 Sep 2015

I am running parfor loop to analyze experiment data files on a local cluster with Intel Xeon Quad-core CPUs (8 Dual-CPU computers, 64 physics cores in total).

Each cluster node has Windows Server 2008 R2 Datacenter OS and MATLAB R2013a with Parallel Computing Toolbox and MATLAB Distributed Computing Server.

The files were written to the hard drive sequentially as wfm1.bin, wfm2.bin, wfm3.bin, ... until wfm42000.bin.

The file size is 7.00 MB each, and it took 11.7 seconds to analyze one file on a single core.

I built a parfor loop, to let each core in the cluster to:

(1) Read 1 file from the shared directory in the storage server #1 (node00);

(2) Analyze the data extracted from this file, and save the result (a 13 kB file) to another shared directory in storage server #2 (node16)

But when I open a matlabpool with size bigger than 32, the network data traffic from storage server gets jammed easily (Maximum 13 MB/s output rate for 1Gbps network interface across the entire cluster, all nodes are equipped with SATA3 6.0Gb/s HDD, and the point-to-point file transfer rate can reach 100 MB/s using Windows Explorer). I believe this conflict is caused by reading multiple non-consecutive files stored on different physical location on the same hard drive.

Is there any methods to control the parfor session to read files one after another one, in order to avoid the network traffic jam?

Other parallel solutions are also appreciated.

Thank you!

Here is the skeleton of my code:

cluster_size=32;
Bin_Folder_Name = '\\node00\New_RawData\';
Dat_Folder_Name = '\\node16\Fitted_Data_Storage\';
matlabpool('Cluster', cluster_size)
parfor j=1:42000
   File_name=sprintf('wfm%d.bin', j);
   Bin_File_name=strcat(Bin_Folder_Name, File_name);
   File_name=sprintf('result%d.dat', j);
   Dat_File_name=strcat(Dat_Folder_Name, File_name);
   API_Mul_5_1_Sub(Bin_File_name, Dat_File_name)
end
matlabpool close

2 Comments
Show NoneHide None

Kirby Fears on 16 Sep 2015

Open in MATLAB Online

Reading files sequentially goes against the entire idea of simultaneous parallel computing with parfor. Have you benchmarked the speed of this with a regular for loop?

I'm not sure if it would help, but you could break your parfor into fewer iterations with a non-parallel for loop inside to give you sequential file reading.

Below is an example of only 4 parallel threads that are each reading a sequential subset of your files.

parfor j=1:4,
for k=(1:10500 + (j-1)*10500),
   File_name=sprintf('wfm%d.bin', k);
   Bin_File_name=strcat(Bin_Folder_Name, File_name);
   File_name=sprintf('result%d.dat', k);
   Dat_File_name=strcat(Dat_Folder_Name, File_name);   API_Mul_5_1_Sub(Bin_File_name, Dat_File_name)
end
end

You could play around with j (try j=1:2, 1:4, etc) to see if a smaller number of parallel jobs helps.

Haoyu Wang on 16 Sep 2015

Thank you Kirby, but I thought the method you proposed will make the workers in matlabpool read non-sequential files when they are running.

Inspired from your comment, I build a for-[parfor-end]-end double layer stucture, and this works slightly better: I can have 40 cores running in parallel without having any traffic jam in the network.

Again, Thank you for your help!

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 16 Sep 2015

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/243426-how-to-read-file-sequentially-in-parfor-loop#answer_192655

Unless the hard drive has been optimized to place sequentially named files near each other, it does not matter for disk utilization whether a group of similarly named files is read per parfor iteration or if only one file is read per parfor iteration.

For some operating systems, there are disk optimizers that will group files by name.

But with it taking 11 seconds to process each file, if you read-process-read-process in a single parfor iteration then that is a lot of "dead time" on disk I/O between processing sequentially named files; you might as well not bother. It could make a difference, though, if you used read-read-read-read-process-process-process-process.

To improve disk I/O, consider writing the files grouped together, so that reading one metafile grabs several original files (with each metafile being a single consecutive block on the disk.) If nothing else comes to mind, ZIP them together and use one of the methods to unzip to memory; https://www.mathworks.com/matlabcentral/newsreader/view_thread/290857 or https://www.mathworks.com/matlabcentral/newsreader/view_thread/240060 or http://www.mathworks.com/matlabcentral/newsreader/view_thread/290817 . And if your compute nodes have temporary disk space then you could resort to copying a large .zip file from the storage server, unzipping it to the local temp space and reading the individual files from there. Combine this with a randomized initial delay so that the workers are not all trying to hit the storage server at the same time.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Edric Ellis on 17 Sep 2015

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/243426-how-to-read-file-sequentially-in-parfor-loop#answer_192703

Open in MATLAB Online

To get finer-grained control over the ordering of parallel operations, you can use spmd blocks instead of parfor loops. The basic pattern would be to do something like this:

spmd
  for idx = 1:numlabs:(numFiles + numlabs)
    myFileIdx = idx + labindex - 1;
    if myFileIdx <= numFiles
      % process file with index myFileIdx
    else
      % skip - we've passed the end
    end
    % The "labBarrier" call here forces all workers to
    % wait until they all reach this call. This stops
    % workers from racing ahead.
    labBarrier();
  end
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How to read file sequentially in Parfor loop?

2 Comments
Show NoneHide None

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

How to read file sequentially in Parfor loop?

2 Comments Show NoneHide None

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments