Is it possible to set a remote folder on HPC as a current folder?

Hi everyone!
I would like to know if there is any way to set the current folder on a remote location. I am working using matlab parallel server and since my data is huge I put data and functions inside a folder DATA locatd in the cluster and send job by batch as follow:
c = parcluster('short');
job = batch(c, 'forHPC', 'Pool', 27, 'CurrentFolder', '~/DATA', 'AutoAddClientPath', false, 'AutoAttachFiles', false);
It perfectly works without transferring data between my computer and workers and finally it saves outputs in the folder DATA in the cluster.
I want to do the same thing in the interactive mode inside Matlab. Put differently, I would like to work with matlab interactively on my computer while my workers and current folder are remote to avoid trasferring data. Is it possible?
Any idea is highly appriciated.

Answers (1)

Can you provide a bit more info on how you'd like this to work. Given that you're using parcluster, batch, etc., I suspect you're also familiar with parpool -- is that what you mean by running MATLAb in interactive mode? Would the following work?
c = parcluster('short');
p = parpool(c,28);
pctRunOnAll cd ~/DATA
forHpc
forHpc runs locally, except when you hit parfor, spmd, etc. That code (e.g. parfor) will be run on the cluster and the workers will run in ~/DATA. Any code in forHpc that is not embedded within a parfor or spmd block, distributed array, will be run on your local machine.
Does this address what you're asking? You didn't mention if your local machine and the cluster share a mount disk. Is ~/DATA reachable from your local machine?

7 Comments

Thank you Raymond for explanation.
My problem is exactly what you mentioned in the last line. My local machine and cluster do not share a mount disk and I am looking for a way to reach ~/DATA (that is not reachable simply by cd ~/DATA) from my local to set that remote folder as current folder.
Your code gives me an error in the 3rd line as follow:
Cannot CD to /Users/mostafa/DATA (Name is nonexistent or not a directory).
I can only reach ~/DATA when I send the job by batch (as you can see in my code in the first post) but it is not as convenient as parpool in my case.
I hope I have been clear in what I am looking for.
Rather than pctRunOnAll, try the following:
c = parcluster('short');
p = parpool(c,28);
% Change worker directory (but not on MATLAB client)
p.parfeval(@cd,0,'~/DATA');
forHpc
Thank you Raymond for modifying the code.
It seems it can not find smallTT.mat while it is in the cluster folder ~/DATA.
Here is the the error I received:
>> p.parfeval(@cd,0,'~/DATA')
forHpc
ans =
FevalFuture with properties:
ID: 25
Function: @cd
CreateDateTime: 18-Dec-2020 15:01:15
StartDateTime: 18-Dec-2020 15:01:15
Running Duration: 0 days 0h 0m 0s
State: running
Error: none
Error using load
Unable to read file 'small.mat'. No such file or directory.
Error in forHpc (line 1)
load('small.mat');
Note: I added a semicolon on the call to parfeval to supress the ans output.
p.parfeval(@cd,0,'~/DATA');
Is small.mat in the same directory as where you're running the local MATLAB client? Because if the error is being thrown on line 1, then I suspect the first thing forHPC is doing is calling
load('small.mat');
But this isn't happening on the short cluster. The only thing that is happening on the short cluster is executing code run within a parfor, spmd, etc. I don't have forHpc, but imagine the following:
load('small.mat'); % load the variables 'small' and 'S'
parfor idx = 1:small
a(idx) = rand * S;
end
plot(a)
The call to load and plot are happening on your machine, but the body of the parfor is happening on the cluster (hence the interactiveness). To run this code, MATLAB will send the values of small and S to each of the workers running on the cluster.
This is differerent then using batch (which is what I think you've been using up to this point) where everything is being offloaded and run on the cluster. With batch, MATLAB would have transfered small.mat as part of the job.
As you truly guessed, small.mat is not located in my local directory but on the remote folder ~/DATA . I was expecting p.parfeval(@cd, 0, ~/DATA) to change my current directory to that remote folder to avoid transferring data between local machine and remote workers. But if load is happening in my machine it is not what I was looking for. My problem is exactly loading all data in my local machine and sending them to the cluster.
In short, is there any way to work with remote workers and remote data by my local machine interactively?
Can you post forHpc? I have a thought or two, but would need to see what the code looks like.
I won't be checking this forum till the new year. Others may be able to provide some guidance as well.
The original forHpc uses several functions that I already created but I shortened it to the follwing code and still get the same error:
load('small.mat');
mdlSVMsmallOpt = fitcsvm(smallTT, 'o_member', 'OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',...
'expected-improvement-plus','ShowPlots', false ,'UseParallel', true)) ;
save('mdlSVMsmallOpt.mat', 'mdlSVMsmallOpt')

Sign in to comment.

Products

Release

R2020b

Asked:

on 13 Dec 2020

Commented:

on 19 Dec 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!