Error using gop -> Error detected on worker N -> Error during serialization

1 view (last 30 days)
I'm receiving an error running distributed code on a cluster;
Error using gop (line 75)
Error detected on worker 5.
Error during serialization
Error in gplus (line 24)
y = gop(@plus, x, labTarget);
Error in hamiltonian>(spmd body) (line 654)
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
Error in hamiltonian>(spmd) (line 611)
spmd
Error in hamiltonian (line 611)
spmd
Error in relaxation (line 111)
[L0,Q]=hamiltonian(assume(spin_system,'labframe'));
Error in decoherence_naphthalene (line 53)
R=relaxation(spin_system);
}
The code within the spmd block is;
spmd
% Localize the problem at the nodes
partition=codistributor1d.defaultPartition(nterms);
codistrib=codistributor1d(1,partition,[nterms 1]);
local_terms=getLocalPart(codistributed((1:nterms)',codistrib));
% Preallocate the local Hamiltonian
H=mprealloc(spin_system,1);
if build_aniso
Q=cell(5,5);
for m=1:5
for k=1:5
Q{k,m}=mprealloc(spin_system,1);
end
end
end
% Build the local Hamiltonian
for n=local_terms'
% Compute operator from the specification
if descr.S(n)==0
oper=operator(spin_system,descr.opL(n),{descr.L(n)},operator_type);
else
oper=operator(spin_system,[descr.opL(n),descr.opS(n)],{descr.L(n),descr.S(n)},operator_type);
end
% Add to relevant local arrays
H=H+descr.H(n)*oper;
if build_aniso
for m=1:5
for k=1:5
if abs(descr.T(n,k)*descr.phi(n,m))>spin_system.tols.inter_cutoff
Q{k,m}=Q{k,m}+descr.T(n,k)*descr.phi(n,m)*oper;
end
end
end
end
end
% Collect the result
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
end
Matlab seems to have no problems starting and connecting to the pool of workers;
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
Starting...
Starting parallel pool (parpool) using the 'local' profile ... connected to 12 workers.
[decoherence_naphthalene > create ] Spinach root directory determined to be /home/******/kec30/spinach_1.4.2114
Some background: I'm running this through an SGE queuing system (although I get the same problem running interactively). It always returns the error with "worker N" where N < # cores in parpool.
I'm running these simulations on Matlab R2014a (8.3.0.532) 64-bit (glnxa64) using code provided in spinach 1.4.2114. It's a 10 spin system (4^n matrix elements), and the code appears to start on my laptop but quickly maxes out my ram (8 GB, about 4 available for Matlab). On the cluster, I've tried reducing the memory dramatically by going from the full system (10 spin-coherences) to a greatly reduced system (3-spin coherences, with a distance cutoff) (which is probably to few), but I still encounter the same problem. Since this seems to be more of a Matlab error than a spinach error, I thought I'd ask here for advice.
  2 Comments
Edric Ellis
Edric Ellis on 8 Aug 2014
What is the type of 'H'? Can it be saved to a MAT file successfully (this is required as the communication system uses the same mechanism when transferring data)? Also, I note that Q is a cell array - I would not expect gplus(Q,1) to succeed as that's trying to add together each worker's version of 'Q' (but I think that would give you a different error message).
Kevin Claytor
Kevin Claytor on 8 Aug 2014
Edric,
Thanks for your reply. I had to comment out the problematic line;
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
In order to run whos 'H' and try to save H. Both of these I was only able to do after the spmd block.
Running whos 'H' yields:
Name Size Bytes Class Attributes
H 1x12 2937 Composite
And attempting to save H gives:
Warning: Saving Composites is not supported.]
> In BaseRemote>BaseRemote.saveobj at 44
In hamiltonian at 658
In relaxation at 111
In decoherence_naphthalene at 53]
Regarding your comment about the cell array - Spinach has a few overloads (plus, minus, times) implemented for cell arrays. Here is the one for plus;
% Adds cell arrays element-by-element.
function A=plus(A,B)
if iscell(A)&&iscell(B)&&all(size(A)==size(B))
for n=1:numel(A), A{n}=A{n}+B{n}; end
else
error('cell array sizes must match.');
end
end

Sign in to comment.

Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!