MATLAB Answers

Why do I receive an error when attempting to start a worker for the MATLAB Parallel Server?

13 views (last 30 days)
When starting a worker with the MATLAB Parallel Server, either the call to the STARTWORKER script hangs, or I receive an error that looks something like this:
The mjs service on the host hostname returned the following error:
The MATLAB worker exited unexpectedly while starting.
The cause of this problem is:
============================================================================
Most likely, the MATLAB worker failed to start due to a licensing problem,
or MATLAB crashed during startup. Check the worker log file
C:\TEMP\MJS\Log\hostname_workername_05-12-15_14-29-00_578.log
for more detailed information. The mdce log file
C:\TEMP\MJS\Log\mjs-service.log
may also contain some additional information.
============================================================================
Script startworker unable to complete successfully - exiting
I would like to resolve this issue so that I can start a worker.

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 6 Dec 2019
Edited: MathWorks Support Team on 6 Dec 2019
These errors generally occur when one of the following is true:
1. The worker failed to checkout the license from the license server
2. There was a crash in the worker process on startup
3. There is a problem with the MJS service configuration
Read below for possible causes and the solutions to above problems:
When you start a worker on any worker node there should be a worker log stored locally on that machine. Generally this log file will be stored in:
C:\TEMP\MJS\log (For Windows)
/var/log/mjs (For Linux/Unix/Mac)
The log will generally be called <hostname>_<workername>_<date>.log.
If this log exists, it generally will contain an error message as to why the worker did not start correctly. If this log contains a license manager error, or if you received a license manager error when you attempted to start the worker, see the "Licensing Issues" section below.
If there is not a clear error message, try resetting your MJS service configuration on all your nodes and try restarting the worker. To do so, see the "Resetting MDCE Service Configuration" section below.
Starting a Worker Outside of MJS
==========
If the worker log does not contain an error message or if the log does not exist, check to see if you can start a worker outside of the MJS service. To do so run the following commands from a Terminal or DOS Command prompt:
cd $MATLAB\bin
matlab -logfile worker.txt -dmlworker -nodisplay -r exit
(where $MATLAB is the installation folder for MATLAB on your machine. For Linux, Unix & Mac, you may need to append "./" before the command.)
Check the worker.txt file that is created in $MATLAB/bin. If the worker is able to start correctly, you should see the following in the file:
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
If you see a license manager error or if the file is blank, see the "License Issues" section below. If you see an error message, search the MathWorks support site for assistance, or contact Installation and Licensing Support here:
Licensing Issues:
==========
If you received a license manager error when starting the worker, see the list of license manager errors on the support site here:
If you started the worker outside of the MJS service and the log file was blank, you can check the license manager log file on the license server machine to see if there is an error recorded there. This log is generally the following:
$MATLAB\etc\win{32|64}\lmlog.txt ($MATLAB\flexlm for releases prior to 2010b)
/var/tmp/lm_TMW.log (For Linux/Unix/Mac)
(where $MATLAB is the installation folder for MATLAB on your machine)
If there is no information in the license manager log file, check the connectivity between the worker node and the license manager. Make sure that the worker can ping the license manager via the hostname of the machine. If not, this will need to be resolved to start the worker. Contact your network administrator for assistance.
Alternatively, try starting a worker on another machine (if possible) to verify that the license server is up and running and your license file is correct.
After resolving issues with the license file, follow the steps in the Resetting MJS Service Configuration below:
Resetting MJS Service Configuration:
==========
As long as you can start a worker outside of the MJS service (see above), you may be able to resolve your issue by simply resetting the MJS service configuration. To do so you will need to perform the following steps:
1. Stop MJS Service. You can do so in the Windows Services or use the following commands in a Terminal for non-Windows machines
cd $MATLAB\toolbox\parallel\bin
./mjs stop
(where $MATLAB is the installation folder for MATLAB on your machine)
2. Delete all the log files from /var/log/mjs or C:\TEMP\mjs\log
3. Restart MJS Service. You can do so in the Windows Services or use the following commands in a Terminal for non-Windows machines
cd $MATLAB\toolbox\parallel\bin
./mjs start
4. Restart the Job Manager
5. Restart the workers
For instructions for steps 4&5 refer to the Stage 2 instructions of the Distributing Computing Installation instructions here:
If you have tried all of the above and are still not able to start the worker, contact the Installation Team here:
*NOTE: *Starting in R2019a the following changes occurred:
• MATLAB Distributed Computing Server was renamed to MATLAB Parallel Server
• mdce_def was renamed to mjs_def
• mdce binary was renamed to mjs
• mjs scripts are in $MATLAB/R20XXx/toolbox/distcomp/bin for R2019a and earlier

  0 Comments

Sign in to comment.

More Answers (0)

Sign in to answer this question.

Tags

No tags entered yet.