Unable to Start MatlabEngine Through Celery
    4 views (last 30 days)
  
       Show older comments
    
We have a Python application that runs several MATLAB tasks using the MATLAB Engine API. Our Python scripts can successfully start the engine, call computational functions in MATLAB and receive the results. For purposes of distributed computing, we need to run our Python scripts via Celery, but when we do so, calls to start_matlab or connect_matlab do not work; instead, they hang without producing any error message or logging.
We’re aware that the MATLAB engine cannot be started or connected to on a remote machine, but that is not at issue here because the Python script connecting to MATLAB is running on the same machine MATLAB needs to run on. We have also demonstrated that this MATLAB-launching Python script can be run successfully remotely via SSH, so this problem is specific to Celery.
Also, our attempts to increase the log level for MATLAB seem to have no effect. I have not been able to get any logging from MATLAB.
I launch Celery on the remote machine like this:
celery -A pipelines.sitarsight_pipelines worker -l info
Here is our top-level Python program launching Celery tasks:
from celery import Celery
import pipelines.celeryconfig as celeryconfig
import time
from pipelines.python_tasks.t_lightapp import run_task_lightapp
from pipelines.python_tasks.t_runBEND import run_task_runBEND
app = Celery(
    'sitarsight_pipelines'
    ,broker= celeryconfig.BROKER_URL
)
app.conf.update(
    CELERY_RESULT_BACKEND = celeryconfig.CELERY_RESULT_BACKEND
)
@app.task(name='pipelines.sitarsight_pipelines.distribute_task_lightapp')
def distribute_task_lightapp(inFile, fullOutPath):
    print("Running task distribute_task_lightapp...")
    var1, var2, var3, var4, var5, var6 = run_task_lightapp(inFile, fullOutPath)
    return "success"
    %#return var1, var2, var3, var4, var5, var6
@app.task(name='pipelines.sitarsight_pipelines.distribute_task_runBEND')
def distribute_task_runBEND(parameters, resultPath, smDeepImagePath, testMode=True):
    print("Running task distribute_task_runBEND...")
    return run_task_runBEND(parameters, resultPath, smDeepImagePath, testMode=True)
@app.task(name='pipelines.sitarsight_pipelines.add')
def add(x, y):
    print("Running add...")
    time.sleep(2)
    return x + y
if __name__ == '__main__':
    print("Calling add...")
    result = add.delay(4, 4)
    print( result.get() )
    print("Calling distribute_task_lightapp...")
    result = distribute_task_lightapp.delay('/first/argument',
                                            '/second/argument')
    print( result )
Here is our Python script calling MATLAB:
import io, os, shutil, sys
sys.path.append('../pipelines')
from pipelines import matlab_session
from pipelines import module_settings
from pipelines import past_math as pm
def run_task_lightapp(inFile, fullOutPath, matlabSession='', arg4='Arg_4.nii'):
    print("run_task_lightapp checkpoint 1")
    if (matlabSession == ''):
        matlabSession = matlab_session.getMatlabSessionForThisProcess()
        print("run_task_lightapp checkpoint 1.5")
    workDir = fullOutPath
    if not os.path.exists(workDir): os.makedirs(workDir)
    cmd = 'gunzip -c %s > %s/in.nii' % (inFile, workDir)
    os.system(cmd)
    DeepImgPath = os.path.join(fullOutPath,'mwc1.nii')
    print("run_task_lightapp checkpoint 2")
    %# Checkpoint 2 is never reached. Omitting code below this point.
    return var1, var2, var3, var4, var5, var6
%# Use this main program to run the task in standalonemode, without using Celery.
if __name__ == '__main__':
    run_task_lightapp(sys.argv[1], sys.argv[2])
Here is our our script for starting the MATLAB engine:
import matlab.engine
import os
import importlib
from billiard import current_process
matlabSessions = {}
def openMatlabSession():
    print('starting a new matlab instance')
    try:
        print("importing matlab module...")
        matlab = importlib.import_module('matlab')
        print("finished importing module")
    except ImportError:
        %# install the matlab package if it is not found
        print("ImportError")
        os.system('pip install matlab')
        matlab = importlib.import_module('matlab')
        print("finished resolving ImportError")
    print(socket.gethostname()) 
    print(os.environ)
    print("Starting matlab...") 
    matlabSession = matlab.engine.start_matlab(option=" -nodisplay -nojvm -nodesktop")
    %# in the same directory of this source file 'pipelines/__init__.py' is a 'matlab-tasks' folder with all of the matlab scripts.
    %# These lines of code add this folder to the matlab path in the newly created instance
    matlabScriptsPath, _ = os.path.split(os.path.realpath(__file__))
    matlabScriptsPath += '/matlab-tasks'
    print('matlabscripts are hopefully at ', matlabScriptsPath)
    try:
        print("adding paths...")
        matlabSession.eval("addpath(genpath('"+matlabScriptsPath+"'));")
        print("added path")
    except RuntimeError:
        print("RuntimeError occurred")
        raise SystemExit("Can't start MATLAB engine!")    %# kills worker, which will hopefully restart
        remoteDebugger() %# this didn't result in any solutions after spending a full day poking around ctypes ....
    return matlabSession
%# each thread of the celery worker has its own pid ... so we create matlab sessions mapped to those pids
def getMatlabSessionForThisProcess():
    global matlabSessions
    print("starting getMatlabSessionForThisProcess")
    process = current_process()
    pid = process.pid
    if pid not in matlabSessions:
        matlabSession = openMatlabSession()
        print("Successfully opened matlab session.")
        matlabSessions[pid] = matlabSession
    matlabSession = matlabSessions[pid]
    %# clear the workspace for the next job
    %# if this fails, the matlab instance is dead, and we open a new one
    try:
        print("Clearing classes...")
        matlabSession.eval('clear classes', nargout=0)
        print("Initializing worker...")
        matlabSession.workerInit(nargout=0)
        print("Completed worker initialization.")
    except RuntimeError as e:
        print(e)
        matlabSession = openMatlabSession()
        matlabSessions[pid] = matlabSession
    print("finishing getMatlabSessionForThisProcess")
    return matlabSession
This is the output I see on the head node when I run the top-level program:
(sitar) service_account@headnode:/sitarsight/dsitar$ python pipelines/sitarsight_pipelines.py
Calling add...
8
Calling distribute_task_lightapp...
20fa53e0-3a90-432b-9a8f-144e6442888a
This is the output I see in Celery:
(sitar) service_account@workernode:/sitarsight/dsitar/pipelines$ celery -A pipelines.sitarsight_pipelines worker -l DEBUG
[2023-04-27 05:06:49,859: DEBUG/MainProcess] | Worker: Preparing bootsteps.
[2023-04-27 05:06:49,861: DEBUG/MainProcess] | Worker: Building graph...
[2023-04-27 05:06:49,861: DEBUG/MainProcess] | Worker: New boot order: {Timer, Hub, Pool, Autoscaler, StateDB, Beat, Consumer}
[2023-04-27 05:06:49,866: DEBUG/MainProcess] | Consumer: Preparing bootsteps.
[2023-04-27 05:06:49,866: DEBUG/MainProcess] | Consumer: Building graph...
[2023-04-27 05:06:49,872: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Events, Mingle, Gossip, Agent, Heart, Tasks, Control, event loop}
-------------- celery@SERVER_NAME v5.2.2 (dawn-chorus)
--- ***** ----- 
-- ******* ---- Linux-5.4.0-146-generic-x86_64-with-glibc2.31 2023-04-27 05:06:49
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         sitarsight_pipelines:0x7f64f459e140
- ** ---------- .> transport:   redis://IP_ADDRESS:6379/0
- ** ---------- .> results:     redis://IP_ADDRESS:6379/1
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
-------------- [queues]
.> celery           exchange=celery(direct) key=celery
[tasks]
. celery.accumulate
. celery.backend_cleanup
. celery.chain
. celery.chord
. celery.chord_unlock
. celery.chunks
. celery.group
. celery.map
. celery.starmap
. pipelines.sitarsight_pipelines.add
. pipelines.sitarsight_pipelines.distribute_task_runBEND
. pipelines.sitarsight_pipelines.distribute_task_lightapp
[2023-04-27 05:06:49,878: WARNING/MainProcess] /opt/anaconda3/envs/sitar/lib/python3.10/site-packages/celery/app/utils.py:204: CDeprecationWarning: 
The 'CELERY_RESULT_BACKEND' setting is deprecated and scheduled for removal in
version 6.0.0. Use the result_backend instead
deprecated.warn(description=f'The {setting!r} setting',
[2023-04-27 05:06:49,878: WARNING/MainProcess] Please run `celery upgrade settings path/to/settings.py` to avoid these warnings and to allow a smoother upgrade to Celery 6.0.
[2023-04-27 05:06:49,878: DEBUG/MainProcess] | Worker: Starting Hub
[2023-04-27 05:06:49,878: DEBUG/MainProcess] ^-- substep ok
[2023-04-27 05:06:49,878: DEBUG/MainProcess] | Worker: Starting Pool
[2023-04-27 05:06:51,757: DEBUG/MainProcess] ^-- substep ok
[2023-04-27 05:06:51,759: DEBUG/MainProcess] | Worker: Starting Consumer
[2023-04-27 05:06:51,760: DEBUG/MainProcess] | Consumer: Starting Connection
[2023-04-27 05:06:51,781: INFO/MainProcess] Connected to redis://IP_ADDRESS/0
[2023-04-27 05:06:51,781: DEBUG/MainProcess] ^-- substep ok
[2023-04-27 05:06:51,781: DEBUG/MainProcess] | Consumer: Starting Events
[2023-04-27 05:06:51,785: DEBUG/MainProcess] ^-- substep ok
[2023-04-27 05:06:51,785: DEBUG/MainProcess] | Consumer: Starting Mingle
[2023-04-27 05:06:51,785: INFO/MainProcess] mingle: searching for neighbors
[2023-04-27 05:06:52,803: INFO/MainProcess] mingle: all alone
[2023-04-27 05:06:52,803: DEBUG/MainProcess] ^-- substep ok
[2023-04-27 05:06:52,803: DEBUG/MainProcess] | Consumer: Starting Gossip
[2023-04-27 05:06:52,810: DEBUG/MainProcess] ^-- substep ok
[2023-04-27 05:06:52,811: DEBUG/MainProcess] | Consumer: Starting Heart
[2023-04-27 05:06:52,814: DEBUG/MainProcess] ^-- substep ok
[2023-04-27 05:06:52,814: DEBUG/MainProcess] | Consumer: Starting Tasks
[2023-04-27 05:06:52,820: DEBUG/MainProcess] ^-- substep ok
[2023-04-27 05:06:52,820: DEBUG/MainProcess] | Consumer: Starting Control
[2023-04-27 05:06:52,824: DEBUG/MainProcess] ^-- substep ok
[2023-04-27 05:06:52,824: DEBUG/MainProcess] | Consumer: Starting event loop
[2023-04-27 05:06:52,825: DEBUG/MainProcess] | Worker: Hub.register Pool...
    [2023-04-27 05:06:52,827: WARNING/MainProcess] /opt/anaconda3/envs/sitar/lib/python3.10/site-packages/celery/fixups/django.py:203: UserWarning: Using settings.DEBUG leads to a memory
leak, never use this setting in production environments!
warnings.warn('''Using settings.DEBUG leads to a memory
[2023-04-27 05:06:52,827: INFO/MainProcess] celery@nsmaclbswrk63.mac-internal.ucsf.edu ready.
[2023-04-27 05:06:52,827: DEBUG/MainProcess] basic.qos: prefetch_count->64
[2023-04-27 05:06:52,875: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:06:52,876: INFO/MainProcess] Events of group {task} enabled by remote.
[2023-04-27 05:06:57,876: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:02,875: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:07,876: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:12,875: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:17,876: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:22,875: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:27,877: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:32,877: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:37,876: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
>>> The first task received is a simple task to add two numbers. It succeeds.
[2023-04-27 05:07:41,000: INFO/MainProcess] Task pipelines.sitarsight_pipelines.add[5b08a3be-e1f0-4293-b58d-8f686938f112] received
[2023-04-27 05:07:41,001: DEBUG/MainProcess] TaskPool: Apply <function fast_trace_task at 0x7f64f503f490> (args:('pipelines.sitarsight_pipelines.add', '5b08a3be-e1f0-4293-b58d-8f686938f112', {'lang': 'py', 'task': 'pipelines.sitarsight_pipelines.add', 'id': '5b08a3be-e1f0-4293-b58d-8f686938f112', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '5b08a3be-e1f0-4293-b58d-8f686938f112', 'parent_id': None, 'argsrepr': '(4, 4)', 'kwargsrepr': '{}', 'origin': 'gen940629@HEAD_NODE_SERVER_NAME', 'ignore_result': False, 'properties': {'correlation_id': '5b08a3be-e1f0-4293-b58d-8f686938f112', 'reply_to': '10697ac8-4b9f-302d-9e76-92cc682aada1', 'delivery_mode': 2, 'delivery_info': {'exchange': '', 'routing_key': 'celery'}, 'priority': 0, 'body_encoding': 'base64', 'delivery_tag': '59bc87f5-3f4b-49e5-b16a-1d8e6692fe5e'}, 'reply_to': '10697ac8-4b9f-302d-9e76-92cc682aada1', 'correlation_id': '5b08a3be-e1f0-4293-b58d-8f686938f112', 'hostname': 'celery@WORKER_NODE_SERVER_NAME', 'delivery_info': {'exchange': '',... kwargs:{})
[2023-04-27 05:07:41,006: WARNING/ForkPoolWorker-14] Running add...
[2023-04-27 05:07:42,876: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:43,015: INFO/ForkPoolWorker-14] Task pipelines.sitarsight_pipelines.add[5b08a3be-e1f0-4293-b58d-8f686938f112] succeeded in 2.0094936878886074s: 8
>>> Now it starts work on the computational task that needs to start the MATLAB engine.
[2023-04-27 05:07:45,034: INFO/MainProcess] Task pipelines.sitarsight_pipelines.distribute_task_lightapp[20fa53e0-3a90-432b-9a8f-144e6442888a] received
[2023-04-27 05:07:45,034: DEBUG/MainProcess] TaskPool: Apply <function fast_trace_task at 0x7f64f503f490> (args:('pipelines.sitarsight_pipelines.distribute_task_lightapp', '20fa53e0-3a90-432b-9a8f-144e6442888a', {'lang': 'py', 'task': 'pipelines.sitarsight_pipelines.distribute_task_lightapp', 'id': '20fa53e0-3a90-432b-9a8f-144e6442888a', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '20fa53e0-3a90-432b-9a8f-144e6442888a', 'parent_id': None, 'argsrepr': "('/first/argument', '/second/argument')", 'kwargsrepr': '{}', 'origin': 'gen940629@HEAD_NODE_SERVER_NAME', 'ignore_result': False, 'properties': {'correlation_id': '20fa53e0-3a90-432b-9a8f-144e6442888a', 'reply_to': '10697ac8-4b9f-302d-9e76-92cc682aada1', 'delivery_mode': 2, 'delivery_info': {'exchange': '', 'routing_key': 'celery'}, 'priority': 0, 'body_encoding': 'base64', 'delivery_tag': '8b8236ef-fe92-4b1f-baea-6ba0e81a60bb'},... kwargs:{})
[2023-04-27 05:07:45,037: WARNING/ForkPoolWorker-14] Running task distribute_task_lightapp...
[2023-04-27 05:07:45,037: WARNING/ForkPoolWorker-14] run_task_lightapp checkpoint 1
[2023-04-27 05:07:45,037: WARNING/ForkPoolWorker-14] starting getMatlabSessionForThisProcess
[2023-04-27 05:07:45,038: WARNING/ForkPoolWorker-14] starting a new matlab instance
[2023-04-27 05:07:45,038: WARNING/ForkPoolWorker-14] importing matlab module...
[2023-04-27 05:07:45,038: WARNING/ForkPoolWorker-14] finished importing module
[2023-04-27 05:07:45,038: WARNING/ForkPoolWorker-14] WORKER_NODE_SERVER_NAME
[2023-04-27 05:07:45,038: WARNING/ForkPoolWorker-14] environ({settings redacted})
[2023-04-27 05:07:45,039: WARNING/ForkPoolWorker-14] Starting matlab...
[2023-04-27 05:07:45,039: WARNING/ForkPoolWorker-14] creating Matlab with the following tokens/options
[2023-04-27 05:07:45,039: WARNING/ForkPoolWorker-14] ['-nodisplay', '-nojvm', '-nodesktop']
[2023-04-27 05:07:47,877: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:52,877: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:07:57,877: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:08:02,877: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:08:07,875: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:08:12,878: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:08:17,876: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2023-04-27 05:08:22,878: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
^C
worker: Hitting Ctrl+C again will terminate all running tasks!
>>> I interrupted execution because the task will never continue beyond this point.
Any help is appreciated!
0 Comments
Answers (1)
  Jesús Chacón
 on 13 Jul 2023
        Hi, I have a similar problem, when I tried to start Matlab from a celery task it got stucked. Running Matab engine for python works fine, though.
I solved it by adding the option "-P solo" to the celery cli, e.g.
celery -A pipelines.sitarsight_pipelines worker -P solo -l info
0 Comments
See Also
Categories
				Find more on Startup and Shutdown in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
