Why is MATLAB engine for Python so much slower in R2023b vs. R2021b?
Show older comments
This is an expansion of a comment I made on this MATLAB Answers question on why the MATLAB engine for Python is so slow.
My original issue was that a 30MB variable was taking about 15 seconds to be sent from Python to a shared MATLAB engine (interactive desktop session opened the usual way by double clicking the desktop launcher) with the following method:
% MATLAB
matlab.engine.shareEngine('test')
% Python
import matlab.engine
eng = matlab.engine.connect_matlab(name = "test")
engine.workspace["variable"] = variable
This led me to the above linked MATLAB answers question about why the Python engine is so slow. So per Alan Frankel's answer I upgraded from R2021b to R2023b, which actually made my Python code slower (at least 3x slower) and also introduced a massive memory leak, see the two attached images.
Some pertinent machine specs: 4 year old PC with 64GB RAM running Windows 10 version 22H2, Python version 3.9.0, MATLAB R2021b and R2023b with no other toolboxes installed.
If anyone could shed light on this situation I'd appreciate it. My primary goal is to be able to use my 30MB Python variable as an input to a MATLAB function in a reasonable time frame, meaning definitely < 1 second rather than the current 15 seconds. Small variables like a scalar number appear in the MATLAB workspace instantly, so I guess it's related to the size of my variable. But if 30MB is too large then the MATLAB engine is not very useful for intensive data processing.
9 Comments
Alan Frankel
on 12 Mar 2024
Edited: Alan Frankel
on 12 Mar 2024
I can't reproduce the behavior you're seeing.
Here's the Python code I'm using to set up an array of ~30MB and an array of ~1200MB:
>>> import time
>>> var_i8_30M = matlab.int8([1 for i in range(30000000)]) # Smaller variable to test with
>>> const_1200M = 1200000000; var_i8_1200M = matlab.int8(vector=[1 for i in range(const_1200M)],size=(1,const_1200M)) # This takes about 4 minutes to execute
Here's how I do the test (against an R2023b release):
>>> before = time.perf_counter(); eng.workspace['var_i8_1200M'] = var_i8_1200M; after = time.perf_counter(); elapsed = after - before; print(elapsed)
Even though I'm testing with a variable 40 times larger than yours, on a computer over a network, I see that this only takes about 0.7-0.8 seconds to run (and I've done it a number of times). I watch the memory in the Task Manager, and I don't see it climbing.
I'm curious whether you will see the same behavior if you try running my code. Also, feel free to share yours.
Mitchell Tillman
on 13 Mar 2024
Mitchell Tillman
on 13 Mar 2024
Mitchell Tillman
on 13 Mar 2024
Edited: Mitchell Tillman
on 13 Mar 2024
Alan Frankel
on 13 Mar 2024
Great! Unless you object, I'm planning to submit an answer based on our discussion.
Mitchell Tillman
on 14 Mar 2024
Alan Frankel
on 14 Mar 2024
@Mitchell Tillman, did you also see the memory leak disappear?
Mitchell Tillman
on 15 Mar 2024
Alan Frankel
on 18 Mar 2024
As an alternative to np.array(), you can use matlab.double(). (If you were using int8, you'd use matlab.int8(), and so on.) Here, you would change the line:
matrix = [[float('nan'), float('nan'), float('nan')] for i in range(N)]
to this:
matrix = matlab.double([[float('nan'), float('nan'), float('nan')] for i in range(N)])
For me, this decreased the amount of time it took to pass a struct to MATLAB from 30 seconds to 0.09 seconds.
Accepted Answer
More Answers (0)
Categories
Find more on Call MATLAB from Python in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!