Why is MATLAB engine for Python so much slower in R2023b vs. R2021b?

This is an expansion of a comment I made on this MATLAB Answers question on why the MATLAB engine for Python is so slow.
My original issue was that a 30MB variable was taking about 15 seconds to be sent from Python to a shared MATLAB engine (interactive desktop session opened the usual way by double clicking the desktop launcher) with the following method:
% MATLAB
matlab.engine.shareEngine('test')
% Python
import matlab.engine
eng = matlab.engine.connect_matlab(name = "test")
engine.workspace["variable"] = variable
This led me to the above linked MATLAB answers question about why the Python engine is so slow. So per Alan Frankel's answer I upgraded from R2021b to R2023b, which actually made my Python code slower (at least 3x slower) and also introduced a massive memory leak, see the two attached images.
Some pertinent machine specs: 4 year old PC with 64GB RAM running Windows 10 version 22H2, Python version 3.9.0, MATLAB R2021b and R2023b with no other toolboxes installed.
If anyone could shed light on this situation I'd appreciate it. My primary goal is to be able to use my 30MB Python variable as an input to a MATLAB function in a reasonable time frame, meaning definitely < 1 second rather than the current 15 seconds. Small variables like a scalar number appear in the MATLAB workspace instantly, so I guess it's related to the size of my variable. But if 30MB is too large then the MATLAB engine is not very useful for intensive data processing.

9 Comments

I can't reproduce the behavior you're seeing.
Here's the Python code I'm using to set up an array of ~30MB and an array of ~1200MB:
>>> import time
>>> var_i8_30M = matlab.int8([1 for i in range(30000000)]) # Smaller variable to test with
>>> const_1200M = 1200000000; var_i8_1200M = matlab.int8(vector=[1 for i in range(const_1200M)],size=(1,const_1200M)) # This takes about 4 minutes to execute
Here's how I do the test (against an R2023b release):
>>> before = time.perf_counter(); eng.workspace['var_i8_1200M'] = var_i8_1200M; after = time.perf_counter(); elapsed = after - before; print(elapsed)
Even though I'm testing with a variable 40 times larger than yours, on a computer over a network, I see that this only takes about 0.7-0.8 seconds to run (and I've done it a number of times). I watch the memory in the Task Manager, and I don't see it climbing.
I'm curious whether you will see the same behavior if you try running my code. Also, feel free to share yours.
I tried to duplicate your results, so I closed and re-opened MATLAB, shared the engine (as above). I ran the following code in Python (all identical to yours except for the two lines I added):
import time
import matlab.engine % I added this to make this self-contained example work.
eng = matlab.engine.connect_matlab(name = "test") % I added this
var_i8_30M = matlab.int8([1 for i in range(30000000)])
const_1200M = 1200000000
var_i8_1200M = matlab.int8(vector=[1 for i in range(const_1200M)],size=(1,const_1200M))
before = time.perf_counter()
eng.workspace['var_i8_1200M'] = var_i8_1200M
after = time.perf_counter()
elapsed = after - before
print(elapsed) % ~0.35-0.40 seconds
Memory useage increased by about 8GB over ~4 minutes while creating the var_i8_1200M variable (which I think is to be expected because it's so large) but then went back down to baseline and stayed there while running the test.
When re-running the last block of code with the smaller variable ~10 times, the elapsed time was ~0.018-0.035 seconds.
In my actual Python code, which is unfortunately too complex/lengthy to post in its entirety here, my variable of interest is a struct of ~50-100 fields where each field is a Nx3 double, max N ~40,000. I just used your benchmarking code to test that variable, constructed as follows:
% Python
N = 43690 % x 3 columns = 1MB matlab.double
dummy_struct = {}
for i in range(1, 101):
field_name = f'field{i}'
% Create an Nx3 matrix of matlab doubles
matrix = eng.randn(N, 3, nargout=1)
dummy_struct[field_name] = matrix
Similar to above, it took about 0.035 seconds. I really am stumped why my actual code has this performance issue. I'll dive into that next to report that variable's exact shape/performance and see if I can share any helpful snippets.
Ok I have more results! I think the issue may be NaN's in my data. In my actual variable, up to ~20/100 fields are entirely NaN.
When I changed the struct test from
matrix = eng.randn(N, 3, nargout=1)
to instead generate a matrix entirely of NaN
matrix = [[float('nan'), float('nan'), float('nan')] for i in range(N)]
the code took so long to run that I had to stop it. N = 10,000 also had not run after 1 min. Setting N = 100, my elapsed time was 1.16 seconds. N = 1,000 ran for 11.94 seconds. This also creates a memory leak similar to what I posted in the screenshot, using R2023b.
I think I can get around this issue by just converting each matrix to a np.array(). I was able to run the benchmark with N = 43,960 which took 0.2 seconds!
Great! Unless you object, I'm planning to submit an answer based on our discussion.
Please do. After using the code more today I think the np.array() conversion is a very workable solution.
@Mitchell Tillman, did you also see the memory leak disappear?
Yes I believe so, as I’ve been able to use the code without slowdown, though I didn’t test that after the np.array() fix.
There is however a pretty massive memory leak still in the Apple Silicon macOS implementation, regardless of NaN’s in the data.
As an alternative to np.array(), you can use matlab.double(). (If you were using int8, you'd use matlab.int8(), and so on.) Here, you would change the line:
matrix = [[float('nan'), float('nan'), float('nan')] for i in range(N)]
to this:
matrix = matlab.double([[float('nan'), float('nan'), float('nan')] for i in range(N)])
For me, this decreased the amount of time it took to pass a struct to MATLAB from 30 seconds to 0.09 seconds.

Sign in to comment.

 Accepted Answer

@Mitchell Tillman and I agree that the slow performance and memory leaks seem to be related to instances of NaN in the data, not to the code. I will look into the NaN issue separately.

More Answers (0)

Products

Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!