Faster linear algebra for Apple Silicon users in the R2025a pre-release (available now!)

Mike Croucher on 16 Jan 2025
Latest activity Reply by Mike Croucher on 21 Jul 2025

So you've downloaded the R2025a pre-release, tried Dark mode and are wondering what else is new. A lot! A lot is new!
One thing I am particularly happy about is the fact that Apple Accelerate is now the default BLAS on Apple Silicon machines. Check it out by doing
>> version -blas
ans =
'Apple Accelerate BLAS (ILP64)'
If you compare this to R2024b that is using OpenBLAS you'll see some dramatic speed-ups in some areas. For example, I saw up to 3.7x speed-up for matrix-matrix multiplication on my M2 Mabook Pro and 2x faster LU factorisation.
Details regarding my experiments are in this blog post Life in the fast lane: Making MATLAB even faster on Apple Silicon with Apple Accelerate » The MATLAB Blog - MATLAB & Simulink . Back then you had to to some trickery to switch to Apple Accelerate, now its the default.
Willi Mutschler
Willi Mutschler on 17 Jul 2025
In MATLAB R2024b on Apple Silicon I was able to switch to using
export BLAS_VERSION=libmwAF_BLAS_ilp64.dylib
in the terminal and then calling matlab from the terminal. In MATLAB I got:
version('-blas')
'OpenBLAS 0.3.24'
In R2025a Apple Accelerate BLAS became the default:
version -blas
'Apple Accelerate BLAS (ILP64)'
Is there a way to switch back to OpenBlas in R2025a?
Mike Croucher
Mike Croucher on 21 Jul 2025
To switch back to OpenBLAS in R2025 do
export BLAS_VERSION=libmwopenblas.dylib
in the terminal and then call MATLAB from there.
Can I ask why you want to switch back to OpenBLAS please?
Willi Mutschler
Willi Mutschler on 21 Jul 2025
Great thanks! Mostly for benchmarking. I do see performance issues between R2024b and R2025a (not only on mac but also windows and linux systems) and wonder whether this is because of the new Desktop or just specific to my use of [Dynare](https://dynare.org), which is a MATLAB toolboxes that has some routinges pre-compiled and added as MEX files using OpenBlas. Also in R2024b I had some test cases were OpenBlas was faster than Apple Accelerate. So either way, good to know how to switch back and forth, thanks!
Mike Croucher
Mike Croucher on 21 Jul 2025
You are welcome. I, and our development team, would be very interested in any cases where OpenBlas is faster than Apple Accelerate so if there is anything you could share please, we'd really appreciate it :)
John D'Errico
John D'Errico on 20 Jan 2025
YEAY!!!!! Sort of. Maybe not. So, a qualified yeay.
You should realize this may cost me, and possibly dearly. It now encourages me to replace my older intel based iMac. At least it adds fuel to that fire. The Studio model seems interesting.
Does anyone have bench results for an M4 mac? (Which is not yet available on that platform.)
Steve Eddins
Steve Eddins on 20 Jan 2025
I know what you mean, John. I have a 2019 Intel-based iMac. I'm thinking that an M4 Mac mini might be in my future.
Mike Croucher
Mike Croucher on 27 Jan 2025
Fortunately for me, my wife is a Mac fan and in need of a new machine. So, I bought her an M4 Macbook Pro for Christmas. I get to play with a new toy AND it all appears to be her idea :)
Here are the results for bench(5) on the R2025a pre-release for the M4 Macbook Pro. This has 4 performance cores and 6 efficiency core and is the weakest M4 available for a Macbook Pro I think. I'm not crazy-happy about the fact that there's more efficiency cores than performance cores but what can you do?
Mike Croucher
Mike Croucher on 27 Jan 2025
and here are the results from R2025a pre-release for MacBook Pro M4 Max with 64GB RAM. This is from the personal machine of a colleague. Some nice Dark Mode action going on here too.
Steve Eddins
Steve Eddins on 20 Jan 2025
I don't see anything in the R2025a Prerelease release notes about this. It would be nice to see some examples in the Performance section of the release notes.
Royi Avital
Royi Avital on 20 Jan 2025
This is great!
Any updates regarding AMD based CPU's?
Summer
Summer on 18 Jan 2025 (Edited on 20 Jan 2025)
Looking at the comments from your blog post, I'm wondering if I will see any benefit if I'm running code in parallel. Do you know if Apple Accelerate still only uses one thread?
Mike Croucher
Mike Croucher on 18 Jan 2025
Not sure. I've asked development if they can comment. In the meantime, why not try your code on the pre-release, see what happens and report back?
Mike Croucher
Mike Croucher on 27 Jan 2025
So the answer is 'It's complicated'. Apple Accelerate uses multiple threads but doesn't use OpenMP, it uses Apple's own thing. Some details at BLAS_THREADING | Apple Developer Documentation
So far so simple. However, Apple Silicon also has what I think of as the 'magic matrix units' which are controlled by Apple's AMX instructions. I'm never sure how many of these each version of the processor has. These are used by Apple Accelerate when appropriate..for example for matrix-matrix multiplication.
The difference is that the magic matrix unit is running independently of the rest of the CPU and one thread/core is in charge of controlling it. So if you call a large matrix-matrix multiplication, Apple Accelerate will appear to use one thread. An alternative BLAS, e.g. OpenBLAS, would use all CPU threads but not the magic matrix unit. Apple Accelerate performance tends to be better at the present time.
The situation seems to have changed again for M4 silicon Apple appears to have replaced AMX with ARM's SME in M4 : r/apple.
This is all just what I've figured out from a combination of informal internal chats and web searches.