Can I tell Matlab not to use contiguous memory?
Show older comments
Matlab eats enormous amounts of memory and rarely or never releases it. As far as I can tell from past questions about this, it's because Matlab stores variables in contiguous physical memory blocks. As the heap becomes fragmented, Matlab asks for new memory when it wants to allocate a variable larger than the largest available contiguous fragment. MathWorks' recommended solution is "exit Matlab and restart". The only defragmentation option at present is "pack", which saves everything to disk, releases everything, and reloads variables from disk. This a) takes a while and b) only saves variables that are 2 GB or smaller.
Is there any way to tell Matlab (via startup switch or other option) to allow variables to be stored in fragmented physical memory?
The only reasons I can think of for asking for contiguous physical memory would be to reduce the number of TLB misses (for code running on CPUs) or to allow hardware acceleration from peripherals that work using DMA (such as GPUs) and that don't support mapping fragments. Like most of the other people who were complaining about this issue, I'd rather have TLB misses and give up GPU acceleration for my tasks and not run out of memory. I understand that for large problems on large machines, these features are important, but it's very strange that there's no way to turn it off.
(Even for our big machines, RAM is more scarce than CPU power or time, so "throw more RAM in the machine" is not viable past the first quarter-terabyte or so.)
Edit: Since there is apparently confusion about what I'm asking:
- Matlab certainly stores variables in contiguous virtual memory. As far as user-mode code is concerned, an array is stored as an unbroken block.
- Normal user-mode memory allocation does not guarantee contiguous physical memory. Pages that are mapped to contiguous virtual addresses may be scattered anywhere in RAM (or on disk).
- Previous forum threads about Matlab's memory usage got responses stating that Matlab does ask for contiguous physical memory, requiring variables to be stored as unbroken blocks in physical RAM (pages that are adjacent in virtual address space also being adjacent in physical address space).
- The claim in those past threads was that Matlab's requirement for contiguous physical memory was responsible for its enormous memory use under certain conditions.
- If that is indeed the case, I wanted to know if there was a way to turn that allocation requirement off.
As of 02 April 2021, I've gotten conflicting responses about whether Matlab does this at all, and have been told that if it does do it there's no way to turn it off and/or that turning it off would do horrible things to performance. I am no longer sure that these responses were to my actual question; hence the clarification.
Edit: As of 06 April 2021, consensus appears to be that Matlab does not ask for contiguous physical memory, making this question moot.
8 Comments
Bruno Luong
on 2 Apr 2021
Do you know if any of MATLAB compettitors (e.g., R, python, octave, ???) can handle non contiguous memory for vector/array?
Christopher Thomas
on 2 Apr 2021
Edited: Christopher Thomas
on 2 Apr 2021
Bruno Luong
on 2 Apr 2021
Edited: Bruno Luong
on 3 Apr 2021
Disclaim: My expertise of memory management is limited. But I have a decent knowledge with MATLAB provided I'm not working for TMW thus some of the thing I expose here might not be accurate.
But to me when user asks to allocate a big array, MATLAB simply calls a mxMalloc in their API which is no more than the C single malloc() with the size of the array, presumably close to OS HEAP allocation with contiguous address, fill this memory block by 0s, and then it might use some tracking management system on top for garbage collecting purpose.
It seems the malloc() used by MATLAB is entirely taken over by OS kernel control as with 99% apps, and they don't do any anything OS specific or customized. The swap seems to me is handled by OS, not by MATLAB where the physical RAM requested is not available.
This method has not changed by TMW since many years, and a lot of library, Blas, Lapack, stock MEX functions, user MEX functions have be built based on this assumption for such long time that there is 0 chance that can be changed to ensure obvious backward compatibility.
I think the contiguous adressing makes the processing really faster. If you claim that only contiguous physical memory can be befenit for speed, and contiguous in virtual address does not matter, then we clearly have different view, and certainly one of use is wrong.
Walter Roberson
on 3 Apr 2021
|t's even possible that Matlab itself isn't asking for contiguous physical memory (which would make this entire thread moot).
I'd seen previous forum responses indicating that it was, but I'm starting to wonder if those were unreliable.
Or if they were perhaps talking about old MATLAB releases. Strategies for 32 bit MATLAB were potentially different.
Joss Knight
on 4 Apr 2021
But I don't even know how to ask for continuous physical memory, in the sense that the asker has described. I'll admit there's a lot I don't know, but I thought I would have known that. You can ask for pinned memory, which can't be swapped, but MATLAB definitely does use swap.
Walter Roberson
on 4 Apr 2021
Allocating contiguous physical memory:
My searches suggest:
Linux: mmap() with MAP_HUGETLB . Requires that the kernel be built with special flags, and a special filesystem be configured -- in other words, not something that programs such as MATLAB can insist on. The idea seems to be that TLB (Translation Lookaside Buffers) are in short supply, so you might want to allocate a large chunk of physical memory for a single TLB and then manage the memory usage yourself.
Windows: AllocateUserPhysicalPages looks like a possibility https://www.w7forums.com/threads/how-to-allocate-physically-contiguous-memory-in-user-mode-on-windows-7.14183/ and there might be others.
MacOS: ??? I have not managed to find any user-mode resources yet, only kernel level.
Bruno Luong
on 4 Apr 2021
So after many elaboration, the question now becomes "can I tell MATALB to use contiguous (physical) memory?".
Christopher Thomas
on 5 Apr 2021
Accepted Answer
More Answers (4)
Walter Roberson
on 27 Mar 2021
You are mistaken.
>> clearvars
>> pack

>> foo = ones(1,2^35,'uint8');

>> clearvars

I allocated 32 gigabytes of memory on my 32 gigabyte Mac, it took up physical memory and virtual memory, and when I cleared the variable, MATLAB returned the memory to the operating system.
There has been proof posted in the past (but it might be difficult to locate in the mass of postings) that MATLAB returns physical memory for MS Windows.
I do not have information about Linux memory use at the moment.
MATLAB has two memory pools: the small object pool and the large object pool. I do not recall the upper limit on the small object pool at the moment; I think it is 512 bytes. Scalars get recycled a lot in MATLAB.
Historically, it was at least documented (possibly in blogs) that MATLAB did keep hold of all memory it allocated, and so could end up with fragmented memory. But I have never seen any evidence that that was a physical memory effect: you get exactly the same problem if you ask the operating system for memory and it pulls together a bunch of different physical banks and provides you with the physical memory bundled up as consecutive virtual addresses.
At some point, evidence started accumulating that at least on Windows, at least for larger objects, MATLAB was using per-object virtual memory, and returning the entire object when it was done with it, instead of keeping it in a pool. I have not seen anything from Mathworks describing the circumstances under which objects are returned directly to the operating system instead of being kept for the memory pools.
Side note: I have proven that allocation of zeros is treated differently in MATLAB. I have been able to allocate large arrays, and then when I change a single element of the array, been told that the array is too large.
12 Comments
Christopher Thomas
on 29 Mar 2021
Walter Roberson
on 30 Mar 2021
Did I say that MATLAB does not use swap? Did I use the word "swap" anywhere in my Answer?
What I said is that you are mistaken. In particular, you started your posting by saying,
"Matlab eats enormous amounts of memory and rarely or never releases it."
and I demonstrated that (at least on Mac) that it releases large objects promptly. I have seen posts in the past that show the same thing for Windows. I have not happened to see any relevant posts about the Linux memory handling.
"As far as I can tell from past questions about this, it's because Matlab stores variables in contiguous physical memory blocks."
I just did some process tracing to be sure... it is not impossible that I missed something, but as far as I could tell, MATLAB made no attempt to allocate physical memory, only virtual memory. Do you have some process trace logs showing MATLAB requesting memory that had to be physically contiguous (rather than memory that was given to it with continguous virtual addresses, with the physical memory being allocated in however many fragments the operating system felt like) ?
I just checked my MATLAB installation, and I cannot see any setuid or seteuid in the installation, but user-mode processes cannot (specifically) allocate contiguous physical memory
or to allow hardware acceleration from peripherals that work using DMA (such as GPUs) and that don't support mapping fragments.
MacOS supports mapping contiguous hardware addresses (such as for PCI) to non-contiguous physical addresses.
The other option is to move your data to the GPU's own memory, but my understanding is that that isn't usually done (GPU memory is instead used as a cache for data fetched from main memory).
MATLAB GPU only supports NVIDIA CUDA devices at the moment. NVIDIA's programming model does permit "page locked host memory" (more commonly known as "pinned" memory) to share I/O space with CUDA devices; the details are discussed at
The important part of the discussion there is that the sharing techniques are expected to be limited, a scarce resource. It is clear from the discussion of the different kinds of memory that for CUDA devices, GPU memory is not only used as a "cache" for data from main memory: arranging data properly within the GPU is considered important for performance
64 bit processes use a Unified memory, described at https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-unified-memory-programming-hd . It is designed to make memory access more efficient between host and GPU. It I understand my skimming properly, it does not require continguous physical pages.
Unified Memory has two basic requirements:
- a GPU with SM architecture 3.0 or higher (Kepler class or newer)
- a 64-bit host application and non-embedded operating system (Linux or Windows)
GPUs with SM architecture 6.x or higher (Pascal class or newer) provide additional Unified Memory features such as on-demand page migration and GPU memory oversubscription that are outlined throughout this document. Note that currently these features are only supported on Linux operating systems. Applications running on Windows (whether in TCC or WDDM mode) will use the basic Unified Memory model as on pre-6.x architectures even when they are running on hardware with compute capability 6.x or higher.
MATLAB's support for cc3.0 was removed as of R2021a, with support for cc3.5 and cc3.7 due to be removed in the next release. I believe we can deduce from that that MATLAB is not using the unified memory interface yet (unless I am misunderstanding the charts and it has been supporting it since R2018a), but perhaps it is on the way. But notice the part about how the advanced unified access is not available for Windows yet.
The document does not mention Mac because Mac is no longer supported by NVIDIA :( The CUDA drivers for Mac did not get further than Kepler.
My point ("and I do have one") is that:
- Shared memory is only one of the ways to communicate with NVIDIA
- NVIDIA never required contiguous physical memory, only that the memory is "host-locked" (pinned) -- in other words, memory that was being prevented from being swapped away
- The unified memory model that is likely coming in (but I don't think is in place yet) will largely remove the need for the pinned memory
- No, the on-device memory is not just acting like a cache
Christopher Thomas
on 30 Mar 2021
Walter Roberson
on 4 Apr 2021
I am unclear as to what problem was originally being encountered?
For everything except possibly some device driver work, or possibly interface to GPU, MATLAB uses malloc() to allocate additional memory. malloc() is OS and library dependent as to exactly how it works, and we do not know at the moment which malloc() is being linked against (but probably the standard one rather than a specialized one.)
On Windows, MacOS, and Linux, if malloc() has to go to the operating system for more space, then the operating system allocates multiple physical memory pages and gathers them into one virtual space and returns the address of the virtual space.
On Windows, MacOS, and Linux, it is not certain that free() will necessarily return the allocated memory to the operating system, or the allocated memory might only be returned under some conditions. Sometimes operating systems provide special forms of allocating and releasing memory that make it easier for the operating system to reclaim the memory; there is at present no evidence that MATLAB is using those special forms.
Does memory released by MATLAB get returned to the operating system? Tests on Windows and MacOS suggest that Yes, large enough allocations get returned to the operating system. But that is not necessily the case for all allocations. Hypothetically there might be a bound below which instead of getting returned to the operating system, free memory is getting cached for reuse. That bound might be operating system dependent, or might depend upon the malloc() getting used.
At some point in the past, I found explicit documentation that MATLAB keeps two pools, one for small fixed-sized blocks, and one for larger objects; that when a large block was released, it was returned to the MATLAB-managed pool for potential re-allocation. However, I am having trouble locating that documentation now... and things might have changed since then.
Considering that [IIRC] MATLAB can crash if you malloc() yourself and put the address in a data-pointer field of an object that MATLAB can later release, it seems likely to me that MATLAB does have its own memory manager that can get confused when asked to release something it did not allocate. If MATLAB did not do any of its own memory management, then it would just free() and there would not be any problem.
So... hypothetically, the situation might be:
- very small blocks such a descriptors of variables get allocated. They have available room for a fairly small number of memory elements, so scalars and very small vectors or arrays are written in directly instead of needing a separate memory block. These small blocks get actively managed by MATLAB; it uses them a lot and it makes sense to keep a pool of them instead of malloc()'ing each of them all the time
- mid-sized blocks get allocated out of a MATLAB-managed pool, and get returned to the pool if the pool size is below a high-water mark, and otherwise released to the operating system
- (uncertain) large-sized blocks get allocated, possibly not within any pool, and get returned to the operating system when done
That last hypothetical special handling is uncertain. Much the same behaviour could happen if the managed pool had an upper size limit but all non-small-block variables went through the managed pool: returning the memory for a large enough block would exceed the upper bound, so it would just naturally trigger release back to the operating system, with no special handling needed.
Can such an arrangement lead to the kind of problems that pack() is intended to deal with?
- Yes, the pool of mid-sized blocks can get fragmented
- Even if everything other than the small fixed-sized blocks is handled by the OS instead of managed by MATLAB, virtual address space can get fragmented
MATLAB does not, as far as I know, offer any controls over where in virtual address space that device drivers or DLLs get mapped.
Bruno Luong
on 5 Apr 2021
Edited: Bruno Luong
on 5 Apr 2021
"Considering that [IIRC] MATLAB can crash if you malloc() yourself and put the address in a data-pointer field of an object that MATLAB can later release,"
That was the case. But now (R2021a) MATLAB crashes instantaneously at the returns statement and NOT latency until the array is cleared.
**************************************************************************
* mex -R2018a test_AssignDataUsingMalloc.c
*************************************************************************/
#include "mex.h"
#include "matrix.h"
#define OUT plhs[0]
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[]) {
mxDouble *p;
OUT = mxCreateDoubleScalar(0);
mxFree(mxGetDoubles(OUT));
p = mxMalloc(8);
// p = malloc(8); <= do this will make MATLAB crash when this mexfile is called at the "return" statement
*p = 1234;
mxSetDoubles(OUT, p);
return;
}
Walter Roberson
on 5 Apr 2021
Bruno Luong
on 5 Apr 2021
My impression is MATLAB internal variable data management has changed the pradigm lately, one can no longer speak about "copy-on-write", or at least not in the sense that was understood few year back.
There is a more sophisticated mechanism that handle data and possibly even MATLAB can even by pass a universal mxArray data structure though JIT in EE. Therefore command such as format debug can returns meaningless output.
In my test MEX code above, it is possibly that the OUT and its data pointer are cleared and free during the return statement and resulting MATLAB crash.
All that is evidently hightly speculative.
Walter Roberson
on 5 Apr 2021
If I recall correctly, there are now cases in which portions of an array are passed instead of creating a new array containing just the section, but I did not happen to take note of the circumstances under which that can happen.
Does the same problem occur if you have a notably larger array? The scalar case can be held inside a small block, so the rules might be different for such a small size.
Bruno Luong
on 5 Apr 2021
Edited: Bruno Luong
on 5 Apr 2021
I guess byt "the portion" you mean inplace change, that happens if the array is on the RHS.
In my test code if output is not scalar and using malloc() then it crashes when the variable is cleared.
MATLAB definitively treats scalar data differently.
Attach is the mex file for you to test. The first argument is the length of the output; if the second argument is provided and > 0, it uses malloc (crash in purpose) instead of mxMalloc.
Walter Roberson
on 5 Apr 2021
Not inplace change, no: there are places now where if you index an array, then instead of making a copy of the desired section, that MATLAB instead creates a new header that points "inside" the existing data.
In my test code if output is not scalar and using malloc() then it crashes when the variable is cleared.
My memory is claiming that roughly 10 doubles fit inside a small block, and that thus it might not be strictly scalars that the special behaviour is for. Unfortunately I do not recall where I found the information... it might have been in a header file or inside some random .m file.
Bruno Luong
on 5 Apr 2021
Edited: Bruno Luong
on 5 Apr 2021
"Not inplace change, no: there are places now where if you index an array, then instead of making a copy of the desired section, that MATLAB instead creates a new header that points "inside" the existing data."
I think we speak about the same thing: I call this "inplace" in the sense that data (RHS) is inplace.
II have submitted such package in the pass on FEX, it works with some older versions then it is broken since MATLAB probihits users for doing that and constantly they change their data management since. I stopped trying to follow them, and I must admit that I can't even follow them for lack of published information that I need to make such pakage work reliably.
But now it is integrated in the engine, such package is no longer relevant.
Christopher Thomas
on 5 Apr 2021
Joss Knight
on 27 Mar 2021
2 votes
You might want to ask yourself why you need so many variables in your workspace at once and whether you couldn't make better use of the file system and of functions and classes to tidy up temporaries. If you actively need your variables, it's probably because you're using them, which means they're going to need to be stored in contiguous address space for any algorithm to operate on them efficiently. If it's data you're processing sequentially, consider storing as files and using one of MATLAB's file iterators (datastore) or a tall array. If it's results you're accumulating, consider writing to a file or using datastore or tall's write capabilities.
Memory storage efficiency is ultimately the job of the operating system, not applications. If you want to store large arrays as a single variable but in a variety of physical memory locations, talk to your OS provider about that. They in turn are bound by the physics and geography of the hardware.
16 Comments
Christopher Thomas
on 29 Mar 2021
Joss Knight
on 29 Mar 2021
Edited: Joss Knight
on 29 Mar 2021
This does seem to get your goat!
What needs to be sequential are addresses. How that maps to physical memory is up to the operating system. Applications that want to process numerical data efficiently can do no better than to ask the operating system for sequential address space, and leave it up to the operating system to decide how to distribute, or redistribute that as it sees fit. If you want this done better, talk to your OS author - or maybe write your own OS!
Christopher Thomas
on 1 Apr 2021
Joss Knight
on 2 Apr 2021
The people in this forum are here voluntarily to help people. If they say things that you think are incorrect they do not do so maliciously. The spirit of this forum should always be one of tolerance and gratitude, even when people are wrong.
I suppose it's worth reiterating before the argument continues that the answer to your question is obviously no. Perhaps that's all you needed confirmed. Some people, when they ask a question like that, really just want to complain about something about MATLAB they don't like. If that's the case here then...duly noted.
I'm interested to know what the C++ standard library function is to allocate non-contiguous memory, and if it's supported by the compilers and operating systems MATLAB supports. Can you tell me? To integrate any new kind of memory allocator obviously has a significant development impact and MathWorks would need to look at the benefit vs the costs.
Christopher Thomas
on 2 Apr 2021
Edited: Christopher Thomas
on 2 Apr 2021
Joss Knight
on 2 Apr 2021
If you want that kind of conversation, raise a Tech Support query, don't come to MATLAB Answers. This isn't a support ticket, and regardless of what it says next to my name, it is just me answering questions to the best of my knowledge in my own personal time. Hold me to a higher standard if you like, be more irritated with me if I'm wrong if you like, but it's not likely to get you where you want to go any faster. Being difficult isn't likely to make me think it's worthwhile pursuing this with greater depth.
Your answer about malloc just makes me more confused. Have you taken the MATLAB documentation regarding 'contiguous memory' to mean MATLAB uses something other than malloc? Because that really is what MATLAB does. MATLAB doesn't take any special steps to prevent memory from being non-contiguous in physical address space if that's what the OS wants to do. All the documentation is trying to do is point out that the elements of a numeric array are adjacent even when the array is multi-dimensional, while for structures and cell arrays they are not. Indeed, it's perfectly clear that MATLAB doesn't force physical adjacency since it's easy to see that MATLAB uses swap space just by checking the Task Manager or top.
Christopher Thomas
on 2 Apr 2021
Joss Knight
on 2 Apr 2021
Edited: Joss Knight
on 3 Apr 2021
MATLAB stores variables in continuous address space. I don't know what past replies have made you think this somehow means that MATLAB prevents the OS from allocating memory in whatever way it normally does. I don't even know how that's possible. All that matters is that MATLAB does not store one array in multiple, independently allocated blocks. That is what causes MATLAB to exhibit certain behaviour like running out of memory or entering swap when there is still theoretically enough memory left for a new array, or needing to perform copies of significant size when arrays are resized. Which is the sort of thing people tend to ask questions about. I was imagining you were going to tell me there was a way to ask the OS to allocate memory in some way that is more efficient with physical memory at the cost of performance, but it seems like you thought the other way round - that MATLAB was doing something special to prevent that happening.
If I seem stupid to you it's because I don't understand this distinction you make between physical RAM and virtual addressing, and I think my original post makes that pretty clear. Only the OS and probably the BIOS decides how addresses map to physical memory. I assumed when you said physical you were making some sort of distinction between memory from a single allocation and an array made up of multiple allocations. Similarly, you used the terms contiguous and fragmented like that is what you were requesting - an array made up from multiple allocations that can therefore better handle fragmented memory.
So I guess this answers your question? That MATLAB already does what you wanted? I certainly can't rule out both further limitations on my ability to understand what you want and what you mean, my knowledge of the way OSs and computer hardware manage memory, and on my precise knowledge of MATLAB's memory management system. I'll do my best to help though. I'm stubborn that way.
Walter Roberson
on 3 Apr 2021
This was the rationale given when other users asked about Matlab's memory use in the past.
Could you provide a few links to posts so we can review exactly what was said?
Christopher Thomas
on 5 Apr 2021
Walter Roberson
on 5 Apr 2021
My priority now is getting the affected code working
?? We are not clear as to what symptoms you are seeing, that you were suspecting might be due to memory allocation issues ??
Christopher Thomas
on 6 Apr 2021
Bruno Luong
on 6 Apr 2021
Edited: Bruno Luong
on 6 Apr 2021
It doesn't sound anything related to MATLAB contiguous memory management.
If the memory footprint isn't stabilize during a long simulation run then IMO you probably have memory leaks somewhere (that could due to user program bug or MATLAB bug).
Christopher Thomas
on 6 Apr 2021
Bruno Luong
on 6 Apr 2021
I have work more than 20 years with MATLAB with small, large simulation from simple tasks but long simulation (that last weeks to months) to complex tasks (simulation a full autonomous twin robots working during day).
AFAIR I never seen memory increases foreever due to memory fragmentation.
In my various expexriences, once the simulation is going, the memory state quickly stabilize.
If it doesn' stabilize than your simulations must do something constantly different over time in the computer memory.
Anyway up to you to stick with your assumption.
Christopher Thomas
on 6 Apr 2021
Is there any way to tell Matlab (via startup switch or other option) to allow variables to be stored in fragmented physical memory?
No, there is absolutely no chance to impelement this. All underlying library functions expect the data to be represented as contiguous blocks.
If you need a distributed representation of memory, e.g. storing matrices as list of vectors, you have to develop the operations from scratch. This has severe drawbacks, e.g. processing a row vector is rather expensive (when the matrix is stored as column vectors). But of course it works. You only have to relinquish e.g. BLAS and LAPACK routines.
I've written my first ODE integrator with 1 kB RAM. Therefore I know, how lean the first 250 GB of RAM are. But the rule remains the same: Large problems need large machines.
7 Comments
Christopher Thomas
on 24 Mar 2021
Jan
on 25 Mar 2021
"All operations performed in user-mode (rather than kernel-mode) use virtual addressing"
But then Matlab would use virtual addressing also. If this is true, memory fragmentation would not be a problem. The BLAS/LAPACK/MKL/etc. libraries are optimized considering the memory management of the CPUs. For the optimal performance it does matter, if you use virtual on non-virtual memory addressing.
My programs do match in the RAM of my computers. I would not be happy if all codes are remarkably slower, when Matlab uses virtual addressing for all variables. But it would be useful, that a user can decide that some variables are addressed such, that they can span non-contiguos RAM pages. Isn't this the case for tall arrays?
I should be easy to implement a class, which uses virtual allocation for the data. As far as I can see, all standard operations should work directly, except for the allocation and change of the array size. It is not trivial to catch the exceptions for shared data copies.
Christopher Thomas
on 25 Mar 2021
Christopher Thomas
on 26 Mar 2021
[EDITED, original]:
I tried it: My Matlab does not use the pagefile. If the RAM is exhausted, it is exhausted and a huge pagefile does not allow Matlab to create larger arrays.
[EDITED, fixed]: This was a mistake from a test in a virtual machine. Matlab does use the pagefile under Windows. Increasing the size of the pagefile allows to create larger arrays.
It was your initial point, that "Matlab stores variables in contiguous physical memory blocks". Now you claim, that virtual addressing is used instead, which would allow an automatic paging. This is a contradiction.
"Matlab also makes extensive use of copy-on-write for passing function arguments, which requires virtual addressing."
No, this does not need virtual addressing. Of course you can implement a copy-on-write strategy by an exception of a write-protected page, but this does not match the behaviour of Matlab: You can write to the memory directly inside a C-mex function and this destroyes the copy-on-write strategy: By this way you can poke into variables, which share the memory but are not provided as input:
x = ones(1, 1e6);
y = x;
yourCMexFunction(y);
% ==>
*mxGetPr(prhs[0]) + 1 = 5;
% <==
x(1:3) % [1 5 1] !!!
There is no automatic detection of a write access. This is a severe problem and discussed in the Matlab forums exhaustively for 25 years.
"For structures allocated in contiguous physical memory, all pages are copied when the fault occurs rather than just the page that was written to."
This is exactly what happens in Matlab.
Obviously your conceptions about Matlab's memory management are not matching the facts. You sound very convinced, if you try to tell others about their "misconceptions". Unfortunately you do not understand the topic you are talking of.
By the way, you could control the memory manager of Matlab 6.5 with different startup parameters. As far as I know this was not officially documented. You still find the undocumented mex functions mxSetAllocFcns and mxSetAllocListeners in the libraries. The licence conditions forbid a reverse-enineering of these functions.
You can simply write some Mex function, which allocate memory by malloc and VirtualAlloc, and compare the run times when calling optimized BLAS and LAPACK functions. My conclusion: I do not want this for standard variables. If some data exhaust my computer, I buy a larger computer or use tall array and a distributed processing in a cluster.
Christopher Thomas
on 30 Mar 2021
Jan
on 2 Apr 2021
I've clarified my mistake in my former comment: Matlab does use the pagefile under Windows.
Your text code shows, that Matlab uses a copy-on-write method. This is documented. Your assumption, that this is done automatically using exceptions for write-protected virtual pages, does not match the implementation in Matlab.
Steven Lord
on 25 Mar 2021
1 vote
You know, I really want to read the newest Brandon Sanderson novel. But I don't have room on that shelf in my bookshelf for the volume. Let me store pages 1 through 20 on this bookshelf upstairs. Pages 21 through 40 would fit nicely in that little table downstairs. Pages 41 through 50 can squeeze into the drawer on my nightstand upstairs. Pages 51 through 80 could get stacked on top of the cereal in the kitchen cupboard downstairs. Pages ...
That's going to make it a lot more time consuming to read the new book. And since Sanderson's newest book is over 1200 pages long, I'm going to wear a path in the carpet on the stairs before I'm finished.
So no, there is no setting to tell MATLAB not to use contiguous memory.
The bits may be physically located in different locations on the chip, but to MATLAB and to the libraries we use they have to appear contiguous. Since in actuality I'm more likely to read that Sanderson novel on my tablet, the pages could be stored as a file split across different locations in the physical hardware of the SD card in my tablet but the reader software handles it so I see the pages in order and I would be annoyed if I had to manually switch to reading a different book every chapter to match the physical location of the data.
6 Comments
Christopher Thomas
on 25 Mar 2021
Steven Lord
on 26 Mar 2021
About the closest thing to the functionality you're requesting that MATLAB provides are some of the large file and big data capabilities like datastore and tall arrays. Distributed arrays in Parallel Computing Toolbox may be another option.
Christopher Thomas
on 26 Mar 2021
Jan
on 27 Mar 2021
"in this thread I've been given a rationale that was demonstrably mistaken" - which one?
There is no cheap standard method to store large objects in a limited space. The problems are equivalent for arrays in the RAM and parcels in a post van - except for the number of dimensions. Efficient programming includes a proper pre-allocation to avoid memory fragmentation. This problem cannot be solve auto-magically by the memory manager. Of course virtual memory or an automatic garbage collector are valid tries to manage the ressources efficiently, but they all have severe disadvantages also.
Obviously the feature you want is not the main problem of other users and of MathWorks, or there is simply no efficient solution.
John D'Errico
on 27 Mar 2021
Remember that knowing those elements are stored contiguously in memory is a hugely important feature, and making them stored in those memory locations improves the way the BLAS works. And that is a big feature in terms of speed. So while a few people MIGHT want to have a feature that would slow down MATLAB for everybody else, I doubt most users would be happy to know that because one person thinks it important, suddenly a matrix multiply is now significantly slower.
It won't happen, nor would I and a lot of other people be happy if it did.
Christopher Thomas
on 29 Mar 2021
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!