Changing contents of Cell Array mex files

I have a mex file where I have a cell array that contains some structure arrays. I want to grow those structure arrays as needed while processing a large dataset, and then finally return the cell array to matlab.
This has been painfully difficult. After many many matlab crashes, I finally have it working. I think the root cause of my trouble has something to do with how cell arrays manage memory. I see this mentioned in the documentation for mxGetCell:
..this might explain why when I tried to re-size my structure arrays (obtained from the mxGetCell call) using the method described in this link, things crashed horribly. However, if one looks at the online documentation for mxSetCell, you see this rather contradictory advice:
so which is it? What I have working is a rather brute-force solution. When I want to grow the structure array, I allocate it in the new size with a call to mxCreateStructArray, and then I manually copy every element and field with calls to mxGetFieldByNumber and mxSetFieldByNumber. I then call mxSetCell and I do not call mxDestroyArray on the original structure array, contrary to the advice in the mxSetCell documentation.
I guess this works, but it's inefficient and I'd rather use the more-efficient method for growing structure arrays utilizing reAlloc....but it seems like this might be incompatible with how cell arrays manage memory. In any case, it would be nice to understand what is going on so I can avoid problems in the future, and also some clarification on the inconsistencies in the documentation.

2 Comments

James Tursa
James Tursa on 26 Jun 2020
Edited: James Tursa on 26 Jun 2020
Are you passing a cell array into the mex routine, adding on to it in the mex routine, and then passing that expanded result back to the caller? And you want to know how to do that correctly and efficiently?
Or are you creating the cell array from scratch inside the mex routine and expanding it inside the mex routine?
Can we see your actual code so that we can comment on it? If you explain what you are trying to do, we can show you the correct official way to do it. There is a fast unofficial way to do it as well, which may or may not apply in your case depending on what you are trying to do.
I'm creating a cell array inside the mex function. As I process data I first add struct arrays to the cell elements...and as I process more data I wind up having to grow the structure arrays so I allocate new ones and replace them inside the cell array.
I can post code, but it will take me some time to strip out stuff I can't post. I can likely create some simple example that demonstrates my problem more succinctly.

Sign in to comment.

 Accepted Answer

James Tursa
James Tursa on 26 Jun 2020
Edited: James Tursa on 26 Jun 2020
When you mxDestroyArray a cell array or struct array, it does a deep destroy. Meaning all of the cell array or struct array elements are deep destroyed first, then the cell array or struct array itself is destroyed. That is why you should not call mxDestroyArray on the result of a mxGetCell or mxGetField call without cleaning things up, because if you do then you have invalidated the memory that is contained in the cell array or struct array and when the cell array or struct array eventually gets destroyed it will try to free invalid memory and bomb.
Regarding the documentation:
Do not call mxDestroyArray on an mxArray returned by the mxGetCell function
This refers to cell arrays that come from one of the prhs[ ] input variables, where doing so can screw up the workspace if it is a shared data copy of another variable. This sentence does not apply to cell arrays that you create and populate inside the mex routine where you know it is not a shared data copy of another variable, as long as you NULL out the corresponding spot in the cell array. Same thing is true for struct arrays btw.
To free existing memory, call mxDestroyArray on the pointer returned by mxGetCell before you call mxSetCell
This refers to cell array content that you originally created inside the mex routine, i.e. not part of a prhs[ ] variable. As long as you created it inside the mex routine, you can destroy it safely as well. E.g., start with this:
mxArray *mycell, *myvariable, *var;
mycell = mxCreateCellMatrix(1,1); /* 1x1 cell array */
myvariable = mxCreateDoubleScalar(5.0); /* Temporary, on garbage collection list */
mxSetCell( mycell, 0, myvariable ); /* myvariable is now Sub-Element of mycell */
The pointer contained in myvariable is put directly into mycell. Not a copy of the variable, but the actual pointer itself. And the type of the variable is changed from Temporary (on the garbage collection list) to Sub-Element (NOT on the garbage collection list). It's disposition is now entirely dependent on the disposition of the cell array it is part of.
What you cannot do is this:
var = mxGetCell( mycell, 0 ); /* this part is ok */
mxDestroyArray( var ); /* you have just made mycell invalid */
mxDestroyArray( mycell ); /* this will bomb MATLAB */
That last line will bomb MATLAB because mycell still contains the pointer of the variable you previously destroyed, so when MATLAB tries to subsequently destroy that element it will access invalid memory and bomb. That is, destroying var is actually OK since you originally created var in the mex routine ... but not cleaning up mycell properly will lead to a crash.
The correct way to extract and destroy a variable you originally created in a mex routine is:
var = mxGetCell( mycell, 0 ); /* this part is ok */
mxSetCell( mycell, 0, NULL ); /* NULL out the pointer that we just extracted */
/* See NOTE below */
mxDestroyArray( var ); /* OK since we originally created this inside the mex routine */
mxDestroyArray( mycell ); /* this will work OK */
NOTE: The var you extracted from mxGetCell( ) is actually not a Temporary variable anymore, it is a Sub-Element since it came from a cell array. Meaning, it is NOT on the garbage collection list and will NOT get automatically destroyed when the mex routine exits. This is true even though you originally created it inside the mex routine (as soon as you called mxSetCell( ) with this as input the type changed and it was removed from the garbage collection list). At this point you must do one of two things to avoid a memory leak. Either mxDestroyArray( var ) downstream in your code, or attach var to a cell or struct array. There are no API functions to put an mxArray back on the garbage collection list once it has been removed.

12 Comments

But this is not what I am doing. I am calling mxGetCell, then allocating a new structure array and replacing the cell element with a call to mxSetCell. The documentation for mxSetCell instructs me to destroy the original mxArray in this instance...but doing so make things crash. The deep destroy of the cell array should not care about my destroy of the original struct array because the original struct array has been replaced in the cell with a new array.
Also, the crash happens before I ever get around to freeing the cell array...so I don't think it has to do with the destroy implementation of the cell array.
That's why I asked to see your code, so that I can see what you are actually trying to do. I think you said it is something like this:
mxArray *var;
var = mxGetCell( some_cell_array, k );
mxSetCell( other_cell_array, m, var ):
This will certainly bomb MATLAB. The reason is that you have the same mxArray pointer (var) inside two different cell arrays, but MATLAB doesn't know about it and hasn't bumped up the reference count. So when either of some_cell_array or other_cell_array gets destroyed, it invalidates the other one which will lead to a bomb when it gets destroyed. The only official way to do this (which is not at all efficient) is to deep copy the mxArray as follows:
mxArray *var;
var = mxGetCell( some_cell_array, k );
mxSetCell( other_cell_array, m, var ? mxDuplicateArray(var) : NULL ):
This is the only official way to copy elements from one cell array to another cell array (or to another element within the same cell array) inside a mex routine. You have to make a deep copy and then attach that to the other cell array or element spot. The same would be true if you are trying to attach the same mxArray variable to multiple cell arrays or multiple spots in the same cell array ... you would have to make multiple deep copies, one for each mxSetCell call. This is not the way that MATLAB actually does it behind the scenes in their code, btw. MATLAB creates shared data copies or shared reference copies (very efficient). There is a way to duplicate this method inside a mex routine, but you have to use unoffical hacks into the mxArray to do it. This is no longer trivial because MATLAB has obfuscated and/or removed the API library functions that do this, but it can be done.
If you are simply trying to transfer the element from one cell array to another, then you would do this:
mxArray *var;
var = mxGetCell( some_cell_array, k );
mxSetCell( some_cell_array, k, NULL );
mxSetCell( other_cell_array, m, var ):
In this last bit of code, we haven't changed the number of times that var appears as a cell array element, so we don't have to fuss with the reference count.
And in general you should only destroy cell elements that are not already NULL. When you first create a cell array with mxCreateCellMatrix all of the elements are set to NULL automatically, so you don't need to destroy them before overwriting them. If you are downstream in your code and are not sure, then you should test first. E.g.,
var = mxGetCell( some_cell_array, k );
if( var ) mxDestroyArray(var);
mxSetCell( some_cell_array, k, other_var );
If you are simply trying to modify a cell or struct element, one that you originally created in the mex routine, then it is simpler yet. Simply get the pointer, and modify the variable. E.g.,
var = mxGetCell( some_cell_array, k );
/* code here to modify var, such as realloc its data area etc */
Nothing else needs to be done. As long as var was originally created by you inside the mex routine, you can modify it at will. The first line above simply gets you the pointer to the variable ... it doesn't remove the pointer from the cell array. That pointer is still there. So all of the subsequent lines where you modify var are actually modifying an element of the cell array in-place. No need to subsequently mxSetCell it back into the same spot since it never left that spot in the first place. You certainly would not want to mxDestroyArray it first because that wipes out the variable you are working on.
The caveat to this is if you have shared data copies or reference copies of this variable inside other cell or struct variables. In that case modifying the variable in-place would affect all copies, not just the one you are working on. This may or may not be what you want to have happen, so you may or may not need to deep copy the variable first, destroy one copy of the original, and use mxSetCell on the modified deep copy.
You might also be interested in this link:
Thank you for all the detailed insight, it is helpful...though I am still not seeing what I have wrong. It is entirely possible I have something else wrong, and it only manifests itself as a crash when the Destroy is uncommented...but at the moment I'm not seeing the problem.
Here is a relatively short example that exhibits the problem. It either crashes or not based on whether the Destroy call is uncommented, or commented out.
void
mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray
*prhs[])
{
mxArray *cell_array = NULL;
const char * fieldnames[] = {"field1", "field2","field3","field4"};
int n_fields = sizeof(fieldnames)/sizeof(*fieldnames);
if (nlhs > 0)
{
mwSize pi_dims[2];
pi_dims[0]=1;
pi_dims[1]=1;
cell_array = mxCreateCellArray(2,pi_dims);
int n_rows = 10000;
// create structure array
mxArray *data_struct = mxCreateStructMatrix(n_rows, 1, n_fields, fieldnames);
// put something in it
for (int i=0;i<n_rows;i++)
{
for (int fn=0;fn<n_fields;fn++)
mxSetField(data_struct,i,fieldnames[fn],mxCreateDoubleScalar(i));
}
// put it in the cell array
mxSetCell(cell_array,0,data_struct);
mxArray *old_data_struct;
old_data_struct = mxGetCell(cell_array,0);
// create a new structure array 2x as large
mxArray *new_data_struct = mxCreateStructMatrix(n_rows*2, 1, n_fields, fieldnames);
// copy old stuff
for (int i=0;i<n_rows;i++)
{
for (int fn=0;fn<n_fields;fn++)
mxSetField(new_data_struct,i,fieldnames[fn],mxGetField(old_data_struct,i,fieldnames[fn]));
}
// put something in the expanded part
for (int i=n_rows;i<2*n_rows;i++)
{
for (int fn=0;fn<n_fields;fn++)
mxSetField(new_data_struct,i,fieldnames[fn],mxCreateDoubleScalar(i));
}
// put it in the cell array
mxSetCell(cell_array,0,NULL);
mxSetCell(cell_array,0,new_data_struct);
// ****** if this line is uncommented, it crashes.
// If it is commented-out, it does not crash
mxDestroyArray(old_data_struct);
// ******
plhs[0] = cell_array;
}
}
This line is incorrect and will eventually lead to a crash:
mxSetField(new_data_struct,i,fieldnames[fn],mxGetField(old_data_struct,i,fieldnames[fn]));
The reason is what I told you earlier, you can't copy mxArray pointers from one cell/struct array into another cell/struct array like this. Now the same mxArray variable exists in two different spots but MATLAB doesn't know it and hasn't bumped up the reference count to keep track of this extra copy. So eventually when MATLAB tries to destroy the last copy downstream in your code (maybe outside the mex function itself) it will access invalid memory and crash.
The best way to do this used to be as follows using the unofficial "reference count bump up" function mxCreateReference:
mxSetField(new_data_struct,i,fieldnames[fn],mxCreateReference(mxGetField(old_data_struct,i,fieldnames[fn])));
Now MATLAB has been told that you are putting the same mxArray variable into two different cell/struct array spots via the mxCreateReference call, so it won't destroy the mxArray until the very last copy is destroyed.
But, alas, MATLAB removed the mxCreateReference function from the API library several years ago so this is no longer an option. Your remaining options are
(1) Hack into the mxArray and bump up the reference count manually (not trivial to do but I know how to do it)
mxSetField(new_data_struct,i,fieldnames[fn],myCreateReference(mxGetField(old_data_struct,i,fieldnames[fn])));
(would need to write the hacking code for myCreateReference)
(2) Deep copy the mxArray.
mxSetField(new_data_struct,i,fieldnames[fn],mxDuplicateArray(mxGetField(old_data_struct,i,fieldnames[fn])));
Option (2) is really the only official way to do what you are trying to do with your code as it is currently written.
Having said all that, it looks like you are simply trying to extract the struct from the cell array, expand it, and then put the expanded version back into the cell array. As long as the cell array and struct were created by you in the mex routine, there is no need to copy any of the fields for this operation. SImply get the pointer to the struct with mxGetCell, realloc the data area of this struct, and add more stuff to it (or NULL out the extra unused data pointers). Nothing else needs to be done since you will be modifying the struct that is already inside the cell array inplace.
NOTE: There is no need to test nlhs > 0 in order to put something in plhs[0]. The plhs[0] spot is always guaranteed to be available even if nlhs == 0. This is often the case when the caller wants the result to go into ans, or maybe the call was part of a function argument list, etc.
Also, in these two lines:
mxSetCell(cell_array,0,NULL);
mxSetCell(cell_array,0,new_data_struct);
there is no point in setting a pointer to NULL if you are going to set it to something else in the very next line. The first line of setting it to NULL accomplishes nothing.
Thank you, that makes sense now...and it does indeed solve the problem to insert a call to mxDuplicateArray.
I originally started out attempting to use realloc, but that was also failing horribly. Probably I had something else wrong with that. I may go back to that and see if I can get it working. I'm happy enough to having something working at the moment.
If you don't have very many variables involved and the data sizes are small, it probably doesn't matter much and using mxDuplicateArray will be adequate. But in the long run it would be best to understand why things are working the way they are so that you can use the realloc method for efficiency.
If you are interested, I can post an example that does not work using the realloc method. Probably something else relatively simple is wrong, but again I am not seeing it.
Sure, go ahead and post it and I will take a look.
Here is an example re-alloc based implementation that crashes. Again, probably something simple but I'm just not seeing what I'm doing wrong.
void
mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray
*prhs[])
{
mxArray *cell_array = NULL;
const char * fieldnames[] = {"field1", "field2","field3","field4"};
int n_fields = sizeof(fieldnames)/sizeof(*fieldnames);
if (nlhs > 0)
{
// create cell array to return
mwSize pi_dims[2];
pi_dims[0]=1;
pi_dims[1]=1;
cell_array = mxCreateCellArray(2,pi_dims);
int n_rows = 10000;
// create structure matrix with n_rows
mxArray *data_struct = mxCreateStructMatrix(n_rows, 1, n_fields, fieldnames);
// put something in it
for (int i=0;i<n_rows;i++)
{
for (int fn=0;fn<n_fields;fn++)
mxSetField(data_struct,i,fieldnames[fn],mxCreateDoubleScalar(i));
}
// put it in the cell array
mxSetCell(cell_array,0,data_struct);
// now we grow it w/ realloc
// get the structure from the cell array
mxArray *old_data_struct;
old_data_struct = mxGetCell(cell_array,0);
const mwSize *sz = mxGetDimensions(old_data_struct);
// double the size
mwSize new_rows = sz[0]+n_rows;
// reallocate the struct array to be larger
mwSize new_size = new_rows * n_fields * sizeof(mxArray *);
mxArray *re_allocated_data_struct_memory = (mxArray *)mxRealloc(mxGetData(old_data_struct), new_size);
mxSetData(old_data_struct, re_allocated_data_struct_memory);
mxSetM(old_data_struct, new_rows);
// put something in the expanded part
for (int i=sz[0];i<new_rows;i++)
{
for (int fn=0;fn<n_fields;fn++)
mxSetField(old_data_struct,i,fieldnames[fn],mxCreateDoubleScalar(i));
}
plhs[0] = cell_array;
}
}
I don't have time to look at this in detail yet, but this line
mxArray *re_allocated_data_struct_memory = (mxArray *)mxRealloc(mxGetData(old_data_struct), new_size);
should be this instead
mxArray **re_allocated_data_struct_memory = (mxArray **)mxRealloc(mxGetData(old_data_struct), new_size);
The data pointer should point to pointer_to_mxArray, not directly to mxArray. That is, the type of data that the data area holds is pointers to mxArray, not mxArray.
This seems to work without crashing. It seems you simply have bad loop indexes and weren't filling any extra data. This is because sz points at the actual dimensions of the struct array, not a copy of the dimensions. So calling mxSetM( ) will immediately change the values that sz points to. See my NOTES.
#include "mex.h"
void
mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray
*prhs[])
{
mxArray *cell_array = NULL;
const char * fieldnames[] = {"field1", "field2","field3","field4"};
int n_fields = sizeof(fieldnames)/sizeof(*fieldnames);
// if (nlhs > 0) // NOTE: Don't need this
{
// create cell array to return
mwSize pi_dims[2];
pi_dims[0]=1;
pi_dims[1]=1;
cell_array = mxCreateCellArray(2,pi_dims);
int n_rows = 10000;
// create structure matrix with n_rows
mxArray *data_struct = mxCreateStructMatrix(n_rows, 1, n_fields, fieldnames);
// put something in it
for (int i=0;i<n_rows;i++)
{
for (int fn=0;fn<n_fields;fn++)
mxSetField(data_struct,i,fieldnames[fn],mxCreateDoubleScalar(i));
}
// put it in the cell array
mxSetCell(cell_array,0,data_struct);
// now we grow it w/ realloc
// get the structure from the cell array
mxArray *old_data_struct;
old_data_struct = mxGetCell(cell_array,0);
const mwSize *sz = mxGetDimensions(old_data_struct);
// double the size
mwSize new_rows = sz[0]+n_rows;
// reallocate the struct array to be larger
mwSize new_size = new_rows * n_fields * sizeof(mxArray *);
mxArray **re_allocated_data_struct_memory = (mxArray **)mxRealloc(mxGetData(old_data_struct), new_size); // NOTE: Changed to (mxArray **)
mxSetData(old_data_struct, re_allocated_data_struct_memory);
mxSetM(old_data_struct, new_rows); // NOTE: This changes the value of sz[0] to new_rows !!!
// put something in the expanded part
// for (int i=sz[0];i<new_rows;i++) // NOTE: This loop DOES NOTHING
for (int i=n_rows;i<new_rows;i++) // NOTE: Changed sz[0] to n_rows
{
for (int fn=0;fn<n_fields;fn++)
mxSetField(old_data_struct,i,fieldnames[fn],mxCreateDoubleScalar(i));
}
plhs[0] = cell_array;
}
}
THATS what's wrong! Yes..it was something stupid.
Thank you!

Sign in to comment.

More Answers (0)

Categories

Find more on Get Started with MATLAB in Help Center and File Exchange

Products

Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!