pulling non-consistent arrays out of a structure

Hello, I am trying to pull out arrays from a structure previously defined in my code to then find the median behavior, so I can plot alongside my individual object's behaviors. The overall goal is to track median behavior of all obects over the time of the experiment.
Let me create an example via words, as I do not know how to recapitulate via code. Object 1-5 were detected over the whole length of the experiment, so they have all data points. Object 6-9 had varying degrees of detection, but were detected for more than 70% of the experiment, and so are considered 'tracked well enough to retain'.
The problem is when attempting horzcat or any way to pull data out of the structure, due to the inconsistent size, matlab has an error. The overall idea I have to use a 'full detection' object to compare all other objects to, and if an image frame wasn't found in the array of a certain object, to fill the row that should have been that with NaN. Then all arrays would be consistent, and I could pull them out of their individual sections within the structure to then collectively median.
While I don't know how to do most of that, attached is the structure, and what I would like to do with the structure once the NaN is inserted into different objects. If there is a better or more efficient way to do this, perhaps without even inserting the NaN, please let me know!!
load 'DataStruct.mat'
% Note that Data.m was used to contruct two different populations, Data.PC3
% and .MDA, disregard Data.m at this point
%once NaN inserted, separate x and y of each object within the structure
%for easier extraction
idxPC3 =
5 6 7 8 9
for i = 1:length(idxPC3)
for j = 1:length(Data.m{idxPC3(i)}.n)
if i== 1
Data.PC3.x = Data.m{idxPC3(i)}.n{1,j}(:,1);
Data.PC3.y = Data.m{idxPC3(i)}.n{1,j}(:,2);
else
Data.PC3.x = horzcat(Data.PC3.x, Data.m{idxPC3(i)}.n{1,j}(:,1));
Data.PC3.y = horzcat(Data.PC3.y, Data.m{idxPC3(i)}.n{1,j}(:,2));
end
end
end
MedPC3= median(Data.PC3.y, 2) %the median,2 is used for median behavior across time points, not at the median time point
plot(Data.PC3.x, MedPC3) %plot the median behavior with respect to time (x data points)

 Accepted Answer

Here's a way you can extract and concatenate all the x,y data. However, I don't understand what you are trying to do with the median operation. Since x(:,t) and y(:,t) both vary with t, taking the median of y(k,t) across t will not give a median value corresponding to a well-defined x-coordinate. You would need to do some sort of interpolation of the x,y data onto a common x-axis.
load DataStruct
idxPC3 =5:9;
data=Data.m(idxPC3);
data=[data{:}];
data=[data.n];
L=max(cellfun(@height,data));
for i=1:numel(data)
data{i}(end+1:L,:)=nan;
end
data=cell2mat(data);
x=data(:,1:2:end);
y=data(:,2:2:end);
whos x y
Name Size Bytes Class Attributes x 79x133 84056 double y 79x133 84056 double

8 Comments

You would need to do some sort of interpolation of the x,y data onto a common x-axis.
One way to do that would be, e.g.,
xcommon=linspace(min(x(:)), max(x(:)))';
clear ycommon
for k=width(y):-1:1
ycommon(:,k)=interp1(x(:,k), y(:,k), xcommon)
end
ymedian=median(ycommon,2);
plot(xcommon, ymedian)
First, thank you for responding @Matt J, I never knew that a structure's levels could be condensed in such a succinct manner. however for the:
for i=1:numel(data)
data{i}(end+1:L,:)=nan;
end
I was hoping to instead say
using object 1 that has the max cell height, does object 2 match cell height, if not,
figure out which row does not, and put a NaN specifically in that row. That way when doing the median based on the column, it should (if I understand that function) only consider all of the columns of a given row that belong to the 'y' variable to create a median of that row. That way you will get a median for every row, and every row corresponds to a different time point, so you get an overall median behavior over time.
If that is correct, putting the NaN at the end would absolutely be combining incorrect time points as you suggested, and the only way past that is to interpolate, but we would still need to consider putting the NaN at the relevant time gaps, correct?
Thanks,
Nick
if not, figure out which row does not, and put a NaN specifically in that row...That way you will get a median for every row, and every row corresponds to a different time point, so you get an overall median behavior over time.
But that seems to assume that the x-values x(:,t) will always be an exact subset of x(:,1). I wouldn't have expected that could be true, but if it's really true, the loop can be modified to,
data=Data.m(idxPC3);
data=[data{:}];
data=[data.n];
x1=data{1}(:,1);
ynan=nan(size(x1));
for i=1:numel(data)
x=data{i}(:,1);
y1=ynan;
I=ismember(x1,x); %possibly you need to use ismembertol ?
assert(sum(I)==length(x),'Assumption failed: x(:,t) is not a perfect subset of x(:,1)')
y1(I)=data{i}(I,2);
data{i}=[x1,y1];
end
data=cell2mat(data);
x=data(:,1:2:end);
y=data(:,2:2:end);
All x(:,t) should* always be a subset of the max(height(x(:,t))), as the maximum height x(:,t) should* be detected during all image frames, and everything else would be all image frames or less. Thank you for teaching me how to do this @Matt J! It was invaluable!
This answer was inspired and modified from an orginal answer by @Matt J! The original intent was to place NaN where there was missing data points with reference to a global time stamp. There were 3 main issues with completion under the code that was the most recent (in comments). For those that may need this in the future, here are the issues, and then the solution will be posted at the end of this breakdown.
There were issues in compilation, as the line:
y1(I)=data{i}(I,2);
caused an error when x did not exactly equal x1, as length(I) was greater than length(x), so it attempted to pull from data that did not exist. The assertion built in did not catch that as written.
Additionally, the first of the array in data that generated x1 was not necessarily the maximum, which assisted the previous issue.
Finally, the NaN that was built in to the previous code put NaN's at the end of the array, which would not tell me which timestep they pertain to. I am not as well versed as Matt J, so I did not know how to eloquently fix these. I am sure there are steps to shorten this, and prevent nested for and if statements, but are above my current coding knowledge. The solution to all of those, and creation of a plot based on all non NaN data (the end goal) is below. Thank you again Matt J for your help in a lot of these steps, I couldn't have done it without you.
celldata_PC3=Data.m(idxPC3);
celldata_PC3=[celldata_PC3{:}];
celldata_PC3=[celldata_PC3.n];
for i = 1:length(celldata_PC3)
a{i} = length(celldata_PC3{1,i});
end
b = max(cell2mat(a));
b_indx = find(b == cell2mat(a));
x1=celldata_PC3{b_indx(1)}(:,1); %this will be the global timestep all other x will be
% compared to to define where NaN should be located.
for i=1:numel(celldata_PC3)
xPC3=celldata_PC3{i}(:,1);
yPC3=celldata_PC3{i}(:,2);
I=ismember(x1,xPC3);
assert(sum(I)==length(xPC3),'Assumption failed: x(:,t) is not a perfect subset of x(:,1)')
NanIdx = find(I == 0);
Fixed_x = xPC3; %initialize so that for loop can detect initial length
Fixed_y = yPC3; %initialize so that for loop can detect initial length
for j = 1:length(NanIdx)
if ismember(1, NanIdx) %Using these nested if statements are necessary for 3 conditions of NaN location
if NanIdx(j) == 1
Fixed_x = [NaN; xPC3];
Fixed_y = [NaN; yPC3];
else
if NanIdx(j)>length(Fixed_x)
Fixed_x = [Fixed_x; NaN];
Fixed_y = [Fixed_y; NaN];
else
Fixed_x = [Fixed_x(1:NanIdx(j)-1); NaN; Fixed_x(NanIdx(j):end)];
Fixed_y = [Fixed_y(1:NanIdx(j)-1); NaN; Fixed_y(NanIdx(j):end)];
end
end
else
if NanIdx(j)>length(Fixed_x)
Fixed_x = [Fixed_x; NaN];
Fixed_y = [Fixed_y; NaN];
else
Fixed_x = [Fixed_x(1:NanIdx(j)-1); NaN; Fixed_x(NanIdx(j):end)];
Fixed_y = [Fixed_y(1:NanIdx(j)-1); NaN; Fixed_y(NanIdx(j):end)];
end
end
end
Fixed_celldata_PC3{i}=[Fixed_x,Fixed_y];
end
Fixed_celldata_PC3 = [Fixed_celldata_PC3{:}]
all_xPC3 =Fixed_celldata_PC3(:,1:2:end);
all_yPC3 =Fixed_celldata_PC3(:,2:2:end);
figure
hold on
for i= 1:length(all_xPC3)
plot(all_xPC3(:,i), all_yPC3(:,i), 'Color', [1 0 0 0.2], 'HandleVisibility', 'off')
end
medAlexaPC3 = median(all_yPC3,2, 'omitmissing');
plot(x1, medAlexaPC3, 'Color', [0.75 0 0], 'LineWidth', 3, 'LineStyle', ':', 'DisplayName', 'PC3 Median')
@Stephen23 why was this moved and the accepted answer changed? the previous answer did not completely and explicitly answer the question, those who do not know to look at the comments, will not, as the answer was accepted 'as is' due to your intervention.
"why was this moved and the accepted answer changed?"
To give the appropriate credit to Matt J for their effort volunteering their time helping you, because as you stated: "This answer was inspired and modified from an orginal answer by @Matt J! "
"the previous answer did not completely and explicitly answer the question..."
It is very common that a first attempt at answering question does not work, and that only after some discussion (e.g. in the comments) that a suitable resolution is found. In such cases, the answer that triggered the resolution is commonly accepted in recognition and support of the user who intiated that approach.
@Stephen23 ok. That was the purpose of mentioning them and their response (to give credit and thank), but if that's the standard, onlookers may need to be edjucated to look at comments. I typically find comments to be long, spurious to the topic, or random folks that want their own question to be answered instead of creating a new thread, so after many instances of wasted time on various topics, I have learned to typically disregard comments, especially if it has 5+ comments in the comment section.
Hence burying the completely polished answer is creating a disservice to folks that need help for specific topics like this, just in an effort to attribute an answer to someone that was mentioned and acreddited for their input. If this comes off as hostile, it is not my intention; I am looking at this from the perspective of helping people find a complete answer quickly and accurately as paramount over redistributing credit when the person was mentioned within the more polished answer.

Sign in to comment.

More Answers (1)

If you're using release R2023b or later, the resize function may be of use. Let's make some sample data.
data = {(1:3).', (4:8).', [9; 10], (11:17).'}
data = 1x4 cell array
{3x1 double} {5x1 double} {2x1 double} {7x1 double}
Let's see what the tallest vector is among those stored in the cells of data.
L=max(cellfun(@height,data))
L = 7
Now we resize each vector in data to that height, filling with NaN. Since the output of resize is not scalar, we need to store them back in a cell array using the UniformOutput name-value argument to cellfun.
data = cellfun(@(x) resize(x, L, FillValue=NaN), data, UniformOutput=false)
data = 1x4 cell array
{7x1 double} {7x1 double} {7x1 double} {7x1 double}
What does each cell looks like?
celldisp(data)
data{1} = 1 2 3 NaN NaN NaN NaN data{2} = 4 5 6 7 8 NaN NaN data{3} = 9 10 NaN NaN NaN NaN NaN data{4} = 11 12 13 14 15 16 17
Finally we can concatenate them all together.
D = [data{:}]
D = 7x4
1 4 9 11 2 5 10 12 3 6 NaN 13 NaN 7 NaN 14 NaN 8 NaN 15 NaN NaN NaN 16 NaN NaN NaN 17
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

2 Comments

Matt J
Matt J on 14 May 2024
Edited: Matt J on 14 May 2024
But Steve's solution simply pads the data with trailing NaNs. In your comment to my original answer, you said that this is not what you want.
You are correct, I misunderstood Steve's code, it does not insert the NaNs in the middle of data if a middle data piece is missing. It does not address the specific question we wished to answer. Thanks yet again @Matt J :).

Sign in to comment.

Categories

Products

Release

R2023a

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!