Error with splitting table into separate columns.

I am working with a 616x91 table that I am reorganising using:
x = table1(1:48:end,:);
y = x(:)
a = table1(2:48:end,:);
b = a(:)
and so on.... until
c = table1(48:48:end,:);
d = x(:)
Each data point is repeated (not an exact repition of data, just the data point) every 48 rows, hence the 48.
My issue is that each output variable (y,b,d etc.) variable should be a 1183x1 column however, the last 8 output variables output a 1092x1 column. I.e. the last 8 variables are:
m = table1(41:48:end,:); q = table1(48:48:end,:);
n = m(:) ......to..... p = q(:)
Can anyone explain why this is happening?

12 Comments

dpb
dpb on 28 Jul 2021
Edited: dpb on 28 Jul 2021
You don't have enough columns -- 2*48 = 96 and you have only 91...so your second selection is 5 columns fewer than the first.
I'd suggest it's not likely you need to actually create the new variables and certainly not as named-sequentially separate variables...use dynamic addressing to select the columns needed for working at a time or, at worst, use an array or cell array.
I don't really understand what you mean. My MATLAB skills are not good. Would you be able to explain further?
"Would you be able to explain further?"
In general, splitting data up makes it harder to process. It is usually easier to loop over one set of data and use indexing to select the required data for processing, or by using tools specifically designed for operating on groups within data sets:
Creating lots of separate variables is very unlikely to be a good approach.
Thank you. I will try this
I am still able to find a solution. If attach the data as an excel sheet would would it help?
Data generally helps for reading files, etc., but if it's just an array the data itself isn't of much import -- "parts are parts!".
What we really need to know is what your end purpose is in accessing the data in this fashion and what you would be doing with it if you could do what you envision.
The point there is, we can probably (almost certainly?) point you in a way to accomplish the end; there may not be any better way to do what you describe, but that is, as Stephen points out, almost certainly not "the MATLAB way" to implement whatever it is that you're try to code efficiently.
So, data may be of some help in that it would illustrate a particular case, but what we really need is a description of what the end result needs to be, not how to get there.
Thanks for your response.
I am only using MATLAB to rearrange the data in a way that each variable is comparable for each person for each day. I will be using Excel to display the data.
Currently, the data is in an Excel sheet, with the top row (from C1 to CP1) consisting of dates, so 91 in total. Column A is the person's name but as there are 48 data points for each person, each person has their name entered 48 times. There are 13 people in total so 624 rows. Take Person 1 for example, 'Person 1' is entered in A2 to A49. Then column B is the name of each data point, let's call them Variable1......Variable48, so column B consists of the name of each data point entered in B2 to B49.
'Person 2' is entered in A50 to A97 with the data point names used in B2 to B49 repeated in B50 to B97.
I am trying to move to data so that column A will consist of the date, column B will hold the person's name and row 1 will hold the names of the data points, with the data in the rows underneath.
For example, for the first day of data (20/03/2021), this date will be entered 13 times in column A (A2 to A14), in column B the names of each person will be entered (B2 to B14) i.e. Person 1 in B2, Person 2 in B3.......Person13 in B14. Then the data that each person reported for each data point will be in the next column.
This is to be repeated for each of the 91 days. So in the end I should have an 1183x48 table of data (excluding the names, dates column, and data point names row)
The code have been using is shown below. I understand that this is probably not a good way to approach this but it has worked for a sheet with only 9 data points per person, it will not work for me now.
I would really appriecate any help as I am totally lost as to why this won't work now.
x1 = table1(1:48:end,:)
y1 = x1(:)
x1 = table1(2:48:end,:)
y2 = x2(:)
x3 = table1(3:48:end,:)
y3 = x3(:)
x4 = table1(4:48:end,:)
y4 = x4(:)
%continue all the way to......
x48 = table1(48:48:end,:)
y48 = x48(:)
combined = [y1 y2 y3 ... y48]
I didn't say to not append a file... :)
What are you going to do in Excel that couldn't be done more simply in MATLAB?
Thanks again for your response
I'm not doing anything in Excel. The document has to be submit as an Excel file so I'll just copy the data across.
I would rather not attach the file as the data is sensitive. Can I offer a more detailed explanation as to what I need?
Just a small(ish) template that outlines the shape/content with randomized/obfuscated names, etc., is enough. It doesn't even have to be all 96 or 48 or whatever, just enough that somebody can load a file of the desired general structure without having to try to make it up for themselves.
We can try from the description alone, but having something that actually is in the actual format is much simpler than trying to figure it out...remember we don't have all the inate knowledge of the problem you have; it's harder to visualize from just the words although I'm sure it could be done; I'm personally not inclined to go try to build a sample spreadsheet from the description, sorry... :)
This is the problem. y2 should be the same size as y1.
I have attached the file, the first sheet is the data now. The seconds sheet is how I need the data do be displayed.

Sign in to comment.

 Accepted Answer

This is an application of unstack and then stack. The code looks a lot more complicated than it might if I hard-coded in sizes, as you might. It's really just a call to unstack, a call to stack, and some bookkeeping. Also, you've used the word "Variable", which is gonna get confused with how that word is used in the tabular lexicon, so I'm using "Measurement".
First, your file is somewhat broken. Easy enough to read, but the Person column is not complete. And there are mostly NaNs in the data. Here's what I did:
t1 = readtable("Example.xlsx", ...
"TextType","string", "ReadVariableNames",true, "PreserveVariableNames",true)
people = "Person " + (1:13)';
t1.("Person (First & Surname)") = repelem(people,48);
t1{:,3:end} = rand(height(t1),91);
But let me illustrate with something smaller.
>> people = ["Person1";"Person2";"Person3";"Person4"];
>> meas = ["Measurement1";"Measurement2";"Measurement3"];
>> dates = datetime(2121,7,29:34);
>> X = rand(length(people)*length(meas),length(dates));
>> t0 = table(repelem(people,length(meas),1), repmat(meas,length(people),1));
>> t1 = [t0 array2table(X)];
>> t1.Properties.VariableNames = ["Person" "Measurement" string(dates)]
t1 =
12×8 table
Person Measurement 29-Jul-2121 30-Jul-2121 31-Jul-2121 01-Aug-2121 02-Aug-2121 03-Aug-2121
_________ ______________ ___________ ___________ ___________ ___________ ___________ ___________
"Person1" "Measurement1" 0.86958 0.56376 0.96026 0.42636 0.75018 0.82838
"Person1" "Measurement2" 0.30749 0.26297 0.43406 0.53727 0.86967 0.39818
"Person1" "Measurement3" 0.30016 0.16764 0.36816 0.09737 0.15583 0.407
"Person2" "Measurement1" 0.26145 0.57806 0.70349 0.013457 0.86016 0.81714
"Person2" "Measurement2" 0.093242 0.33359 0.74725 0.077764 0.444 0.73318
"Person2" "Measurement3" 0.40713 0.54292 0.72929 0.42199 0.32273 0.12627
"Person3" "Measurement1" 0.58666 0.31563 0.25499 0.62665 0.22856 0.21707
"Person3" "Measurement2" 0.54294 0.36551 0.48211 0.11886 0.34332 0.077295
"Person3" "Measurement3" 0.55193 0.34819 0.58987 0.13769 0.81084 0.20005
"Person4" "Measurement1" 0.087758 0.14986 0.076465 0.47683 0.36707 0.25788
"Person4" "Measurement2" 0.13081 0.91179 0.58878 0.10389 0.53224 0.44917
"Person4" "Measurement3" 0.99408 0.81306 0.60691 0.14631 0.43607 0.72003
Person and Measurement are categorical data, they should be stored in categorical variables.
>> t1.("Person") = categorical(t1.("Person"));
>> t1.Measurement = categorical(t1.Measurement);
OK, this has data separately for each date, and data for each measurement stacked up within each date. You want the opposite. So unstack on Variable:
>> meas = unique(t1.Measurement,"stable");
>> nmeas = length(meas);
>> dates = datetime(t1.Properties.VariableNames(3:end));
>> ndates = length(dates);
>> nvarsUnstacked = nmeas * length(dates);
>> t2 = unstack(t1,2+(1:ndates),"Measurement")
t2 =
4×19 table
Person 29-Jul-2121_Measurement1 29-Jul-2121_Measurement2 29-Jul-2121_Measurement3 30-Jul-2121_Measurement1 30-Jul-2121_Measurement2 30-Jul-2121_Measurement3 31-Jul-2121_Measurement1 31-Jul-2121_Measurement2 31-Jul-2121_Measurement3 01-Aug-2121_Measurement1 01-Aug-2121_Measurement2 01-Aug-2121_Measurement3 02-Aug-2121_Measurement1 02-Aug-2121_Measurement2 02-Aug-2121_Measurement3 03-Aug-2121_Measurement1 03-Aug-2121_Measurement2 03-Aug-2121_Measurement3
_______ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________ ________________________
Person1 0.5417 0.10085 0.43765 0.25821 0.2993 0.2469 0.2124 0.66636 0.77015 0.43363 0.46433 0.49833 0.87624 0.49025 0.90014 0.44137 0.33786 0.49158
Person2 0.90169 0.47628 0.84116 0.43142 0.77569 0.59875 0.25473 0.83978 0.99591 0.34608 0.56922 0.33262 0.18738 0.50787 0.070031 0.62675 0.96446 0.50531
Person3 0.93431 0.57748 0.11194 0.51276 0.33596 0.25611 0.4042 0.92685 0.026795 0.82401 0.52511 0.90661 0.23808 0.42686 0.18892 0.28795 0.37948 0.38927
Person4 0.75668 0.79903 0.49847 0.14754 0.95279 0.18124 0.93929 0.72575 0.99809 0.35544 0.080383 0.14538 0.71777 0.13216 0.26415 0.94446 0.40165 0.36561
Now we just need to restack, this time dates within measurements. To do that, we'll need to group the variables in the table by the measurements they correspond to. There are other less arcane ways to do this, but this way is short.
>> varGroups = arrayfun(@(i) {1 + (i:nmeas:nvarsUnstacked)},1:nmeas)
varGroups =
1×3 cell array
{[2 5 8 11 14 17]} {[3 6 9 12 15 18]} {[4 7 10 13 16 19]}
Now we stack those groups of table variables.
>> t3 = stack(t2,varGroups,"NewDataVariableName",string(meas),"IndexVariableName","Date")
t3 =
24×5 table
Person Date Measurement1 Measurement2 Measurement3
_______ ____ ____________ ____________ ____________
Person1 2 0.5417 0.10085 0.43765
Person1 5 0.25821 0.2993 0.2469
Person1 8 0.2124 0.66636 0.77015
Person1 11 0.43363 0.46433 0.49833
Person1 14 0.87624 0.49025 0.90014
Person1 17 0.44137 0.33786 0.49158
Person2 2 0.90169 0.47628 0.84116
Person2 5 0.43142 0.77569 0.59875
Person2 8 0.25473 0.83978 0.99591
Person2 11 0.34608 0.56922 0.33262
Person2 14 0.18738 0.50787 0.070031
Person2 17 0.62675 0.96446 0.50531
Person3 2 0.93431 0.57748 0.11194
Person3 5 0.51276 0.33596 0.25611
Person3 8 0.4042 0.92685 0.026795
Person3 11 0.82401 0.52511 0.90661
Person3 14 0.23808 0.42686 0.18892
Person3 17 0.28795 0.37948 0.38927
Person4 2 0.75668 0.79903 0.49847
Person4 5 0.14754 0.95279 0.18124
Person4 8 0.93929 0.72575 0.99809
Person4 11 0.35544 0.080383 0.14538
Person4 14 0.71777 0.13216 0.26415
Person4 17 0.94446 0.40165 0.36561
Where's the dates go? They are there, or at least their var indices into t2 are. Fix that up, then add the weekday name, and sort on date.
>> dateInds = ceil(t3.Date/nmeas);
t3.Date = dates(dateInds)';
t3 = sortrows(t3,"Date");
t3 = addvars(t3,categorical(day(t3.Date,"name")),'After',"Date",'NewVariableName',"Weekday")
t3 =
24×6 table
Person Date Weekday Measurement1 Measurement2 Measurement3
_______ ___________ _________ ____________ ____________ ____________
Person1 29-Jul-2121 Tuesday 0.5417 0.10085 0.43765
Person2 29-Jul-2121 Tuesday 0.90169 0.47628 0.84116
Person3 29-Jul-2121 Tuesday 0.93431 0.57748 0.11194
Person4 29-Jul-2121 Tuesday 0.75668 0.79903 0.49847
Person1 30-Jul-2121 Wednesday 0.25821 0.2993 0.2469
Person2 30-Jul-2121 Wednesday 0.43142 0.77569 0.59875
Person3 30-Jul-2121 Wednesday 0.51276 0.33596 0.25611
Person4 30-Jul-2121 Wednesday 0.14754 0.95279 0.18124
Person1 31-Jul-2121 Thursday 0.2124 0.66636 0.77015
Person2 31-Jul-2121 Thursday 0.25473 0.83978 0.99591
Person3 31-Jul-2121 Thursday 0.4042 0.92685 0.026795
Person4 31-Jul-2121 Thursday 0.93929 0.72575 0.99809
Person1 01-Aug-2121 Friday 0.43363 0.46433 0.49833
Person2 01-Aug-2121 Friday 0.34608 0.56922 0.33262
Person3 01-Aug-2121 Friday 0.82401 0.52511 0.90661
Person4 01-Aug-2121 Friday 0.35544 0.080383 0.14538
Person1 02-Aug-2121 Saturday 0.87624 0.49025 0.90014
Person2 02-Aug-2121 Saturday 0.18738 0.50787 0.070031
Person3 02-Aug-2121 Saturday 0.23808 0.42686 0.18892
Person4 02-Aug-2121 Saturday 0.71777 0.13216 0.26415
Person1 03-Aug-2121 Sunday 0.44137 0.33786 0.49158
Person2 03-Aug-2121 Sunday 0.62675 0.96446 0.50531
Person3 03-Aug-2121 Sunday 0.28795 0.37948 0.38927
Person4 03-Aug-2121 Sunday 0.94446 0.40165 0.36561

4 Comments

This is incredibly helpful. I really appreciate the time it took you to help. Thank you
I get this error when I try to use your code?
Another issue I am experiencing, and it would explain the whole problem is that when I import the excel sheet, MATLAB sees a 616x91 array instead of a 624x91 array. I cannot understand why this is happening?
As I said, you have issues in your spreadsheet. I can only assume that you are now using something where "13" and "48" are not the right values.

Sign in to comment.

More Answers (0)

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!