Error with splitting table into separate columns.

Question

0 votes

I am working with a 616x91 table that I am reorganising using:

x = table1(1:48:end,:);
y = x(:)
a = table1(2:48:end,:);
b = a(:)

and so on.... until

c = table1(48:48:end,:);
d = x(:)

Each data point is repeated (not an exact repition of data, just the data point) every 48 rows, hence the 48.

My issue is that each output variable (y,b,d etc.) variable should be a 1183x1 column however, the last 8 output variables output a 1092x1 column. I.e. the last 8 variables are:

m = table1(41:48:end,:);                                q = table1(48:48:end,:);   
n = m(:)                                    ......to.....       p = q(:) 

Can anyone explain why this is happening?

12 Comments
Show 10 older comments Hide 10 older comments

ANDREW Feenan on 29 Jul 2021

Open in MATLAB Online

Thanks for your response.

I am only using MATLAB to rearrange the data in a way that each variable is comparable for each person for each day. I will be using Excel to display the data.

Currently, the data is in an Excel sheet, with the top row (from C1 to CP1) consisting of dates, so 91 in total. Column A is the person's name but as there are 48 data points for each person, each person has their name entered 48 times. There are 13 people in total so 624 rows. Take Person 1 for example, 'Person 1' is entered in A2 to A49. Then column B is the name of each data point, let's call them Variable1......Variable48, so column B consists of the name of each data point entered in B2 to B49.

'Person 2' is entered in A50 to A97 with the data point names used in B2 to B49 repeated in B50 to B97.

I am trying to move to data so that column A will consist of the date, column B will hold the person's name and row 1 will hold the names of the data points, with the data in the rows underneath.

For example, for the first day of data (20/03/2021), this date will be entered 13 times in column A (A2 to A14), in column B the names of each person will be entered (B2 to B14) i.e. Person 1 in B2, Person 2 in B3.......Person13 in B14. Then the data that each person reported for each data point will be in the next column.

This is to be repeated for each of the 91 days. So in the end I should have an 1183x48 table of data (excluding the names, dates column, and data point names row)

The code have been using is shown below. I understand that this is probably not a good way to approach this but it has worked for a sheet with only 9 data points per person, it will not work for me now.

I would really appriecate any help as I am totally lost as to why this won't work now.

x1 = table1(1:48:end,:)
y1 = x1(:)
x1 = table1(2:48:end,:)
y2 = x2(:)
x3 = table1(3:48:end,:)
y3 = x3(:)
x4 = table1(4:48:end,:)
y4 = x4(:)
%continue all the way to......
x48 = table1(48:48:end,:)
y48 = x48(:)
combined = [y1 y2 y3 ... y48]

ANDREW Feenan on 29 Jul 2021

This is the problem. y2 should be the same size as y1.

ANDREW Feenan on 29 Jul 2021

Example.xlsx

I have attached the file, the first sheet is the data now. The seconds sheet is how I need the data do be displayed.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Peter Perkins on 29 Jul 2021

Open in MATLAB Online

0 votes

This is an application of unstack and then stack. The code looks a lot more complicated than it might if I hard-coded in sizes, as you might. It's really just a call to unstack, a call to stack, and some bookkeeping. Also, you've used the word "Variable", which is gonna get confused with how that word is used in the tabular lexicon, so I'm using "Measurement".

First, your file is somewhat broken. Easy enough to read, but the Person column is not complete. And there are mostly NaNs in the data. Here's what I did:

t1 = readtable("Example.xlsx", ...
    "TextType","string", "ReadVariableNames",true, "PreserveVariableNames",true)
people = "Person " + (1:13)';
t1.("Person (First & Surname)") = repelem(people,48);
t1{:,3:end} = rand(height(t1),91);

But let me illustrate with something smaller.

>> people = ["Person1";"Person2";"Person3";"Person4"];
>> meas = ["Measurement1";"Measurement2";"Measurement3"];
>> dates = datetime(2121,7,29:34);
>> X = rand(length(people)*length(meas),length(dates));
>> t0 = table(repelem(people,length(meas),1), repmat(meas,length(people),1));
>> t1 = [t0 array2table(X)];
>> t1.Properties.VariableNames = ["Person" "Measurement" string(dates)]
t1 =
  12×8 table
     Person       Measurement      29-Jul-2121    30-Jul-2121    31-Jul-2121    01-Aug-2121    02-Aug-2121    03-Aug-2121
    _________    ______________    ___________    ___________    ___________    ___________    ___________    ___________
    "Person1"    "Measurement1"      0.86958        0.56376        0.96026        0.42636        0.75018        0.82838  
    "Person1"    "Measurement2"      0.30749        0.26297        0.43406        0.53727        0.86967        0.39818  
    "Person1"    "Measurement3"      0.30016        0.16764        0.36816        0.09737        0.15583          0.407  
    "Person2"    "Measurement1"      0.26145        0.57806        0.70349       0.013457        0.86016        0.81714  
    "Person2"    "Measurement2"     0.093242        0.33359        0.74725       0.077764          0.444        0.73318  
    "Person2"    "Measurement3"      0.40713        0.54292        0.72929        0.42199        0.32273        0.12627  
    "Person3"    "Measurement1"      0.58666        0.31563        0.25499        0.62665        0.22856        0.21707  
    "Person3"    "Measurement2"      0.54294        0.36551        0.48211        0.11886        0.34332       0.077295  
    "Person3"    "Measurement3"      0.55193        0.34819        0.58987        0.13769        0.81084        0.20005  
    "Person4"    "Measurement1"     0.087758        0.14986       0.076465        0.47683        0.36707        0.25788  
    "Person4"    "Measurement2"      0.13081        0.91179        0.58878        0.10389        0.53224        0.44917  
    "Person4"    "Measurement3"      0.99408        0.81306        0.60691        0.14631        0.43607        0.72003  

Person and Measurement are categorical data, they should be stored in categorical variables.

>> t1.("Person") = categorical(t1.("Person"));
>> t1.Measurement = categorical(t1.Measurement);

OK, this has data separately for each date, and data for each measurement stacked up within each date. You want the opposite. So unstack on Variable:

>> meas = unique(t1.Measurement,"stable");
>> nmeas = length(meas);
>> dates = datetime(t1.Properties.VariableNames(3:end));
>> ndates = length(dates);
>> nvarsUnstacked = nmeas * length(dates);
>> t2 = unstack(t1,2+(1:ndates),"Measurement")
t2 =
  4×19 table
    Person     29-Jul-2121_Measurement1    29-Jul-2121_Measurement2    29-Jul-2121_Measurement3    30-Jul-2121_Measurement1    30-Jul-2121_Measurement2    30-Jul-2121_Measurement3    31-Jul-2121_Measurement1    31-Jul-2121_Measurement2    31-Jul-2121_Measurement3    01-Aug-2121_Measurement1    01-Aug-2121_Measurement2    01-Aug-2121_Measurement3    02-Aug-2121_Measurement1    02-Aug-2121_Measurement2    02-Aug-2121_Measurement3    03-Aug-2121_Measurement1    03-Aug-2121_Measurement2    03-Aug-2121_Measurement3
    _______    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________    ________________________
    Person1             0.5417                     0.10085                     0.43765                     0.25821                      0.2993                      0.2469                      0.2124                     0.66636                      0.77015                    0.43363                      0.46433                    0.49833                     0.87624                     0.49025                      0.90014                    0.44137                     0.33786                     0.49158         
    Person2            0.90169                     0.47628                     0.84116                     0.43142                     0.77569                     0.59875                     0.25473                     0.83978                      0.99591                    0.34608                      0.56922                    0.33262                     0.18738                     0.50787                     0.070031                    0.62675                     0.96446                     0.50531         
    Person3            0.93431                     0.57748                     0.11194                     0.51276                     0.33596                     0.25611                      0.4042                     0.92685                     0.026795                    0.82401                      0.52511                    0.90661                     0.23808                     0.42686                      0.18892                    0.28795                     0.37948                     0.38927         
    Person4            0.75668                     0.79903                     0.49847                     0.14754                     0.95279                     0.18124                     0.93929                     0.72575                      0.99809                    0.35544                     0.080383                    0.14538                     0.71777                     0.13216                      0.26415                    0.94446                     0.40165                     0.36561         

Now we just need to restack, this time dates within measurements. To do that, we'll need to group the variables in the table by the measurements they correspond to. There are other less arcane ways to do this, but this way is short.

>> varGroups = arrayfun(@(i) {1 + (i:nmeas:nvarsUnstacked)},1:nmeas)
varGroups =
  1×3 cell array
    {[2 5 8 11 14 17]}    {[3 6 9 12 15 18]}    {[4 7 10 13 16 19]}

Now we stack those groups of table variables.

>> t3 = stack(t2,varGroups,"NewDataVariableName",string(meas),"IndexVariableName","Date")
t3 =
  24×5 table
    Person     Date    Measurement1    Measurement2    Measurement3
    _______    ____    ____________    ____________    ____________
    Person1      2        0.5417          0.10085         0.43765  
    Person1      5       0.25821           0.2993          0.2469  
    Person1      8        0.2124          0.66636         0.77015  
    Person1     11       0.43363          0.46433         0.49833  
    Person1     14       0.87624          0.49025         0.90014  
    Person1     17       0.44137          0.33786         0.49158  
    Person2      2       0.90169          0.47628         0.84116  
    Person2      5       0.43142          0.77569         0.59875  
    Person2      8       0.25473          0.83978         0.99591  
    Person2     11       0.34608          0.56922         0.33262  
    Person2     14       0.18738          0.50787        0.070031  
    Person2     17       0.62675          0.96446         0.50531  
    Person3      2       0.93431          0.57748         0.11194  
    Person3      5       0.51276          0.33596         0.25611  
    Person3      8        0.4042          0.92685        0.026795  
    Person3     11       0.82401          0.52511         0.90661  
    Person3     14       0.23808          0.42686         0.18892  
    Person3     17       0.28795          0.37948         0.38927  
    Person4      2       0.75668          0.79903         0.49847  
    Person4      5       0.14754          0.95279         0.18124  
    Person4      8       0.93929          0.72575         0.99809  
    Person4     11       0.35544         0.080383         0.14538  
    Person4     14       0.71777          0.13216         0.26415  
    Person4     17       0.94446          0.40165         0.36561  

Where's the dates go? They are there, or at least their var indices into t2 are. Fix that up, then add the weekday name, and sort on date.

>> dateInds = ceil(t3.Date/nmeas);
t3.Date = dates(dateInds)';
t3 = sortrows(t3,"Date");
t3 = addvars(t3,categorical(day(t3.Date,"name")),'After',"Date",'NewVariableName',"Weekday")
t3 =
  24×6 table
    Person        Date         Weekday     Measurement1    Measurement2    Measurement3
    _______    ___________    _________    ____________    ____________    ____________
    Person1    29-Jul-2121    Tuesday         0.5417          0.10085         0.43765  
    Person2    29-Jul-2121    Tuesday        0.90169          0.47628         0.84116  
    Person3    29-Jul-2121    Tuesday        0.93431          0.57748         0.11194  
    Person4    29-Jul-2121    Tuesday        0.75668          0.79903         0.49847  
    Person1    30-Jul-2121    Wednesday      0.25821           0.2993          0.2469  
    Person2    30-Jul-2121    Wednesday      0.43142          0.77569         0.59875  
    Person3    30-Jul-2121    Wednesday      0.51276          0.33596         0.25611  
    Person4    30-Jul-2121    Wednesday      0.14754          0.95279         0.18124  
    Person1    31-Jul-2121    Thursday        0.2124          0.66636         0.77015  
    Person2    31-Jul-2121    Thursday       0.25473          0.83978         0.99591  
    Person3    31-Jul-2121    Thursday        0.4042          0.92685        0.026795  
    Person4    31-Jul-2121    Thursday       0.93929          0.72575         0.99809  
    Person1    01-Aug-2121    Friday         0.43363          0.46433         0.49833  
    Person2    01-Aug-2121    Friday         0.34608          0.56922         0.33262  
    Person3    01-Aug-2121    Friday         0.82401          0.52511         0.90661  
    Person4    01-Aug-2121    Friday         0.35544         0.080383         0.14538  
    Person1    02-Aug-2121    Saturday       0.87624          0.49025         0.90014  
    Person2    02-Aug-2121    Saturday       0.18738          0.50787        0.070031  
    Person3    02-Aug-2121    Saturday       0.23808          0.42686         0.18892  
    Person4    02-Aug-2121    Saturday       0.71777          0.13216         0.26415  
    Person1    03-Aug-2121    Sunday         0.44137          0.33786         0.49158  
    Person2    03-Aug-2121    Sunday         0.62675          0.96446         0.50531  
    Person3    03-Aug-2121    Sunday         0.28795          0.37948         0.38927  
    Person4    03-Aug-2121    Sunday         0.94446          0.40165         0.36561  

4 Comments
Show 2 older comments Hide 2 older comments

ANDREW Feenan on 29 Jul 2021

Another issue I am experiencing, and it would explain the whole problem is that when I import the excel sheet, MATLAB sees a 616x91 array instead of a 624x91 array. I cannot understand why this is happening?

Peter Perkins on 30 Jul 2021

As I said, you have issues in your spreadsheet. I can only assume that you are now using something where "13" and "48" are not the right values.

Sign in to comment.

Error with splitting table into separate columns.

12 Comments
Show 10 older comments Hide 10 older comments

Accepted Answer

4 Comments
Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Error with splitting table into separate columns.

12 Comments Show 10 older comments Hide 10 older comments

Accepted Answer

4 Comments Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

12 Comments
Show 10 older comments Hide 10 older comments

4 Comments
Show 2 older comments Hide 2 older comments