Excel sheet data upload "Invalid 'DataRange'. The column size must match the number of variables".

30 views (last 30 days)
Error:
"Invalid 'DataRange'. The column size must match the number of variables".
I imported xlsx (excel) data into matlab then generated a script after which I copied into the beginning of a different script i wrote but I get the above error. Can you help me fix this? Why do I get this error?

Accepted Answer

Walter Roberson
Walter Roberson on 26 Jan 2023
opts.DataRange = "A2:AT78";
That requests that column "A" to "AT" be read in. However, M1_New_Macro1.xlsx only has data up to column "AS"
The headers in the xlsx file do not align with the content of the data. Observe that column A is titled "cut" and contains a file name, and column B is titled "Count" and contains integer numbers (plausibly counts of something.) Then look at column L "Slice" which is empty, then column M which named "Count" but contains file names. The Total Area in column C contains fractions but the Total Area in column N contains only integers (looking like a count)
It looks to me as if there are blank columns in the data but that the headers in row 1 do not take that into account.
When you look at the variable named in the code, it is not obvious to me that the variable names match up to the data in the file.
Starting in row 34 the file contains additional tables that are not in the same format as the data above that.
You should probably be using readcell() instead of readtable(), and extracting the data you need from the resulting cells.
  12 Comments
Walter Roberson
Walter Roberson on 29 Jan 2023
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/1274435/M1_New_Macro1.xlsx';
C = readcell(filename);
nmask = cellfun(@isnumeric, C);
spy(nmask)
L = bwlabel(nmask, 4); %4 connected
stats = regionprops(L, 'BoundingBox');
bb = ceil(vertcat(stats.BoundingBox)); %ceil is important here!
blobs = {};
bloblocs = {};
for row = 1 : size(bb,1)
sc = bb(row,1);
sr = bb(row,2);
ec = sc + bb(row,3) - 1;
er = sr + bb(row,4) - 1;
blobcols = sc:ec;
blob = cell2mat(C(sr:er, blobcols));
need4 = blob ~= round(blob,3);
col_needs_4 = any(need4, 1);
extract = blob(:,col_needs_4);
if ~isempty(extract)
blobs{end+1} = extract;
usedcols = blobcols(col_needs_4);
bloblocs{end+1} = [sr er usedcols];
end
end
bloblocs
bloblocs = 1×14 cell array
{[2 13 6 7 11]} {[19 30 6 7 11]} {[2 13 18 19 23]} {[19 30 18 19 23]} {[2 13 30 31 35]} {[19 30 30 31 35]} {[2 13 37 38 39 40 41]} {[19 30 38 39 40 41]} {[67 78 37 38 39 40 41]} {[2 13 44 45]} {[19 30 44 45]} {[35 46 44 45]} {[51 62 44 45]} {[67 78 44 45]}
The code is able to detect 14 different blocks of numeric data, and identify which columns within the blocks need 4 significant digits. The bloblocs output you see above, such as [2 13 6 7 11] means there is a block of numeric data starting from row 2 and ending at row 13, in which columns 6 7 and 11 contained 4+ digit numbers. In terms of the headings of your file those are density, norm/area, and ferret/minferret . You can see from [19 30 6 7 11] that rows 19 to 30 had a similar pattern of numeric values.
When you get to [2 13 37 38 39 40 41] that is again rows 2 to 13, columns AK to AO; the column headers are messed up by that point, but probably those are intended to be density columns.
Then you get the blobs such as [67 78 37 38 39 40 41] which is rows 67 to 78, AK to A0, and probably intended to be ferret/minferret values. You will also see that the end of those rows have blobs for Avg, std, N, where the std and N columns need at least 4 digits.
In each case, the contents of the columns that need 4 digits (only those columns) from each blob are stored into the cell array blobs -- blobs{K} is a numeric array with multiple columns, and bloblocks{K} tells you where that data was originally from inside the spreadsheet.
So we can now automatically identify where in the file there are blocks containing data that needs at least 4 digits, and we have extracted those sections into a cell array, and we have recorded the locations we took them from.
Now what? Should we assume that each blob will have exactly the same number of rows, and just horizontally concatenate them all together and do pca on that??
allblocks = horzcat(blobs{:});
size(allblocks)
ans = 1×2
12 42

Sign in to comment.

More Answers (0)

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!