I know this is probably a novice question, but I am quite a Matlab novice. The while loop in my script begins to run ridiculously slow as the table "nonapattern" increases in size. Is it possible to increase the speed somehow? Thank you.

2 views (last 30 days)
counter=1;
searchsize=254;
patternsize=92378;
j=1;
i=1;
newlist = zeros(100,2);
while counter<patternsize
while i<searchsize
i
if isequal(pinellas{i,3},nonapattern{counter,1})
newlist(j,1)=pinellas(i,1);
newlist(j,2)=pinellas(i,2);
j=j+1;
end
i=i+1;
end
counter=counter+1;
i=1;
end
Pattern trajectory is the script which matches the patterns from "Data" with the list in "nonapattern". When "nonapattern" becomes large (e.g. around 90,000 x 2 element table) the script takes days to run. Thanks so much for any suggestions/help to make this run faster.

Accepted Answer

jonas
jonas on 29 Jul 2018
Edited: jonas on 31 Jul 2018
Looks like the size of the matrix is increasing by each entry. Read about preallocation and preallocation of matrices of unknown size .
Other than that, the original script loops through one cell array, nonapattern, and finds matching strings in a second cell array, data, including duplicates. Some data is then extracted from the matched rows of data. Faster code given below:
Load data
[~,nonapattern]=xlsread('nonapattern.xlsx');
[numdata,data]=xlsread('Data.xlsx');
Find pairs of identical strings in each cell arrays
[C,ia,ib] = intersect(nonapattern,data)
C =
3×1 cell array
{'SO5 SO6 SOA SOB SOC SOD SOE SOG SOO'}
{'SO5 SO6 SOA SOB SOD SOE SOG SOH SOO'}
{'SO5 SO6 SOD SOE SOF SOG SOK SOM SON'}
Next, find duplicates
index=cellfun(@(x)find(ismember(data,x)==1),C,'uniformoutput',false)
index =
1×3 cell array
{5×1 double} {4×1 double} {2×1 double}
Grab corresponding numerical data from numdata, columns 1 and 2
out=cellfun(@(x)numdata(x,1:2),index,'uniformoutput',false);
out =
3×1 cell array
{5×2 double}
{4×2 double}
{2×2 double}
  11 Comments
Mark Bodner
Mark Bodner on 4 Aug 2018
Solved the problem and the code works great for the data files for which you wrote it. Having a problem though with the indexing part of the code when applying it to other excel files with different dimensions. The first 3 lines work great, but the last two formatting lines have trouble when I try to adapt for files (i.e. "state.xlxs" 41351 x 26; "Pinellas.xlsx" 757107 x 17). It finds the intersections between state and Pinellas and puts them in "D" just fine. But then when I try to find the repeats, "indexP" seems fine except for the first cell which winds up as a 6315992 x 1 double. The last line of code (where I try to format everything grabbing columns 8 and 18) just fails because "Index in position 1 exceeds array bounds (must not exceed 41351)"--the size of the file "state". I guess I just don't understand how these last two formatting lines work. The code as I currently tried to adapt it is
[numdataP,patternsizeP]=xlsread('state.xlsx');
[~,dataP]=xlsread('pinellas.xlsx');
[D,ic,id] = intersect(patternsizeP,dataP)
indexP=cellfun(@(x)find(ismember(dataP,x)==1),D,'uniformoutput',false)
outP=cellfun(@(x)numdataP(x,8:18),indexP,'uniformoutput',false)
Could you shed any light on how this formatting works so that I can generalize the script. Thanks so much once again.
jonas
jonas on 4 Aug 2018
Edited: jonas on 4 Aug 2018
I am a bit confused because I don't understand the structure of your new data. Now you are working with 2D cell arrays, which is fine, but what are the dimensions of numdata? Feel free to upload the new data if you want me to take a look.
Anyway, so let's break the code down line by line, using my original notations.
[C,ia,ib] = intersect(nonapattern,data)
You said this works fine, but I suspect there is a problem with the input here. I would take a look at the content of C{1} to make sure it looks OK. The next line of code:
index=cellfun(@(x)find(ismember(data,x)==1),D,'uniformoutput',false)
goes over over each unique cell in D, cell by cell, and finds matches in data. the function ismember outputs a matrix with the same size as data, containing ones where you have matches and zeros otherwise. The find function then takes this matrix and outputs the linear indices of matches, i.e. the ones. It seems C{1} matches 6315992 times, which is not necessarily wrong, but makes me believe there is something sketchy going on with the content of that cell.
out=cellfun(@(x)numdata(x,1:2),index,'uniformoutput',false);
The problem is in this line of code, which only works if both C and data are single column cell arrays. The reason is that the previous line of code outputs linear indices, as opposed to subscripts.
What are linear indices? Assume you have a matrix:
A =
0 0
0 0
1 1
find(A==1)
3
6
The linear indices basically describe the position in the 2D-array if you stack each column on top of one another to a long 1D-array.
The next line
out=cellfun(@(x)numdata(x,1:2),index,'uniformoutput',false);
breaks down because we are using linear indices to refer to rows.
This can easily be fixed. In fact, the find column can output both linear indices and subscripts if you add two more outputs:
[linear,row,col]=find()
However, I don't understand the structure of your new numdata so I cannot write the new code for you.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!