Sort cell strings according to specific subsets of those cell strings

Let's say I have a cell string with values:
filename = {'2009.272.17.57.23.8445.AZ.SMER..BHE.R.SAC';...
'2009.272.17.57.24.5500.AZ.FRD..BHN.R.SAC';...
'2009.272.17.57.27.5445.AZ.SMER..BHN.R.SAC';...
'2009.272.17.57.27.8000.AZ.SND..BHZ.R.SAC';...
'2009.272.17.57.27.9445.AZ.BZN..BHE.R.SAC';...
'2009.272.17.57.28.7000.AZ.SND..BHN.R.SAC';...
'2009.272.17.57.29.1250.AZ.FRD..BHZ.R.SAC';...
'2009.272.17.57.29.2250.AZ.PFO..BHE.R.SAC';...
'2009.272.17.57.29.3695.AZ.SMER..BHZ.R.SAC';...
'2009.272.17.57.29.9445.AZ.BZN..BHN.R.SAC';...
'2009.272.17.57.30.0000.AZ.RDM..BHN.R.SAC';...
'2009.272.17.57.30.8000.AZ.RDM..BHZ.R.SAC';...
'2009.272.17.57.31.8250.AZ.LVA2..BHZ.R.SAC';...
'2009.272.17.57.31.8500.AZ.LVA2..BHE.R.SAC';...
'2009.272.17.57.31.9195.AZ.BZN..BHZ.R.SAC';...
'2009.272.17.57.32.0000.AZ.WMC..BHZ.R.SAC';...
'2009.272.17.57.32.6750.AZ.WMC..BHN.R.SAC';...
'2009.272.17.57.33.3195.AZ.KNW..BHZ.R.SAC';...
'2009.272.17.57.33.4750.AZ.TRO..BHN.R.SAC';...
'2009.272.17.57.33.7750.AZ.PFO..BHN.R.SAC';...
'2009.272.17.57.33.9000.AZ.PFO..BHZ.R.SAC';...
'2009.272.17.57.34.1750.AZ.LVA2..BHN.R.SAC';...
'2009.272.17.57.34.8000.AZ.TRO..BHZ.R.SAC';...
'2009.272.17.57.35.0000.AZ.WMC..BHE.R.SAC';...
'2009.272.17.57.35.0750.AZ.RDM..BHE.R.SAC';...
'2009.272.17.57.35.8945.AZ.KNW..BHE.R.SAC';...
'2009.272.17.57.36.0250.AZ.FRD..BHE.R.SAC';...
'2009.272.17.57.36.2250.AZ.CRY..BHZ.R.SAC';...
'2009.272.17.57.36.3500.AZ.CRY..BHN.R.SAC';...
'2009.272.17.57.36.4500.AZ.SND..BHE.R.SAC';...
'2009.272.17.57.36.5000.AZ.TRO..BHE.R.SAC';...
'2009.272.17.57.36.5195.AZ.KNW..BHN.R.SAC';...
'2009.272.17.57.36.5750.AZ.CRY..BHE.R.SAC'};
I want to be able to assume that I do not know what character the station name (e.g., CRY) or component name (e.g., BHE) starts and ends on. Though, the number of periods (".") will be consistent.
I have something fairly clunky to do this, but I am wondering if anyone can suggest a quick one/two-liner that would assume a string format of the general form:
YYYY.DDD.HH.MM.SS.ssss.$1.$2..$3.R.SAC
where:
$1 = Array name $2 = Station name $3 = Component name
And then sort the list with the primary and secondary sort order according to $2 and $3, respectively, so that the first 6 rows in the cell string would be:
2009.272.17.57.27.9445.AZ.BZN..BHE.R.SAC
2009.272.17.57.29.9445.AZ.BZN..BHN.R.SAC
2009.272.17.57.31.9195.AZ.BZN..BHZ.R.SAC
2009.272.17.57.36.5750.AZ.CRY..BHE.R.SAC
2009.272.17.57.36.3500.AZ.CRY..BHN.R.SAC
2009.272.17.57.36.2250.AZ.CRY..BHZ.R.SAC
...

4 Comments

In the example you show, the number of characters for each component is exactly the same for each line. Is that a rule for your situation? If so then the sort becomes quite simple.
Yes, they are the same. I would definitely like to see the simple approach, but I would also like to know the more general approach for future applications if say, in my example, $2 and $3 were to be different lengths inside the specified periods.
It looks like the parts do *not* have the same length:
'2009.272.17.57.33.9000.AZ.PFO..BHZ.R.SAC'
'2009.272.17.57.34.1750.AZ.LVA2..BHN.R.SAC'
Oh, his question was related to the "component" name, which are all the same number of characters (i.e., 3). The "station" names are not the same - they range from 3 to 4 characters.

Sign in to comment.

 Accepted Answer

% Split using |'.'| as the delimiter
splt = regexpi(filename,'\.','split');
% Sort according to the 8th and 10th column
[sorted,idx] = sortrows(cat(1,splt{:}),[8,10])
Now you can use the sorted split array or apply idx to filename

More Answers (2)

filename = {'2009.272.17.57.23.8445.AZ.SMER..BHE.R.SAC';...
'2009.272.17.57.24.5500.AZ.FRD..BHN.R.SAC';...
'2009.272.17.57.27.5445.AZ.SMER..BHN.R.SAC';...
'2009.272.17.57.27.8000.AZ.SND..BHZ.R.SAC';...
'2009.272.17.57.27.9445.AZ.BZN..BHE.R.SAC';...
'2009.272.17.57.28.7000.AZ.SND..BHN.R.SAC';...
'2009.272.17.57.29.1250.AZ.FRD..BHZ.R.SAC';...
'2009.272.17.57.29.2250.AZ.PFO..BHE.R.SAC'};
n = numel(filename);
C2 = cell(1, n);
C3 = cell(1, n);
for iC = 1:n
D = textscan(filename{iC}(27:end), '%s', 'Delimiter', '.');
C2{iC} = D{1}{1};
C3{iC} = D{1}{3};
end
% A kind of SORTROWS:
[dummy, ind3] = sort(C3);
[dummy, ind2] = sort(C2(ind3));
index = ind3(ind2);
filename = filename(index);

3 Comments

Thanks, Jan! In the end, I was looking for something more general in the event that the station name and component name were not the same number of characters. I also was looking for something that could mix and match the primary and secondary sort ordering... like in Oleg's response.
If I want the primary and secondary sort order to be the station name and component name (as in the example I gave) then I would use his "idx" from "sortrows(cat(1,splt{:}),[8,10])"... but if I wanted to flip the sort order, I could do this easily be using the "idx" created from "sortrows(cat(1,splt{:}),[10,8])" where the 10 and the 8 are now flipped.
Thanks for the updated code... +1!
While Oleg's REGEXP is much nicer than calling TEXTSCAN in a loop, SORTROWS does exactly the same as my sorting method, but with a lot of overhead.

Sign in to comment.

Here is the clunky version I have been using:
numFiles = numel(filename);
sortcell = {''};
sortind = zeros(numFiles,4);
for i = 1 : numFiles
sortind(i,2)=strfind(filename{i},'..')-1;
for j = sortind(i,2):-1:1
if isequal(filename{i}(j),'.')
break;
end
sortind(i,1)=j;
end
sortind(i,3)=sortind(i,2)+3;
for j = sortind(i,3):length(filename{i})
if isequal(filename{i}(j),'.')
break;
end
sortind(i,4)=j;
end
sortcell(i,1)=cellstr(filename{i}(sortind(i,1):sortind(i,2)));
sortcell(i,2)=cellstr(filename{i}(sortind(i,3):sortind(i,4)));
end
[tempcell,tempind1]=sort(sortcell(:,2));
[tempcell,tempind2]=sort(sortcell(tempind1,1));
filename = filename(tempind1(tempind2));

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!