how to search for multiple words anywhere in the sentence ?
    12 views (last 30 days)
  
       Show older comments
    
I want to search for three words "Battery , power , failure" the three must exist in the sentence in any order to copy the cell .
I try :
j=1;
k=1;
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');            
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ; 
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);  %save rows which didn't contain
but it search for any cell contains for one of the three.
how i can search for the cells contains the three words in any order?
Answers (3)
  the cyclist
      
      
 on 19 Sep 2015
        The most straightforward way, it seems to me, is to do the regexp search three times, once for each word, and then copy the cells where all three match. I am not sure there is a way to do an "and" match in the same way one can do an "or" match like you have done.
2 Comments
  per isakson
      
      
 on 19 Sep 2015
        
      Edited: per isakson
      
      
 on 20 Sep 2015
  
      Try this
sentence_1  = 'abc battery def power ghi failure';
typo_str_1  = 'abc battery def power ghi faiXure';
sentence_2  = 'Battery def power ghi failure.';
typo_str_2  = 'abc Xbattery def power ghi failure';
words    = {'battery','power','failure'};
is1 = cellfun( @(str) not(isempty(regexpi( sentence_1, ['\<',str,'\>'] ))), words );
is2 = cellfun( @(str) not(isempty(regexpi( typo_str_1, ['\<',str,'\>'] ))), words );
is3 = cellfun( @(str) not(isempty(regexpi( sentence_2, ['\<',str,'\>'] ))), words );
is4 = cellfun( @(str) not(isempty(regexpi( typo_str_2, ['\<',str,'\>'] ))), words );
 
A different approach
>> cssm(1)
Elapsed time is 0.001078 seconds.
ans =
   1     0     0     1     0     0
>> cssm(1e3);
Elapsed time is 0.791887 seconds.
where
function   has_all_three = cssm( N )
    sentence_1  = 'Abc battery def power ghi failure.';
    typo_str_1  = 'Abc battery def power ghi faiXure.';
    multistr_1  = 'Abc battery def power ghi battery.';
    sentence_2  = 'Battery def failure ghi power jkl.';
    typo_str_2  = 'Abc Xbattery def power ghi failure';
    multistr_2  = 'Abc power def power ghi power jkl.';
%    
    test_sentences = {sentence_1,typo_str_1,multistr_1,sentence_2,typo_str_2,multistr_2};
%    
    text_corp = repmat( test_sentences, [N,1] );
tic
    cac = regexpi( text_corp, ['\<(battery)|(power)|(failure)\>'], 'match' );
    has_all_three = cellfun( @(c) length(unique(lower(c)))==3, cac );
toc
end
12 Comments
  Amr Hashem
      
 on 20 Sep 2015
        1 Comment
  Cedric
      
      
 on 22 Sep 2015
				This can be simplified as developed in my answer. I move it below as a comment:
Here is an alternate solution:
 keywords = {'battery', 'power', 'failure'} ;
 allCells = {'V_batterypowerfailure', 'I_batterypwerfailure'; ...
             'V_batterypowerfailure', 'I_atterypowerfailure'; ...
             'I_batterypowerfailre',  'V_batterypowerfailure'} ;
 ids = 1 : numel( allCells ) ;
 for k = 1 : numel( keywords )
    isFound = ~cellfun( 'isempty', strfind( allCells(ids), keywords{k} )) ;
    ids = ids(isFound) ;
 end
 validCells = allCells(ids) ;
You'll notice that it works on a pool of cells which reduces with the keyword index (as when a keyword is not found, there is no point in testing the others). I started valid entries of the dummy data set with V_ and invalid entries with I_ to simplify the final check.
If you need a case-insensitive solution, replace
 strfind( allCells(ids), keywords{k} )
with
 regexpi( allCells(ids), keywords{k}, 'once' )
See Also
Categories
				Find more on Characters and Strings in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



