How does one create an array of strings in a loop? In a better way.

So I have an array of characters, separated by whitespaces, and obviously matlab recognizes whitespaces as a character too. I wanted to separate the words into strings and put them into one array so that calling a certain index would refer to the entire word. This was my code to do so.
if true
% code
end
File1 = fopen('History2.txt');
words = fscanf(File1 , '%c');
num_words = 1;
i = 1;
new_words = char(100,1);
while(1)
if words(i) ==' '
num_words = num_words+1;
else
new_words(num_words, i) = words(i);
end
if words(i+1)=='9'
num_words = num_words - 2;
break;
end
i=i+1;
end
if true
% code
end
Now, this does the job as in I can refer to the word by saying new_words(1, :). But, the way it displays is it as such :
OUTPUT: Hello, history tells us that history ... ... and so on. Moreover, when I want to compare two of the strings, using strfind like this:
if ~(isempty (strfind ( new_words(2,:) , new_words(6,:) ) ) )
disp('yoyo');
else
disp('nono');
end
It always displays 'nono'. What is a better way to accomplish this task so that I can perform the comparison between the strings to find the unique words in the paragraph?
Maybe using cell arrays? But how would I do that, given the words?
I even tried using strsplit, but this wouldn't work on the array words.

9 Comments

Nevermind. I just copied the source code for strsplit and used it. I have MATLAB 2012a so strsplit wasn't on the MATLAB path.
I ran into this as I went from 2012b to 2013a... When I tried copying the source code to make it work in 2012b I ran into the issue of "isString" is not a function?? Did you get this error? If so, what was your fix?
Using a Mac 64bit configuration.
Thanks
ISSTRING is not a regular MATLAB function. Either it is present in a folder from your file system which is not in MATLAB path (which should therefore be added using ADDPATH), or you are mistaken on the function name and you want to use ISCHAR instead.
I am still learning regular expressions but this line seems to work with my needs for older versions of Matlab. The key is I am simply using the default strsplit call...
sampleLine = regexp(string, '\s+', 'split')
Your REGEXP expression matches series of 1 or more white spaces and splits string at these locations. Some implementations of string splitting functions will trim the string on both sides before making the split, but not your REGEXP expression.
Early results suggests that this is faster... Replacing all of the strsplit with regexp(....) by ~30%. I am using this call a bunch for this example though.
STRSPLIT is not a MATLAB regular function either, so you should be able to read its code, wherever it is stored. I'd say that a good implementation of STRSPLIT should be faster than REGEXP, because building the regexp engine/parser is a bit time consuming. You should have roughly the same difference than between STRREP and REGEXPREP.
Cedric
If you type edit strsplit in 2013a/b check line 80 and 83. It looks like this:
if ~isString(str)
error(message('MATLAB:strsplit:InvalidStringType'));
end
if isString(aDelim)
aDelim = {aDelim};
elseif ~isCellString(aDelim)
error(message('MATLAB:strsplit:InvalidDelimiterType'));
end
This is straight from the Matlab toolbox... (Matlab/toolbox/Matlab/strfun) So I am not sure what you are talking about with respect to isString... Although I agree that if I type edit isString I get an error that it is not on any path. So, you got me why this function works. I chalk it up to Matlab Magic.
Line 119 and 120 calls regexp... Not sure why you think that strsplit would be faster since it calls regexp... This line was also how I fell into the regexp route.
% Split.
[c, matches] = regexp(str, aDelim, 'split', 'match');
Ah! My mistake, 2013a/b have STRSPLIT but not the 2012b on my laptop. Yet, ISTRING doesn't seem to be a built-in. Isn't it an internal function in STRSPLIT? I can't check now but I could check on Monday.
STRFUN is not a regular\base MALTAB toolbox (available separately here).
Your last point about STRSPLIT calling REGEXP is why I wrote "good implementation" in my previous comment. Well, "good" is a question of point of view, but what I had in mind was that an easy implementation is to write STRSPLIT as a wrapper for REGEXP (which is what you discovered), and that a "good" implementation would be more specific and efficient. I mentioned the difference in efficiency between STRREP and REGEXPREP to illustrate this point: an easy implementation of STRREP would be a wrapper for REGEXPREP, but the MATLAB implementation of STRREP if more specific and efficient:
>> testStr = repmat('AB ', 1, 1e5) ;
>> tic ; strrep(testStr, 'A', 'CC') ; toc
Elapsed time is 0.002794 seconds.
>> tic ; regexprep(testStr, 'A', 'CC') ; toc
Elapsed time is 4.176688 seconds.

Sign in to comment.

Answers (1)

Once you have identified your word, you can simply put it in an element of a cell array thus:
cell{index} = word;
unique(cell) will then give you all of the unique strings.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Asked:

on 7 Jun 2013

Commented:

on 19 Oct 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!