Convert from cell to string

I have the following problem. I have a sentence as a string. I wan to process some words in it. My idea was to isolate the words using textscan and then unite the resulting cell array of strings to one string to gain the modified sentence. However I can not do the last step. I would like to have converted the cell array to a matrix first so that the matrix was converted to a string, but I could not perform it because the elements of the cell array contains strings of different length. What can I do?
Example:
sentence = 'I am a MATLAB user.';
extracted = textscan(sentence, '%s');
extracted = extracted{1};
for k = 1:numel(extracted)
extracted{k}(1) = 'a';
end
% Now I should rearrange it to form a single string (the same format as the string "sentence").

 Accepted Answer

Ahmet Cecen
Ahmet Cecen on 2 Nov 2014
Edited: Ahmet Cecen on 2 Nov 2014
Do you specifically need it to be a cell array at the end? Why not just use a recursive string concatenation? Something like:
newsentence='';
for i=1:numel(extracted)
newsentence=strcat(newsentence,' ',extracted{i});
end
% '' in the middle is space between words.

1 Comment

Thank you for your idea. The reason why I accepted your answer can be found in my comment on Image Analyst's answer. However I must note that the strcat function omits white spaces even if I use a white space delimiter between the words. I solved it by using horzcat instead of strcat.

Sign in to comment.

More Answers (2)

The strsplit function is your friend here:
sentence = 'I am a MATLAB user.';
strvct = strsplit(sentence, '.','CollapseDelimiters',0);
newsent = sprintf('%s',char(strvct)')
produces:
newsent =
I am a MATLAB user

6 Comments

Unfortunately I can not use strsplit. It is unknown in R2011a.
Implementing strsplit and strjoin is fairly trivial. In any case, the following regexp will do just the same:
split = regexp(sentence, ' ', 'split');
Or to split at, and collapse all whitespace delimiters:
split = regexp(sentence, '\s+', 'split');
I an not accustomed to using regular expressions. Why is it a better solution here than textscan?
In this particular case, splitting at whitespaces, a regex is not any better than textscan. It was just a response to you saying you don't have strsplit.
However, to extract or replace arbitrary content in a string, a regex is infinitely more flexible than textscan. In fact, it's possible that you could do your split/process/rejoin as a single regexprep command.
If, as in your example, all you do is replace the first letter of each word with 'a', it's simply:
newsentence = regexprep(sentence, '\<.', 'a');
I assume your process step is more complex though, but if you give more details about it, it's possible we can come up with a regular expression for it.
In fact I wanted to solve this problem at Cody. I managed to solve it (see my attached file), but I think there are easier methods. Perhaps you could offer a more elegant way.
As you saw in your code, with textscan, you only split at blank spaces, so punctuation gets included in your word. So to start with, a regular expression would allow you to detect proper word boundaries. The regular expression '[A-Za-z]+' would detect a sequence of one or more letters and letters only. Thus to get the start and end of each word, you could use:
[startword, endword] = regexp(sentence, '[A-Za-z]+');
and use the indices to swap the letters of the word.
However, there is also regexprep which allows you to use regular expressions to detect and replace part of a string and since you can include matlab commands in the replacement string, you can do it all with a single regexprep.
The first thing you want to do is detect and capture the first, the inside, and the last letters of a word. The inside letters, you want to swap with fliplr. The captures are done with brackets (), so the regular expression is '([A-Za-z])([A-Za-z]+)([A-Za-z])', that is one letter, one or more letters, one letter. In your replacement you want the first and third capture intact and the 2nd flipped, so your replacement string is '$1${fliplr($2)}$3', leading to:
regexprep(sIn, '([a-zA-Z])([a-zA-Z]+)([a-zA-Z])', '$1${fliplr($2)}$3');
One of the winning solution for that cody problem. Instead of using captures you could use look-ahead and look-behind for the first and last letters but the principle is the same. It's just the syntax of the regex and replacement string that changes.

Sign in to comment.

sentence = 'I am a MATLAB user.';
theSeparateWords = allwords(sentence)
newSentence = sprintf('%s ', theSeparateWords{:})
In the command window:
theSeparateWords =
'I' 'am' 'a' 'MATLAB' 'user'
newSentence =
I am a MATLAB user
It's a cell array of course because the strings are of different lengths.

1 Comment

Since I have to process all words in the sentence I already have a for loop. Within that loop, I can use recursive string concatenation as Ahmet Cecen described it. First I thought it would save me time, but when I tested it on a 90000 long string, it turned out that his and your solution takes almost the same time to run. So the only reason why I accepted his method is that it does only need built-in functions. Thank you anyway.

Sign in to comment.

Categories

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!