How to ignore punctuation in a user string while scanning for words (textscan())?

It works perfectly now. thanks to oleg and lucas for ur help! if ur interested, here's how it looks like in the end:
function pushbutton1_Callback(hObject, eventdata, handles)
words = get(handles.editbox, 'string'); %scans user input string from editbox
wavdirectory = 'C:\Program Files\MATLAB\R2010b\Recordings\';
wordsstring = regexp(words, '\w+', 'match') ; %reads string only, ignores punctuation
[j, k] = size(wordsstring); %stores number of words in user input string
for m = 1:k
thisfid = [wavdirectory wordsstring{m} '.wav'];
try
[y, fs] = wavread(thisfid);
sound(y, fs);
catch
fprintf(1,'Failed to process file wave "%s" because: ', thisfid);
lasterror
end
end

1 Comment

How do you get the user-input? http://www.mathworks.com/matlabcentral/answers/6200-tutorial-how-to-ask-a-question-on-answers-and-get-a-fast-answer

Sign in to comment.

 Accepted Answer

This removes the specified punctuation in your word:
regexprep(word, '[-!,.?]', '')

9 Comments

does regexpprep return the result in a cell array like textscan?
Yes it does.
Try:
>> word = {'this.-is..!a,test', 'it!!.works???-'};
>> C = regexprep(word, '[-!,.?]', '');
>> class(C)
ans =
cell
ok i used textscan after using the results of regexp so that i didn't have to alter the for statements. now it works but i encounter a problem when a malay word like 'kawan-kawan' is keyed in. the '-' is not read, and the returned word becomes kawankawan. this results in the wav file not playing as the wav file is kawan.wav. any ideas?
word = {'kwan-kwan'};
C = regexp(word, '[\w-]+', 'match')
hmm doesn't seem to work.. i've updated the code in the main question. it now successfully plays the .wav files even if the input string has punctuation, but in the case of 'kawan-kawan' it still does not read 2 separate 'kawan' but 1 'kawankawan' which causes wavread to not be able to read kawan.wav
Ah ok I thought you needed kwan-kwan! Then Use the version in my post directly, it strips the '-' and creates 2 words.
mm yeah i did use the code in ur post (updated the code in the question to reflect it). it still only returns 'kawankawan' as one word. i tried a standalone test program to confirm that it did. what do you think?
word = {'this.-is..!a,test', 'it!!.works???-kwan-kwan'};
C = regexp(word, '\w+', 'match')
C{2}
C =
{1x4 cell} {1x4 cell}
ans =
'it' 'works' 'kwan' 'kwan'
ah the array dimensions were slightly different. i modified the for loop so it works. thanks guys! updated code is above if ur interested

Sign in to comment.

More Answers (2)

Use regexp.
If you want more details provide some example inputs and required output.
From Lucas' example:
word = {'this.-is..!a,test', 'it!!.works???-'};
C = regexp(word, '\w+', 'match')
Ah, the poster broke this out in to a separate question, which I did not see before I answered in the original thread. My answer there was:
Before the textscan:
words = lower(words);
words(~ismember(words, ['a':'z' ' '])) = ' ';
then go ahead with the textscan
On second look, that could be shortened to
words = lower(words);
words(~ismember(words, 'a':'z')) = ' ';

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!