Parsing ⅛ and ⅓ Characters from actxserver Outlook Mail object Body and Converting to Floats

Hi all
I am parsing Outlook mails in Matlab by actxserver and regexp.
Some mails contain fraction characters as below
The ½,¼,¾ characters are read ok, but the eighths (⅛,⅜,⅝,⅞) and thirds (⅓,⅔) are present in the body property of the mail object as "?" [char(63)] as per below screenshot from the command-line print of the mail body.
Matlab recognises only ¼ ½ ¾ [char(188:190)] so I guess I need to access non ASCII chars. Its not clear whether the issue is Matlab's 16bit unicode or the actxserver object. The characters are available on Windows Vista Arial font as U+215C,E etc
You can verify this for yourself by emailing yourself a mail with the subjectline
⅛¼⅓⅜½⅝⅔¾⅞
and then running the code below in matlab to regexp this subjectline of the mail in your inbox. Put a breakpoint at the regexp line to inspect what the subject variable looks like, should see "?" in there.
Two questions here:
1. How could I extend Matlab's ASCII set to read these characters
2. Is there a neat way to convert them into equivalent floats (3¼ -- > 3.25) within regexp ?
Grateful for any suggestions here
Mark
% Below function will need to be adapted depending on how your outlook folders are set up:
function myfrac = TestReadFractions
outlook = actxserver('Outlook.Application');
mapi = outlook.GetNamespace('mapi');
folder1 = mapi.Folders(1);
myaccount = folder1.Item(2);
inboxmails = myaccount.Folders.Item(2).Folders.Item(9).Items;
count = inboxmails.Count;
myfrac = {};
for i = count:-1:count-10
if strcmp(inboxmails.Item(i).SenderEmailAddress,'yourname@youraddress.com')
subject = inboxmails.Item(i).Subject; % Mail Subject-Line
myfrac = regexp(subject,'\x215c','match');
end
end

Answers (1)

regexprep('ABC','B','\x215c')

4 Comments

Hi Walter
There are no \x215c characters present in the mail body/subject to replace, only "?" chars.
I think that the solution must involve changing some property of the Mail Item Object or maybe the Outlook object itself.
I've been trying to reset the InternetCodePage property of the mail (a stab in the dark to be honest), but I don't seem to be able to change it from the commandline (its stuck on 65001). I'm not sure this is really even relevant tho.
inboxmails.Item(i).InternetCodepage = 1252;
inboxmails.Item(i).InternetCodepage
>> ans =
65001
Ah if it is code page 65001 then that is UTF-8 . You might have to take "subject" and pass it through native2unicode().
Could you show me the result of
subject + 0
? I do not have MATLAB installed on any MS Windows systems to test with.
Hi Walter,
emailing myself "⅛¼⅓⅜½⅝⅔¾⅞" and reading as per code above gives
K>> subject+0
ans =
63 188 63 63 189 63 63 190 63
Since all the question-marks have same 63 integer, I think that passing through nativetounicode will not work.
I can't change the pc locale as this will effect all other applications I think.
Do you know why changing the InternetCodepage property of the outlook mail, doesn't work? (i.e. as above if I set to anything other than 65001, it is still 65001 when I check in then). I guess the property is immutable, perhaps there is a way of setting it in the actxserver constructor? But even if I can do this, I don't know which InternetCodepage value would fix it
Mark
Sorry, I am not familiar with how InternetCodePage properties work.

Sign in to comment.

Asked:

on 31 Jan 2014

Commented:

on 5 Feb 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!