utf-16 encoding in writestruct /readstruct or other xml2struct functions

10 views (last 30 days)
I have a large xml file that is used to store information and settings for variables used in another program. It holds the variable name, units, description, display name. That kind of thing.
Due to some weird error with another program, someone tried to import this variable set and instead of overwriting the existing variables on that system it merged them. So I now have this xml file with over 400 duplicate entries and over 2000 entries in total.
I've already sorted a way of finding and removing these but the problem has come with encoding. The original xml file is in UTF-16 and it needs this presumably because of the japanese characters that are used for the variable descriptions etc.
Presumably then I need to have matlab not convert to utf-8 on reading and also save as utf-16 on saving. Is this possible?
I've been using community functions xml2struct and struct2xml but I see there are also native matlab options of readstruct and writestruct. But its not clear if they are capable of doing UTF-16 or whether its a selectable option?
  3 Comments
Alex Mason
Alex Mason on 22 Aug 2024
@Walter Roberson Do I use this unicode conversion before I am exporting using struct2xml? or doing it once the new xml is made?
Walter Roberson
Walter Roberson on 22 Aug 2024
You would struct2xml() returning the generated xml, which would be generated with utf-8. You would "fix up" the header that says encoding utf-8 to say utf-16 instead. You would uint8() that to convert from characters to bytes, and you would native2unicode() the bytes to convert into unicode code points. You would then unicode2native() that char stream asking for UTF-16, generating a byte stream. You would fwrite() the byte stream.

Sign in to comment.

Answers (1)

Harsh
Harsh on 26 Aug 2024
Hi,
Based on my understanding, you've been utilizing community functions to handle the reading and writing of UTF-16 XML files. Now, you're seeking a MATLAB-native solution to achieve the same task.
Fortunately, MATLAB provides built-in functions that can seamlessly accomplish this. Here's how you can use MATLAB's native capabilities to read and write UTF-16 XML files:
% Define the file name
filename = 'example_utf16.xml';
% Open the file for writing with UTF-16 encoding
fileID = fopen(filename, 'w', 'n', 'UTF-16LE');
if fileID == -1
error('Failed to open file for writing.');
end
% Write some text to the file
fprintf(fileID, '<note>It can contain special characters like: ä, ö, ü, ñ, ç, 𤭢.\n </note>');
% Close the file
fclose(fileID);
disp(['File "', filename, '" has been created with UTF-16 encoding.'])
File "example_utf16.xml" has been created with UTF-16 encoding.
The encoding of the file created can be confirmed in notepad as well,
I hope this helps, thanks!
  1 Comment
Alex Mason
Alex Mason on 2 Sep 2024
Hi @Harsh
Sorry for the late reply, I will give this a try. I've used MATLAB on and off for a long time and have gotten used to functions I need not being in MATLAB natively and just going straight to the community for solutions.
I will give it a try and report back.
Many thanks

Sign in to comment.

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!