How to read 16-bit text with Matlab

I have an Agilent 34970A data logger with their BenchLink software. This puts out a .csv file that Excel reads with no trouble. But Matlab cannot import it because it's 16 bit text. The first two bytes are 0xFF 0xFE, then after that every text byte is followed by a NULL (0x00). I wrote a function to read it, but I was hoping for the ability to read it directly, without having to programmatically skip the NULLs.

 Accepted Answer

MATLAB does not handle UTF-16LE files. You might wish to use the source code I posted in http://uk.mathworks.com/matlabcentral/answers/267176-read-and-seperate-csv-data#answer_209938 which reads csv files from any of the UTF-8 / UTF-16 / UTF-32 encodings

9 Comments

After a bunch more programming, the solution turns out to be not bad at all.
If you have a UTF* encoded file you want to read with csvread or dlmread or textscan, you cannot do that directly. Instead:
fopen() the file with 'rt' (read text), and 'n' (native encoding), and specify the UTF encoding to the fopen(). For example,
fid = fopen('test.csv', 'rt', 'n', 'UTF16LE')
If you specify a UTF encoding other than UTF8, then you will probably get a warning about the UTF encoding not be supported. It turns out you can ignore that warning since R2006a when you are reading a file.
Now, fread() from the file enough bytes to consume any Byte Order mark that is at the beginning of the file. For example, UTF16LE starts with the byte pair 255 254 so tell it to read those (you can throw them away)
fread(fid, 2, '*uint8'); %adjust the 2 to fit the UTF encoding
Now, read the entire rest of the file as a string:
filecontent = fread(fid, [1 inf], '*char');
and now you can fclose(fid)
This string will already have the UTF decoded.
With that string that is the entire file in hand, you can proceed to textscan() the string buffer, such as
datacell = textscan(filecontent, '%s%f%f%f', 'Delimiter', ',', 'HeaderLines', 1);
Stephen23
Stephen23 on 2 Mar 2016
Edited: Stephen23 on 2 Mar 2016
Frank's "Answer" moved here:
Thanks, looks like a good solution. Interesting that it was posted a few days after I did my own workaround.
+1, Thanks, Walter.
Thanks !! Works very well.
If you came back here from Walter's answer to the other question and are confused, unhide the older comments!
So just to double check, the number of bytes to skip is the BOM_size, right? So, the final solution would be
[encoding,bytes_per_char,BOM_size] = detect_UTF_encoding(srcFile);
fidr = fopen(srcPath, 'rt', 'n', encoding);
% skip
fread(fidr, BOM_size, '*uint8');
filecontent = fread(fidr, [1 inf], '*char');
And for clarification, the BOM_size is in bytes, right? If so is it clearer/better to use fseek to skip?
[encoding,bytes_per_char,BOM_size] = detect_UTF_encoding(srcFile);
fidr = fopen(srcPath, 'rt', 'n', encoding);
% skip
fseek(fidr, BOM_size, 0);
filecontent = fread(fidr, [1 inf], '*char');
Yes BOM_size is bytes.
fseek() has more overhead than fread and discarding at that point. fseek() has to execute code to check for buffered output and wait for it to be completed, and reset the end of file flag, and re-buffer from the input file because fseek is the system call used to synchronize between multiple process i/o.
deep...thanks!
Very nice Walter ;)
Does anybody know how to deactivate the warning?
"Warning: The encoding 'XXXXXXX' is not supported.
See the documentation for FOPEN."
I am reading several hundreds of these files, I'd like to remove this output from the console...
To disable the warning, issue the following command after you receive that warning:
[msg, id] = lastwarn;
Then, disable the warning as follows:
warning('off', id)

Sign in to comment.

More Answers (0)

Products

Asked:

on 1 Mar 2016

Edited:

on 20 May 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!