How to turn .txt file into a useful table.

46 views (last 30 days)
John Jacoby
John Jacoby on 27 Aug 2017
Commented: Jeremy Hughes on 31 Aug 2017
This seems like it should be exceedingly simple, but I haven't found anything on here or anywhere else that addresses it. I have a text file delimited by periods that should be very easy to import using the readtable function, but it seems that readtable automatically sets everything to be character arrays. I've tried using format strings, but I get errors. I would include my code but it's simply one line, one fuction: readtable(filepath).
Trying to include a format string gets me:
"Unable to read the entire file. You may need to specify
a different format, delimiter, or number of header
lines.
Note: readtable detected the following parameters:
'HeaderLines', 0, 'ReadVariableNames', true
Error in redditAnalysis (line 4)
allData =
readtable('C:\Users\John\Desktop\ChildrensNeurobio\MATLABproject\redditPractice\all.txt',
'Delimiter', '.', 'Format', '%f%f%f%f%f%s');
"
Any idea how to get the columns I need into a useful numeric vector format?
EDIT: the first few lines of the file... rank.page.upvotes.comments.age.subreddit
1.1.40400.1283.3.OldSchoolCool
2.1.19200.906.4.funny
3.1.31800.1709.5.politics
4.1.40300.780.5.bestof
5.1.5844.1277.3.soccer
6.1.30200.256.5.aww

Answers (2)

Sailesh Sidhwani
Sailesh Sidhwani on 30 Aug 2017
To achieve your workflow, along with the file you should all pass "File Import Options" to the readtable() functio. These options define how the file will be read in MATLAB. You can also set the variable names, variable types and delimiter in these import options. To know more about import options, check the documentation link below:
See the following steps to achieve your workflow. "abc.txt" is the subset of your file from your question.
opts = detectImportOptions('abc.txt')
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'rank_page_upvotes_comments_age_subreddit'}
VariableTypes: {'char'}
SelectedVariableNames: {'rank_page_upvotes_comments_age_subreddit'}
VariableOptions: Show all 1 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Now change the delimiter, variableNames and variableTypes as per your requirement.
opts.Delimiter = {'.'};
opts.VariableNames= {'rank','page','upvotes','comments','age','subreddit'}
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'.'}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'rank', 'page', 'upvotes' ... and 3 more}
VariableTypes: {'char', 'char', 'char' ... and 3 more}
SelectedVariableNames: {'rank', 'page', 'upvotes' ... and 3 more}
VariableOptions: Show all 6 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Now pass this "opts" as File Import Options to "readtable"
readtable('abc.txt',opts)
ans =
6×6 table
rank page upvotes comments age subreddit
____ ____ _______ ________ ___ _______________
'1' '1' '40400' '1283' '3' 'OldSchoolCool'
'2' '1' '19200' '906' '4' 'funny'
'3' '1' '31800' '1709' '5' 'politics'
'4' '1' '40300' '780' '5' 'bestof'
'5' '1' '5844' '1277' '3' 'soccer'
'6' '1' '30200' '256' '5' 'aww'
  1 Comment
Jeremy Hughes
Jeremy Hughes on 31 Aug 2017
Edited: Jeremy Hughes on 31 Aug 2017
you can also set the types with:
>> opts = setvartype(opts,1:5,'double');
See my full answer for a slightly better approach.

Sign in to comment.


Jeremy Hughes
Jeremy Hughes on 31 Aug 2017
Edited: Jeremy Hughes on 31 Aug 2017
Hi,
This is actually pretty simple:
>> opts = detectImportOptions('abc.txt','Delimiter','.')
>> opts.VariableNames= {'rank','page','upvotes','comments','age','subreddit'}
>> t = readtable('abc.txt',opts);
Without import options, readtable uses a slightly different reading method that scans for numbers and thus pulls the '.' (i.e. decimal point) along for the ride. Without the 'Delimiter' parameter, detectImportOptions will not choose '.' since it assumes the value will appear as a decimal separator.
Hope this helps,
Jeremy
  1 Comment
Jeremy Hughes
Jeremy Hughes on 31 Aug 2017
And if the variable names are already there in the file, you might not need that second line.

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!