Simple data extration from notepad

14 views (last 30 days)
Mate 2u
Mate 2u on 30 Mar 2012
Hi there. I have financial data in a notepad in the form of:
10/21/2002,0609,0.97270,0.97270,0.97260,0.97260,0,0
10/21/2002,0610,0.97260,0.97260,0.97260,0.97260,0,0
10/21/2002,0611,0.97280,0.97280,0.97280,0.97280,0,0
10/21/2002,0612,0.97290,0.97290,0.97290,0.97290,0,0
10/21/2002,0613,0.97290,0.97290,0.97290,0.97290,0,0
10/21/2002,0614,0.97290,0.97290,0.97290,0.97290,0,0
Now to brief you this is data for 1 minute data 24 hours a day and 5 days a week. Each entry is on a new line with no spaces.
I want to transfer this data to MATLAB...but I want a easy method to select certain periods...For instance lets say I want period of 0600 - 0800 only for the historical data.
Additionally for anybody very clever is there a way I can select certain dates and time constraint like 10/28/2003 0600-0800.
I look forward to some answers.
Thanks

Accepted Answer

Eric
Eric on 30 Mar 2012
Here's an approach I would try:
1. Use csvread() to read in only the first two columns, the dates and times.
2. Use the datenum() function to convert these to serial date numbers.
3. Use the datenum() function to convert the desired dates and times to serial date numbers.
4. You should now be able to figure out exactly which rows to read. Use csvread() again to read only those lines of data.
The goal is to read only the data of interest from the file rather than reading in the whole file and then selecting the data of interest. I'm assuming that partially reading in a file using csvread() is faster than reading the whole thing, which I haven't tried. I would hope csvread() is that intelligent, though.
If your data truly are quite repeatable, then step 1 above could be replaced by reading in only the first date/time present in the first row. You could figure out the desired rows from just the first entry if they are absolutely repeatable. That would save you from having to read in all of the first two columns.
Good luck,
Eric

More Answers (3)

Mate 2u
Mate 2u on 30 Mar 2012
Additionally for the selected time frames would be nice to have a matrix with the selected values.

Andrei Bobrov
Andrei Bobrov on 30 Mar 2012
try this is code
fid = fopen('yourtxtfile.txt');
C = textscan(fid,'%s %s %f %f %f %f %f %f','Delimiter',',','CollectOutput',1);
fclose(fid);
mdyhm = arrayfun(@(x)[C{1}{x,:}],(1:size(C{1},1))','un',0);
nmdyhm = datenum(mdyhm,'mm/dd/yyyyHHMM');
% input your period
mdy = '10/28/2003';
hm = ['0600';'0800'];
bd = strcat(mdy,hm);
nbd = datenum(bd,'mm/dd/yyyyHHMM');
out = C{2}(nmdyhm >= nbd(1) & nmdyhm <= nbd(2),:);
  2 Comments
Mate 2u
Mate 2u on 30 Mar 2012
fid = fopen('eurusd1m.txt');
% This does not import the file data it just gives a random numeric value for some reason
Jason Ross
Jason Ross on 30 Mar 2012
That's the file ID. The next line is what reads the data from the file ID.
http://www.mathworks.com/help/techdoc/ref/fopen.html

Sign in to comment.


Mate 2u
Mate 2u on 30 Mar 2012
I am running the code above but it is taking forever. There are many elements in this file. Is there any faster methods? I have been literally been waiting 20 minutes and still not done.
Additionally I would like to be able to run it so I can choose a custom range of dates and times which may not be sequential.
Any suggestions.
  1 Comment
Jason Ross
Jason Ross on 30 Mar 2012
Rather than importing all the data, then throwing away what you don't want, figure out a way to organize the data into smaller file chunks so you only have to open what you want. For example, you could create five files, one for each day, or you could create files by date and hour. This would give you a well-known pattern you can search against since you can get a directory listing very quickly and discard the files that don't contain the data you need.
The actual scheme for the file naming is up to you. You could use some sort of YYMMDDHH layout, or if it's all relative to now, you could use .0 (today), .1 (yesterday) and on back.
Of course, at some point you are essentially re-implementing a database. If you are getting this data from a database already, you can figure out how to make a query to the database for only the data you want, dump that to a file, and then you don't need to search in MATLAB since you already have narrowed the data set.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!