Data manipulation of a financial time series

Hi all I have a 1x1 Struct called "ans". Within this I have a textdata matrix of size 5 million x 4.
Now in this textdata I have 4 columns, 1st column is dates, 2nd column is time, 3rd column is prices and 4th column is symbol.
OBJECTIVE
I need to rearrange the 2nd column times so it gives me the prices in 30 second intervals. Currently it is not regular.
Example of current format of column 2 is:'18:42:20.000' '18:42:22.000' '18:42:30.000' '18:42:40.000' '18:42:40.000' '18:42:40.000' '18:42:40.000' '18:42:40.000' '18:42:43.000' '18:42:49.000' '18:43:00.000'
I need to do it so that it gives me the times only every 30 seconds and the corresponding price (column 3) at that specific time. The goal is to have regular 30 second intervals alongside the price at that current time. So we only need to work with columns 2 and 3.
Please message me if I never explained properly, and thanks so much for the help in advance.

4 Comments

So you have 5e6 data points. If they are taken every second, that gives around 57 days of data. How do you distinguish between days? How do you want to interpolate when you have several data with the same time stamp?
'18:42:40.000' '18:42:40.000' '18:42:40.000'
I do not need to distinguish between days. In your example the price at 18:43:00 would be the last price at 18:42:40 as it stays the same from the last 42.40 to 43......so we have to think of it as a financial application
Well, you would still need to distinguish between days then. Say the price in Monday at 18:40:02 is 412 bananas and on Tuesday at 18:40:02 it is 24 bananas. That would have an impact in your calculations. Similary, you have several data with the same time stamp:
  • 18:42:00 12 bananas
  • 18:42:00 13 bananas
  • 18:42:00 0 bananas
Which is the valid one? Are the prices guaranteed to be the same? Or how do you calculate it? The average? The minimum? The maximum? The last one? Or maybe I am missing something?
Hi, you are right 1st column is dates in the format of 01/03/2007...
The prices are not guaranteed to be the same.
Let me explain properly: I need the prices in 30 second even intervals. Currently it is uneven. If there are multiple prices for a certain time that corresponds to many trades occuring in a second and the last price is what the asset would remain until the next trade? Does this help?

Sign in to comment.

Answers (1)

José-Luis
José-Luis on 5 Oct 2012
Edited: José-Luis on 5 Oct 2012
Here is what you can do, assuming your_data is a cell matrix containing strings.
your_data{1} = '01/08/2007';
your_data{1,2} = '18:04:01';
your_data{2,1} = '04/08/2007';
your_data{2,2} = '18:05:01';
your_data{1,3} = '12';
your_data{2,3} = '13';
%Getting Matlab's date number
your_dates = cellfun(@(a,b) datenum([a ' ' b],'dd/mm/yy HH:MM:SS'),...
your_data(:,1),your_data(:,2));
%Creating time stamp with 1/2 hour interval
interpDates = (floor(your_dates(1)):1/2880:ceil(your_dates(end)));
interpDates(interpDates < your_dates(1)) = [];
interpDates(interpDates > your_dates(end)) = [];
%Keeping only the last time stamp, from the data
[dummy idx dummy] = unique(your_dates,'last');
your_dates = your_dates(idx);
%Getting data values
your_bananas = cellfun(@(a) str2double(a), your_data(:,3));
your_bananas = your_bananas(idx);
%Function to find the nearest value
find_banana = @(x) find(your_dates>=x,1,'first'); %There is probably a more efficient way
%Interpolating
your_interp_bananas = arrayfun(@(x) your_bananas(find_banana(x)),interpDates);

4 Comments

Mate 2u
Mate 2u on 5 Oct 2012
Edited: Mate 2u on 5 Oct 2012
Hi there, thanks,
It does not seem to work. Could you please see my data sample and desired output below. It is only a sample.
Mate 2u
Mate 2u on 5 Oct 2012
Edited: Mate 2u on 5 Oct 2012
Data sample:
'01/03/2007' '15:30:06.000' '55.90' 'CTH07'
'01/03/2007' '15:30:30.000' '55.75' 'CTH07'
'01/03/2007' '15:30:42.000' '55.80' 'CTH07'
'01/03/2007' '15:30:53.000' '55.85' 'CTH07'
'01/03/2007' '15:30:57.000' '55.75' 'CTH07'
'01/03/2007' '15:31:17.000' '55.70' 'CTH07'
'01/03/2007' '15:31:23.000' '55.65' 'CTH07'
'01/03/2007' '15:31:36.000' '55.55' 'CTH07'
'01/03/2007' '15:31:38.000' '55.60' 'CTH07'
'01/03/2007' '15:31:43.000' '55.55' 'CTH07'
'01/03/2007' '15:31:44.000' '55.60' 'CTH07'
'01/03/2007' '15:31:50.000' '55.70' 'CTH07'
'01/03/2007' '15:32:07.000' '55.55' 'CTH07'
'01/03/2007' '15:32:07.000' '55.90' 'CTH07'
'01/03/2007' '15:40:41.000' '55.30' 'CTH07'
'01/03/2007' '15:40:43.000' '55.40' 'CTH07'
'01/03/2007' '15:40:52.000' '55.30' 'CTH07'
'01/03/2007' '15:40:54.000' '55.50' 'CTH07'
'01/03/2007' '15:41:33.000' '55.15' 'CTH07'
OUTPUT REQUIRED
'01/03/2007' '15:30:30.000' '55.75' 'CTH07'
'01/03/2007' '15:31:00.000' '55.75' 'CTH07'
'01/03/2007' '15:31:30.000' '55.65' 'CTH07'
'01/03/2007' '15:32:00.000' '55.70' 'CTH07'
'01/03/2007' '15:32:30.000' '55.90' 'CTH07'
'01/03/2007' '15:33:00.000' '55.90' 'CTH07'
'01/03/2007' '15:33:30.000' '55.90' 'CTH07'
etc.......
Modified original code. I thought it was 30 min interval. To do a 30 second interval, use:
interpDates = (floor(your_dates(1)):1/2880:ceil(your_dates(end)));
Also I realized an error in the find_bananas function, it should have been written the other way around:
find_banana = @(x) find(your_dates>=x,1,'first');
OK, I am running it on my 5,000,000 x 4 cell array. It is Busy, I will let you know when it has run and if it works.

Sign in to comment.

Asked:

on 5 Oct 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!