Principal Component Analysis - return data (stock market data)

Question

David Schaefer on 15 Oct 2019

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/485444-principal-component-analysis-return-data-stock-market-data

Answered: the cyclist on 16 Oct 2019

First I want to explain you what I want to do.

I have data on returns of 262 stocks for 299 days in one year.

I want to run a factor model that takes the following form:

r(i,t) = y(0,i) + beta(i)*F(t) + e(i,t)

where t denotes a daily observation (edit: there are 299 days) and i denotes a stock. After the regression I want to calculate the standard deviation of the residual e for every stock.

F(t) in this regression should be the first 5 principal components of the cross section of returns in this year.

I spent the last two days reading about the principal component analysis. I think I understood the basic idea but I have difficulties to use it.

So I loaded the data into matlab and executed the following code:

coeff = pca(Data,'NumComponents',5)

This returns a 262 x 5 matrix.

So there are 5 columns because I specified the number of components to be 5, right?

But why do I get 5 different components for every stock?

First I thougt I need only 1 row and 5 columns. But when I look at the regression and see that F has the subscribed t I need five different components for every day or am I wrong? And how do I get them?

3 Comments
Show 1 older commentHide 1 older comment

David Schaefer on 15 Oct 2019

What I forgot to mention: I have 299 daily oberservations.

So for my factor model:

Could it be a solution to transpose the data before applying the code? So that I get a 299 x 5 matrix. And then run the regression because in my factor model F has the subscib t for daily observations.

Adam on 15 Oct 2019

Edited: Adam on 15 Oct 2019

The number of observations should not be a factor. The observations just determine what the eigenvectors actually are and how accurately they will measure what you want (more observations should give greater accuracy as a model of your data), but the eigenvectors themselves will have the dimensionality of your inputs.

Each of your input obersvations is in 262-dimensional space - i.e. it will have 262 components to it. These are all 'axis-aligned' along each of those components. The eigenvectors you get will simply re-orientate within that 262-dimensional space to give new axes that follow the multi-dimensional shape of your data rather than following each of the original components.

You can then project your data onto the eigenvectors and use these instead of the original dimensions and, because they follow the principal components of your data that is why you can throw away 257 of them (well, you chose to keep just 5 at least) because they describe your data better than if you just threw away 257 of the original dimensions.

You should also look at the other outputs from the pca function though. The explained output will tell you how much of the data variation is captured by those first 5 principal components.

Sign in to comment.

Sign in to answer this question.

Answer 1

the cyclist on 16 Oct 2019

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/485444-principal-component-analysis-return-data-stock-market-data#answer_396530

Open in MATLAB Online

You might want to check out my tutorial-style PCA answer here.

I think one helpful way to think about your 262x5 output is that your 262 stocks are the entire "market", and your 5 are stock "indices", designed to capture a fraction (ideally a large fraction) of the variability of the market.

Each index -- defined by a column of the coeff output -- is a principal component, defined by a linear combination of all the stocks. So if the first column of coeff is

coeff(:,1) = [0.03;
              0.02;
              0.06;
              ...
              ...]

that is telling you that the first "index" (i.e. first principal component) is composed of 3% of stock 1, 2% of stock 2, 6% of stock 3, and so on.

So, to capture what the market was doing on the 299 market days (as captured by the 5 indices), I believe you just need

Data * coeff

which is a (299x262) * (262x5) = (299,5) matrix. That matrix is the 299 daily returns of the 5 indices.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Principal Component Analysis - return data (stock market data)

3 Comments
Show 1 older commentHide 1 older comment

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Principal Component Analysis - return data (stock market data)

3 Comments Show 1 older commentHide 1 older comment

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments