Correlation of two variables over time: can this happen?

Question

1 vote

I have a variable x1 and x2 between January 1 till December 31. When I calculate the correlation between Janaury and June it is positive. When I calculate the correlation between July to December it is positive. But the correlation between Janauary and December is negative. Can it happen?

1 Comment
Show -1 older comments Hide -1 older comments

Matt Gaidica on 17 Dec 2020

Yes. However, you probably want to perform a statistical test to determine if the fluctuating correlation is significant.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

the cyclist on 17 Dec 2020

Open in MATLAB Online

1 vote

Yes. This is known as Simpson's Paradox. Here is an example:

rng default
x1 = [1 2 3 4  6  7  8  9]';
x2 = [1 2 3 4 -6 -5 -4 -3]' + 0.8*rand(8,1);
% Correlation of first half
corrcoef(x1(1:4),x2(1:4))
% Correlation of second half
corrcoef(x1(5:8),x2(5:8))
% Correlation of entire vector
corrcoef(x1,x2)
% Plot it
figure
scatter(x1,x2)

You can see that the first half and the second half are positively correlated with each other, but if you look at the trend over the entire vector, it is negative.

7 Comments
Show 5 older comments Hide 5 older comments

Ive J on 17 Dec 2020

Open in MATLAB Online

As also mentioned above it's a clear example of Simpson's paradox; you cannot (you can but better don't) interpret the association of this model (irrespective of the statistical method you use) without taking into account the effect of confounding variables. As a simple example, you can think of group 1 and group 2 (two clusteres in above picture) as a confounding you should adjust for.

rng default
x = [1 2 3 4  6  7  8  9]'; % predictor
y = [1 2 3 4 -6 -5 -4 -3]' + 0.8*rand(8,1); % dependent
conf = [ones(4, 1); ones(4, 1) + 1]; % confounding variable
% model 1: without adjusting for this confounder
fitlm(x, y)
Linear regression model:
    y ~ 1 + x1
Estimated Coefficients:
                   Estimate      SE        tStat      pValue 
                   ________    _______    _______    ________
    (Intercept)     4.6512      2.1124     2.2019    0.069923
    x1             -1.0439     0.37054    -2.8173    0.030462
    
% model 2: after adjusting for the confounder 
fitlm([x, conf], y)
Linear regression model:
    y ~ 1 + x1 + x2
Estimated Coefficients:
                   Estimate       SE        tStat       pValue  
                   ________    ________    _______    __________
    (Intercept)     12.737      0.38027     33.496    4.4587e-07
    x1             0.97767     0.087819     11.133    0.00010197
    x2             -12.129      0.48101    -25.217    1.8305e-06
    

the cyclist on 17 Dec 2020

Judea Pearl would emphasize that this "paradox" cannot be resolved using only the data. The correct interpretation will rely on understanding the causal mechanism or generative process that led to these data. (I don't expect there is an ELI5 explanation.)

Assuming that the data are meaningful, neither the positive nor the negative correlation is "wrong". They just describe different aspects of the data. For example, suppose in my example the variables represent something like

x1 = amount of fertilizer used on a field
x2 = crop yield

(Doesn't really work with the negative values I used, but ignore that.)

And maybe the cluster of 4 points on the left is from spring, and the cluster of 4 points on the right is from autumn.

The interpretation could be that greater use of fertilizer yields greater crop yield ... but that there is a factor related to the season that means there is lower yield in autumn.

It is not possible to interpret the data themselves, or determine whether the partitioned or aggregated data are more relevant, without a conceptual model of what is going on.

Ive J on 17 Dec 2020

It was useful anyway!

alpedhuez on 17 Dec 2020

Edited: alpedhuez on 17 Dec 2020

Yes I now understand confounding factor t.

x1 and x2 has a postive correlation between date 1 and date 2.

x1 and x2 has a positive correltion between date 2 and date 3.

x1 and x2 has a negative correlation betwen date 1 and date 3.

Regression of x2 on x1 between date 1 and date 2 has positive statistically sig coef.

Regression of x2 on x1 between date 2 and date 3 has positive statistically sig coef.

Regression of x2 on x1 between date 1 and date 3 has negative statistically sig coef.

Regression of x2 on x1 and time between date 1 and date 2 has positive statistically sig coef for both x1 and time.

Regression of x2 on x1 and time between date 2 and date 3 has positive statistically sig coef for both x1 and time.

Regression of x2 on x1 and time between date 1 and date 3 has positive statistically sig coef for both x1 and time.

How can it happen?

Sign in to comment.

Answer 2

Matt Gaidica on 17 Dec 2020

Edited: Matt Gaidica on 17 Dec 2020

1 vote

Sure, it can happen. What's your concern about it?

4 Comments
Show 2 older comments Hide 2 older comments

Image Analyst on 17 Dec 2020

For a thorough explanation see the Wikipedia link in the Cyclist's answer below. (He beat me to it, posting this fun paradox.)

It would be interesting for us to know what real world measurements x1 and x2 are so we can see a real world situation that gives rise to this parabox. What do x1 and x2 represent?

alpedhuez on 17 Dec 2020

Edited: alpedhuez on 17 Dec 2020

I am happy to explain to get comments. But I am not sure whether I want to discuss the problem in a public bulletin board. So if there is a suggestion for a venue I will be happy to listen.

Sign in to comment.

Correlation of two variables over time: can this happen?

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

7 Comments
Show 5 older comments Hide 5 older comments

More Answers (1)

4 Comments
Show 2 older comments Hide 2 older comments

Categories

Tags

Community Treasure Hunt

Correlation of two variables over time: can this happen?

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

7 Comments Show 5 older comments Hide 5 older comments

More Answers (1)

4 Comments Show 2 older comments Hide 2 older comments

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

7 Comments
Show 5 older comments Hide 5 older comments

4 Comments
Show 2 older comments Hide 2 older comments