Why is the correlation of two random markov chains so large?

up vote
0
down vote

favorite

A very simple python code shows that the correlation coefficient and spearmann rank between two datasets of uniform random numbers drops proportionally to the square root of the number of points in the dataset. However, when one compares Markov chains produced by accumulating those numbers, there is no such behaviour, and even for large data sets, correlation can easily be like 0.7. What is the origin of such behaviour? Are two random markov chains more "related" to each other than two random datasets? Can this effect be corrected?

Average correlation and spearmann rank between two random datasets drawn from a normal distribution, as function of dataset size

Average correlation and spearmann rank between two markov chains generated from a normal distribution, as function of dataset size

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

x = np.random.normal(0, 1, 10000)
y = np.random.normal(0, 1, 10000)

# Convert random data into Markov chains
for j in range(1, 10000):
 x[j] += x[j-1]
 y[j] += y[j-1]

print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))

plt.figure()
plt.plot(x)
plt.plot(y)
plt.show()

asked Jul 12 at 2:42

Aleksejs Fomins

373111

add a commentÂ |Â

up vote
0
down vote

favorite

Average correlation and spearmann rank between two random datasets drawn from a normal distribution, as function of dataset size

Average correlation and spearmann rank between two markov chains generated from a normal distribution, as function of dataset size

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

x = np.random.normal(0, 1, 10000)
y = np.random.normal(0, 1, 10000)

# Convert random data into Markov chains
for j in range(1, 10000):
 x[j] += x[j-1]
 y[j] += y[j-1]

print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))

plt.figure()
plt.plot(x)
plt.plot(y)
plt.show()

asked Jul 12 at 2:42

Aleksejs Fomins

373111

add a commentÂ |Â

up vote
0
down vote

favorite

Average correlation and spearmann rank between two random datasets drawn from a normal distribution, as function of dataset size

Average correlation and spearmann rank between two markov chains generated from a normal distribution, as function of dataset size

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

x = np.random.normal(0, 1, 10000)
y = np.random.normal(0, 1, 10000)

# Convert random data into Markov chains
for j in range(1, 10000):
 x[j] += x[j-1]
 y[j] += y[j-1]

print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))

plt.figure()
plt.plot(x)
plt.plot(y)
plt.show()

asked Jul 12 at 2:42

Aleksejs Fomins

373111

Average correlation and spearmann rank between two random datasets drawn from a normal distribution, as function of dataset size

Average correlation and spearmann rank between two markov chains generated from a normal distribution, as function of dataset size

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

x = np.random.normal(0, 1, 10000)
y = np.random.normal(0, 1, 10000)

# Convert random data into Markov chains
for j in range(1, 10000):
 x[j] += x[j-1]
 y[j] += y[j-1]

print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))

plt.figure()
plt.plot(x)
plt.plot(y)
plt.show()

asked Jul 12 at 2:42

Aleksejs Fomins

373111

asked Jul 12 at 2:42

Aleksejs Fomins

373111

asked Jul 12 at 2:42

Aleksejs Fomins

373111

asked Jul 12 at 2:42

Aleksejs Fomins

373111

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).

You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.

You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.

This is explained in more detail here

BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).

edited Aug 25 at 1:54

answered Aug 25 at 1:44

leonbloy

38.2k644104

Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â€“Â Aleksejs Fomins
Aug 27 at 8:31

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2848252%2fwhy-is-the-correlation-of-two-random-markov-chains-so-large%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.

This is explained in more detail here

BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).

edited Aug 25 at 1:54

answered Aug 25 at 1:44

leonbloy

38.2k644104

Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â€“Â Aleksejs Fomins
Aug 27 at 8:31

add a commentÂ |Â

up vote
1
down vote

accepted

You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.

This is explained in more detail here

BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).

edited Aug 25 at 1:54

answered Aug 25 at 1:44

leonbloy

38.2k644104

Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â€“Â Aleksejs Fomins
Aug 27 at 8:31

add a commentÂ |Â

up vote
1
down vote

accepted

You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.

This is explained in more detail here

BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).

edited Aug 25 at 1:54

answered Aug 25 at 1:44

leonbloy

38.2k644104

You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.

This is explained in more detail here

BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).

edited Aug 25 at 1:54

answered Aug 25 at 1:44

leonbloy

38.2k644104

edited Aug 25 at 1:54

answered Aug 25 at 1:44

leonbloy

38.2k644104

answered Aug 25 at 1:44

leonbloy

38.2k644104

answered Aug 25 at 1:44

leonbloy

38.2k644104

Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â€“Â Aleksejs Fomins
Aug 27 at 8:31

add a commentÂ |Â

Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â€“Â Aleksejs Fomins
Aug 27 at 8:31

Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â€“Â Aleksejs Fomins
Aug 27 at 8:31

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

Vtyjkyuk