Why is the correlation of two random markov chains so large?

Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
A very simple python code shows that the correlation coefficient and spearmann rank between two datasets of uniform random numbers drops proportionally to the square root of the number of points in the dataset. However, when one compares Markov chains produced by accumulating those numbers, there is no such behaviour, and even for large data sets, correlation can easily be like 0.7. What is the origin of such behaviour? Are two random markov chains more "related" to each other than two random datasets? Can this effect be corrected?


import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
x = np.random.normal(0, 1, 10000)
y = np.random.normal(0, 1, 10000)
# Convert random data into Markov chains
for j in range(1, 10000):
x[j] += x[j-1]
y[j] += y[j-1]
print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))
plt.figure()
plt.plot(x)
plt.plot(y)
plt.show()
markov-chains correlation
add a comment |Â
up vote
0
down vote
favorite
A very simple python code shows that the correlation coefficient and spearmann rank between two datasets of uniform random numbers drops proportionally to the square root of the number of points in the dataset. However, when one compares Markov chains produced by accumulating those numbers, there is no such behaviour, and even for large data sets, correlation can easily be like 0.7. What is the origin of such behaviour? Are two random markov chains more "related" to each other than two random datasets? Can this effect be corrected?


import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
x = np.random.normal(0, 1, 10000)
y = np.random.normal(0, 1, 10000)
# Convert random data into Markov chains
for j in range(1, 10000):
x[j] += x[j-1]
y[j] += y[j-1]
print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))
plt.figure()
plt.plot(x)
plt.plot(y)
plt.show()
markov-chains correlation
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
A very simple python code shows that the correlation coefficient and spearmann rank between two datasets of uniform random numbers drops proportionally to the square root of the number of points in the dataset. However, when one compares Markov chains produced by accumulating those numbers, there is no such behaviour, and even for large data sets, correlation can easily be like 0.7. What is the origin of such behaviour? Are two random markov chains more "related" to each other than two random datasets? Can this effect be corrected?


import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
x = np.random.normal(0, 1, 10000)
y = np.random.normal(0, 1, 10000)
# Convert random data into Markov chains
for j in range(1, 10000):
x[j] += x[j-1]
y[j] += y[j-1]
print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))
plt.figure()
plt.plot(x)
plt.plot(y)
plt.show()
markov-chains correlation
A very simple python code shows that the correlation coefficient and spearmann rank between two datasets of uniform random numbers drops proportionally to the square root of the number of points in the dataset. However, when one compares Markov chains produced by accumulating those numbers, there is no such behaviour, and even for large data sets, correlation can easily be like 0.7. What is the origin of such behaviour? Are two random markov chains more "related" to each other than two random datasets? Can this effect be corrected?


import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
x = np.random.normal(0, 1, 10000)
y = np.random.normal(0, 1, 10000)
# Convert random data into Markov chains
for j in range(1, 10000):
x[j] += x[j-1]
y[j] += y[j-1]
print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))
plt.figure()
plt.plot(x)
plt.plot(y)
plt.show()
markov-chains correlation
asked Jul 12 at 2:42
Aleksejs Fomins
373111
373111
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).
You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.
You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.
This is explained in more detail here
BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).
Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â Aleksejs Fomins
Aug 27 at 8:31
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).
You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.
You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.
This is explained in more detail here
BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).
Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â Aleksejs Fomins
Aug 27 at 8:31
add a comment |Â
up vote
1
down vote
accepted
The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).
You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.
You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.
This is explained in more detail here
BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).
Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â Aleksejs Fomins
Aug 27 at 8:31
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).
You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.
You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.
This is explained in more detail here
BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).
The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).
You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.
You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.
This is explained in more detail here
BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).
edited Aug 25 at 1:54
answered Aug 25 at 1:44
leonbloy
38.2k644104
38.2k644104
Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â Aleksejs Fomins
Aug 27 at 8:31
add a comment |Â
Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â Aleksejs Fomins
Aug 27 at 8:31
Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â Aleksejs Fomins
Aug 27 at 8:31
Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
â Aleksejs Fomins
Aug 27 at 8:31
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2848252%2fwhy-is-the-correlation-of-two-random-markov-chains-so-large%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password