Why is the correlation of two random markov chains so large?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












A very simple python code shows that the correlation coefficient and spearmann rank between two datasets of uniform random numbers drops proportionally to the square root of the number of points in the dataset. However, when one compares Markov chains produced by accumulating those numbers, there is no such behaviour, and even for large data sets, correlation can easily be like 0.7. What is the origin of such behaviour? Are two random markov chains more "related" to each other than two random datasets? Can this effect be corrected?



Average correlation and spearmann rank between two random datasets drawn from a normal distribution, as function of dataset size



Average correlation and spearmann rank between two markov chains generated from a normal distribution, as function of dataset size



import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

x = np.random.normal(0, 1, 10000)
y = np.random.normal(0, 1, 10000)

# Convert random data into Markov chains
for j in range(1, 10000):
x[j] += x[j-1]
y[j] += y[j-1]

print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))

plt.figure()
plt.plot(x)
plt.plot(y)
plt.show()






share|cite|improve this question
























    up vote
    0
    down vote

    favorite












    A very simple python code shows that the correlation coefficient and spearmann rank between two datasets of uniform random numbers drops proportionally to the square root of the number of points in the dataset. However, when one compares Markov chains produced by accumulating those numbers, there is no such behaviour, and even for large data sets, correlation can easily be like 0.7. What is the origin of such behaviour? Are two random markov chains more "related" to each other than two random datasets? Can this effect be corrected?



    Average correlation and spearmann rank between two random datasets drawn from a normal distribution, as function of dataset size



    Average correlation and spearmann rank between two markov chains generated from a normal distribution, as function of dataset size



    import numpy as np
    import matplotlib.pyplot as plt
    import scipy.stats

    x = np.random.normal(0, 1, 10000)
    y = np.random.normal(0, 1, 10000)

    # Convert random data into Markov chains
    for j in range(1, 10000):
    x[j] += x[j-1]
    y[j] += y[j-1]

    print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
    print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))

    plt.figure()
    plt.plot(x)
    plt.plot(y)
    plt.show()






    share|cite|improve this question






















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      A very simple python code shows that the correlation coefficient and spearmann rank between two datasets of uniform random numbers drops proportionally to the square root of the number of points in the dataset. However, when one compares Markov chains produced by accumulating those numbers, there is no such behaviour, and even for large data sets, correlation can easily be like 0.7. What is the origin of such behaviour? Are two random markov chains more "related" to each other than two random datasets? Can this effect be corrected?



      Average correlation and spearmann rank between two random datasets drawn from a normal distribution, as function of dataset size



      Average correlation and spearmann rank between two markov chains generated from a normal distribution, as function of dataset size



      import numpy as np
      import matplotlib.pyplot as plt
      import scipy.stats

      x = np.random.normal(0, 1, 10000)
      y = np.random.normal(0, 1, 10000)

      # Convert random data into Markov chains
      for j in range(1, 10000):
      x[j] += x[j-1]
      y[j] += y[j-1]

      print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
      print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))

      plt.figure()
      plt.plot(x)
      plt.plot(y)
      plt.show()






      share|cite|improve this question












      A very simple python code shows that the correlation coefficient and spearmann rank between two datasets of uniform random numbers drops proportionally to the square root of the number of points in the dataset. However, when one compares Markov chains produced by accumulating those numbers, there is no such behaviour, and even for large data sets, correlation can easily be like 0.7. What is the origin of such behaviour? Are two random markov chains more "related" to each other than two random datasets? Can this effect be corrected?



      Average correlation and spearmann rank between two random datasets drawn from a normal distribution, as function of dataset size



      Average correlation and spearmann rank between two markov chains generated from a normal distribution, as function of dataset size



      import numpy as np
      import matplotlib.pyplot as plt
      import scipy.stats

      x = np.random.normal(0, 1, 10000)
      y = np.random.normal(0, 1, 10000)

      # Convert random data into Markov chains
      for j in range(1, 10000):
      x[j] += x[j-1]
      y[j] += y[j-1]

      print("Correlation: ", np.abs(np.corrcoef(x,y)[0, 1]))
      print("SpearmannRank: ", np.abs(scipy.stats.spearmanr(x,y)[0]))

      plt.figure()
      plt.plot(x)
      plt.plot(y)
      plt.show()








      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Jul 12 at 2:42









      Aleksejs Fomins

      373111




      373111




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).



          You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.



          You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.



          This is explained in more detail here



          BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).






          share|cite|improve this answer






















          • Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
            – Aleksejs Fomins
            Aug 27 at 8:31










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "69"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2848252%2fwhy-is-the-correlation-of-two-random-markov-chains-so-large%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).



          You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.



          You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.



          This is explained in more detail here



          BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).






          share|cite|improve this answer






















          • Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
            – Aleksejs Fomins
            Aug 27 at 8:31














          up vote
          1
          down vote



          accepted










          The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).



          You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.



          You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.



          This is explained in more detail here



          BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).






          share|cite|improve this answer






















          • Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
            – Aleksejs Fomins
            Aug 27 at 8:31












          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).



          You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.



          You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.



          This is explained in more detail here



          BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).






          share|cite|improve this answer














          The problem is that your Markov chain (a pure accumulator) is not stationary, the variance increases so much that it's useless to trying to estimate a correlation coefficient by averaging (its variance, even after divided by $n$ , does not tend to zero).



          You can check this by adding a small "forgetting" factor. Denoting by $X[n]$ the Markov chain and by $x[n]$ the independent process (white noise), you' have $X[n] = a X[n-1] + x[n]$. By taking $0<a <1$, $X[n]$ results (asymptotically) stationary.



          You can check that by taking, say $a=0.99$ you already get rid of the problem and the (estimator of the) correlation coefficient is practically zero.



          This is explained in more detail here



          BTW: you speak of "uniform random numbers" but then you use a normal distribution. That's not essential though (though it's better to use zero mean variables).







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Aug 25 at 1:54

























          answered Aug 25 at 1:44









          leonbloy

          38.2k644104




          38.2k644104











          • Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
            – Aleksejs Fomins
            Aug 27 at 8:31
















          • Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
            – Aleksejs Fomins
            Aug 27 at 8:31















          Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
          – Aleksejs Fomins
          Aug 27 at 8:31




          Yeah, everybody is using correlation and nobody talks about the validity of (exact or asymptotic) i.i.d assumption underlying its sanity. Seems like I'm not the first to fall into this trap, but at least now I am aware of it :D
          – Aleksejs Fomins
          Aug 27 at 8:31

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2848252%2fwhy-is-the-correlation-of-two-random-markov-chains-so-large%23new-answer', 'question_page');

          );

          Post as a guest













































































          這個網誌中的熱門文章

          tkz-euclide: tkzDrawCircle[R] not working

          How to combine Bézier curves to a surface?

          1st Magritte Awards