How does Markov Chain coupling work?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I'm trying to understand the coupling argument used in proof of Theorem 5.2 in Finite Markov Chains and Algorithmic Applications by Häggström. My problem is actually more general than this particular situation, hence the generic question title.



Before asking the question, I will build up some context.



Theorem 5.2



For any aperiodic and irreducible MC $(X_0, X_1, ...)$ with state space $S = s_1, s_2, ... ,s_k $ we have that for any distribution $pi$ that is stationary and any arbitrary distribution $mu^(0)$ we have



$$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$



So to prove this, we first define two Markov Chains - $X = (X_0, X_1, ...)$ with initial ditribution $mu^(0)$ and $X' = (X'_0, X'_1, ...)$ with initial distribution $pi$. Since $pi$ is stationary, we get that $X'$ has distribution $pi$ for any $n$.



We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.



Then the problem begins. We define the third Markov Chain $X''$ such that



$X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.




Question 1:
Why is $X''$ a legitimate Markov Chain? On one hand I understand that it fulfills the property of memorylessness, but in MCs we have that distribution of the next step $mu^(n+1)$ is just $mu^nP$, where $P$ is the transition matrix. Doesn't this condition break at time $n = T$?



Question 2:
It's said that since $X''_0$ has distribution $mu^(0)$, then for any $n$, $X''_n$ has distribution $mu^(n)$. I don't see it. I understand that it works for $n le T$ but I don't get why it should hold after that time. I mean, for $n = T$ we get there with $mu^T$ and the next step is now with distribution $pi$. It just doesn't seem like $mu^(T+1)$.




In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.



But I still have one more question:



Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?










share|cite|improve this question

























    up vote
    1
    down vote

    favorite












    I'm trying to understand the coupling argument used in proof of Theorem 5.2 in Finite Markov Chains and Algorithmic Applications by Häggström. My problem is actually more general than this particular situation, hence the generic question title.



    Before asking the question, I will build up some context.



    Theorem 5.2



    For any aperiodic and irreducible MC $(X_0, X_1, ...)$ with state space $S = s_1, s_2, ... ,s_k $ we have that for any distribution $pi$ that is stationary and any arbitrary distribution $mu^(0)$ we have



    $$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$



    So to prove this, we first define two Markov Chains - $X = (X_0, X_1, ...)$ with initial ditribution $mu^(0)$ and $X' = (X'_0, X'_1, ...)$ with initial distribution $pi$. Since $pi$ is stationary, we get that $X'$ has distribution $pi$ for any $n$.



    We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.



    Then the problem begins. We define the third Markov Chain $X''$ such that



    $X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.




    Question 1:
    Why is $X''$ a legitimate Markov Chain? On one hand I understand that it fulfills the property of memorylessness, but in MCs we have that distribution of the next step $mu^(n+1)$ is just $mu^nP$, where $P$ is the transition matrix. Doesn't this condition break at time $n = T$?



    Question 2:
    It's said that since $X''_0$ has distribution $mu^(0)$, then for any $n$, $X''_n$ has distribution $mu^(n)$. I don't see it. I understand that it works for $n le T$ but I don't get why it should hold after that time. I mean, for $n = T$ we get there with $mu^T$ and the next step is now with distribution $pi$. It just doesn't seem like $mu^(T+1)$.




    In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.



    But I still have one more question:



    Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?










    share|cite|improve this question























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I'm trying to understand the coupling argument used in proof of Theorem 5.2 in Finite Markov Chains and Algorithmic Applications by Häggström. My problem is actually more general than this particular situation, hence the generic question title.



      Before asking the question, I will build up some context.



      Theorem 5.2



      For any aperiodic and irreducible MC $(X_0, X_1, ...)$ with state space $S = s_1, s_2, ... ,s_k $ we have that for any distribution $pi$ that is stationary and any arbitrary distribution $mu^(0)$ we have



      $$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$



      So to prove this, we first define two Markov Chains - $X = (X_0, X_1, ...)$ with initial ditribution $mu^(0)$ and $X' = (X'_0, X'_1, ...)$ with initial distribution $pi$. Since $pi$ is stationary, we get that $X'$ has distribution $pi$ for any $n$.



      We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.



      Then the problem begins. We define the third Markov Chain $X''$ such that



      $X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.




      Question 1:
      Why is $X''$ a legitimate Markov Chain? On one hand I understand that it fulfills the property of memorylessness, but in MCs we have that distribution of the next step $mu^(n+1)$ is just $mu^nP$, where $P$ is the transition matrix. Doesn't this condition break at time $n = T$?



      Question 2:
      It's said that since $X''_0$ has distribution $mu^(0)$, then for any $n$, $X''_n$ has distribution $mu^(n)$. I don't see it. I understand that it works for $n le T$ but I don't get why it should hold after that time. I mean, for $n = T$ we get there with $mu^T$ and the next step is now with distribution $pi$. It just doesn't seem like $mu^(T+1)$.




      In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.



      But I still have one more question:



      Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?










      share|cite|improve this question













      I'm trying to understand the coupling argument used in proof of Theorem 5.2 in Finite Markov Chains and Algorithmic Applications by Häggström. My problem is actually more general than this particular situation, hence the generic question title.



      Before asking the question, I will build up some context.



      Theorem 5.2



      For any aperiodic and irreducible MC $(X_0, X_1, ...)$ with state space $S = s_1, s_2, ... ,s_k $ we have that for any distribution $pi$ that is stationary and any arbitrary distribution $mu^(0)$ we have



      $$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$



      So to prove this, we first define two Markov Chains - $X = (X_0, X_1, ...)$ with initial ditribution $mu^(0)$ and $X' = (X'_0, X'_1, ...)$ with initial distribution $pi$. Since $pi$ is stationary, we get that $X'$ has distribution $pi$ for any $n$.



      We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.



      Then the problem begins. We define the third Markov Chain $X''$ such that



      $X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.




      Question 1:
      Why is $X''$ a legitimate Markov Chain? On one hand I understand that it fulfills the property of memorylessness, but in MCs we have that distribution of the next step $mu^(n+1)$ is just $mu^nP$, where $P$ is the transition matrix. Doesn't this condition break at time $n = T$?



      Question 2:
      It's said that since $X''_0$ has distribution $mu^(0)$, then for any $n$, $X''_n$ has distribution $mu^(n)$. I don't see it. I understand that it works for $n le T$ but I don't get why it should hold after that time. I mean, for $n = T$ we get there with $mu^T$ and the next step is now with distribution $pi$. It just doesn't seem like $mu^(T+1)$.




      In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.



      But I still have one more question:



      Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?







      probability markov-chains coupling






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Sep 1 at 9:37









      A.Budziak

      82




      82




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          Here is a more "algorithmic" way of describing the coupling.



          Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:



          1. If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

          2. If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

          Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.



          The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.



          If we work with $Y$ and $Y'$ then it's easy to answer your questions:



          1. $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

          2. We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

          Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.



          For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.






          share|cite|improve this answer






















            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "69"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2901521%2fhow-does-markov-chain-coupling-work%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            Here is a more "algorithmic" way of describing the coupling.



            Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:



            1. If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

            2. If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

            Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.



            The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.



            If we work with $Y$ and $Y'$ then it's easy to answer your questions:



            1. $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

            2. We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

            Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.



            For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.






            share|cite|improve this answer


























              up vote
              1
              down vote



              accepted










              Here is a more "algorithmic" way of describing the coupling.



              Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:



              1. If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

              2. If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

              Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.



              The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.



              If we work with $Y$ and $Y'$ then it's easy to answer your questions:



              1. $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

              2. We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

              Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.



              For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.






              share|cite|improve this answer
























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                Here is a more "algorithmic" way of describing the coupling.



                Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:



                1. If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

                2. If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

                Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.



                The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.



                If we work with $Y$ and $Y'$ then it's easy to answer your questions:



                1. $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

                2. We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

                Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.



                For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.






                share|cite|improve this answer














                Here is a more "algorithmic" way of describing the coupling.



                Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:



                1. If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

                2. If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

                Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.



                The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.



                If we work with $Y$ and $Y'$ then it's easy to answer your questions:



                1. $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

                2. We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

                Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.



                For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Sep 1 at 21:10

























                answered Sep 1 at 17:16









                Misha Lavrov

                38k55093




                38k55093



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2901521%2fhow-does-markov-chain-coupling-work%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    這個網誌中的熱門文章

                    How to combine Bézier curves to a surface?

                    Mutual Information Always Non-negative

                    Why am i infinitely getting the same tweet with the Twitter Search API?