How does Markov Chain coupling work?
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I'm trying to understand the coupling argument used in proof of Theorem 5.2 in Finite Markov Chains and Algorithmic Applications by Häggström. My problem is actually more general than this particular situation, hence the generic question title.
Before asking the question, I will build up some context.
Theorem 5.2
For any aperiodic and irreducible MC $(X_0, X_1, ...)$ with state space $S = s_1, s_2, ... ,s_k $ we have that for any distribution $pi$ that is stationary and any arbitrary distribution $mu^(0)$ we have
$$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$
So to prove this, we first define two Markov Chains - $X = (X_0, X_1, ...)$ with initial ditribution $mu^(0)$ and $X' = (X'_0, X'_1, ...)$ with initial distribution $pi$. Since $pi$ is stationary, we get that $X'$ has distribution $pi$ for any $n$.
We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.
Then the problem begins. We define the third Markov Chain $X''$ such that
$X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.
Question 1:
Why is $X''$ a legitimate Markov Chain? On one hand I understand that it fulfills the property of memorylessness, but in MCs we have that distribution of the next step $mu^(n+1)$ is just $mu^nP$, where $P$ is the transition matrix. Doesn't this condition break at time $n = T$?
Question 2:
It's said that since $X''_0$ has distribution $mu^(0)$, then for any $n$, $X''_n$ has distribution $mu^(n)$. I don't see it. I understand that it works for $n le T$ but I don't get why it should hold after that time. I mean, for $n = T$ we get there with $mu^T$ and the next step is now with distribution $pi$. It just doesn't seem like $mu^(T+1)$.
In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.
But I still have one more question:
Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?
probability markov-chains coupling
add a comment |Â
up vote
1
down vote
favorite
I'm trying to understand the coupling argument used in proof of Theorem 5.2 in Finite Markov Chains and Algorithmic Applications by Häggström. My problem is actually more general than this particular situation, hence the generic question title.
Before asking the question, I will build up some context.
Theorem 5.2
For any aperiodic and irreducible MC $(X_0, X_1, ...)$ with state space $S = s_1, s_2, ... ,s_k $ we have that for any distribution $pi$ that is stationary and any arbitrary distribution $mu^(0)$ we have
$$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$
So to prove this, we first define two Markov Chains - $X = (X_0, X_1, ...)$ with initial ditribution $mu^(0)$ and $X' = (X'_0, X'_1, ...)$ with initial distribution $pi$. Since $pi$ is stationary, we get that $X'$ has distribution $pi$ for any $n$.
We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.
Then the problem begins. We define the third Markov Chain $X''$ such that
$X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.
Question 1:
Why is $X''$ a legitimate Markov Chain? On one hand I understand that it fulfills the property of memorylessness, but in MCs we have that distribution of the next step $mu^(n+1)$ is just $mu^nP$, where $P$ is the transition matrix. Doesn't this condition break at time $n = T$?
Question 2:
It's said that since $X''_0$ has distribution $mu^(0)$, then for any $n$, $X''_n$ has distribution $mu^(n)$. I don't see it. I understand that it works for $n le T$ but I don't get why it should hold after that time. I mean, for $n = T$ we get there with $mu^T$ and the next step is now with distribution $pi$. It just doesn't seem like $mu^(T+1)$.
In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.
But I still have one more question:
Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?
probability markov-chains coupling
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm trying to understand the coupling argument used in proof of Theorem 5.2 in Finite Markov Chains and Algorithmic Applications by Häggström. My problem is actually more general than this particular situation, hence the generic question title.
Before asking the question, I will build up some context.
Theorem 5.2
For any aperiodic and irreducible MC $(X_0, X_1, ...)$ with state space $S = s_1, s_2, ... ,s_k $ we have that for any distribution $pi$ that is stationary and any arbitrary distribution $mu^(0)$ we have
$$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$
So to prove this, we first define two Markov Chains - $X = (X_0, X_1, ...)$ with initial ditribution $mu^(0)$ and $X' = (X'_0, X'_1, ...)$ with initial distribution $pi$. Since $pi$ is stationary, we get that $X'$ has distribution $pi$ for any $n$.
We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.
Then the problem begins. We define the third Markov Chain $X''$ such that
$X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.
Question 1:
Why is $X''$ a legitimate Markov Chain? On one hand I understand that it fulfills the property of memorylessness, but in MCs we have that distribution of the next step $mu^(n+1)$ is just $mu^nP$, where $P$ is the transition matrix. Doesn't this condition break at time $n = T$?
Question 2:
It's said that since $X''_0$ has distribution $mu^(0)$, then for any $n$, $X''_n$ has distribution $mu^(n)$. I don't see it. I understand that it works for $n le T$ but I don't get why it should hold after that time. I mean, for $n = T$ we get there with $mu^T$ and the next step is now with distribution $pi$. It just doesn't seem like $mu^(T+1)$.
In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.
But I still have one more question:
Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?
probability markov-chains coupling
I'm trying to understand the coupling argument used in proof of Theorem 5.2 in Finite Markov Chains and Algorithmic Applications by Häggström. My problem is actually more general than this particular situation, hence the generic question title.
Before asking the question, I will build up some context.
Theorem 5.2
For any aperiodic and irreducible MC $(X_0, X_1, ...)$ with state space $S = s_1, s_2, ... ,s_k $ we have that for any distribution $pi$ that is stationary and any arbitrary distribution $mu^(0)$ we have
$$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$
So to prove this, we first define two Markov Chains - $X = (X_0, X_1, ...)$ with initial ditribution $mu^(0)$ and $X' = (X'_0, X'_1, ...)$ with initial distribution $pi$. Since $pi$ is stationary, we get that $X'$ has distribution $pi$ for any $n$.
We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.
Then the problem begins. We define the third Markov Chain $X''$ such that
$X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.
Question 1:
Why is $X''$ a legitimate Markov Chain? On one hand I understand that it fulfills the property of memorylessness, but in MCs we have that distribution of the next step $mu^(n+1)$ is just $mu^nP$, where $P$ is the transition matrix. Doesn't this condition break at time $n = T$?
Question 2:
It's said that since $X''_0$ has distribution $mu^(0)$, then for any $n$, $X''_n$ has distribution $mu^(n)$. I don't see it. I understand that it works for $n le T$ but I don't get why it should hold after that time. I mean, for $n = T$ we get there with $mu^T$ and the next step is now with distribution $pi$. It just doesn't seem like $mu^(T+1)$.
In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.
But I still have one more question:
Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?
probability markov-chains coupling
probability markov-chains coupling
asked Sep 1 at 9:37
A.Budziak
82
82
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
Here is a more "algorithmic" way of describing the coupling.
Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:
- If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.
- If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.
Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.
The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.
If we work with $Y$ and $Y'$ then it's easy to answer your questions:
- $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.
- We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.
Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.
For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Here is a more "algorithmic" way of describing the coupling.
Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:
- If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.
- If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.
Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.
The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.
If we work with $Y$ and $Y'$ then it's easy to answer your questions:
- $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.
- We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.
Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.
For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.
add a comment |Â
up vote
1
down vote
accepted
Here is a more "algorithmic" way of describing the coupling.
Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:
- If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.
- If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.
Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.
The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.
If we work with $Y$ and $Y'$ then it's easy to answer your questions:
- $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.
- We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.
Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.
For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Here is a more "algorithmic" way of describing the coupling.
Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:
- If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.
- If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.
Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.
The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.
If we work with $Y$ and $Y'$ then it's easy to answer your questions:
- $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.
- We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.
Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.
For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.
Here is a more "algorithmic" way of describing the coupling.
Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:
- If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.
- If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.
Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.
The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.
If we work with $Y$ and $Y'$ then it's easy to answer your questions:
- $Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.
- We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.
Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.
For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.
edited Sep 1 at 21:10
answered Sep 1 at 17:16
Misha Lavrov
38k55093
38k55093
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2901521%2fhow-does-markov-chain-coupling-work%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password