How does Markov Chain coupling work?

up vote
1
down vote

favorite

I'm trying to understand the coupling argument used in proof of Theorem 5.2 in Finite Markov Chains and Algorithmic Applications by HÃƒÂ¤ggstrÃƒÂ¶m. My problem is actually more general than this particular situation, hence the generic question title.

Before asking the question, I will build up some context.

Theorem 5.2

For any aperiodic and irreducible MC $(X_0, X_1, ...)$ with state space $S = s_1, s_2, ... ,s_k $ we have that for any distribution $pi$ that is stationary and any arbitrary distribution $mu^(0)$ we have

$$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$

So to prove this, we first define two Markov Chains - $X = (X_0, X_1, ...)$ with initial ditribution $mu^(0)$ and $X' = (X'_0, X'_1, ...)$ with initial distribution $pi$. Since $pi$ is stationary, we get that $X'$ has distribution $pi$ for any $n$.

We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.

Then the problem begins. We define the third Markov Chain $X''$ such that

$X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.

Question 1:
Why is $X''$ a legitimate Markov Chain? On one hand I understand that it fulfills the property of memorylessness, but in MCs we have that distribution of the next step $mu^(n+1)$ is just $mu^nP$, where $P$ is the transition matrix. Doesn't this condition break at time $n = T$?

Question 2:
It's said that since $X''_0$ has distribution $mu^(0)$, then for any $n$, $X''_n$ has distribution $mu^(n)$. I don't see it. I understand that it works for $n le T$ but I don't get why it should hold after that time. I mean, for $n = T$ we get there with $mu^T$ and the next step is now with distribution $pi$. It just doesn't seem like $mu^(T+1)$.

In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.

But I still have one more question:

Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?

asked Sep 1 at 9:37

A.Budziak

add a commentÂ |Â

up vote
1
down vote

favorite

Before asking the question, I will build up some context.

Theorem 5.2

$$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$

We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.

Then the problem begins. We define the third Markov Chain $X''$ such that

$X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.

In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.

But I still have one more question:

Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?

asked Sep 1 at 9:37

A.Budziak

add a commentÂ |Â

up vote
1
down vote

favorite

Before asking the question, I will build up some context.

Theorem 5.2

$$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$

We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.

Then the problem begins. We define the third Markov Chain $X''$ such that

$X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.

In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.

But I still have one more question:

Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?

asked Sep 1 at 9:37

A.Budziak

Before asking the question, I will build up some context.

Theorem 5.2

$$lim_nrightarrowinftyd_TV(mu^(n), pi) = 0$$

We also need to define $T = minn: X_n = X'_n $. We also prove that $T$ is finite with probability one, but this part is fine for me, so I'll skip.

Then the problem begins. We define the third Markov Chain $X''$ such that

$X''_n = X_n$ for $n < T$ and $X''_n = X'_n$ for $n ge T$.

In the proof we then proceed and show that $mu^(n) - pi = P(T > n)$. It's fine if you trust the claims I asked about in the questions above. This and one more thing actually completes the proof.

But I still have one more question:

Why does it prove anything? We artificially crafted a Markov Chain that has a certain property and this proves something about the distribution in general? Why is it so?

probability markov-chains coupling

asked Sep 1 at 9:37

A.Budziak

asked Sep 1 at 9:37

A.Budziak

asked Sep 1 at 9:37

A.Budziak

asked Sep 1 at 9:37

A.Budziak

asked Sep 1 at 9:37

A.Budziak

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

Here is a more "algorithmic" way of describing the coupling.

Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:

If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

Then it's clear that if we just look at $Y_n$ and ignore $Y_n'$ entirely, we get a Markov chain, because at each step we follow the transition rules. Similarly, we get a Markov chain if we look at $Y_n'$ and ignore $Y_n$. But we do not get independent Markov chains.

The $X_n, X_n', X_n''$ thing that "crosses over" at time $T$ is just a non-algorithmic way of building this coupling. In the end, $(X'',X')$ have the same joint distribution as the $(Y,Y')$ defined above. In terms of $Y$ and $Y'$, we can set $T$ to be the first $n$ for which $Y_n = Y_n'$.

If we work with $Y$ and $Y'$ then it's easy to answer your questions:

$Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

Why does this prove anything? It proves something about the distributions $mu^(n)$ and $pi$ because we constructed $Y$ and $Y'$ so that $Y_n$ has distribution $mu^(n)$ while $Y_n'$ has distribution $pi$.

For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.

edited Sep 1 at 21:10

answered Sep 1 at 17:16

Misha Lavrov

38k55093

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2901521%2fhow-does-markov-chain-coupling-work%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

Here is a more "algorithmic" way of describing the coupling.

Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:

If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

If we work with $Y$ and $Y'$ then it's easy to answer your questions:

$Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.

edited Sep 1 at 21:10

answered Sep 1 at 17:16

Misha Lavrov

38k55093

add a commentÂ |Â

up vote
1
down vote

accepted

Here is a more "algorithmic" way of describing the coupling.

Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:

If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

If we work with $Y$ and $Y'$ then it's easy to answer your questions:

$Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.

edited Sep 1 at 21:10

answered Sep 1 at 17:16

Misha Lavrov

38k55093

add a commentÂ |Â

up vote
1
down vote

accepted

Here is a more "algorithmic" way of describing the coupling.

Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:

If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

If we work with $Y$ and $Y'$ then it's easy to answer your questions:

$Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.

edited Sep 1 at 21:10

answered Sep 1 at 17:16

Misha Lavrov

38k55093

Here is a more "algorithmic" way of describing the coupling.

Define two Markov chains $Y$ and $Y'$ by initializing $Y_0$ from distribution $mu^(0)$ and $Y_0'$ from distribution $pi$. Then, choose the next states by the following algorithm:

If $Y_n ne Y_n'$, then choose $Y_n+1$ and $Y_n+1'$ independently, following the transition rules of the Markov chain.

If $Y_n = Y_n'$, then choose a single value following the transition rules in the Markov chain, and set both $Y_n+1$ and $Y_n+1'$ equal to that value.

If we work with $Y$ and $Y'$ then it's easy to answer your questions:

$Y$ and $Y'$ are both legitimate Markov chains because, individually, each of them follows the transition rules.

We can prove that $Y_n$ has distribution $mu^(n) = mu^(0)P^n$ for all $n$ by induction (if we ignore the existence of $Y'$ there is literally nothing unusual going on). Similarly, $Y_n'$ has distribution $pi$ for all $n$ by induction.

For all states $i$, the probability $Pr[Y_n ne Y_n']$ is at least $|mu^(n)_i - pi_i|$. We know that $Pr[Y_n ne Y_n'] = Pr[T>n] to 0$ as $n to infty$, so $|mu^(n)_i - pi_i| to 0$ as $n to infty$.

edited Sep 1 at 21:10

answered Sep 1 at 17:16

Misha Lavrov

38k55093

edited Sep 1 at 21:10

answered Sep 1 at 17:16

Misha Lavrov

38k55093

answered Sep 1 at 17:16

Misha Lavrov

38k55093

answered Sep 1 at 17:16

Misha Lavrov

38k55093

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

Vtyjkyuk