Improving on the data processing inequality for Markov chains

up vote
3
down vote

favorite

Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information

$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$

However, it seems intuitive that we should be able to do better through some combination of these quantities. For example, if $X$ is a uniform binary random variable, $Y$ flips $X$ with probability $1/4$, and $Z$ flips $Y$ with probability $1/4$, then $I(X;Z)$ should be strictly lower than either $I(X;Y)$ or $I(Y;Z)$, right? More broadly, if I have a Markov chain of successive corrupting channels

$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$

I would imagine that $I(X_1;X_k)$ is much lower than any one $I(X_j;X_j+1)$. However, I'm not aware of any such result, nor am I clear on why such a result does not exist. Can someone either provide a similar result or explain why one does not exist?

edited Aug 10 at 3:34

Rodrigo de Azevedo

12.6k41751

asked Aug 8 at 21:12

Barbot

1254

What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â€“Â Theoretical Economist
Aug 8 at 21:19

2

By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â€“Â Barbot
Aug 8 at 21:41

equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â€“Â Stelios
Aug 8 at 22:07

2

Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â€“Â Barbot
Aug 8 at 22:22

1

This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â€“Â stochasticboy321
Aug 10 at 19:35

Â |Â
show 1 more comment

up vote
3
down vote

favorite

Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information

$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$

$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$

edited Aug 10 at 3:34

Rodrigo de Azevedo

12.6k41751

asked Aug 8 at 21:12

Barbot

1254

What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â€“Â Theoretical Economist
Aug 8 at 21:19

2

By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â€“Â Barbot
Aug 8 at 21:41

equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â€“Â Stelios
Aug 8 at 22:07

2

Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â€“Â Barbot
Aug 8 at 22:22

1

This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â€“Â stochasticboy321
Aug 10 at 19:35

Â |Â
show 1 more comment

up vote
3
down vote

favorite

Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information

$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$

$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$

edited Aug 10 at 3:34

Rodrigo de Azevedo

12.6k41751

asked Aug 8 at 21:12

Barbot

1254

Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information

$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$

$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$

edited Aug 10 at 3:34

Rodrigo de Azevedo

12.6k41751

asked Aug 8 at 21:12

Barbot

1254

edited Aug 10 at 3:34

Rodrigo de Azevedo

12.6k41751

edited Aug 10 at 3:34

Rodrigo de Azevedo

12.6k41751

edited Aug 10 at 3:34

Rodrigo de Azevedo

12.6k41751

asked Aug 8 at 21:12

Barbot

1254

asked Aug 8 at 21:12

Barbot

1254

asked Aug 8 at 21:12

Barbot

1254

What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â€“Â Theoretical Economist
Aug 8 at 21:19

2

By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â€“Â Barbot
Aug 8 at 21:41

equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â€“Â Stelios
Aug 8 at 22:07

2

Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â€“Â Barbot
Aug 8 at 22:22

1

This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â€“Â stochasticboy321
Aug 10 at 19:35

Â |Â
show 1 more comment

What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â€“Â Theoretical Economist
Aug 8 at 21:19

2

By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â€“Â Barbot
Aug 8 at 21:41

equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â€“Â Stelios
Aug 8 at 22:07

2

Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â€“Â Barbot
Aug 8 at 22:22

1

This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â€“Â stochasticboy321
Aug 10 at 19:35

What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â€“Â Theoretical Economist
Aug 8 at 21:19

By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â€“Â Barbot
Aug 8 at 21:41

equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â€“Â Stelios
Aug 8 at 22:07

Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â€“Â Barbot
Aug 8 at 22:22

This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â€“Â stochasticboy321
Aug 10 at 19:35

Â |Â
show 1 more comment

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2876598%2fimproving-on-the-data-processing-inequality-for-markov-chains%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

Vtyjkyuk