Improving on the data processing inequality for Markov chains

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite
1












Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information



$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$



However, it seems intuitive that we should be able to do better through some combination of these quantities. For example, if $X$ is a uniform binary random variable, $Y$ flips $X$ with probability $1/4$, and $Z$ flips $Y$ with probability $1/4$, then $I(X;Z)$ should be strictly lower than either $I(X;Y)$ or $I(Y;Z)$, right? More broadly, if I have a Markov chain of successive corrupting channels



$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$



I would imagine that $I(X_1;X_k)$ is much lower than any one $I(X_j;X_j+1)$. However, I'm not aware of any such result, nor am I clear on why such a result does not exist. Can someone either provide a similar result or explain why one does not exist?







share|cite|improve this question






















  • What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
    – Theoretical Economist
    Aug 8 at 21:19






  • 2




    By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
    – Barbot
    Aug 8 at 21:41










  • equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
    – Stelios
    Aug 8 at 22:07






  • 2




    Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
    – Barbot
    Aug 8 at 22:22







  • 1




    This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
    – stochasticboy321
    Aug 10 at 19:35














up vote
3
down vote

favorite
1












Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information



$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$



However, it seems intuitive that we should be able to do better through some combination of these quantities. For example, if $X$ is a uniform binary random variable, $Y$ flips $X$ with probability $1/4$, and $Z$ flips $Y$ with probability $1/4$, then $I(X;Z)$ should be strictly lower than either $I(X;Y)$ or $I(Y;Z)$, right? More broadly, if I have a Markov chain of successive corrupting channels



$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$



I would imagine that $I(X_1;X_k)$ is much lower than any one $I(X_j;X_j+1)$. However, I'm not aware of any such result, nor am I clear on why such a result does not exist. Can someone either provide a similar result or explain why one does not exist?







share|cite|improve this question






















  • What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
    – Theoretical Economist
    Aug 8 at 21:19






  • 2




    By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
    – Barbot
    Aug 8 at 21:41










  • equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
    – Stelios
    Aug 8 at 22:07






  • 2




    Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
    – Barbot
    Aug 8 at 22:22







  • 1




    This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
    – stochasticboy321
    Aug 10 at 19:35












up vote
3
down vote

favorite
1









up vote
3
down vote

favorite
1






1





Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information



$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$



However, it seems intuitive that we should be able to do better through some combination of these quantities. For example, if $X$ is a uniform binary random variable, $Y$ flips $X$ with probability $1/4$, and $Z$ flips $Y$ with probability $1/4$, then $I(X;Z)$ should be strictly lower than either $I(X;Y)$ or $I(Y;Z)$, right? More broadly, if I have a Markov chain of successive corrupting channels



$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$



I would imagine that $I(X_1;X_k)$ is much lower than any one $I(X_j;X_j+1)$. However, I'm not aware of any such result, nor am I clear on why such a result does not exist. Can someone either provide a similar result or explain why one does not exist?







share|cite|improve this question














Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information



$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$



However, it seems intuitive that we should be able to do better through some combination of these quantities. For example, if $X$ is a uniform binary random variable, $Y$ flips $X$ with probability $1/4$, and $Z$ flips $Y$ with probability $1/4$, then $I(X;Z)$ should be strictly lower than either $I(X;Y)$ or $I(Y;Z)$, right? More broadly, if I have a Markov chain of successive corrupting channels



$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$



I would imagine that $I(X_1;X_k)$ is much lower than any one $I(X_j;X_j+1)$. However, I'm not aware of any such result, nor am I clear on why such a result does not exist. Can someone either provide a similar result or explain why one does not exist?









share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Aug 10 at 3:34









Rodrigo de Azevedo

12.6k41751




12.6k41751










asked Aug 8 at 21:12









Barbot

1254




1254











  • What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
    – Theoretical Economist
    Aug 8 at 21:19






  • 2




    By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
    – Barbot
    Aug 8 at 21:41










  • equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
    – Stelios
    Aug 8 at 22:07






  • 2




    Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
    – Barbot
    Aug 8 at 22:22







  • 1




    This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
    – stochasticboy321
    Aug 10 at 19:35
















  • What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
    – Theoretical Economist
    Aug 8 at 21:19






  • 2




    By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
    – Barbot
    Aug 8 at 21:41










  • equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
    – Stelios
    Aug 8 at 22:07






  • 2




    Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
    – Barbot
    Aug 8 at 22:22







  • 1




    This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
    – stochasticboy321
    Aug 10 at 19:35















What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
– Theoretical Economist
Aug 8 at 21:19




What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
– Theoretical Economist
Aug 8 at 21:19




2




2




By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
– Barbot
Aug 8 at 21:41




By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
– Barbot
Aug 8 at 21:41












equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
– Stelios
Aug 8 at 22:07




equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
– Stelios
Aug 8 at 22:07




2




2




Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
– Barbot
Aug 8 at 22:22





Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
– Barbot
Aug 8 at 22:22





1




1




This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
– stochasticboy321
Aug 10 at 19:35




This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
– stochasticboy321
Aug 10 at 19:35















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2876598%2fimproving-on-the-data-processing-inequality-for-markov-chains%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2876598%2fimproving-on-the-data-processing-inequality-for-markov-chains%23new-answer', 'question_page');

);

Post as a guest













































































這個網誌中的熱門文章

How to combine Bézier curves to a surface?

Mutual Information Always Non-negative

Why am i infinitely getting the same tweet with the Twitter Search API?