Improving on the data processing inequality for Markov chains
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information
$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$
However, it seems intuitive that we should be able to do better through some combination of these quantities. For example, if $X$ is a uniform binary random variable, $Y$ flips $X$ with probability $1/4$, and $Z$ flips $Y$ with probability $1/4$, then $I(X;Z)$ should be strictly lower than either $I(X;Y)$ or $I(Y;Z)$, right? More broadly, if I have a Markov chain of successive corrupting channels
$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$
I would imagine that $I(X_1;X_k)$ is much lower than any one $I(X_j;X_j+1)$. However, I'm not aware of any such result, nor am I clear on why such a result does not exist. Can someone either provide a similar result or explain why one does not exist?
markov-chains information-theory entropy
 |Â
show 1 more comment
up vote
3
down vote
favorite
Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information
$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$
However, it seems intuitive that we should be able to do better through some combination of these quantities. For example, if $X$ is a uniform binary random variable, $Y$ flips $X$ with probability $1/4$, and $Z$ flips $Y$ with probability $1/4$, then $I(X;Z)$ should be strictly lower than either $I(X;Y)$ or $I(Y;Z)$, right? More broadly, if I have a Markov chain of successive corrupting channels
$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$
I would imagine that $I(X_1;X_k)$ is much lower than any one $I(X_j;X_j+1)$. However, I'm not aware of any such result, nor am I clear on why such a result does not exist. Can someone either provide a similar result or explain why one does not exist?
markov-chains information-theory entropy
What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â Theoretical Economist
Aug 8 at 21:19
2
By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â Barbot
Aug 8 at 21:41
equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â Stelios
Aug 8 at 22:07
2
Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â Barbot
Aug 8 at 22:22
1
This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â stochasticboy321
Aug 10 at 19:35
 |Â
show 1 more comment
up vote
3
down vote
favorite
up vote
3
down vote
favorite
Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information
$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$
However, it seems intuitive that we should be able to do better through some combination of these quantities. For example, if $X$ is a uniform binary random variable, $Y$ flips $X$ with probability $1/4$, and $Z$ flips $Y$ with probability $1/4$, then $I(X;Z)$ should be strictly lower than either $I(X;Y)$ or $I(Y;Z)$, right? More broadly, if I have a Markov chain of successive corrupting channels
$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$
I would imagine that $I(X_1;X_k)$ is much lower than any one $I(X_j;X_j+1)$. However, I'm not aware of any such result, nor am I clear on why such a result does not exist. Can someone either provide a similar result or explain why one does not exist?
markov-chains information-theory entropy
Given a Markov chain $X rightarrow Y rightarrow Z$, the data processing inequality bounds the mutual information
$$I(X;Z) leq min big( I(X;Y),I(Y;Z) big)$$
However, it seems intuitive that we should be able to do better through some combination of these quantities. For example, if $X$ is a uniform binary random variable, $Y$ flips $X$ with probability $1/4$, and $Z$ flips $Y$ with probability $1/4$, then $I(X;Z)$ should be strictly lower than either $I(X;Y)$ or $I(Y;Z)$, right? More broadly, if I have a Markov chain of successive corrupting channels
$$X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$$
I would imagine that $I(X_1;X_k)$ is much lower than any one $I(X_j;X_j+1)$. However, I'm not aware of any such result, nor am I clear on why such a result does not exist. Can someone either provide a similar result or explain why one does not exist?
markov-chains information-theory entropy
edited Aug 10 at 3:34
Rodrigo de Azevedo
12.6k41751
12.6k41751
asked Aug 8 at 21:12
Barbot
1254
1254
What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â Theoretical Economist
Aug 8 at 21:19
2
By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â Barbot
Aug 8 at 21:41
equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â Stelios
Aug 8 at 22:07
2
Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â Barbot
Aug 8 at 22:22
1
This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â stochasticboy321
Aug 10 at 19:35
 |Â
show 1 more comment
What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â Theoretical Economist
Aug 8 at 21:19
2
By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â Barbot
Aug 8 at 21:41
equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â Stelios
Aug 8 at 22:07
2
Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â Barbot
Aug 8 at 22:22
1
This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â stochasticboy321
Aug 10 at 19:35
What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â Theoretical Economist
Aug 8 at 21:19
What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â Theoretical Economist
Aug 8 at 21:19
2
2
By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â Barbot
Aug 8 at 21:41
By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â Barbot
Aug 8 at 21:41
equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â Stelios
Aug 8 at 22:07
equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â Stelios
Aug 8 at 22:07
2
2
Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â Barbot
Aug 8 at 22:22
Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â Barbot
Aug 8 at 22:22
1
1
This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â stochasticboy321
Aug 10 at 19:35
This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â stochasticboy321
Aug 10 at 19:35
 |Â
show 1 more comment
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2876598%2fimproving-on-the-data-processing-inequality-for-markov-chains%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
What do you mean by doing "better"? What does it mean for $Y$ to "flip" $X$? Are the random variables you've specified a Markov chain?
â Theoretical Economist
Aug 8 at 21:19
2
By doing "better" I mean that, in the example I sketched, it seems like $I(X;Z) < min(I(X;Y),I(Y;Z))$, so the inequality $I(X;Z) leq min(I(X;Y),I(Y;Z))$ is not tight. By "$Y$ flip[ping] $X$" I mean that $Y$ is is a function $f(X)$ that with probability $3/4$ outputs $X$ and with probability $1/4$ outputs its negation (so 0 becomes 1 and vice versa). It is a Markov chain as $X$ and $Z$ are conditionally independent given $Y$, no?
â Barbot
Aug 8 at 21:41
equality in the data processing inequality may hold but only when a certain condition holds (see here). When this is not the case, the inequality is strict.
â Stelios
Aug 8 at 22:07
2
Right, so I'm wondering if it's possible to show a stronger inequality in that case. More broadly, for a Markov chain $X_1 rightarrow X_2 rightarrow cdots rightarrow X_k$ the best upper bound I know of for $I(X_1;X_k)$ is $I(X_1;X_k) leq min(I(X_1;X_2), I(X_2;X_3), ldots, I(X_k-1;X_k))$. But it seems like something more akin to $I(X_1;X_k) leq prod_i=1^k-1 I(X_i;X_i+1)$ should hold -- not literally this relationship, but something that takes advantage of this kind of "composition".
â Barbot
Aug 8 at 22:22
1
This is an active area of research, and is, AFAIK, far from settled in general. Google for the 'strong data processing inequality'. For the particular question you asked with the long Markov chain, Polyansky and Wu have a recent paper that might be of interest.
â stochasticboy321
Aug 10 at 19:35