Mutual Information Always Non-negative

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
10
down vote

favorite
6












What is the simplest proof that mutual information is always non-negative? i.e., $I(X;Y)ge0$







share|cite|improve this question


















  • 3




    Convexity of the function $tmapsto tlog t$.
    – Did
    Jun 17 '12 at 15:33










  • In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
    – Francisco Javier Delgado Ceped
    Aug 10 at 18:44














up vote
10
down vote

favorite
6












What is the simplest proof that mutual information is always non-negative? i.e., $I(X;Y)ge0$







share|cite|improve this question


















  • 3




    Convexity of the function $tmapsto tlog t$.
    – Did
    Jun 17 '12 at 15:33










  • In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
    – Francisco Javier Delgado Ceped
    Aug 10 at 18:44












up vote
10
down vote

favorite
6









up vote
10
down vote

favorite
6






6





What is the simplest proof that mutual information is always non-negative? i.e., $I(X;Y)ge0$







share|cite|improve this question














What is the simplest proof that mutual information is always non-negative? i.e., $I(X;Y)ge0$









share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jun 17 '12 at 15:38









Cameron Buie

83.6k771154




83.6k771154










asked Jun 17 '12 at 15:30









Omri

359313




359313







  • 3




    Convexity of the function $tmapsto tlog t$.
    – Did
    Jun 17 '12 at 15:33










  • In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
    – Francisco Javier Delgado Ceped
    Aug 10 at 18:44












  • 3




    Convexity of the function $tmapsto tlog t$.
    – Did
    Jun 17 '12 at 15:33










  • In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
    – Francisco Javier Delgado Ceped
    Aug 10 at 18:44







3




3




Convexity of the function $tmapsto tlog t$.
– Did
Jun 17 '12 at 15:33




Convexity of the function $tmapsto tlog t$.
– Did
Jun 17 '12 at 15:33












In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
– Francisco Javier Delgado Ceped
Aug 10 at 18:44




In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
– Francisco Javier Delgado Ceped
Aug 10 at 18:44










1 Answer
1






active

oldest

votes

















up vote
12
down vote



accepted










By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D






share|cite|improve this answer
















  • 1




    What if the variables are continuous?
    – becko
    Jan 13 '16 at 14:06






  • 1




    @becko The same arguments hold, you just have to replace the summations by integrals.
    – TenaliRaman
    Jan 13 '16 at 19:07











  • why must be the sum of p(x,y) = 1?
    – Peter
    Apr 17 at 14:09






  • 1




    @Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
    – Sam
    Jun 12 at 23:29










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f159501%2fmutual-information-always-non-negative%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
12
down vote



accepted










By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D






share|cite|improve this answer
















  • 1




    What if the variables are continuous?
    – becko
    Jan 13 '16 at 14:06






  • 1




    @becko The same arguments hold, you just have to replace the summations by integrals.
    – TenaliRaman
    Jan 13 '16 at 19:07











  • why must be the sum of p(x,y) = 1?
    – Peter
    Apr 17 at 14:09






  • 1




    @Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
    – Sam
    Jun 12 at 23:29














up vote
12
down vote



accepted










By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D






share|cite|improve this answer
















  • 1




    What if the variables are continuous?
    – becko
    Jan 13 '16 at 14:06






  • 1




    @becko The same arguments hold, you just have to replace the summations by integrals.
    – TenaliRaman
    Jan 13 '16 at 19:07











  • why must be the sum of p(x,y) = 1?
    – Peter
    Apr 17 at 14:09






  • 1




    @Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
    – Sam
    Jun 12 at 23:29












up vote
12
down vote



accepted







up vote
12
down vote



accepted






By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D






share|cite|improve this answer












By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Jun 17 '12 at 16:55









TenaliRaman

3,0891222




3,0891222







  • 1




    What if the variables are continuous?
    – becko
    Jan 13 '16 at 14:06






  • 1




    @becko The same arguments hold, you just have to replace the summations by integrals.
    – TenaliRaman
    Jan 13 '16 at 19:07











  • why must be the sum of p(x,y) = 1?
    – Peter
    Apr 17 at 14:09






  • 1




    @Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
    – Sam
    Jun 12 at 23:29












  • 1




    What if the variables are continuous?
    – becko
    Jan 13 '16 at 14:06






  • 1




    @becko The same arguments hold, you just have to replace the summations by integrals.
    – TenaliRaman
    Jan 13 '16 at 19:07











  • why must be the sum of p(x,y) = 1?
    – Peter
    Apr 17 at 14:09






  • 1




    @Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
    – Sam
    Jun 12 at 23:29







1




1




What if the variables are continuous?
– becko
Jan 13 '16 at 14:06




What if the variables are continuous?
– becko
Jan 13 '16 at 14:06




1




1




@becko The same arguments hold, you just have to replace the summations by integrals.
– TenaliRaman
Jan 13 '16 at 19:07





@becko The same arguments hold, you just have to replace the summations by integrals.
– TenaliRaman
Jan 13 '16 at 19:07













why must be the sum of p(x,y) = 1?
– Peter
Apr 17 at 14:09




why must be the sum of p(x,y) = 1?
– Peter
Apr 17 at 14:09




1




1




@Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
– Sam
Jun 12 at 23:29




@Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
– Sam
Jun 12 at 23:29












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f159501%2fmutual-information-always-non-negative%23new-answer', 'question_page');

);

Post as a guest













































































這個網誌中的熱門文章

How to combine Bézier curves to a surface?

Why am i infinitely getting the same tweet with the Twitter Search API?