Mutual Information Always Non-negative
Clash Royale CLAN TAG#URR8PPP
up vote
10
down vote
favorite
What is the simplest proof that mutual information is always non-negative? i.e., $I(X;Y)ge0$
probability-theory
add a comment |Â
up vote
10
down vote
favorite
What is the simplest proof that mutual information is always non-negative? i.e., $I(X;Y)ge0$
probability-theory
3
Convexity of the function $tmapsto tlog t$.
â Did
Jun 17 '12 at 15:33
In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
â Francisco Javier Delgado Ceped
Aug 10 at 18:44
add a comment |Â
up vote
10
down vote
favorite
up vote
10
down vote
favorite
What is the simplest proof that mutual information is always non-negative? i.e., $I(X;Y)ge0$
probability-theory
What is the simplest proof that mutual information is always non-negative? i.e., $I(X;Y)ge0$
probability-theory
edited Jun 17 '12 at 15:38
Cameron Buie
83.6k771154
83.6k771154
asked Jun 17 '12 at 15:30
Omri
359313
359313
3
Convexity of the function $tmapsto tlog t$.
â Did
Jun 17 '12 at 15:33
In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
â Francisco Javier Delgado Ceped
Aug 10 at 18:44
add a comment |Â
3
Convexity of the function $tmapsto tlog t$.
â Did
Jun 17 '12 at 15:33
In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
â Francisco Javier Delgado Ceped
Aug 10 at 18:44
3
3
Convexity of the function $tmapsto tlog t$.
â Did
Jun 17 '12 at 15:33
Convexity of the function $tmapsto tlog t$.
â Did
Jun 17 '12 at 15:33
In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
â Francisco Javier Delgado Ceped
Aug 10 at 18:44
In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
â Francisco Javier Delgado Ceped
Aug 10 at 18:44
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
12
down vote
accepted
By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D
1
What if the variables are continuous?
â becko
Jan 13 '16 at 14:06
1
@becko The same arguments hold, you just have to replace the summations by integrals.
â TenaliRaman
Jan 13 '16 at 19:07
why must be the sum of p(x,y) = 1?
â Peter
Apr 17 at 14:09
1
@Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
â Sam
Jun 12 at 23:29
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
12
down vote
accepted
By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D
1
What if the variables are continuous?
â becko
Jan 13 '16 at 14:06
1
@becko The same arguments hold, you just have to replace the summations by integrals.
â TenaliRaman
Jan 13 '16 at 19:07
why must be the sum of p(x,y) = 1?
â Peter
Apr 17 at 14:09
1
@Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
â Sam
Jun 12 at 23:29
add a comment |Â
up vote
12
down vote
accepted
By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D
1
What if the variables are continuous?
â becko
Jan 13 '16 at 14:06
1
@becko The same arguments hold, you just have to replace the summations by integrals.
â TenaliRaman
Jan 13 '16 at 19:07
why must be the sum of p(x,y) = 1?
â Peter
Apr 17 at 14:09
1
@Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
â Sam
Jun 12 at 23:29
add a comment |Â
up vote
12
down vote
accepted
up vote
12
down vote
accepted
By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D
By definition,
$$I(X;Y) = -sum_x in X sum_y in Y p(x,y) logleft(fracp(x)p(y)p(x,y)right)$$
Now, negative logarithm is convex and $sum_x in X sum_y in Y p(x,y) = 1$, therefore, by applying Jensen Inequality we will get,
$$I(X;Y) geq -logleft( sum_x in X sum_y in Y p(x,y) fracp(x)p(y)p(x,y) right) = -logleft( sum_x in X sum_y in Y p(x)p(y)right) = 0$$
Q.E.D
answered Jun 17 '12 at 16:55
TenaliRaman
3,0891222
3,0891222
1
What if the variables are continuous?
â becko
Jan 13 '16 at 14:06
1
@becko The same arguments hold, you just have to replace the summations by integrals.
â TenaliRaman
Jan 13 '16 at 19:07
why must be the sum of p(x,y) = 1?
â Peter
Apr 17 at 14:09
1
@Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
â Sam
Jun 12 at 23:29
add a comment |Â
1
What if the variables are continuous?
â becko
Jan 13 '16 at 14:06
1
@becko The same arguments hold, you just have to replace the summations by integrals.
â TenaliRaman
Jan 13 '16 at 19:07
why must be the sum of p(x,y) = 1?
â Peter
Apr 17 at 14:09
1
@Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
â Sam
Jun 12 at 23:29
1
1
What if the variables are continuous?
â becko
Jan 13 '16 at 14:06
What if the variables are continuous?
â becko
Jan 13 '16 at 14:06
1
1
@becko The same arguments hold, you just have to replace the summations by integrals.
â TenaliRaman
Jan 13 '16 at 19:07
@becko The same arguments hold, you just have to replace the summations by integrals.
â TenaliRaman
Jan 13 '16 at 19:07
why must be the sum of p(x,y) = 1?
â Peter
Apr 17 at 14:09
why must be the sum of p(x,y) = 1?
â Peter
Apr 17 at 14:09
1
1
@Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
â Sam
Jun 12 at 23:29
@Peter p(x,y) is a probability distribution, and so the sum of p(x,y) = 1
â Sam
Jun 12 at 23:29
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f159501%2fmutual-information-always-non-negative%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
3
Convexity of the function $tmapsto tlog t$.
â Did
Jun 17 '12 at 15:33
In addition, the convexity properties require the coefficients in the linear combination sum 1. Then, as p(x,y) is a probability distribution, it fullfits such condition.
â Francisco Javier Delgado Ceped
Aug 10 at 18:44