Proving the information inequality using measure theory

up vote
3
down vote

favorite

The information inequality is a theorem that shows that the Kullback-Leibler divergence between two probability distributions is always non negative. This can be proved easily using the Jensen's inequality with the $-log$ function, but the proofs I have read always have to distinguish if the probability distributions are defined by continuous or discrete random variables.

I was trying to see if it is possible to not distinguish cases, and I thinked about using measure theory, with the Jensen's inequality provided in Rudin's book:

Let $mu$ be a probability measure on a $sigma$-algebra $mathcalM$ in a set $Omega$. If $f$ is a real integrable function with $a < f(x) < b$ for all $x in Omega$, and if $varphi$ is convex on $]a,b[$, then
$$varphileft( int_Omegaf dmuright) le int_Omega(varphi circ f) dmu. $$

So using this I have
$$KL(p|q) = intlogfracp(x)q(x)dp = int - log fracq(x)p(x) dp ge - log intfracq(x)p(x) dp, $$
but now the only way I find to continue is distinguishing if the distributions are discrete or continuous. I haven't studied measure theory and I only have some basic notions, so I'm not sure how to continue. Also, if my notations are wrong, please tell me. Any help will be appreciated.

asked Aug 17 at 8:43

utbutnut

211

1

Your derivation immediately generalized to the case where $q$ is absolutely continuous w.r.t. $p$, meaning that $dq(x)=f(x)dp(x)$ for some measurable function $fgeq0$.
â€“Â Sangchul Lee
Aug 17 at 9:56

add a commentÂ |Â

up vote
3
down vote

favorite

I was trying to see if it is possible to not distinguish cases, and I thinked about using measure theory, with the Jensen's inequality provided in Rudin's book:

asked Aug 17 at 8:43

utbutnut

211

1

Your derivation immediately generalized to the case where $q$ is absolutely continuous w.r.t. $p$, meaning that $dq(x)=f(x)dp(x)$ for some measurable function $fgeq0$.
â€“Â Sangchul Lee
Aug 17 at 9:56

add a commentÂ |Â

up vote
3
down vote

favorite

I was trying to see if it is possible to not distinguish cases, and I thinked about using measure theory, with the Jensen's inequality provided in Rudin's book:

asked Aug 17 at 8:43

utbutnut

211

I was trying to see if it is possible to not distinguish cases, and I thinked about using measure theory, with the Jensen's inequality provided in Rudin's book:

asked Aug 17 at 8:43

utbutnut

211

asked Aug 17 at 8:43

utbutnut

211

asked Aug 17 at 8:43

utbutnut

211

asked Aug 17 at 8:43

utbutnut

211

1

Your derivation immediately generalized to the case where $q$ is absolutely continuous w.r.t. $p$, meaning that $dq(x)=f(x)dp(x)$ for some measurable function $fgeq0$.
â€“Â Sangchul Lee
Aug 17 at 9:56

add a commentÂ |Â

1

Your derivation immediately generalized to the case where $q$ is absolutely continuous w.r.t. $p$, meaning that $dq(x)=f(x)dp(x)$ for some measurable function $fgeq0$.
â€“Â Sangchul Lee
Aug 17 at 9:56

Your derivation immediately generalized to the case where $q$ is absolutely continuous w.r.t. $p$, meaning that $dq(x)=f(x)dp(x)$ for some measurable function $fgeq0$.
â€“Â Sangchul Lee
Aug 17 at 9:56

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
0
down vote

A rigorous definition of information theory quantities can be found in M.S. Pinsker's book "Information and information stability of random variables and processes".

Consider two probability measures $mu$ and $nu$ defined on the measurable space $(Omega,mathcal M)$. Let $E_i$ be a partition of $Omega$. Then KL-divergence can be defined as
$$
D(mu|nu)=supsum_i mu(E_i)logfracmu(E_i)nu(E_i)
$$
where the supremum is taken over all partitions of $Omega$. Since this definition utilizes a sum like discrete KL-divergence, the non-negative property follows accordingly:
$$
D(mu|nu)geq 0.
$$

There is a theorem by Gelfand-Yaglom-Perez stating that if $D(mu|nu)$ is finite, then $mu$ is absolutely continuous with respect to $nu$ and
$$
D(mu|nu)=int_Omega logfracdmudnumathrmdmu,
$$
where $fracdmudnu$ is the Radon-Nikodym derivation. If you want, you can define the KL-divergence directly using the above equation.

edited Aug 17 at 12:37

answered Aug 17 at 12:22

Arash

9,20821537

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2885547%2fproving-the-information-inequality-using-measure-theory%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

A rigorous definition of information theory quantities can be found in M.S. Pinsker's book "Information and information stability of random variables and processes".

edited Aug 17 at 12:37

answered Aug 17 at 12:22

Arash

9,20821537

add a commentÂ |Â

up vote
0
down vote

A rigorous definition of information theory quantities can be found in M.S. Pinsker's book "Information and information stability of random variables and processes".

edited Aug 17 at 12:37

answered Aug 17 at 12:22

Arash

9,20821537

add a commentÂ |Â

up vote
0
down vote

A rigorous definition of information theory quantities can be found in M.S. Pinsker's book "Information and information stability of random variables and processes".

edited Aug 17 at 12:37

answered Aug 17 at 12:22

Arash

9,20821537

A rigorous definition of information theory quantities can be found in M.S. Pinsker's book "Information and information stability of random variables and processes".

edited Aug 17 at 12:37

answered Aug 17 at 12:22

Arash

9,20821537

edited Aug 17 at 12:37

answered Aug 17 at 12:22

Arash

9,20821537

answered Aug 17 at 12:22

Arash

9,20821537

answered Aug 17 at 12:22

Arash

9,20821537

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

Vtyjkyuk