Bayes theorem confusion with likelihood [duplicate]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite
1













This question already has an answer here:



  • What is the difference between “likelihood” and “probability”?

    9 answers



I learned that Bayes theorem was defined as follows :



$$p(thetamid y)=fracp(ymidtheta)p(theta)p(y)$$



But then today I came across definition with likelihood:



$$p(thetamid y)=fracL(thetamid y)p(theta)p(y) = fracL(thetamid y) p(theta)int L(thetamid y) p(theta) dtheta $$



What is the link between the two?







share|cite|improve this question














marked as duplicate by Xi'an bayesian
Users with the  bayesian badge can single-handedly close bayesian questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Aug 27 at 19:55


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.










  • 5




    The likelihood is defined as $L(theta|y)=p(y|theta)$.
    – Robin Ryder
    Aug 26 at 21:18










  • stats.stackexchange.com/questions/2641/…
    – Alex
    Aug 27 at 4:58
















up vote
3
down vote

favorite
1













This question already has an answer here:



  • What is the difference between “likelihood” and “probability”?

    9 answers



I learned that Bayes theorem was defined as follows :



$$p(thetamid y)=fracp(ymidtheta)p(theta)p(y)$$



But then today I came across definition with likelihood:



$$p(thetamid y)=fracL(thetamid y)p(theta)p(y) = fracL(thetamid y) p(theta)int L(thetamid y) p(theta) dtheta $$



What is the link between the two?







share|cite|improve this question














marked as duplicate by Xi'an bayesian
Users with the  bayesian badge can single-handedly close bayesian questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Aug 27 at 19:55


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.










  • 5




    The likelihood is defined as $L(theta|y)=p(y|theta)$.
    – Robin Ryder
    Aug 26 at 21:18










  • stats.stackexchange.com/questions/2641/…
    – Alex
    Aug 27 at 4:58












up vote
3
down vote

favorite
1









up vote
3
down vote

favorite
1






1






This question already has an answer here:



  • What is the difference between “likelihood” and “probability”?

    9 answers



I learned that Bayes theorem was defined as follows :



$$p(thetamid y)=fracp(ymidtheta)p(theta)p(y)$$



But then today I came across definition with likelihood:



$$p(thetamid y)=fracL(thetamid y)p(theta)p(y) = fracL(thetamid y) p(theta)int L(thetamid y) p(theta) dtheta $$



What is the link between the two?







share|cite|improve this question















This question already has an answer here:



  • What is the difference between “likelihood” and “probability”?

    9 answers



I learned that Bayes theorem was defined as follows :



$$p(thetamid y)=fracp(ymidtheta)p(theta)p(y)$$



But then today I came across definition with likelihood:



$$p(thetamid y)=fracL(thetamid y)p(theta)p(y) = fracL(thetamid y) p(theta)int L(thetamid y) p(theta) dtheta $$



What is the link between the two?





This question already has an answer here:



  • What is the difference between “likelihood” and “probability”?

    9 answers









share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Aug 27 at 7:49









The Laconic

777414




777414










asked Aug 26 at 20:30









user1607

1199




1199




marked as duplicate by Xi'an bayesian
Users with the  bayesian badge can single-handedly close bayesian questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Aug 27 at 19:55


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by Xi'an bayesian
Users with the  bayesian badge can single-handedly close bayesian questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Aug 27 at 19:55


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









  • 5




    The likelihood is defined as $L(theta|y)=p(y|theta)$.
    – Robin Ryder
    Aug 26 at 21:18










  • stats.stackexchange.com/questions/2641/…
    – Alex
    Aug 27 at 4:58












  • 5




    The likelihood is defined as $L(theta|y)=p(y|theta)$.
    – Robin Ryder
    Aug 26 at 21:18










  • stats.stackexchange.com/questions/2641/…
    – Alex
    Aug 27 at 4:58







5




5




The likelihood is defined as $L(theta|y)=p(y|theta)$.
– Robin Ryder
Aug 26 at 21:18




The likelihood is defined as $L(theta|y)=p(y|theta)$.
– Robin Ryder
Aug 26 at 21:18












stats.stackexchange.com/questions/2641/…
– Alex
Aug 27 at 4:58




stats.stackexchange.com/questions/2641/…
– Alex
Aug 27 at 4:58










2 Answers
2






active

oldest

votes

















up vote
2
down vote













$L(theta|y) = p(y|theta)$. I assume that $y$ is the observation here, and we are inferring the value of the parameter $theta$, Thus, $p(y|theta)$ can be viewed as a function $L$ over the (unknown) variables/parameters $theta$.



For the denominator, $p(y) = int p(y,theta)dtheta = int p(y|theta)p(theta)dtheta = int L(theta|y)p(theta)dtheta$.






share|cite|improve this answer



























    up vote
    1
    down vote













    The second formula is wrong: the outside parts are equal to each other, but the middle part is merely proportional to (and not necessarily equal to) the outside parts. The likelihood is defined by $L(theta mid y) = k(y) p(y mid theta) propto p(y mid theta)$ where $k$ is some constant-of-proportionality that does not depend on $theta$. This means you have:



    $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)p(y) propto fracL(theta mid y) p(theta)p(y).$$



    Using the law of total probability you also have $p(y) = int p(y mid theta) p(theta) dtheta$ which gives:



    $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)k(y) int L(theta mid y) p(theta) dtheta = fracL(theta mid y) p(theta)int L(theta mid y) p(theta) dtheta.$$



    In the special case where $k(y) = 1$ you have $L(theta mid y) = p(y mid theta)$ and so in this case you get the second equation you specified. However, it is common when using likelihood functions to use a constant-of-proportionality that effectively removes multiplicative terms that do not depend on $theta$.






    share|cite|improve this answer






















    • It is wrong if they define it as unnormalized, but I've seen it defined as $L(theta | y) = p(y | theta)$ in some texts...
      – Tim♦
      Aug 27 at 9:18


















    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote













    $L(theta|y) = p(y|theta)$. I assume that $y$ is the observation here, and we are inferring the value of the parameter $theta$, Thus, $p(y|theta)$ can be viewed as a function $L$ over the (unknown) variables/parameters $theta$.



    For the denominator, $p(y) = int p(y,theta)dtheta = int p(y|theta)p(theta)dtheta = int L(theta|y)p(theta)dtheta$.






    share|cite|improve this answer
























      up vote
      2
      down vote













      $L(theta|y) = p(y|theta)$. I assume that $y$ is the observation here, and we are inferring the value of the parameter $theta$, Thus, $p(y|theta)$ can be viewed as a function $L$ over the (unknown) variables/parameters $theta$.



      For the denominator, $p(y) = int p(y,theta)dtheta = int p(y|theta)p(theta)dtheta = int L(theta|y)p(theta)dtheta$.






      share|cite|improve this answer






















        up vote
        2
        down vote










        up vote
        2
        down vote









        $L(theta|y) = p(y|theta)$. I assume that $y$ is the observation here, and we are inferring the value of the parameter $theta$, Thus, $p(y|theta)$ can be viewed as a function $L$ over the (unknown) variables/parameters $theta$.



        For the denominator, $p(y) = int p(y,theta)dtheta = int p(y|theta)p(theta)dtheta = int L(theta|y)p(theta)dtheta$.






        share|cite|improve this answer












        $L(theta|y) = p(y|theta)$. I assume that $y$ is the observation here, and we are inferring the value of the parameter $theta$, Thus, $p(y|theta)$ can be viewed as a function $L$ over the (unknown) variables/parameters $theta$.



        For the denominator, $p(y) = int p(y,theta)dtheta = int p(y|theta)p(theta)dtheta = int L(theta|y)p(theta)dtheta$.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Aug 26 at 21:41









        Yi Yang

        1179




        1179






















            up vote
            1
            down vote













            The second formula is wrong: the outside parts are equal to each other, but the middle part is merely proportional to (and not necessarily equal to) the outside parts. The likelihood is defined by $L(theta mid y) = k(y) p(y mid theta) propto p(y mid theta)$ where $k$ is some constant-of-proportionality that does not depend on $theta$. This means you have:



            $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)p(y) propto fracL(theta mid y) p(theta)p(y).$$



            Using the law of total probability you also have $p(y) = int p(y mid theta) p(theta) dtheta$ which gives:



            $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)k(y) int L(theta mid y) p(theta) dtheta = fracL(theta mid y) p(theta)int L(theta mid y) p(theta) dtheta.$$



            In the special case where $k(y) = 1$ you have $L(theta mid y) = p(y mid theta)$ and so in this case you get the second equation you specified. However, it is common when using likelihood functions to use a constant-of-proportionality that effectively removes multiplicative terms that do not depend on $theta$.






            share|cite|improve this answer






















            • It is wrong if they define it as unnormalized, but I've seen it defined as $L(theta | y) = p(y | theta)$ in some texts...
              – Tim♦
              Aug 27 at 9:18















            up vote
            1
            down vote













            The second formula is wrong: the outside parts are equal to each other, but the middle part is merely proportional to (and not necessarily equal to) the outside parts. The likelihood is defined by $L(theta mid y) = k(y) p(y mid theta) propto p(y mid theta)$ where $k$ is some constant-of-proportionality that does not depend on $theta$. This means you have:



            $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)p(y) propto fracL(theta mid y) p(theta)p(y).$$



            Using the law of total probability you also have $p(y) = int p(y mid theta) p(theta) dtheta$ which gives:



            $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)k(y) int L(theta mid y) p(theta) dtheta = fracL(theta mid y) p(theta)int L(theta mid y) p(theta) dtheta.$$



            In the special case where $k(y) = 1$ you have $L(theta mid y) = p(y mid theta)$ and so in this case you get the second equation you specified. However, it is common when using likelihood functions to use a constant-of-proportionality that effectively removes multiplicative terms that do not depend on $theta$.






            share|cite|improve this answer






















            • It is wrong if they define it as unnormalized, but I've seen it defined as $L(theta | y) = p(y | theta)$ in some texts...
              – Tim♦
              Aug 27 at 9:18













            up vote
            1
            down vote










            up vote
            1
            down vote









            The second formula is wrong: the outside parts are equal to each other, but the middle part is merely proportional to (and not necessarily equal to) the outside parts. The likelihood is defined by $L(theta mid y) = k(y) p(y mid theta) propto p(y mid theta)$ where $k$ is some constant-of-proportionality that does not depend on $theta$. This means you have:



            $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)p(y) propto fracL(theta mid y) p(theta)p(y).$$



            Using the law of total probability you also have $p(y) = int p(y mid theta) p(theta) dtheta$ which gives:



            $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)k(y) int L(theta mid y) p(theta) dtheta = fracL(theta mid y) p(theta)int L(theta mid y) p(theta) dtheta.$$



            In the special case where $k(y) = 1$ you have $L(theta mid y) = p(y mid theta)$ and so in this case you get the second equation you specified. However, it is common when using likelihood functions to use a constant-of-proportionality that effectively removes multiplicative terms that do not depend on $theta$.






            share|cite|improve this answer














            The second formula is wrong: the outside parts are equal to each other, but the middle part is merely proportional to (and not necessarily equal to) the outside parts. The likelihood is defined by $L(theta mid y) = k(y) p(y mid theta) propto p(y mid theta)$ where $k$ is some constant-of-proportionality that does not depend on $theta$. This means you have:



            $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)p(y) propto fracL(theta mid y) p(theta)p(y).$$



            Using the law of total probability you also have $p(y) = int p(y mid theta) p(theta) dtheta$ which gives:



            $$p(theta mid y) = fracp(y mid theta) p(theta)p(y) = frack(y) L(theta mid y) p(theta)k(y) int L(theta mid y) p(theta) dtheta = fracL(theta mid y) p(theta)int L(theta mid y) p(theta) dtheta.$$



            In the special case where $k(y) = 1$ you have $L(theta mid y) = p(y mid theta)$ and so in this case you get the second equation you specified. However, it is common when using likelihood functions to use a constant-of-proportionality that effectively removes multiplicative terms that do not depend on $theta$.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited Aug 27 at 0:50

























            answered Aug 26 at 22:41









            Ben

            14.4k12176




            14.4k12176











            • It is wrong if they define it as unnormalized, but I've seen it defined as $L(theta | y) = p(y | theta)$ in some texts...
              – Tim♦
              Aug 27 at 9:18

















            • It is wrong if they define it as unnormalized, but I've seen it defined as $L(theta | y) = p(y | theta)$ in some texts...
              – Tim♦
              Aug 27 at 9:18
















            It is wrong if they define it as unnormalized, but I've seen it defined as $L(theta | y) = p(y | theta)$ in some texts...
            – Tim♦
            Aug 27 at 9:18





            It is wrong if they define it as unnormalized, but I've seen it defined as $L(theta | y) = p(y | theta)$ in some texts...
            – Tim♦
            Aug 27 at 9:18



            這個網誌中的熱門文章

            How to combine Bézier curves to a surface?

            Mutual Information Always Non-negative

            Why am i infinitely getting the same tweet with the Twitter Search API?