How to compare two experiments from the Bayesian perspective?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












Imagine that I have two different coin flipping experiments:



  • Experiment 1: Uses one coin with M1 heads and (N1-M1) tails.

  • Experiment 2: It is composed of two sub-experiments each using a different coin. Experiment 2.1: M21 heads and (N21-M21) tails. Experiment 2.2: M22 heads and (N22-M22) tails.

Note that I can not aggregate M21+M21 because the sub-experiments use different coins and potentially each one could be differently biased.



Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension. So, it can not be directly compared with experiment 1.



How can I know which experiment better supports/rejects the hypothesis, H0, that probability of head for all coin/s is 0.5? What is the justification for the proposed solution?



Update: After jorki comments, I realized that it would be helpful to write down my understanding of the first experiment more formally.



In the first case (multiple throws using the same coin), if I'm not mistaken, the probability of the hypothesis coin is fair subject to the data is:
$$
p(H_0|M_11,N_11) = fracp(M_11,N_11p(M_11,N_11)p(H_0)=fracN_11choose M_11 0.5^N_11int_0^1 N_11choose M_11 p^M_11(1-p)^N_11-M_11 dp p(H_0) = (N_11+1)frac12^N_11 p(H_0)
$$



Am I doing a mistake? Because I do not understand why the posterior probability of the coins being fair ($H_0$) decreases as the number of throws increases? or otherwise, how the prior should be related to the number of throws?










share|cite|improve this question



















  • 1




    Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
    – joriki
    Sep 9 at 6:56






  • 1




    Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
    – joriki
    Sep 10 at 7:56






  • 1




    If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
    – joriki
    Sep 10 at 8:01







  • 1




    Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
    – joriki
    Sep 10 at 8:16







  • 1




    Thanks a lot for the helpful comments!
    – marcmagransdeabril
    Sep 10 at 8:27














up vote
0
down vote

favorite












Imagine that I have two different coin flipping experiments:



  • Experiment 1: Uses one coin with M1 heads and (N1-M1) tails.

  • Experiment 2: It is composed of two sub-experiments each using a different coin. Experiment 2.1: M21 heads and (N21-M21) tails. Experiment 2.2: M22 heads and (N22-M22) tails.

Note that I can not aggregate M21+M21 because the sub-experiments use different coins and potentially each one could be differently biased.



Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension. So, it can not be directly compared with experiment 1.



How can I know which experiment better supports/rejects the hypothesis, H0, that probability of head for all coin/s is 0.5? What is the justification for the proposed solution?



Update: After jorki comments, I realized that it would be helpful to write down my understanding of the first experiment more formally.



In the first case (multiple throws using the same coin), if I'm not mistaken, the probability of the hypothesis coin is fair subject to the data is:
$$
p(H_0|M_11,N_11) = fracp(M_11,N_11p(M_11,N_11)p(H_0)=fracN_11choose M_11 0.5^N_11int_0^1 N_11choose M_11 p^M_11(1-p)^N_11-M_11 dp p(H_0) = (N_11+1)frac12^N_11 p(H_0)
$$



Am I doing a mistake? Because I do not understand why the posterior probability of the coins being fair ($H_0$) decreases as the number of throws increases? or otherwise, how the prior should be related to the number of throws?










share|cite|improve this question



















  • 1




    Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
    – joriki
    Sep 9 at 6:56






  • 1




    Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
    – joriki
    Sep 10 at 7:56






  • 1




    If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
    – joriki
    Sep 10 at 8:01







  • 1




    Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
    – joriki
    Sep 10 at 8:16







  • 1




    Thanks a lot for the helpful comments!
    – marcmagransdeabril
    Sep 10 at 8:27












up vote
0
down vote

favorite









up vote
0
down vote

favorite











Imagine that I have two different coin flipping experiments:



  • Experiment 1: Uses one coin with M1 heads and (N1-M1) tails.

  • Experiment 2: It is composed of two sub-experiments each using a different coin. Experiment 2.1: M21 heads and (N21-M21) tails. Experiment 2.2: M22 heads and (N22-M22) tails.

Note that I can not aggregate M21+M21 because the sub-experiments use different coins and potentially each one could be differently biased.



Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension. So, it can not be directly compared with experiment 1.



How can I know which experiment better supports/rejects the hypothesis, H0, that probability of head for all coin/s is 0.5? What is the justification for the proposed solution?



Update: After jorki comments, I realized that it would be helpful to write down my understanding of the first experiment more formally.



In the first case (multiple throws using the same coin), if I'm not mistaken, the probability of the hypothesis coin is fair subject to the data is:
$$
p(H_0|M_11,N_11) = fracp(M_11,N_11p(M_11,N_11)p(H_0)=fracN_11choose M_11 0.5^N_11int_0^1 N_11choose M_11 p^M_11(1-p)^N_11-M_11 dp p(H_0) = (N_11+1)frac12^N_11 p(H_0)
$$



Am I doing a mistake? Because I do not understand why the posterior probability of the coins being fair ($H_0$) decreases as the number of throws increases? or otherwise, how the prior should be related to the number of throws?










share|cite|improve this question















Imagine that I have two different coin flipping experiments:



  • Experiment 1: Uses one coin with M1 heads and (N1-M1) tails.

  • Experiment 2: It is composed of two sub-experiments each using a different coin. Experiment 2.1: M21 heads and (N21-M21) tails. Experiment 2.2: M22 heads and (N22-M22) tails.

Note that I can not aggregate M21+M21 because the sub-experiments use different coins and potentially each one could be differently biased.



Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension. So, it can not be directly compared with experiment 1.



How can I know which experiment better supports/rejects the hypothesis, H0, that probability of head for all coin/s is 0.5? What is the justification for the proposed solution?



Update: After jorki comments, I realized that it would be helpful to write down my understanding of the first experiment more formally.



In the first case (multiple throws using the same coin), if I'm not mistaken, the probability of the hypothesis coin is fair subject to the data is:
$$
p(H_0|M_11,N_11) = fracp(M_11,N_11p(M_11,N_11)p(H_0)=fracN_11choose M_11 0.5^N_11int_0^1 N_11choose M_11 p^M_11(1-p)^N_11-M_11 dp p(H_0) = (N_11+1)frac12^N_11 p(H_0)
$$



Am I doing a mistake? Because I do not understand why the posterior probability of the coins being fair ($H_0$) decreases as the number of throws increases? or otherwise, how the prior should be related to the number of throws?







probability-theory bayesian






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Sep 10 at 7:39

























asked Sep 5 at 7:58









marcmagransdeabril

586




586







  • 1




    Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
    – joriki
    Sep 9 at 6:56






  • 1




    Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
    – joriki
    Sep 10 at 7:56






  • 1




    If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
    – joriki
    Sep 10 at 8:01







  • 1




    Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
    – joriki
    Sep 10 at 8:16







  • 1




    Thanks a lot for the helpful comments!
    – marcmagransdeabril
    Sep 10 at 8:27












  • 1




    Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
    – joriki
    Sep 9 at 6:56






  • 1




    Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
    – joriki
    Sep 10 at 7:56






  • 1




    If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
    – joriki
    Sep 10 at 8:01







  • 1




    Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
    – joriki
    Sep 10 at 8:16







  • 1




    Thanks a lot for the helpful comments!
    – marcmagransdeabril
    Sep 10 at 8:27







1




1




Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
– joriki
Sep 9 at 6:56




Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
– joriki
Sep 9 at 6:56




1




1




Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
– joriki
Sep 10 at 7:56




Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
– joriki
Sep 10 at 7:56




1




1




If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
– joriki
Sep 10 at 8:01





If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
– joriki
Sep 10 at 8:01





1




1




Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
– joriki
Sep 10 at 8:16





Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
– joriki
Sep 10 at 8:16





1




1




Thanks a lot for the helpful comments!
– marcmagransdeabril
Sep 10 at 8:27




Thanks a lot for the helpful comments!
– marcmagransdeabril
Sep 10 at 8:27










1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted
+50











Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.




That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.



The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.



To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.



Let's consider three scenarios:



  • $1$) You buy all your coins from Leprecoin.

  • $2$) You buy each coin from either company with equal probability.

  • $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.

In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.



In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.



In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.






share|cite|improve this answer






















  • This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
    – marcmagransdeabril
    Sep 10 at 7:41










  • @marcmagransdeabril: I replied underneath the question.
    – joriki
    Sep 10 at 7:54











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2905996%2fhow-to-compare-two-experiments-from-the-bayesian-perspective%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted
+50











Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.




That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.



The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.



To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.



Let's consider three scenarios:



  • $1$) You buy all your coins from Leprecoin.

  • $2$) You buy each coin from either company with equal probability.

  • $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.

In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.



In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.



In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.






share|cite|improve this answer






















  • This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
    – marcmagransdeabril
    Sep 10 at 7:41










  • @marcmagransdeabril: I replied underneath the question.
    – joriki
    Sep 10 at 7:54















up vote
2
down vote



accepted
+50











Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.




That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.



The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.



To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.



Let's consider three scenarios:



  • $1$) You buy all your coins from Leprecoin.

  • $2$) You buy each coin from either company with equal probability.

  • $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.

In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.



In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.



In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.






share|cite|improve this answer






















  • This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
    – marcmagransdeabril
    Sep 10 at 7:41










  • @marcmagransdeabril: I replied underneath the question.
    – joriki
    Sep 10 at 7:54













up vote
2
down vote



accepted
+50







up vote
2
down vote



accepted
+50




+50





Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.




That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.



The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.



To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.



Let's consider three scenarios:



  • $1$) You buy all your coins from Leprecoin.

  • $2$) You buy each coin from either company with equal probability.

  • $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.

In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.



In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.



In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.






share|cite|improve this answer















Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.




That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.



The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.



To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.



Let's consider three scenarios:



  • $1$) You buy all your coins from Leprecoin.

  • $2$) You buy each coin from either company with equal probability.

  • $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.

In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.



In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.



In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Sep 9 at 7:55

























answered Sep 9 at 7:44









joriki

168k10180335




168k10180335











  • This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
    – marcmagransdeabril
    Sep 10 at 7:41










  • @marcmagransdeabril: I replied underneath the question.
    – joriki
    Sep 10 at 7:54

















  • This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
    – marcmagransdeabril
    Sep 10 at 7:41










  • @marcmagransdeabril: I replied underneath the question.
    – joriki
    Sep 10 at 7:54
















This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
– marcmagransdeabril
Sep 10 at 7:41




This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
– marcmagransdeabril
Sep 10 at 7:41












@marcmagransdeabril: I replied underneath the question.
– joriki
Sep 10 at 7:54





@marcmagransdeabril: I replied underneath the question.
– joriki
Sep 10 at 7:54


















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2905996%2fhow-to-compare-two-experiments-from-the-bayesian-perspective%23new-answer', 'question_page');

);

Post as a guest













































































這個網誌中的熱門文章

How to combine Bézier curves to a surface?

Carbon dioxide

Why am i infinitely getting the same tweet with the Twitter Search API?