How to compare two experiments from the Bayesian perspective?
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
Imagine that I have two different coin flipping experiments:
- Experiment 1: Uses one coin with M1 heads and (N1-M1) tails.
- Experiment 2: It is composed of two sub-experiments each using a different coin. Experiment 2.1: M21 heads and (N21-M21) tails. Experiment 2.2: M22 heads and (N22-M22) tails.
Note that I can not aggregate M21+M21 because the sub-experiments use different coins and potentially each one could be differently biased.
Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension. So, it can not be directly compared with experiment 1.
How can I know which experiment better supports/rejects the hypothesis, H0, that probability of head for all coin/s is 0.5? What is the justification for the proposed solution?
Update: After jorki comments, I realized that it would be helpful to write down my understanding of the first experiment more formally.
In the first case (multiple throws using the same coin), if I'm not mistaken, the probability of the hypothesis coin is fair subject to the data is:
$$
p(H_0|M_11,N_11) = fracp(M_11,N_11p(M_11,N_11)p(H_0)=fracN_11choose M_11 0.5^N_11int_0^1 N_11choose M_11 p^M_11(1-p)^N_11-M_11 dp p(H_0) = (N_11+1)frac12^N_11 p(H_0)
$$
Am I doing a mistake? Because I do not understand why the posterior probability of the coins being fair ($H_0$) decreases as the number of throws increases? or otherwise, how the prior should be related to the number of throws?
probability-theory bayesian
add a comment |Â
up vote
0
down vote
favorite
Imagine that I have two different coin flipping experiments:
- Experiment 1: Uses one coin with M1 heads and (N1-M1) tails.
- Experiment 2: It is composed of two sub-experiments each using a different coin. Experiment 2.1: M21 heads and (N21-M21) tails. Experiment 2.2: M22 heads and (N22-M22) tails.
Note that I can not aggregate M21+M21 because the sub-experiments use different coins and potentially each one could be differently biased.
Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension. So, it can not be directly compared with experiment 1.
How can I know which experiment better supports/rejects the hypothesis, H0, that probability of head for all coin/s is 0.5? What is the justification for the proposed solution?
Update: After jorki comments, I realized that it would be helpful to write down my understanding of the first experiment more formally.
In the first case (multiple throws using the same coin), if I'm not mistaken, the probability of the hypothesis coin is fair subject to the data is:
$$
p(H_0|M_11,N_11) = fracp(M_11,N_11p(M_11,N_11)p(H_0)=fracN_11choose M_11 0.5^N_11int_0^1 N_11choose M_11 p^M_11(1-p)^N_11-M_11 dp p(H_0) = (N_11+1)frac12^N_11 p(H_0)
$$
Am I doing a mistake? Because I do not understand why the posterior probability of the coins being fair ($H_0$) decreases as the number of throws increases? or otherwise, how the prior should be related to the number of throws?
probability-theory bayesian
1
Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
â joriki
Sep 9 at 6:56
1
Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
â joriki
Sep 10 at 7:56
1
If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
â joriki
Sep 10 at 8:01
1
Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
â joriki
Sep 10 at 8:16
1
Thanks a lot for the helpful comments!
â marcmagransdeabril
Sep 10 at 8:27
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Imagine that I have two different coin flipping experiments:
- Experiment 1: Uses one coin with M1 heads and (N1-M1) tails.
- Experiment 2: It is composed of two sub-experiments each using a different coin. Experiment 2.1: M21 heads and (N21-M21) tails. Experiment 2.2: M22 heads and (N22-M22) tails.
Note that I can not aggregate M21+M21 because the sub-experiments use different coins and potentially each one could be differently biased.
Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension. So, it can not be directly compared with experiment 1.
How can I know which experiment better supports/rejects the hypothesis, H0, that probability of head for all coin/s is 0.5? What is the justification for the proposed solution?
Update: After jorki comments, I realized that it would be helpful to write down my understanding of the first experiment more formally.
In the first case (multiple throws using the same coin), if I'm not mistaken, the probability of the hypothesis coin is fair subject to the data is:
$$
p(H_0|M_11,N_11) = fracp(M_11,N_11p(M_11,N_11)p(H_0)=fracN_11choose M_11 0.5^N_11int_0^1 N_11choose M_11 p^M_11(1-p)^N_11-M_11 dp p(H_0) = (N_11+1)frac12^N_11 p(H_0)
$$
Am I doing a mistake? Because I do not understand why the posterior probability of the coins being fair ($H_0$) decreases as the number of throws increases? or otherwise, how the prior should be related to the number of throws?
probability-theory bayesian
Imagine that I have two different coin flipping experiments:
- Experiment 1: Uses one coin with M1 heads and (N1-M1) tails.
- Experiment 2: It is composed of two sub-experiments each using a different coin. Experiment 2.1: M21 heads and (N21-M21) tails. Experiment 2.2: M22 heads and (N22-M22) tails.
Note that I can not aggregate M21+M21 because the sub-experiments use different coins and potentially each one could be differently biased.
Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension. So, it can not be directly compared with experiment 1.
How can I know which experiment better supports/rejects the hypothesis, H0, that probability of head for all coin/s is 0.5? What is the justification for the proposed solution?
Update: After jorki comments, I realized that it would be helpful to write down my understanding of the first experiment more formally.
In the first case (multiple throws using the same coin), if I'm not mistaken, the probability of the hypothesis coin is fair subject to the data is:
$$
p(H_0|M_11,N_11) = fracp(M_11,N_11p(M_11,N_11)p(H_0)=fracN_11choose M_11 0.5^N_11int_0^1 N_11choose M_11 p^M_11(1-p)^N_11-M_11 dp p(H_0) = (N_11+1)frac12^N_11 p(H_0)
$$
Am I doing a mistake? Because I do not understand why the posterior probability of the coins being fair ($H_0$) decreases as the number of throws increases? or otherwise, how the prior should be related to the number of throws?
probability-theory bayesian
probability-theory bayesian
edited Sep 10 at 7:39
asked Sep 5 at 7:58
marcmagransdeabril
586
586
1
Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
â joriki
Sep 9 at 6:56
1
Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
â joriki
Sep 10 at 7:56
1
If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
â joriki
Sep 10 at 8:01
1
Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
â joriki
Sep 10 at 8:16
1
Thanks a lot for the helpful comments!
â marcmagransdeabril
Sep 10 at 8:27
add a comment |Â
1
Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
â joriki
Sep 9 at 6:56
1
Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
â joriki
Sep 10 at 7:56
1
If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
â joriki
Sep 10 at 8:01
1
Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
â joriki
Sep 10 at 8:16
1
Thanks a lot for the helpful comments!
â marcmagransdeabril
Sep 10 at 8:27
1
1
Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
â joriki
Sep 9 at 6:56
Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
â joriki
Sep 9 at 6:56
1
1
Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
â joriki
Sep 10 at 7:56
Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
â joriki
Sep 10 at 7:56
1
1
If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
â joriki
Sep 10 at 8:01
If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
â joriki
Sep 10 at 8:01
1
1
Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
â joriki
Sep 10 at 8:16
Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
â joriki
Sep 10 at 8:16
1
1
Thanks a lot for the helpful comments!
â marcmagransdeabril
Sep 10 at 8:27
Thanks a lot for the helpful comments!
â marcmagransdeabril
Sep 10 at 8:27
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.
That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.
The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.
To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.
Let's consider three scenarios:
- $1$) You buy all your coins from Leprecoin.
- $2$) You buy each coin from either company with equal probability.
- $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.
In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.
In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.
In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.
This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
â marcmagransdeabril
Sep 10 at 7:41
@marcmagransdeabril: I replied underneath the question.
â joriki
Sep 10 at 7:54
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.
That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.
The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.
To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.
Let's consider three scenarios:
- $1$) You buy all your coins from Leprecoin.
- $2$) You buy each coin from either company with equal probability.
- $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.
In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.
In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.
In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.
This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
â marcmagransdeabril
Sep 10 at 7:41
@marcmagransdeabril: I replied underneath the question.
â joriki
Sep 10 at 7:54
add a comment |Â
up vote
2
down vote
accepted
Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.
That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.
The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.
To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.
Let's consider three scenarios:
- $1$) You buy all your coins from Leprecoin.
- $2$) You buy each coin from either company with equal probability.
- $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.
In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.
In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.
In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.
This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
â marcmagransdeabril
Sep 10 at 7:41
@marcmagransdeabril: I replied underneath the question.
â joriki
Sep 10 at 7:54
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.
That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.
The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.
To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.
Let's consider three scenarios:
- $1$) You buy all your coins from Leprecoin.
- $2$) You buy each coin from either company with equal probability.
- $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.
In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.
In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.
In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.
Obviously as the number of sub-experiments increase the probability of the data conditioned to the hypothesis tends to zero because the domain of the probability increases in dimension.
That's not true. Conditional on the hypothesis that all coins are fair, the probability of the data is simply $2^-n$, where $n$ is the total number of coin flips; the number of subexperiments doesn't enter into it. But anyway, you're interested in the posterior probability of the hypothesis, not in the probability of the data conditioned on the hypothesis.
The title says that you want to approach the problem from a Bayesian perspective, but you never mention that again in the text, and in particular you don't mention your prior. If the prior exhibits the phenomenon that you describe, then the posterior probability will also reflect that. But that just means that, given your prior, the hypothesis is less likely to be true when there are more subexperiments.
To illustrate this, let's say there are two different coin makers, Fairie and Leprecoin. Fairie is known for making state-of-the-art ultra-precise fair coins, and their coins can for all intents and purposes be assumed to be exactly fair. Leprecoin sells random bits of metal superficially made to look like coins; their probability to show heads is uniformly randomly distributed on the unit interval.
Let's consider three scenarios:
- $1$) You buy all your coins from Leprecoin.
- $2$) You buy each coin from either company with equal probability.
- $3$) For each of your two experiments, you buy all coins for that experiment from either company with equal probability.
In the first case, your prior is continuous, so the probability for all coins to be fair is zero and remains zero, no matter how often you flip them. You can only define a probability that the coins are all approximately fair; say, that their probability to show heads lies in $[0.49,0.51]$. In this case, already before you start flipping any coins, the probability that the coin in Experiment $1$ is approximately fair is $0.02$ whereas the probability that both coins in Experiment $2$ are approximately fair is only $0.0004$. There's nothing wrong with that; it merely expresses the fact that it's much more likely to buy one approximately fair coin from Leprecoin than two. This will continue to be reflected in the posterior distribution when you conduct the experiments.
In the second case, you do have a non-zero probability for all coins to be fair, but again, before you start flipping, the probability that all coins are fair is already $frac12$ in Experiment $1$ and $frac14$ in Experiment $2$, again merely reflecting the fact that if you buy uniformly randomly from the two companies, it's more likely that one coin is bought from Fairie than it is that two coins are bought form Fairie. Again, nothing wrong with that, and again, this will continue to be reflected in the posterior distribution when you conduct the experiments.
In the third case, you do start out with the same probability $frac12$ that all coins in an experiment are fair. And that will continue be reflected in the posterior distribution when you conduct the experiments.
edited Sep 9 at 7:55
answered Sep 9 at 7:44
joriki
168k10180335
168k10180335
This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
â marcmagransdeabril
Sep 10 at 7:41
@marcmagransdeabril: I replied underneath the question.
â joriki
Sep 10 at 7:54
add a comment |Â
This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
â marcmagransdeabril
Sep 10 at 7:41
@marcmagransdeabril: I replied underneath the question.
â joriki
Sep 10 at 7:54
This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
â marcmagransdeabril
Sep 10 at 7:41
This answer is very helpful and it is close to my (more than imperfect) question. I have added an update to the original question that hopefully clarifies the point I do not understand. It will be great if you can take a look.
â marcmagransdeabril
Sep 10 at 7:41
@marcmagransdeabril: I replied underneath the question.
â joriki
Sep 10 at 7:54
@marcmagransdeabril: I replied underneath the question.
â joriki
Sep 10 at 7:54
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2905996%2fhow-to-compare-two-experiments-from-the-bayesian-perspective%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Please see this tutorial and reference on how to typeset math on this site. Also I think where you write $M_21+M_21$ you mean $M_21+M_22$?
â joriki
Sep 9 at 6:56
1
Your expression for $p(M_11,N_11)$ in the denominator is missing a prior for $p$. So you're assuming a flat continuous prior for $p$ there, which would imply $p(H_0)=0$ (and thus also $p(H_0mid M_11,N_11)=0$). Two notational issues: In this context I'd suggest to use another variable name instead of $p$ to avoid confusion with all the other $p$s. And I wouldn't include $N_11$ in the notation, as it's part of the setup of the experiment, not of the data.
â joriki
Sep 10 at 7:56
1
If I might venture a guess: Some of what you write seems to have a frequentist flavour to it. I get the impression that perhaps you're coming from a frequentist background and are trying to take a Bayesian approach but sometimes "lapsing" into the frequentist approach. In the frequentist framework, you can accept or reject a point hypothesis like $p=frac12$ without worrying about the fact that it's just a point of measure $0$ in a continuum; in the Bayesian framework this only works if you have non-zero prior probability concentrated at that point.
â joriki
Sep 10 at 8:01
1
Also, you're missing a binomial coefficient in the result. The ones that are already in the denominator and the numerator cancel, and the integral in the denominator evaluates to $left((N_11+1)binomN_11M_11right)^-1$, so you're left with $(N_11+1)binomN_11M_11$.
â joriki
Sep 10 at 8:16
1
Thanks a lot for the helpful comments!
â marcmagransdeabril
Sep 10 at 8:27