How do we get past how **every** outcome is very unlikely?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












Edit: This question is about rejecting the null hypothesis.




Last month my evil twin and I were at a game show.



The rules are as follows: There is a sealed booth with two magic boxes. Box A has a button that generates a real number according to a normal distribution with mean 5 and standard deviation 1. Box B has a button that generates a real number according to some second fixed probability distribution. We know nothing about the second distribution only it is different from Box A.



Edit: If it makes a difference to the answer we can assume some knowledge about Distribution $B$. For example it is normal and we have some bound on the mean and standard deviation. Say Mean from $-5$ to $15$ and standard deviation from $0.5$ to $1.5$.



The host goes into the booth and presses some combination of the two buttons to generate five real numbers $x_1,x_2,x_3,x_4,x_5$. The host then leaves the booth and the contestant enters. There is a camera inside the booth so the audience can see the contestant through an overhead screen, but the contestant cannot see out.



There is a screen showing $x_1 text_ _ _ _$ and a button that says reveal. The contestant can press the reveal button up to four times to reveal $x_2,x_3,x_4,x_5$. They then write either $A$ or $B$ next to each revealed number and leave the booth.



The contestant wins prize money equal to $displaystyle frac € 1000 times texttextnumbers revealed$.



I went into the booth to see $x_1 = 9.132987$. Since I knew it's very unlikely for a normal distribution to throw out something 4 standard deviations from the mean, I wrote $B$ under $x_1$ and left the booth.



Then the host generated $5$ new numbers and my evil twin entered the booth. I saw on the overhead screen that $x_1=5.134998$. I expected my twin to reveal some more numbers. But instead they wrote $B$ next to $x_1$ and left the booth.



I was surprised. "Why did you do that?" I asked my twin.
They said, "Well the number $x_1$ is in the range $[5.134997,5.134999]$. Since the interval is very small I know a normal distribution is very unlikely to throw out something in that interval. So the number was probably generated by box $B$".



Unsurprisingly I won $€1000$ that day and my evil twin won nothing.



"Damn, bad luck!" that said.



As a result of their bad luck my evil twin defaulted on their house payment and has been sleeping on my couch since then.



I know there is something wrong with my twin's reasoning but don't know how to explain it. After all their reasoning is very similar to my own reasoning about Box $A$ throwing out something bigger than $9$. It seems like the difference should somehow involve how $[9,infty]$ is a very large set and $[5.134997,5.134999]$ is very small.



We're going on the same gameshow next week and I want my evil twin to play better next time. What should I say to convince them their strategy last month was flawed?







share|cite|improve this question






















  • I dont understand the twin's reasoning. Any number $x$ is in the range $[x-delta,x+delta]$ for any $delta>0$. However, being in the interval $[5-delta,5+delta]$ is more likely than being in the interval $[9-delta,9+delta]$ for the same delta.
    – Sar
    Aug 8 at 18:53















up vote
3
down vote

favorite












Edit: This question is about rejecting the null hypothesis.




Last month my evil twin and I were at a game show.



The rules are as follows: There is a sealed booth with two magic boxes. Box A has a button that generates a real number according to a normal distribution with mean 5 and standard deviation 1. Box B has a button that generates a real number according to some second fixed probability distribution. We know nothing about the second distribution only it is different from Box A.



Edit: If it makes a difference to the answer we can assume some knowledge about Distribution $B$. For example it is normal and we have some bound on the mean and standard deviation. Say Mean from $-5$ to $15$ and standard deviation from $0.5$ to $1.5$.



The host goes into the booth and presses some combination of the two buttons to generate five real numbers $x_1,x_2,x_3,x_4,x_5$. The host then leaves the booth and the contestant enters. There is a camera inside the booth so the audience can see the contestant through an overhead screen, but the contestant cannot see out.



There is a screen showing $x_1 text_ _ _ _$ and a button that says reveal. The contestant can press the reveal button up to four times to reveal $x_2,x_3,x_4,x_5$. They then write either $A$ or $B$ next to each revealed number and leave the booth.



The contestant wins prize money equal to $displaystyle frac € 1000 times texttextnumbers revealed$.



I went into the booth to see $x_1 = 9.132987$. Since I knew it's very unlikely for a normal distribution to throw out something 4 standard deviations from the mean, I wrote $B$ under $x_1$ and left the booth.



Then the host generated $5$ new numbers and my evil twin entered the booth. I saw on the overhead screen that $x_1=5.134998$. I expected my twin to reveal some more numbers. But instead they wrote $B$ next to $x_1$ and left the booth.



I was surprised. "Why did you do that?" I asked my twin.
They said, "Well the number $x_1$ is in the range $[5.134997,5.134999]$. Since the interval is very small I know a normal distribution is very unlikely to throw out something in that interval. So the number was probably generated by box $B$".



Unsurprisingly I won $€1000$ that day and my evil twin won nothing.



"Damn, bad luck!" that said.



As a result of their bad luck my evil twin defaulted on their house payment and has been sleeping on my couch since then.



I know there is something wrong with my twin's reasoning but don't know how to explain it. After all their reasoning is very similar to my own reasoning about Box $A$ throwing out something bigger than $9$. It seems like the difference should somehow involve how $[9,infty]$ is a very large set and $[5.134997,5.134999]$ is very small.



We're going on the same gameshow next week and I want my evil twin to play better next time. What should I say to convince them their strategy last month was flawed?







share|cite|improve this question






















  • I dont understand the twin's reasoning. Any number $x$ is in the range $[x-delta,x+delta]$ for any $delta>0$. However, being in the interval $[5-delta,5+delta]$ is more likely than being in the interval $[9-delta,9+delta]$ for the same delta.
    – Sar
    Aug 8 at 18:53













up vote
3
down vote

favorite









up vote
3
down vote

favorite











Edit: This question is about rejecting the null hypothesis.




Last month my evil twin and I were at a game show.



The rules are as follows: There is a sealed booth with two magic boxes. Box A has a button that generates a real number according to a normal distribution with mean 5 and standard deviation 1. Box B has a button that generates a real number according to some second fixed probability distribution. We know nothing about the second distribution only it is different from Box A.



Edit: If it makes a difference to the answer we can assume some knowledge about Distribution $B$. For example it is normal and we have some bound on the mean and standard deviation. Say Mean from $-5$ to $15$ and standard deviation from $0.5$ to $1.5$.



The host goes into the booth and presses some combination of the two buttons to generate five real numbers $x_1,x_2,x_3,x_4,x_5$. The host then leaves the booth and the contestant enters. There is a camera inside the booth so the audience can see the contestant through an overhead screen, but the contestant cannot see out.



There is a screen showing $x_1 text_ _ _ _$ and a button that says reveal. The contestant can press the reveal button up to four times to reveal $x_2,x_3,x_4,x_5$. They then write either $A$ or $B$ next to each revealed number and leave the booth.



The contestant wins prize money equal to $displaystyle frac € 1000 times texttextnumbers revealed$.



I went into the booth to see $x_1 = 9.132987$. Since I knew it's very unlikely for a normal distribution to throw out something 4 standard deviations from the mean, I wrote $B$ under $x_1$ and left the booth.



Then the host generated $5$ new numbers and my evil twin entered the booth. I saw on the overhead screen that $x_1=5.134998$. I expected my twin to reveal some more numbers. But instead they wrote $B$ next to $x_1$ and left the booth.



I was surprised. "Why did you do that?" I asked my twin.
They said, "Well the number $x_1$ is in the range $[5.134997,5.134999]$. Since the interval is very small I know a normal distribution is very unlikely to throw out something in that interval. So the number was probably generated by box $B$".



Unsurprisingly I won $€1000$ that day and my evil twin won nothing.



"Damn, bad luck!" that said.



As a result of their bad luck my evil twin defaulted on their house payment and has been sleeping on my couch since then.



I know there is something wrong with my twin's reasoning but don't know how to explain it. After all their reasoning is very similar to my own reasoning about Box $A$ throwing out something bigger than $9$. It seems like the difference should somehow involve how $[9,infty]$ is a very large set and $[5.134997,5.134999]$ is very small.



We're going on the same gameshow next week and I want my evil twin to play better next time. What should I say to convince them their strategy last month was flawed?







share|cite|improve this question














Edit: This question is about rejecting the null hypothesis.




Last month my evil twin and I were at a game show.



The rules are as follows: There is a sealed booth with two magic boxes. Box A has a button that generates a real number according to a normal distribution with mean 5 and standard deviation 1. Box B has a button that generates a real number according to some second fixed probability distribution. We know nothing about the second distribution only it is different from Box A.



Edit: If it makes a difference to the answer we can assume some knowledge about Distribution $B$. For example it is normal and we have some bound on the mean and standard deviation. Say Mean from $-5$ to $15$ and standard deviation from $0.5$ to $1.5$.



The host goes into the booth and presses some combination of the two buttons to generate five real numbers $x_1,x_2,x_3,x_4,x_5$. The host then leaves the booth and the contestant enters. There is a camera inside the booth so the audience can see the contestant through an overhead screen, but the contestant cannot see out.



There is a screen showing $x_1 text_ _ _ _$ and a button that says reveal. The contestant can press the reveal button up to four times to reveal $x_2,x_3,x_4,x_5$. They then write either $A$ or $B$ next to each revealed number and leave the booth.



The contestant wins prize money equal to $displaystyle frac € 1000 times texttextnumbers revealed$.



I went into the booth to see $x_1 = 9.132987$. Since I knew it's very unlikely for a normal distribution to throw out something 4 standard deviations from the mean, I wrote $B$ under $x_1$ and left the booth.



Then the host generated $5$ new numbers and my evil twin entered the booth. I saw on the overhead screen that $x_1=5.134998$. I expected my twin to reveal some more numbers. But instead they wrote $B$ next to $x_1$ and left the booth.



I was surprised. "Why did you do that?" I asked my twin.
They said, "Well the number $x_1$ is in the range $[5.134997,5.134999]$. Since the interval is very small I know a normal distribution is very unlikely to throw out something in that interval. So the number was probably generated by box $B$".



Unsurprisingly I won $€1000$ that day and my evil twin won nothing.



"Damn, bad luck!" that said.



As a result of their bad luck my evil twin defaulted on their house payment and has been sleeping on my couch since then.



I know there is something wrong with my twin's reasoning but don't know how to explain it. After all their reasoning is very similar to my own reasoning about Box $A$ throwing out something bigger than $9$. It seems like the difference should somehow involve how $[9,infty]$ is a very large set and $[5.134997,5.134999]$ is very small.



We're going on the same gameshow next week and I want my evil twin to play better next time. What should I say to convince them their strategy last month was flawed?









share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Aug 8 at 20:45

























asked Aug 8 at 18:36









Daron

4,4051923




4,4051923











  • I dont understand the twin's reasoning. Any number $x$ is in the range $[x-delta,x+delta]$ for any $delta>0$. However, being in the interval $[5-delta,5+delta]$ is more likely than being in the interval $[9-delta,9+delta]$ for the same delta.
    – Sar
    Aug 8 at 18:53

















  • I dont understand the twin's reasoning. Any number $x$ is in the range $[x-delta,x+delta]$ for any $delta>0$. However, being in the interval $[5-delta,5+delta]$ is more likely than being in the interval $[9-delta,9+delta]$ for the same delta.
    – Sar
    Aug 8 at 18:53
















I dont understand the twin's reasoning. Any number $x$ is in the range $[x-delta,x+delta]$ for any $delta>0$. However, being in the interval $[5-delta,5+delta]$ is more likely than being in the interval $[9-delta,9+delta]$ for the same delta.
– Sar
Aug 8 at 18:53





I dont understand the twin's reasoning. Any number $x$ is in the range $[x-delta,x+delta]$ for any $delta>0$. However, being in the interval $[5-delta,5+delta]$ is more likely than being in the interval $[9-delta,9+delta]$ for the same delta.
– Sar
Aug 8 at 18:53











3 Answers
3






active

oldest

votes

















up vote
2
down vote














"We know nothing about the second distribution only it is different from Box A."




Therein lay the problem. Without some knowledge of how B is generated, the problem has no meaningful solution.



The game is a basic exercise in Bayes Theorem. If you know both population distributions, then the probability that a particular result was generated by A is simply Pa/(Pa+Pb). Unfortunately, without some information about B, you can't calculate Pb and the formula is useless. Conversely, if you had a winning strategy, then you can place certain bounds around Pb, which contradicts the original premise (that B is completely unknown.)



This type of problem is pretty common in the real world, but in most cases we know something about B - typically, the type of distribution. If you can express B in terms of some parameter, then your game starts to become interesting. If your original game contained the premise, "B is also normally distributed, but with an unknown mean and standard deviation", it becomes quite an interesting problem.






share|cite|improve this answer




















  • Suppose we know something about $B$ (see edits). How does that change things? Also what are $Pa$ and $Pb$?
    – Daron
    Aug 8 at 20:49










  • Pa and Pb represent the probability of the particular result under distributions A and B, respectively.
    – pokep
    Aug 8 at 22:31










  • Your edits come close to having enough information to make the game "interesting". If, instead of saying that the mean and standard deviation are bounded, you provide distributions describing them (e.g. uniformly distributed over the bounds), then you can devise an optimal strategy and calculate the odds.
    – pokep
    Aug 8 at 22:36

















up vote
1
down vote













Before you go into the booth, you should formulate your strategy (rather than just reacting to what you see in the booth). Then we might be able to evaluate the expected winnings with that strategy (dependent of course on the unknown distribution of box B and the host's probabilities of pressing A or B).



Your evil twin's strategy amounts to "always choose B". Of course if the host
knows the ET's strategy and wants the ET to lose, he will always pick A. But given an unbiased host who presses buttons at random, this has expected winnings of $ € 500$.



On the other hand, we don't know your strategy, but it seems to include "if $|x_1 - 5| > m$, pick B" (for some $m > 4$). But it's entirely possible that box B's distribution gives even less probability than A's to numbers with $|x-5|>m$, in which case this part of your strategy would not be good at all. In fact, without any knowledge of how the distribution of box B was chosen, I don't see any
way to formulate a good strategy that depends on $x_1$.



I would propose the strategy "choose A or B at random, with probability $1/2$".
This gives you expected winnings of $€ 500$ no matter what distribution box B has and no matter what the host does.






share|cite|improve this answer





























    up vote
    0
    down vote













    Suppose you place one million pieces of paper numbered one to a million in a vat. You pull one out, and find the number is $339,207$. Should you be surprised? On the one hand, the probability of the event you just witnessed was one in a million! On the other hand, that is true for any number you could draw, and it would not make sense to be surprised no matter what happens.



    This illustrates the importance of specifying a statistical test before you see the data. Suppose you did things in the other order. Suspicious that vat filled with pieces of paper numbered $498,924$, you decide to test the null hypothesis that the vat was filled fairly against the alternate hypothesis that it is filled with $498,924$. The test you decide to use is to pull out a number, and if it is equal to $498,924$, you reject the null hypothesis, and keep it otherwise. You pull out a random number, and sure enough it is $498,924$! You would be correct in rejecting the null hypothesis.



    So, for one thing, your brother made the mistake of choosing the test "is the number in $[5.134997,5.134999]$?" after already seeing the number. What if he had specified this test beforehand? Would the strategy of concluding the distribution was $B$ if the number seen was in this range be a good test?



    Any statistical test has two associated parameters:



    • The significance level of the test, is the probability of incorrectly rejecting the null hypothesis. In this case, given the number came from Box A, the probability of the test concluding it did not come from $A$ should be small. The significance level is quite small for the $[5.134997,5.134999]$ test, which is good; you can be very confident of any rejections you make.


    • The power of the test is the probability of rejecting the null hypothesis when the alternate hypothesis is true. You want the power to be high. Here, we cannot compute the power because the distribution of Box B is unknown. However, you can compute the power function $textpow(mu,sigma)$ which gives the power of the test for each possible mean and standard deviation for $B$, and you would find that there is only a very small region of values of $mu$ and $sigma$ for which the test is powerful.


    A good test should be both significant and powerful. Your brother's is not. The problem is that the event $Xin [5.134997,5.134999]$ is not better explained by the alternate hypothesis of Box B than the null hypothesis of Box A.



    On the other hand, the test where you reject the hypothesis of box A whenever $|X-5|>2$ would be significant and arguably powerful; whenever the $mu$ for Box B is greater than $7$ or less than $3$ (which is a significant portion of its range of possible values), the power of the test is over $50%$. Alternatively, you could say that we know nothing about $mu$ and $sigma$, and assume they are uniformly distributed over their range of values, and calculate the power that way. If you did this, you would conclude the test was powerful. Therefore, this is a good test.



    In conclusion, it is not about the size of the sets, it is about how well they match the two possible hypothesis.






    share|cite|improve this answer




















      Your Answer




      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "69"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2876423%2fhow-do-we-get-past-how-every-outcome-is-very-unlikely%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      2
      down vote














      "We know nothing about the second distribution only it is different from Box A."




      Therein lay the problem. Without some knowledge of how B is generated, the problem has no meaningful solution.



      The game is a basic exercise in Bayes Theorem. If you know both population distributions, then the probability that a particular result was generated by A is simply Pa/(Pa+Pb). Unfortunately, without some information about B, you can't calculate Pb and the formula is useless. Conversely, if you had a winning strategy, then you can place certain bounds around Pb, which contradicts the original premise (that B is completely unknown.)



      This type of problem is pretty common in the real world, but in most cases we know something about B - typically, the type of distribution. If you can express B in terms of some parameter, then your game starts to become interesting. If your original game contained the premise, "B is also normally distributed, but with an unknown mean and standard deviation", it becomes quite an interesting problem.






      share|cite|improve this answer




















      • Suppose we know something about $B$ (see edits). How does that change things? Also what are $Pa$ and $Pb$?
        – Daron
        Aug 8 at 20:49










      • Pa and Pb represent the probability of the particular result under distributions A and B, respectively.
        – pokep
        Aug 8 at 22:31










      • Your edits come close to having enough information to make the game "interesting". If, instead of saying that the mean and standard deviation are bounded, you provide distributions describing them (e.g. uniformly distributed over the bounds), then you can devise an optimal strategy and calculate the odds.
        – pokep
        Aug 8 at 22:36














      up vote
      2
      down vote














      "We know nothing about the second distribution only it is different from Box A."




      Therein lay the problem. Without some knowledge of how B is generated, the problem has no meaningful solution.



      The game is a basic exercise in Bayes Theorem. If you know both population distributions, then the probability that a particular result was generated by A is simply Pa/(Pa+Pb). Unfortunately, without some information about B, you can't calculate Pb and the formula is useless. Conversely, if you had a winning strategy, then you can place certain bounds around Pb, which contradicts the original premise (that B is completely unknown.)



      This type of problem is pretty common in the real world, but in most cases we know something about B - typically, the type of distribution. If you can express B in terms of some parameter, then your game starts to become interesting. If your original game contained the premise, "B is also normally distributed, but with an unknown mean and standard deviation", it becomes quite an interesting problem.






      share|cite|improve this answer




















      • Suppose we know something about $B$ (see edits). How does that change things? Also what are $Pa$ and $Pb$?
        – Daron
        Aug 8 at 20:49










      • Pa and Pb represent the probability of the particular result under distributions A and B, respectively.
        – pokep
        Aug 8 at 22:31










      • Your edits come close to having enough information to make the game "interesting". If, instead of saying that the mean and standard deviation are bounded, you provide distributions describing them (e.g. uniformly distributed over the bounds), then you can devise an optimal strategy and calculate the odds.
        – pokep
        Aug 8 at 22:36












      up vote
      2
      down vote










      up vote
      2
      down vote










      "We know nothing about the second distribution only it is different from Box A."




      Therein lay the problem. Without some knowledge of how B is generated, the problem has no meaningful solution.



      The game is a basic exercise in Bayes Theorem. If you know both population distributions, then the probability that a particular result was generated by A is simply Pa/(Pa+Pb). Unfortunately, without some information about B, you can't calculate Pb and the formula is useless. Conversely, if you had a winning strategy, then you can place certain bounds around Pb, which contradicts the original premise (that B is completely unknown.)



      This type of problem is pretty common in the real world, but in most cases we know something about B - typically, the type of distribution. If you can express B in terms of some parameter, then your game starts to become interesting. If your original game contained the premise, "B is also normally distributed, but with an unknown mean and standard deviation", it becomes quite an interesting problem.






      share|cite|improve this answer













      "We know nothing about the second distribution only it is different from Box A."




      Therein lay the problem. Without some knowledge of how B is generated, the problem has no meaningful solution.



      The game is a basic exercise in Bayes Theorem. If you know both population distributions, then the probability that a particular result was generated by A is simply Pa/(Pa+Pb). Unfortunately, without some information about B, you can't calculate Pb and the formula is useless. Conversely, if you had a winning strategy, then you can place certain bounds around Pb, which contradicts the original premise (that B is completely unknown.)



      This type of problem is pretty common in the real world, but in most cases we know something about B - typically, the type of distribution. If you can express B in terms of some parameter, then your game starts to become interesting. If your original game contained the premise, "B is also normally distributed, but with an unknown mean and standard deviation", it becomes quite an interesting problem.







      share|cite|improve this answer












      share|cite|improve this answer



      share|cite|improve this answer










      answered Aug 8 at 19:35









      pokep

      1412




      1412











      • Suppose we know something about $B$ (see edits). How does that change things? Also what are $Pa$ and $Pb$?
        – Daron
        Aug 8 at 20:49










      • Pa and Pb represent the probability of the particular result under distributions A and B, respectively.
        – pokep
        Aug 8 at 22:31










      • Your edits come close to having enough information to make the game "interesting". If, instead of saying that the mean and standard deviation are bounded, you provide distributions describing them (e.g. uniformly distributed over the bounds), then you can devise an optimal strategy and calculate the odds.
        – pokep
        Aug 8 at 22:36
















      • Suppose we know something about $B$ (see edits). How does that change things? Also what are $Pa$ and $Pb$?
        – Daron
        Aug 8 at 20:49










      • Pa and Pb represent the probability of the particular result under distributions A and B, respectively.
        – pokep
        Aug 8 at 22:31










      • Your edits come close to having enough information to make the game "interesting". If, instead of saying that the mean and standard deviation are bounded, you provide distributions describing them (e.g. uniformly distributed over the bounds), then you can devise an optimal strategy and calculate the odds.
        – pokep
        Aug 8 at 22:36















      Suppose we know something about $B$ (see edits). How does that change things? Also what are $Pa$ and $Pb$?
      – Daron
      Aug 8 at 20:49




      Suppose we know something about $B$ (see edits). How does that change things? Also what are $Pa$ and $Pb$?
      – Daron
      Aug 8 at 20:49












      Pa and Pb represent the probability of the particular result under distributions A and B, respectively.
      – pokep
      Aug 8 at 22:31




      Pa and Pb represent the probability of the particular result under distributions A and B, respectively.
      – pokep
      Aug 8 at 22:31












      Your edits come close to having enough information to make the game "interesting". If, instead of saying that the mean and standard deviation are bounded, you provide distributions describing them (e.g. uniformly distributed over the bounds), then you can devise an optimal strategy and calculate the odds.
      – pokep
      Aug 8 at 22:36




      Your edits come close to having enough information to make the game "interesting". If, instead of saying that the mean and standard deviation are bounded, you provide distributions describing them (e.g. uniformly distributed over the bounds), then you can devise an optimal strategy and calculate the odds.
      – pokep
      Aug 8 at 22:36










      up vote
      1
      down vote













      Before you go into the booth, you should formulate your strategy (rather than just reacting to what you see in the booth). Then we might be able to evaluate the expected winnings with that strategy (dependent of course on the unknown distribution of box B and the host's probabilities of pressing A or B).



      Your evil twin's strategy amounts to "always choose B". Of course if the host
      knows the ET's strategy and wants the ET to lose, he will always pick A. But given an unbiased host who presses buttons at random, this has expected winnings of $ € 500$.



      On the other hand, we don't know your strategy, but it seems to include "if $|x_1 - 5| > m$, pick B" (for some $m > 4$). But it's entirely possible that box B's distribution gives even less probability than A's to numbers with $|x-5|>m$, in which case this part of your strategy would not be good at all. In fact, without any knowledge of how the distribution of box B was chosen, I don't see any
      way to formulate a good strategy that depends on $x_1$.



      I would propose the strategy "choose A or B at random, with probability $1/2$".
      This gives you expected winnings of $€ 500$ no matter what distribution box B has and no matter what the host does.






      share|cite|improve this answer


























        up vote
        1
        down vote













        Before you go into the booth, you should formulate your strategy (rather than just reacting to what you see in the booth). Then we might be able to evaluate the expected winnings with that strategy (dependent of course on the unknown distribution of box B and the host's probabilities of pressing A or B).



        Your evil twin's strategy amounts to "always choose B". Of course if the host
        knows the ET's strategy and wants the ET to lose, he will always pick A. But given an unbiased host who presses buttons at random, this has expected winnings of $ € 500$.



        On the other hand, we don't know your strategy, but it seems to include "if $|x_1 - 5| > m$, pick B" (for some $m > 4$). But it's entirely possible that box B's distribution gives even less probability than A's to numbers with $|x-5|>m$, in which case this part of your strategy would not be good at all. In fact, without any knowledge of how the distribution of box B was chosen, I don't see any
        way to formulate a good strategy that depends on $x_1$.



        I would propose the strategy "choose A or B at random, with probability $1/2$".
        This gives you expected winnings of $€ 500$ no matter what distribution box B has and no matter what the host does.






        share|cite|improve this answer
























          up vote
          1
          down vote










          up vote
          1
          down vote









          Before you go into the booth, you should formulate your strategy (rather than just reacting to what you see in the booth). Then we might be able to evaluate the expected winnings with that strategy (dependent of course on the unknown distribution of box B and the host's probabilities of pressing A or B).



          Your evil twin's strategy amounts to "always choose B". Of course if the host
          knows the ET's strategy and wants the ET to lose, he will always pick A. But given an unbiased host who presses buttons at random, this has expected winnings of $ € 500$.



          On the other hand, we don't know your strategy, but it seems to include "if $|x_1 - 5| > m$, pick B" (for some $m > 4$). But it's entirely possible that box B's distribution gives even less probability than A's to numbers with $|x-5|>m$, in which case this part of your strategy would not be good at all. In fact, without any knowledge of how the distribution of box B was chosen, I don't see any
          way to formulate a good strategy that depends on $x_1$.



          I would propose the strategy "choose A or B at random, with probability $1/2$".
          This gives you expected winnings of $€ 500$ no matter what distribution box B has and no matter what the host does.






          share|cite|improve this answer














          Before you go into the booth, you should formulate your strategy (rather than just reacting to what you see in the booth). Then we might be able to evaluate the expected winnings with that strategy (dependent of course on the unknown distribution of box B and the host's probabilities of pressing A or B).



          Your evil twin's strategy amounts to "always choose B". Of course if the host
          knows the ET's strategy and wants the ET to lose, he will always pick A. But given an unbiased host who presses buttons at random, this has expected winnings of $ € 500$.



          On the other hand, we don't know your strategy, but it seems to include "if $|x_1 - 5| > m$, pick B" (for some $m > 4$). But it's entirely possible that box B's distribution gives even less probability than A's to numbers with $|x-5|>m$, in which case this part of your strategy would not be good at all. In fact, without any knowledge of how the distribution of box B was chosen, I don't see any
          way to formulate a good strategy that depends on $x_1$.



          I would propose the strategy "choose A or B at random, with probability $1/2$".
          This gives you expected winnings of $€ 500$ no matter what distribution box B has and no matter what the host does.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Aug 8 at 19:23

























          answered Aug 8 at 19:13









          Robert Israel

          304k22201443




          304k22201443




















              up vote
              0
              down vote













              Suppose you place one million pieces of paper numbered one to a million in a vat. You pull one out, and find the number is $339,207$. Should you be surprised? On the one hand, the probability of the event you just witnessed was one in a million! On the other hand, that is true for any number you could draw, and it would not make sense to be surprised no matter what happens.



              This illustrates the importance of specifying a statistical test before you see the data. Suppose you did things in the other order. Suspicious that vat filled with pieces of paper numbered $498,924$, you decide to test the null hypothesis that the vat was filled fairly against the alternate hypothesis that it is filled with $498,924$. The test you decide to use is to pull out a number, and if it is equal to $498,924$, you reject the null hypothesis, and keep it otherwise. You pull out a random number, and sure enough it is $498,924$! You would be correct in rejecting the null hypothesis.



              So, for one thing, your brother made the mistake of choosing the test "is the number in $[5.134997,5.134999]$?" after already seeing the number. What if he had specified this test beforehand? Would the strategy of concluding the distribution was $B$ if the number seen was in this range be a good test?



              Any statistical test has two associated parameters:



              • The significance level of the test, is the probability of incorrectly rejecting the null hypothesis. In this case, given the number came from Box A, the probability of the test concluding it did not come from $A$ should be small. The significance level is quite small for the $[5.134997,5.134999]$ test, which is good; you can be very confident of any rejections you make.


              • The power of the test is the probability of rejecting the null hypothesis when the alternate hypothesis is true. You want the power to be high. Here, we cannot compute the power because the distribution of Box B is unknown. However, you can compute the power function $textpow(mu,sigma)$ which gives the power of the test for each possible mean and standard deviation for $B$, and you would find that there is only a very small region of values of $mu$ and $sigma$ for which the test is powerful.


              A good test should be both significant and powerful. Your brother's is not. The problem is that the event $Xin [5.134997,5.134999]$ is not better explained by the alternate hypothesis of Box B than the null hypothesis of Box A.



              On the other hand, the test where you reject the hypothesis of box A whenever $|X-5|>2$ would be significant and arguably powerful; whenever the $mu$ for Box B is greater than $7$ or less than $3$ (which is a significant portion of its range of possible values), the power of the test is over $50%$. Alternatively, you could say that we know nothing about $mu$ and $sigma$, and assume they are uniformly distributed over their range of values, and calculate the power that way. If you did this, you would conclude the test was powerful. Therefore, this is a good test.



              In conclusion, it is not about the size of the sets, it is about how well they match the two possible hypothesis.






              share|cite|improve this answer
























                up vote
                0
                down vote













                Suppose you place one million pieces of paper numbered one to a million in a vat. You pull one out, and find the number is $339,207$. Should you be surprised? On the one hand, the probability of the event you just witnessed was one in a million! On the other hand, that is true for any number you could draw, and it would not make sense to be surprised no matter what happens.



                This illustrates the importance of specifying a statistical test before you see the data. Suppose you did things in the other order. Suspicious that vat filled with pieces of paper numbered $498,924$, you decide to test the null hypothesis that the vat was filled fairly against the alternate hypothesis that it is filled with $498,924$. The test you decide to use is to pull out a number, and if it is equal to $498,924$, you reject the null hypothesis, and keep it otherwise. You pull out a random number, and sure enough it is $498,924$! You would be correct in rejecting the null hypothesis.



                So, for one thing, your brother made the mistake of choosing the test "is the number in $[5.134997,5.134999]$?" after already seeing the number. What if he had specified this test beforehand? Would the strategy of concluding the distribution was $B$ if the number seen was in this range be a good test?



                Any statistical test has two associated parameters:



                • The significance level of the test, is the probability of incorrectly rejecting the null hypothesis. In this case, given the number came from Box A, the probability of the test concluding it did not come from $A$ should be small. The significance level is quite small for the $[5.134997,5.134999]$ test, which is good; you can be very confident of any rejections you make.


                • The power of the test is the probability of rejecting the null hypothesis when the alternate hypothesis is true. You want the power to be high. Here, we cannot compute the power because the distribution of Box B is unknown. However, you can compute the power function $textpow(mu,sigma)$ which gives the power of the test for each possible mean and standard deviation for $B$, and you would find that there is only a very small region of values of $mu$ and $sigma$ for which the test is powerful.


                A good test should be both significant and powerful. Your brother's is not. The problem is that the event $Xin [5.134997,5.134999]$ is not better explained by the alternate hypothesis of Box B than the null hypothesis of Box A.



                On the other hand, the test where you reject the hypothesis of box A whenever $|X-5|>2$ would be significant and arguably powerful; whenever the $mu$ for Box B is greater than $7$ or less than $3$ (which is a significant portion of its range of possible values), the power of the test is over $50%$. Alternatively, you could say that we know nothing about $mu$ and $sigma$, and assume they are uniformly distributed over their range of values, and calculate the power that way. If you did this, you would conclude the test was powerful. Therefore, this is a good test.



                In conclusion, it is not about the size of the sets, it is about how well they match the two possible hypothesis.






                share|cite|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Suppose you place one million pieces of paper numbered one to a million in a vat. You pull one out, and find the number is $339,207$. Should you be surprised? On the one hand, the probability of the event you just witnessed was one in a million! On the other hand, that is true for any number you could draw, and it would not make sense to be surprised no matter what happens.



                  This illustrates the importance of specifying a statistical test before you see the data. Suppose you did things in the other order. Suspicious that vat filled with pieces of paper numbered $498,924$, you decide to test the null hypothesis that the vat was filled fairly against the alternate hypothesis that it is filled with $498,924$. The test you decide to use is to pull out a number, and if it is equal to $498,924$, you reject the null hypothesis, and keep it otherwise. You pull out a random number, and sure enough it is $498,924$! You would be correct in rejecting the null hypothesis.



                  So, for one thing, your brother made the mistake of choosing the test "is the number in $[5.134997,5.134999]$?" after already seeing the number. What if he had specified this test beforehand? Would the strategy of concluding the distribution was $B$ if the number seen was in this range be a good test?



                  Any statistical test has two associated parameters:



                  • The significance level of the test, is the probability of incorrectly rejecting the null hypothesis. In this case, given the number came from Box A, the probability of the test concluding it did not come from $A$ should be small. The significance level is quite small for the $[5.134997,5.134999]$ test, which is good; you can be very confident of any rejections you make.


                  • The power of the test is the probability of rejecting the null hypothesis when the alternate hypothesis is true. You want the power to be high. Here, we cannot compute the power because the distribution of Box B is unknown. However, you can compute the power function $textpow(mu,sigma)$ which gives the power of the test for each possible mean and standard deviation for $B$, and you would find that there is only a very small region of values of $mu$ and $sigma$ for which the test is powerful.


                  A good test should be both significant and powerful. Your brother's is not. The problem is that the event $Xin [5.134997,5.134999]$ is not better explained by the alternate hypothesis of Box B than the null hypothesis of Box A.



                  On the other hand, the test where you reject the hypothesis of box A whenever $|X-5|>2$ would be significant and arguably powerful; whenever the $mu$ for Box B is greater than $7$ or less than $3$ (which is a significant portion of its range of possible values), the power of the test is over $50%$. Alternatively, you could say that we know nothing about $mu$ and $sigma$, and assume they are uniformly distributed over their range of values, and calculate the power that way. If you did this, you would conclude the test was powerful. Therefore, this is a good test.



                  In conclusion, it is not about the size of the sets, it is about how well they match the two possible hypothesis.






                  share|cite|improve this answer












                  Suppose you place one million pieces of paper numbered one to a million in a vat. You pull one out, and find the number is $339,207$. Should you be surprised? On the one hand, the probability of the event you just witnessed was one in a million! On the other hand, that is true for any number you could draw, and it would not make sense to be surprised no matter what happens.



                  This illustrates the importance of specifying a statistical test before you see the data. Suppose you did things in the other order. Suspicious that vat filled with pieces of paper numbered $498,924$, you decide to test the null hypothesis that the vat was filled fairly against the alternate hypothesis that it is filled with $498,924$. The test you decide to use is to pull out a number, and if it is equal to $498,924$, you reject the null hypothesis, and keep it otherwise. You pull out a random number, and sure enough it is $498,924$! You would be correct in rejecting the null hypothesis.



                  So, for one thing, your brother made the mistake of choosing the test "is the number in $[5.134997,5.134999]$?" after already seeing the number. What if he had specified this test beforehand? Would the strategy of concluding the distribution was $B$ if the number seen was in this range be a good test?



                  Any statistical test has two associated parameters:



                  • The significance level of the test, is the probability of incorrectly rejecting the null hypothesis. In this case, given the number came from Box A, the probability of the test concluding it did not come from $A$ should be small. The significance level is quite small for the $[5.134997,5.134999]$ test, which is good; you can be very confident of any rejections you make.


                  • The power of the test is the probability of rejecting the null hypothesis when the alternate hypothesis is true. You want the power to be high. Here, we cannot compute the power because the distribution of Box B is unknown. However, you can compute the power function $textpow(mu,sigma)$ which gives the power of the test for each possible mean and standard deviation for $B$, and you would find that there is only a very small region of values of $mu$ and $sigma$ for which the test is powerful.


                  A good test should be both significant and powerful. Your brother's is not. The problem is that the event $Xin [5.134997,5.134999]$ is not better explained by the alternate hypothesis of Box B than the null hypothesis of Box A.



                  On the other hand, the test where you reject the hypothesis of box A whenever $|X-5|>2$ would be significant and arguably powerful; whenever the $mu$ for Box B is greater than $7$ or less than $3$ (which is a significant portion of its range of possible values), the power of the test is over $50%$. Alternatively, you could say that we know nothing about $mu$ and $sigma$, and assume they are uniformly distributed over their range of values, and calculate the power that way. If you did this, you would conclude the test was powerful. Therefore, this is a good test.



                  In conclusion, it is not about the size of the sets, it is about how well they match the two possible hypothesis.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Aug 8 at 21:26









                  Mike Earnest

                  15.8k11646




                  15.8k11646






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2876423%2fhow-do-we-get-past-how-every-outcome-is-very-unlikely%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      這個網誌中的熱門文章

                      How to combine Bézier curves to a surface?

                      Mutual Information Always Non-negative

                      Why am i infinitely getting the same tweet with the Twitter Search API?