Why do t-tests on the same distribution can give me a p-value near zero?
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I was expecting a t-test on two sets of data generated by the same distribution to pretty much have a p-value of 1. I was surprised to see it could be much, much lower. Can somebody please explain why?
My Python code to demonstrate looks like this:
import scipy.stats as stats
import numpy as np
Nsamp = 10 ** 6
sigma_x = 2
mean = 100
ps =
for x in range(10):
x = np.random.normal(mean, sigma_x, size=Nsamp)
y = np.random.normal(mean, sigma_x, size=Nsamp)
(t_value, p_value) = stats.ttest_ind(x, y, equal_var=True)
ps.append(p_value)
p_mean = sum(ps) / len(ps)
print("p-values: Average 0:.3f lowest 1:.3f".format(p_mean, min(ps)))
And it's not unusual for me to see something like:
p-values: Average 0.553 lowest 0.088
EDIT: It's also worth mentioning that a chi-square test on the same data consistently returns a p-value of 1.0.
statistics
add a comment |Â
up vote
0
down vote
favorite
I was expecting a t-test on two sets of data generated by the same distribution to pretty much have a p-value of 1. I was surprised to see it could be much, much lower. Can somebody please explain why?
My Python code to demonstrate looks like this:
import scipy.stats as stats
import numpy as np
Nsamp = 10 ** 6
sigma_x = 2
mean = 100
ps =
for x in range(10):
x = np.random.normal(mean, sigma_x, size=Nsamp)
y = np.random.normal(mean, sigma_x, size=Nsamp)
(t_value, p_value) = stats.ttest_ind(x, y, equal_var=True)
ps.append(p_value)
p_mean = sum(ps) / len(ps)
print("p-values: Average 0:.3f lowest 1:.3f".format(p_mean, min(ps)))
And it's not unusual for me to see something like:
p-values: Average 0.553 lowest 0.088
EDIT: It's also worth mentioning that a chi-square test on the same data consistently returns a p-value of 1.0.
statistics
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I was expecting a t-test on two sets of data generated by the same distribution to pretty much have a p-value of 1. I was surprised to see it could be much, much lower. Can somebody please explain why?
My Python code to demonstrate looks like this:
import scipy.stats as stats
import numpy as np
Nsamp = 10 ** 6
sigma_x = 2
mean = 100
ps =
for x in range(10):
x = np.random.normal(mean, sigma_x, size=Nsamp)
y = np.random.normal(mean, sigma_x, size=Nsamp)
(t_value, p_value) = stats.ttest_ind(x, y, equal_var=True)
ps.append(p_value)
p_mean = sum(ps) / len(ps)
print("p-values: Average 0:.3f lowest 1:.3f".format(p_mean, min(ps)))
And it's not unusual for me to see something like:
p-values: Average 0.553 lowest 0.088
EDIT: It's also worth mentioning that a chi-square test on the same data consistently returns a p-value of 1.0.
statistics
I was expecting a t-test on two sets of data generated by the same distribution to pretty much have a p-value of 1. I was surprised to see it could be much, much lower. Can somebody please explain why?
My Python code to demonstrate looks like this:
import scipy.stats as stats
import numpy as np
Nsamp = 10 ** 6
sigma_x = 2
mean = 100
ps =
for x in range(10):
x = np.random.normal(mean, sigma_x, size=Nsamp)
y = np.random.normal(mean, sigma_x, size=Nsamp)
(t_value, p_value) = stats.ttest_ind(x, y, equal_var=True)
ps.append(p_value)
p_mean = sum(ps) / len(ps)
print("p-values: Average 0:.3f lowest 1:.3f".format(p_mean, min(ps)))
And it's not unusual for me to see something like:
p-values: Average 0.553 lowest 0.088
EDIT: It's also worth mentioning that a chi-square test on the same data consistently returns a p-value of 1.0.
statistics
edited Aug 28 at 10:07
asked Aug 21 at 10:05
PHenry
384
384
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
What you're seeing is what you should expect to see.
When a point-null hypothesis is true and the statistic has a continuous distribution (and the assumptions of the procedure all hold), then p-values should have a standard uniform distribution. i.e. they are exactly as likely to be near 0 as they are to be near 1. That this is true follows immediately from the definition of a p-value.
For example here's the p-values from ten thousand (equal-variance-) two-sample t-tests where the data came from standard normal distributions ($n_1=n_2=10$):
[As I said above, this uniformity is a direct consequence of the definition of the p-value.]
Under the alternative (if your test has power against the alternative), the p-values "crowd" down toward zero more but you can still observe occasional large p-values:
Here the difference in means was 1/5 of a standard deviation, but otherwise the conditions were as for the first histogram.
I'm not sure this answers my question.
â PHenry
Aug 28 at 9:56
The question in the title is totally answered by the fact that in that case the "p-values should have a standard uniform distribution", which is given in my answer and as I said there, is a consequence of the definition of the p-value. This makes small values exactly as likely as large values. Is there something about that which you would like explained?.
â Glen_b
Aug 28 at 11:45
I've made a small edit, which hopefully highlights the central point of my answer more clearly.
â Glen_b
Aug 28 at 12:43
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
What you're seeing is what you should expect to see.
When a point-null hypothesis is true and the statistic has a continuous distribution (and the assumptions of the procedure all hold), then p-values should have a standard uniform distribution. i.e. they are exactly as likely to be near 0 as they are to be near 1. That this is true follows immediately from the definition of a p-value.
For example here's the p-values from ten thousand (equal-variance-) two-sample t-tests where the data came from standard normal distributions ($n_1=n_2=10$):
[As I said above, this uniformity is a direct consequence of the definition of the p-value.]
Under the alternative (if your test has power against the alternative), the p-values "crowd" down toward zero more but you can still observe occasional large p-values:
Here the difference in means was 1/5 of a standard deviation, but otherwise the conditions were as for the first histogram.
I'm not sure this answers my question.
â PHenry
Aug 28 at 9:56
The question in the title is totally answered by the fact that in that case the "p-values should have a standard uniform distribution", which is given in my answer and as I said there, is a consequence of the definition of the p-value. This makes small values exactly as likely as large values. Is there something about that which you would like explained?.
â Glen_b
Aug 28 at 11:45
I've made a small edit, which hopefully highlights the central point of my answer more clearly.
â Glen_b
Aug 28 at 12:43
add a comment |Â
up vote
0
down vote
What you're seeing is what you should expect to see.
When a point-null hypothesis is true and the statistic has a continuous distribution (and the assumptions of the procedure all hold), then p-values should have a standard uniform distribution. i.e. they are exactly as likely to be near 0 as they are to be near 1. That this is true follows immediately from the definition of a p-value.
For example here's the p-values from ten thousand (equal-variance-) two-sample t-tests where the data came from standard normal distributions ($n_1=n_2=10$):
[As I said above, this uniformity is a direct consequence of the definition of the p-value.]
Under the alternative (if your test has power against the alternative), the p-values "crowd" down toward zero more but you can still observe occasional large p-values:
Here the difference in means was 1/5 of a standard deviation, but otherwise the conditions were as for the first histogram.
I'm not sure this answers my question.
â PHenry
Aug 28 at 9:56
The question in the title is totally answered by the fact that in that case the "p-values should have a standard uniform distribution", which is given in my answer and as I said there, is a consequence of the definition of the p-value. This makes small values exactly as likely as large values. Is there something about that which you would like explained?.
â Glen_b
Aug 28 at 11:45
I've made a small edit, which hopefully highlights the central point of my answer more clearly.
â Glen_b
Aug 28 at 12:43
add a comment |Â
up vote
0
down vote
up vote
0
down vote
What you're seeing is what you should expect to see.
When a point-null hypothesis is true and the statistic has a continuous distribution (and the assumptions of the procedure all hold), then p-values should have a standard uniform distribution. i.e. they are exactly as likely to be near 0 as they are to be near 1. That this is true follows immediately from the definition of a p-value.
For example here's the p-values from ten thousand (equal-variance-) two-sample t-tests where the data came from standard normal distributions ($n_1=n_2=10$):
[As I said above, this uniformity is a direct consequence of the definition of the p-value.]
Under the alternative (if your test has power against the alternative), the p-values "crowd" down toward zero more but you can still observe occasional large p-values:
Here the difference in means was 1/5 of a standard deviation, but otherwise the conditions were as for the first histogram.
What you're seeing is what you should expect to see.
When a point-null hypothesis is true and the statistic has a continuous distribution (and the assumptions of the procedure all hold), then p-values should have a standard uniform distribution. i.e. they are exactly as likely to be near 0 as they are to be near 1. That this is true follows immediately from the definition of a p-value.
For example here's the p-values from ten thousand (equal-variance-) two-sample t-tests where the data came from standard normal distributions ($n_1=n_2=10$):
[As I said above, this uniformity is a direct consequence of the definition of the p-value.]
Under the alternative (if your test has power against the alternative), the p-values "crowd" down toward zero more but you can still observe occasional large p-values:
Here the difference in means was 1/5 of a standard deviation, but otherwise the conditions were as for the first histogram.
edited Aug 28 at 12:42
answered Aug 25 at 3:39
Glen_b
1,9231023
1,9231023
I'm not sure this answers my question.
â PHenry
Aug 28 at 9:56
The question in the title is totally answered by the fact that in that case the "p-values should have a standard uniform distribution", which is given in my answer and as I said there, is a consequence of the definition of the p-value. This makes small values exactly as likely as large values. Is there something about that which you would like explained?.
â Glen_b
Aug 28 at 11:45
I've made a small edit, which hopefully highlights the central point of my answer more clearly.
â Glen_b
Aug 28 at 12:43
add a comment |Â
I'm not sure this answers my question.
â PHenry
Aug 28 at 9:56
The question in the title is totally answered by the fact that in that case the "p-values should have a standard uniform distribution", which is given in my answer and as I said there, is a consequence of the definition of the p-value. This makes small values exactly as likely as large values. Is there something about that which you would like explained?.
â Glen_b
Aug 28 at 11:45
I've made a small edit, which hopefully highlights the central point of my answer more clearly.
â Glen_b
Aug 28 at 12:43
I'm not sure this answers my question.
â PHenry
Aug 28 at 9:56
I'm not sure this answers my question.
â PHenry
Aug 28 at 9:56
The question in the title is totally answered by the fact that in that case the "p-values should have a standard uniform distribution", which is given in my answer and as I said there, is a consequence of the definition of the p-value. This makes small values exactly as likely as large values. Is there something about that which you would like explained?.
â Glen_b
Aug 28 at 11:45
The question in the title is totally answered by the fact that in that case the "p-values should have a standard uniform distribution", which is given in my answer and as I said there, is a consequence of the definition of the p-value. This makes small values exactly as likely as large values. Is there something about that which you would like explained?.
â Glen_b
Aug 28 at 11:45
I've made a small edit, which hopefully highlights the central point of my answer more clearly.
â Glen_b
Aug 28 at 12:43
I've made a small edit, which hopefully highlights the central point of my answer more clearly.
â Glen_b
Aug 28 at 12:43
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2889700%2fwhy-do-t-tests-on-the-same-distribution-can-give-me-a-p-value-near-zero%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password