Why Sampling distribution not skewed when np < 10 and nq < 10?
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.
After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.
I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.
I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.
I tried increasing sample size, no of experiments, but no improvement.
I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)
Kindly check and let me know if there is any issue in the math I am doing.
Code:
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi
# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability
N = 2000 # no of experiments
n = 50 # sample size
fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()
def animate(i):
ax.clear()
p = round(p_list[i],4)
# create population to be sampled from - Note pop has to be re created every time due to changing p
pops = create_bernoulli_population(T, p)
_, Y_mean_list = sample_for_SDSP(pops, N,n)
# plot discrete density
_, bins,_ = ax.hist(Y_mean_list, density=True)
# normal approx
mu, var, sigma = get_metrics(Y_mean_list)
X = np.linspace(min(bins),max(bins),10*len(bins))
if sigma>0:
Cp = 1/(sigma*sqrt(2*pi))
Ep = -1/2*((X-mu)/sigma)**2
G = Cp*np.exp(Ep)
ax.plot(X, G, color='red')
metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red')
ax.set_ylim([0,15])
ax.set_xlim([0,1])
np_factor = round(n*p,2)
nq_factor = round(n*(1-p),2)
metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')
plt.show()
ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)
from IPython.display import HTML
def display_animation(anim):
plt.close(anim._fig)
return HTML(anim.to_html5_video())
display_animation(ani1)
Helper file: SDSPSM
Output:
probability probability-distributions normal-distribution sampling python
 |Â
show 7 more comments
up vote
0
down vote
favorite
My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.
After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.
I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.
I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.
I tried increasing sample size, no of experiments, but no improvement.
I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)
Kindly check and let me know if there is any issue in the math I am doing.
Code:
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi
# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability
N = 2000 # no of experiments
n = 50 # sample size
fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()
def animate(i):
ax.clear()
p = round(p_list[i],4)
# create population to be sampled from - Note pop has to be re created every time due to changing p
pops = create_bernoulli_population(T, p)
_, Y_mean_list = sample_for_SDSP(pops, N,n)
# plot discrete density
_, bins,_ = ax.hist(Y_mean_list, density=True)
# normal approx
mu, var, sigma = get_metrics(Y_mean_list)
X = np.linspace(min(bins),max(bins),10*len(bins))
if sigma>0:
Cp = 1/(sigma*sqrt(2*pi))
Ep = -1/2*((X-mu)/sigma)**2
G = Cp*np.exp(Ep)
ax.plot(X, G, color='red')
metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red')
ax.set_ylim([0,15])
ax.set_xlim([0,1])
np_factor = round(n*p,2)
nq_factor = round(n*(1-p),2)
metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')
plt.show()
ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)
from IPython.display import HTML
def display_animation(anim):
plt.close(anim._fig)
return HTML(anim.to_html5_video())
display_animation(ani1)
Helper file: SDSPSM
Output:
probability probability-distributions normal-distribution sampling python
"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â Did
Aug 13 at 6:05
khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â Paari Vendhan
Aug 13 at 6:27
And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â Did
Aug 13 at 6:37
jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â Paari Vendhan
Aug 13 at 6:41
I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â Paari Vendhan
Aug 13 at 6:43
 |Â
show 7 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.
After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.
I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.
I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.
I tried increasing sample size, no of experiments, but no improvement.
I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)
Kindly check and let me know if there is any issue in the math I am doing.
Code:
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi
# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability
N = 2000 # no of experiments
n = 50 # sample size
fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()
def animate(i):
ax.clear()
p = round(p_list[i],4)
# create population to be sampled from - Note pop has to be re created every time due to changing p
pops = create_bernoulli_population(T, p)
_, Y_mean_list = sample_for_SDSP(pops, N,n)
# plot discrete density
_, bins,_ = ax.hist(Y_mean_list, density=True)
# normal approx
mu, var, sigma = get_metrics(Y_mean_list)
X = np.linspace(min(bins),max(bins),10*len(bins))
if sigma>0:
Cp = 1/(sigma*sqrt(2*pi))
Ep = -1/2*((X-mu)/sigma)**2
G = Cp*np.exp(Ep)
ax.plot(X, G, color='red')
metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red')
ax.set_ylim([0,15])
ax.set_xlim([0,1])
np_factor = round(n*p,2)
nq_factor = round(n*(1-p),2)
metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')
plt.show()
ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)
from IPython.display import HTML
def display_animation(anim):
plt.close(anim._fig)
return HTML(anim.to_html5_video())
display_animation(ani1)
Helper file: SDSPSM
Output:
probability probability-distributions normal-distribution sampling python
My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.
After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.
I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.
I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.
I tried increasing sample size, no of experiments, but no improvement.
I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)
Kindly check and let me know if there is any issue in the math I am doing.
Code:
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi
# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability
N = 2000 # no of experiments
n = 50 # sample size
fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()
def animate(i):
ax.clear()
p = round(p_list[i],4)
# create population to be sampled from - Note pop has to be re created every time due to changing p
pops = create_bernoulli_population(T, p)
_, Y_mean_list = sample_for_SDSP(pops, N,n)
# plot discrete density
_, bins,_ = ax.hist(Y_mean_list, density=True)
# normal approx
mu, var, sigma = get_metrics(Y_mean_list)
X = np.linspace(min(bins),max(bins),10*len(bins))
if sigma>0:
Cp = 1/(sigma*sqrt(2*pi))
Ep = -1/2*((X-mu)/sigma)**2
G = Cp*np.exp(Ep)
ax.plot(X, G, color='red')
metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red')
ax.set_ylim([0,15])
ax.set_xlim([0,1])
np_factor = round(n*p,2)
nq_factor = round(n*(1-p),2)
metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')
plt.show()
ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)
from IPython.display import HTML
def display_animation(anim):
plt.close(anim._fig)
return HTML(anim.to_html5_video())
display_animation(ani1)
Helper file: SDSPSM
Output:
probability probability-distributions normal-distribution sampling python
asked Aug 13 at 5:55
Paari Vendhan
6417
6417
"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â Did
Aug 13 at 6:05
khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â Paari Vendhan
Aug 13 at 6:27
And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â Did
Aug 13 at 6:37
jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â Paari Vendhan
Aug 13 at 6:41
I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â Paari Vendhan
Aug 13 at 6:43
 |Â
show 7 more comments
"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â Did
Aug 13 at 6:05
khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â Paari Vendhan
Aug 13 at 6:27
And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â Did
Aug 13 at 6:37
jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â Paari Vendhan
Aug 13 at 6:41
I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â Paari Vendhan
Aug 13 at 6:43
"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â Did
Aug 13 at 6:05
"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â Did
Aug 13 at 6:05
khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â Paari Vendhan
Aug 13 at 6:27
khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â Paari Vendhan
Aug 13 at 6:27
And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â Did
Aug 13 at 6:37
And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â Did
Aug 13 at 6:37
jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â Paari Vendhan
Aug 13 at 6:41
jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â Paari Vendhan
Aug 13 at 6:41
I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â Paari Vendhan
Aug 13 at 6:43
I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â Paari Vendhan
Aug 13 at 6:43
 |Â
show 7 more comments
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2881045%2fwhy-sampling-distribution-not-skewed-when-np-10-and-nq-10%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â Did
Aug 13 at 6:05
khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â Paari Vendhan
Aug 13 at 6:27
And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â Did
Aug 13 at 6:37
jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â Paari Vendhan
Aug 13 at 6:41
I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â Paari Vendhan
Aug 13 at 6:43