Why Sampling distribution not skewed when np < 10 and nq < 10?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.



After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.



I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.



I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.



I tried increasing sample size, no of experiments, but no improvement.
I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)



Kindly check and let me know if there is any issue in the math I am doing.



Code:



import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi

# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability

N = 2000 # no of experiments
n = 50 # sample size


fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()


def animate(i):

ax.clear()

p = round(p_list[i],4)

# create population to be sampled from - Note pop has to be re created every time due to changing p
pops = create_bernoulli_population(T, p)

_, Y_mean_list = sample_for_SDSP(pops, N,n)

# plot discrete density
_, bins,_ = ax.hist(Y_mean_list, density=True)

# normal approx
mu, var, sigma = get_metrics(Y_mean_list)

X = np.linspace(min(bins),max(bins),10*len(bins))
if sigma>0:
Cp = 1/(sigma*sqrt(2*pi))
Ep = -1/2*((X-mu)/sigma)**2
G = Cp*np.exp(Ep)
ax.plot(X, G, color='red')
metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red')


ax.set_ylim([0,15])
ax.set_xlim([0,1])

np_factor = round(n*p,2)
nq_factor = round(n*(1-p),2)
metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')

plt.show()

ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)

from IPython.display import HTML
def display_animation(anim):
plt.close(anim._fig)
return HTML(anim.to_html5_video())

display_animation(ani1)


Helper file: SDSPSM



Output:
enter image description here







share|cite|improve this question




















  • "when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
    – Did
    Aug 13 at 6:05











  • khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
    – Paari Vendhan
    Aug 13 at 6:27











  • And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
    – Did
    Aug 13 at 6:37











  • jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
    – Paari Vendhan
    Aug 13 at 6:41










  • I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
    – Paari Vendhan
    Aug 13 at 6:43














up vote
0
down vote

favorite












My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.



After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.



I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.



I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.



I tried increasing sample size, no of experiments, but no improvement.
I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)



Kindly check and let me know if there is any issue in the math I am doing.



Code:



import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi

# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability

N = 2000 # no of experiments
n = 50 # sample size


fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()


def animate(i):

ax.clear()

p = round(p_list[i],4)

# create population to be sampled from - Note pop has to be re created every time due to changing p
pops = create_bernoulli_population(T, p)

_, Y_mean_list = sample_for_SDSP(pops, N,n)

# plot discrete density
_, bins,_ = ax.hist(Y_mean_list, density=True)

# normal approx
mu, var, sigma = get_metrics(Y_mean_list)

X = np.linspace(min(bins),max(bins),10*len(bins))
if sigma>0:
Cp = 1/(sigma*sqrt(2*pi))
Ep = -1/2*((X-mu)/sigma)**2
G = Cp*np.exp(Ep)
ax.plot(X, G, color='red')
metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red')


ax.set_ylim([0,15])
ax.set_xlim([0,1])

np_factor = round(n*p,2)
nq_factor = round(n*(1-p),2)
metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')

plt.show()

ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)

from IPython.display import HTML
def display_animation(anim):
plt.close(anim._fig)
return HTML(anim.to_html5_video())

display_animation(ani1)


Helper file: SDSPSM



Output:
enter image description here







share|cite|improve this question




















  • "when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
    – Did
    Aug 13 at 6:05











  • khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
    – Paari Vendhan
    Aug 13 at 6:27











  • And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
    – Did
    Aug 13 at 6:37











  • jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
    – Paari Vendhan
    Aug 13 at 6:41










  • I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
    – Paari Vendhan
    Aug 13 at 6:43












up vote
0
down vote

favorite









up vote
0
down vote

favorite











My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.



After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.



I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.



I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.



I tried increasing sample size, no of experiments, but no improvement.
I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)



Kindly check and let me know if there is any issue in the math I am doing.



Code:



import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi

# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability

N = 2000 # no of experiments
n = 50 # sample size


fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()


def animate(i):

ax.clear()

p = round(p_list[i],4)

# create population to be sampled from - Note pop has to be re created every time due to changing p
pops = create_bernoulli_population(T, p)

_, Y_mean_list = sample_for_SDSP(pops, N,n)

# plot discrete density
_, bins,_ = ax.hist(Y_mean_list, density=True)

# normal approx
mu, var, sigma = get_metrics(Y_mean_list)

X = np.linspace(min(bins),max(bins),10*len(bins))
if sigma>0:
Cp = 1/(sigma*sqrt(2*pi))
Ep = -1/2*((X-mu)/sigma)**2
G = Cp*np.exp(Ep)
ax.plot(X, G, color='red')
metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red')


ax.set_ylim([0,15])
ax.set_xlim([0,1])

np_factor = round(n*p,2)
nq_factor = round(n*(1-p),2)
metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')

plt.show()

ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)

from IPython.display import HTML
def display_animation(anim):
plt.close(anim._fig)
return HTML(anim.to_html5_video())

display_animation(ani1)


Helper file: SDSPSM



Output:
enter image description here







share|cite|improve this question












My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.



After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.



I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.



I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.



I tried increasing sample size, no of experiments, but no improvement.
I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)



Kindly check and let me know if there is any issue in the math I am doing.



Code:



import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi

# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability

N = 2000 # no of experiments
n = 50 # sample size


fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()


def animate(i):

ax.clear()

p = round(p_list[i],4)

# create population to be sampled from - Note pop has to be re created every time due to changing p
pops = create_bernoulli_population(T, p)

_, Y_mean_list = sample_for_SDSP(pops, N,n)

# plot discrete density
_, bins,_ = ax.hist(Y_mean_list, density=True)

# normal approx
mu, var, sigma = get_metrics(Y_mean_list)

X = np.linspace(min(bins),max(bins),10*len(bins))
if sigma>0:
Cp = 1/(sigma*sqrt(2*pi))
Ep = -1/2*((X-mu)/sigma)**2
G = Cp*np.exp(Ep)
ax.plot(X, G, color='red')
metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red')


ax.set_ylim([0,15])
ax.set_xlim([0,1])

np_factor = round(n*p,2)
nq_factor = round(n*(1-p),2)
metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')

plt.show()

ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)

from IPython.display import HTML
def display_animation(anim):
plt.close(anim._fig)
return HTML(anim.to_html5_video())

display_animation(ani1)


Helper file: SDSPSM



Output:
enter image description here









share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Aug 13 at 5:55









Paari Vendhan

6417




6417











  • "when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
    – Did
    Aug 13 at 6:05











  • khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
    – Paari Vendhan
    Aug 13 at 6:27











  • And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
    – Did
    Aug 13 at 6:37











  • jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
    – Paari Vendhan
    Aug 13 at 6:41










  • I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
    – Paari Vendhan
    Aug 13 at 6:43
















  • "when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
    – Did
    Aug 13 at 6:05











  • khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
    – Paari Vendhan
    Aug 13 at 6:27











  • And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
    – Did
    Aug 13 at 6:37











  • jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
    – Paari Vendhan
    Aug 13 at 6:41










  • I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
    – Paari Vendhan
    Aug 13 at 6:43















"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
– Did
Aug 13 at 6:05





"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
– Did
Aug 13 at 6:05













khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
– Paari Vendhan
Aug 13 at 6:27





khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
– Paari Vendhan
Aug 13 at 6:27













And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
– Did
Aug 13 at 6:37





And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
– Did
Aug 13 at 6:37













jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
– Paari Vendhan
Aug 13 at 6:41




jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
– Paari Vendhan
Aug 13 at 6:41












I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
– Paari Vendhan
Aug 13 at 6:43




I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
– Paari Vendhan
Aug 13 at 6:43















active

oldest

votes











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2881045%2fwhy-sampling-distribution-not-skewed-when-np-10-and-nq-10%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes










 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2881045%2fwhy-sampling-distribution-not-skewed-when-np-10-and-nq-10%23new-answer', 'question_page');

);

Post as a guest













































































這個網誌中的熱門文章

How to combine Bézier curves to a surface?

Mutual Information Always Non-negative

Why am i infinitely getting the same tweet with the Twitter Search API?