Why Sampling distribution not skewed when np &#60; 10 and nq &#60; 10?

Why Sampling distribution not skewed when np < 10 and nq < 10?

up vote
0
down vote

favorite

My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.

After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.

I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.

I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.

I tried increasing sample size, no of experiments, but no improvement.
I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)

Kindly check and let me know if there is any issue in the math I am doing.

Code:

import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi

# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability

N = 2000 # no of experiments
n = 50 # sample size


fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()


def animate(i): 

 ax.clear()

 p = round(p_list[i],4)

 # create population to be sampled from - Note pop has to be re created every time due to changing p
 pops = create_bernoulli_population(T, p) 

 _, Y_mean_list = sample_for_SDSP(pops, N,n)

 # plot discrete density
 _, bins,_ = ax.hist(Y_mean_list, density=True)

 # normal approx
 mu, var, sigma = get_metrics(Y_mean_list)

 X = np.linspace(min(bins),max(bins),10*len(bins))
 if sigma>0:
 Cp = 1/(sigma*sqrt(2*pi))
 Ep = -1/2*((X-mu)/sigma)**2
 G = Cp*np.exp(Ep) 
 ax.plot(X, G, color='red') 
 metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
 ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red') 


 ax.set_ylim([0,15])
 ax.set_xlim([0,1])

 np_factor = round(n*p,2)
 nq_factor = round(n*(1-p),2)
 metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
 ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red') 

plt.show()

ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)

from IPython.display import HTML
def display_animation(anim):
 plt.close(anim._fig)
 return HTML(anim.to_html5_video())

display_animation(ani1)

Helper file: SDSPSM

Output:
enter image description here

asked Aug 13 at 5:55

Paari Vendhan

6417

"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â€“Â Did
Aug 13 at 6:05

khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â€“Â Paari Vendhan
Aug 13 at 6:27

And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â€“Â Did
Aug 13 at 6:37

jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â€“Â Paari Vendhan
Aug 13 at 6:41

I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â€“Â Paari Vendhan
Aug 13 at 6:43

Â |Â
show 7 more comments

up vote
0
down vote

favorite

My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.

After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.

Kindly check and let me know if there is any issue in the math I am doing.

Code:

import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi

# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability

N = 2000 # no of experiments
n = 50 # sample size


fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()


def animate(i): 

 ax.clear()

 p = round(p_list[i],4)

 # create population to be sampled from - Note pop has to be re created every time due to changing p
 pops = create_bernoulli_population(T, p) 

 _, Y_mean_list = sample_for_SDSP(pops, N,n)

 # plot discrete density
 _, bins,_ = ax.hist(Y_mean_list, density=True)

 # normal approx
 mu, var, sigma = get_metrics(Y_mean_list)

 X = np.linspace(min(bins),max(bins),10*len(bins))
 if sigma>0:
 Cp = 1/(sigma*sqrt(2*pi))
 Ep = -1/2*((X-mu)/sigma)**2
 G = Cp*np.exp(Ep) 
 ax.plot(X, G, color='red') 
 metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
 ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red') 


 ax.set_ylim([0,15])
 ax.set_xlim([0,1])

 np_factor = round(n*p,2)
 nq_factor = round(n*(1-p),2)
 metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
 ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red') 

plt.show()

ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)

from IPython.display import HTML
def display_animation(anim):
 plt.close(anim._fig)
 return HTML(anim.to_html5_video())

display_animation(ani1)

Helper file: SDSPSM

Output:
enter image description here

asked Aug 13 at 5:55

Paari Vendhan

6417

"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â€“Â Did
Aug 13 at 6:05

khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â€“Â Paari Vendhan
Aug 13 at 6:27

And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â€“Â Did
Aug 13 at 6:37

jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â€“Â Paari Vendhan
Aug 13 at 6:41

I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â€“Â Paari Vendhan
Aug 13 at 6:43

Â |Â
show 7 more comments

up vote
0
down vote

favorite

My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.

After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.

Kindly check and let me know if there is any issue in the math I am doing.

Code:

import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi

# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability

N = 2000 # no of experiments
n = 50 # sample size


fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()


def animate(i): 

 ax.clear()

 p = round(p_list[i],4)

 # create population to be sampled from - Note pop has to be re created every time due to changing p
 pops = create_bernoulli_population(T, p) 

 _, Y_mean_list = sample_for_SDSP(pops, N,n)

 # plot discrete density
 _, bins,_ = ax.hist(Y_mean_list, density=True)

 # normal approx
 mu, var, sigma = get_metrics(Y_mean_list)

 X = np.linspace(min(bins),max(bins),10*len(bins))
 if sigma>0:
 Cp = 1/(sigma*sqrt(2*pi))
 Ep = -1/2*((X-mu)/sigma)**2
 G = Cp*np.exp(Ep) 
 ax.plot(X, G, color='red') 
 metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
 ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red') 


 ax.set_ylim([0,15])
 ax.set_xlim([0,1])

 np_factor = round(n*p,2)
 nq_factor = round(n*(1-p),2)
 metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
 ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red') 

plt.show()

ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)

from IPython.display import HTML
def display_animation(anim):
 plt.close(anim._fig)
 return HTML(anim.to_html5_video())

display_animation(ani1)

Helper file: SDSPSM

Output:
enter image description here

asked Aug 13 at 5:55

Paari Vendhan

6417

My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.

After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.

Kindly check and let me know if there is any issue in the math I am doing.

Code:

import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi

# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability

N = 2000 # no of experiments
n = 50 # sample size


fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()


def animate(i): 

 ax.clear()

 p = round(p_list[i],4)

 # create population to be sampled from - Note pop has to be re created every time due to changing p
 pops = create_bernoulli_population(T, p) 

 _, Y_mean_list = sample_for_SDSP(pops, N,n)

 # plot discrete density
 _, bins,_ = ax.hist(Y_mean_list, density=True)

 # normal approx
 mu, var, sigma = get_metrics(Y_mean_list)

 X = np.linspace(min(bins),max(bins),10*len(bins))
 if sigma>0:
 Cp = 1/(sigma*sqrt(2*pi))
 Ep = -1/2*((X-mu)/sigma)**2
 G = Cp*np.exp(Ep) 
 ax.plot(X, G, color='red') 
 metrics_text = '$mu_x:$ n$sigma_x:$'.format(mu, sigma)
 ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red') 


 ax.set_ylim([0,15])
 ax.set_xlim([0,1])

 np_factor = round(n*p,2)
 nq_factor = round(n*(1-p),2)
 metrics_text_2 = '$p:$ n$np:$ n$nq:$'.format(p, np_factor, nq_factor)
 ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red') 

plt.show()

ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)

from IPython.display import HTML
def display_animation(anim):
 plt.close(anim._fig)
 return HTML(anim.to_html5_video())

display_animation(ani1)

Helper file: SDSPSM

Output:
enter image description here

asked Aug 13 at 5:55

Paari Vendhan

6417

asked Aug 13 at 5:55

Paari Vendhan

6417

asked Aug 13 at 5:55

Paari Vendhan

6417

asked Aug 13 at 5:55

Paari Vendhan

6417

"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â€“Â Did
Aug 13 at 6:05

khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â€“Â Paari Vendhan
Aug 13 at 6:27

And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â€“Â Did
Aug 13 at 6:37

jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â€“Â Paari Vendhan
Aug 13 at 6:41

I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â€“Â Paari Vendhan
Aug 13 at 6:43

Â |Â
show 7 more comments

"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â€“Â Did
Aug 13 at 6:05

khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â€“Â Paari Vendhan
Aug 13 at 6:27

And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â€“Â Did
Aug 13 at 6:37

jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â€“Â Paari Vendhan
Aug 13 at 6:41

I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â€“Â Paari Vendhan
Aug 13 at 6:43

"when np < 10 or nq < 10 because the distribution would be skewed" No it would not (be skewed). What makes you think it would be?
â€“Â Did
Aug 13 at 6:05

khan academy: youtu.be/L7AX2RcbqCg. They said thumb rule, but I wanted to verify and convince myself
â€“Â Paari Vendhan
Aug 13 at 6:27

And their diagrams simply allude to the fact that the mode of the binomial distribution Bin$(n,p)$ is located at $k<frac n2$ if $p<frac12$ and at $k>frac n2$ if $p>frac12$. For every $(n,p)$, the mode is located exactly at $k=lfloor np-1+prfloor$, thus, roughly at $np$. Thus, sorry but you misread this hugely.
â€“Â Did
Aug 13 at 6:37

jbstatistics proves the skewness here: youtu.be/fuGwbG9_W1c
â€“Â Paari Vendhan
Aug 13 at 6:41

I could have probably proved easier, if my population distribution was binomial in first place (so sampling distribution is easily normal), but I wanted to take general case (bernoulli, which is least binomial for sampling distribution, and random distribution for sample means) to prove the issue if the conditions are not met.
â€“Â Paari Vendhan
Aug 13 at 6:43

Â |Â
show 7 more comments

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2881045%2fwhy-sampling-distribution-not-skewed-when-np-10-and-nq-10%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

Vtyjkyuk

Why Sampling distribution not skewed when np < 10 and nq < 10?

Your Answer

Post as a guest

Post as a guest

這個網誌中的熱門文章

Drama (film and television)

Distribution of Stopped Wiener Process with Stochastic Volatility

How to combine BÃ©zier curves to a surface?

Why Sampling distribution not skewed when np < 10 and nq < 10?

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

這個網誌中的熱門文章

Drama (film and television)

Distribution of Stopped Wiener Process with Stochastic Volatility

How to combine BÃ©zier curves to a surface?