My aim is to study "Sample distributions" via simulations convincing myself of the outcomes.
After struggling with questions here, here and here, finally I hope am somewhat getting my head around it.
I tried to simulate a population of bernoulli random variable Y, with a given probability for Y=1. I tried to sample from it, each time sample size being 'n', for 'N' experiments. The sampling distribution turned out to be normal as expected.
I tried to then show, the problem would not be a good fit for normal approximation, when np < 10 or nq < 10 because the distribution would be skewed. But when I simulate in python, I do not observe much skewness. The "extreme" curves were as good as some of intermediate curves, so I do not get it.
I tried increasing sample size, no of experiments, but no improvement. I tried using bar instead of hist, but in vain. I tried varying width of them but no use (in fact changing width affects size for another purpose - to maintain total height as 1 due to density=True)
Kindly check and let me know if there is any issue in the math I am doing.
Code:
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from SDSPSM import create_bernoulli_population, sample_for_SDSP
import numpy as np
from math import sqrt, pi
# control inputs
T = 10000
p_list = np.arange(0.1,1,0.1) # varying probability
N = 2000 # no of experiments
n = 50 # sample size
fig, ax = plt.subplots(1,1,figsize=(12,4))
plt.close()
def animate(i): ax.clear() p = round(p_list[i],4) # create population to be sampled from - Note pop has to be re created every time due to changing p pops = create_bernoulli_population(T, p) _, Y_mean_list = sample_for_SDSP(pops, N,n) # plot discrete density _, bins,_ = ax.hist(Y_mean_list, density=True) # normal approx mu, var, sigma = get_metrics(Y_mean_list) X = np.linspace(min(bins),max(bins),10*len(bins)) if sigma>0: Cp = 1/(sigma*sqrt(2*pi)) Ep = -1/2*((X-mu)/sigma)**2 G = Cp*np.exp(Ep) ax.plot(X, G, color='red') metrics_text = '$\mu_x:{}$ \n$\sigma_x:{}$'.format(mu, sigma) ax.text(0.97, 0.98,metrics_text,ha='right', va='top',transform = ax.transAxes,fontsize=10,color='red') ax.set_ylim([0,15]) ax.set_xlim([0,1]) np_factor = round(n*p,2) nq_factor = round(n*(1-p),2) metrics_text_2 = '$p:{}$ \n$np:{}$ \n$nq:{}$'.format(p, np_factor, nq_factor) ax.text(0.01, 0.98,metrics_text_2,ha='left', va='top',transform = ax.transAxes,fontsize=10,color='red')
plt.show()
ani1 = animation.FuncAnimation(fig, animate, frames = range(0,len(p_list)), interval=1000)
from IPython.display import HTML
def display_animation(anim): plt.close(anim._fig) return HTML(anim.to_html5_video())
display_animation(ani1)Helper file: SDSPSM
$\endgroup$ 12 Reset to default