# Simulations¶

Ha Khanh Nguyen (hknguyen)

## 1. Sampling from a Known Distribution¶

### 1.1 Binomial distribution¶

• What does this function does?
• Let's say $X$ = the number of heads when flipping 5 loaded coins with the probability of getting a head is 0.2.
• $X$ ~ Binom(n=5, p=0.2)
• binom.rvs(n=5, p=0.2, size=1) returns a possible value of $X$ if we do this experiment exactly ONCE.
• Then, binom.rvs(n=5, p=0.2, size=10) returns 10 possible values of $X$, each value corresponds to each time we perform the experiment.
• You might wonder how did Python come up with these numbers?
• Remember the probability distribution? (the table) They use that to randomly select a value from 0 to 5 as the output.
• So $P(X = 5)$ = binom.pmf(k=2, n=5, p=0.2) = 0.2048, so the probability of "picking" 2 as the outcome of the experiment is around 20%!

Why do we need this?

• rvs() stands for random variates.
• We use this for simulation studies.

### 1.3 Normal Distribution¶

NOTES: The output of the rvs() functions are random (actually pseudo-random), so each time you run the function call, you will get a different result UNLESS you set the random state (also called seed).

## 2. Law of Large Numbers¶

• Let's write a simulation on Law of Large Numbers.
• So, Law of Large Numbers state that when the sample size $n$ increases, the sample mean $\bar{X}$ tends to get closer and closer to the population mean $\mu$.
• Assume the population we're considering is a Normal distribution with mean 0 and standard deviation 2.
• Let's take a sample of size 10.
• Now calculate sample mean!
• Now increase the sample size!
• The plot below shows how the sample mean changes as the sample size $n$ increases.

## 3. Central Limit Theorem¶

### 3.1 Binomial Distribution Example¶

• Assume the population follows a Binomial distribution with $n = 20$ and $p = 0.2$.
• This Binomial distribution is plotted below:
• Let's take a sample of size 10 from this distribution and calculate the sample mean:
• Repeat the process 10000 times!
• Let's try a sample of size 30 instead!
• Now, let's calculate the mean and standard deviation of sample_means (the list of 10000 sample means computed from the samples)!
• To do this, we will use the functions provided in the NumPy library:
• Compute the population mean and standard deviation:
• Does this follow the result of Central Limit Theorem?

### 3.2 Unnamed Discrete Distribution Example¶

• Not all the population we work with will follow a named probability distribution (like Binomial, Geometric or Normal, etc.).
• Sometimes, the probability distribution will be given to us in a table format (like the examples in Discrete Random Variables lecture).
• In this example, let's assume the population follows the following distribution:
X P(X)
0 0.1
1 0.25
2 0.05
3 0.2
4 0.35
5 0.05
• To sample 10 observations from this population, we will use the choices() function from the random library.
• The random library will come up a lot when working with simulations.
• The documentations for the built-in random library can be found here.
• Repeat this process 10000 times and store the sample means in a list called sample_means.