MDI Data Workshop

Monte Carlo Simulation and Resampling Methods

Le Bao

Massive Data Institute, Georgetown University

February 16, 2022

MDI Events

  • The goal of MDI workshops

  • MDI Workshop Spring

    • Monte Carlo simulations and resampling methods
    • Bayesian Simulation with Dr. Nathan Wycoff on March 14 & 15
    • Causal Inference with Dr. J.J. Naddeo on April 18 & 19

  • MDI Distinguished Lecture on March 23rd (stay tuned!)

Workshop Materials

 

Slides, Google Colab are available at:

https://bit.ly/mdi-sim

 

Motivation

  • Why simulation?
    • We want the computer to do hard work that we don’t want to.
  • Comparing to (deterministic) numerical methods:
    • These methods can resemble the repeated sampling/data generation process.
      • The results reflect probabilistic aspects of the problem.
    • They allow for simultaneously estimation of several features.
      • High-dimension, multi-modal data, etc.
  • The advantages of simulation methods:
    • Flexible: These tools allow us to control random processes with counting and drawing.
    • Simple: Most of these ideas are mathematically simple and easy to do via programming.
    • Intuitive: It is very intuitive for illustration of problems and learning statistics.
  • Plan:
    • Monte Carlo simulation: MC integration, rejection sampling, model simulations, etc.
    • Resampling methods: bootstraping, permutation tests, cross-validation, etc.

Some Background

  • Law of Large Numbers:
    • The average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer to the expected value as more trials are performed.
      • vs. “Big Data Paradox” (Meng, 2018)

Some Background

  • Central Limit Theorem:
    • The sum of many small independent random variables will be a random variable with an approximate normal distribution, (even if the original variables themselves are not normally distributed).

Experiment: Law of Large Numbers

  • Law of Large Numbers
np.random.randint(0,2)
0
coins = np.random.randint(low=0,high=2,size=20)
print(coins)
[1 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 1 0 0 0]
np.average(coins)
0.35
coins = np.random.randint(low=0,high=2,size=1000)
np.average(coins)
0.476

Experiment: Central Limit Theorem

  • Central Limit Theorem
np.random.uniform(0, 1)
0.1005848227813454
num = np.random.uniform(0, 1, size = 20)
print(num[0:5])
[0.53896269 0.6107311  0.63005872 0.34595959 0.86314885]
mu = np.average(num)
print(mu)
0.4759697612263528
num = np.random.uniform(0, 1, size = 1000)
np.average(num)
0.5065218284679716

Monte Carlo Method: A Little History

      Stanislaw Ulam (1909-1984)