We want the computer to do hard work that we don’t want to.
Comparing to (deterministic) numerical methods:
These methods can resemble the repeated sampling/data generation process.
The results reflect probabilistic aspects of the problem.
They allow for simultaneously estimation of several features.
High-dimension, multi-modal data, etc.
The advantages of simulation methods:
Flexible: These tools allow us to control random processes with counting and drawing.
Simple: Most of these ideas are mathematically simple and easy to do via programming.
Intuitive: It is very intuitive for illustration of problems and learning statistics.
Plan:
Monte Carlo simulation: MC integration, rejection sampling, model simulations, etc.
Resampling methods: bootstraping, permutation tests, cross-validation, etc.
Some Background
Law of Large Numbers:
The average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer to the expected value as more trials are performed.
vs. “Big Data Paradox” (Meng, 2018)
Some Background
Central Limit Theorem:
The sum of many small independent random variables will be a random variable with an approximate normal distribution, (even if the original variables themselves are not normally distributed).