On Cross Validation

Setting

  • We have models $\mathcal{M}_1, …, \mathcal{M}_k$.
  • There are $2n$ data points
  • Split the data randomly into two: $D = (Y_1, …, Y_n)$ and $T = (Y^_1,…,Y^_n)$.

Definition 1

A random variable $X$ with mean $\mu = E[X]$ is sub-Gaussian if there is a positive number $\sigma$ such that

$$E[e^{\lambda (X - \mu)}] \le e^{\frac{\sigma^2 \lambda^2}{2}}$$

Step 1:

  • Find MLE $\hat{\theta}_j$ using $D$.

(WIP)

Reference

AIC,BIC,Cross-Validation:

Concentration inequalities: