Optimism

Optimism of the Training Error Rate

Errin=1Ni=1NEY0[L(Yi0,f^(xi))T]Err_{in}=\dfrac{1}{N}\sum^N_{i=1}E_{Y_0}[L(Y_i^0,\hat{f}(x_i))|\mathcal{T}]

The Y0Y^0notation indicates that we observe N new response values at each of the training points xi,i=1,2,,Nx_i, i=1,2,\dots,N.

Errextra=Errin+Errout  of=(errˉ+op)+Errout  ofErr_{extra}=Err_{in}+Err_{out\;of}=(\bar{err}+op)+Err_{out\;of}

Extra sample error can be decomposed into in-sample error and out of sample error. In sample error is the sum of training error and optimism.

opErrinerrˉop\equiv Err_{in}-\bar{err}

(Because ErrinErr_{in}and opopare random quantity, we use triple equal sign.) It means opophas the distribution same as ErrinerrˉErr_{in}-\bar{err}. Let's get the expected op to predict it.

ωEy(op)\omega\equiv E_y(op)

More concisely, in a right side this is conditioned on T\mathcal{T} but we can just use y(X is given input, we only need to consider random quantity y). In several loss functions like squared error and 0-1 ω\omegasatisfies the following equation.

ω=2Ni=1NCov(y^i,yi)\omega=\dfrac{2}{N}\sum^N_{i=1}Cov(\hat{y}_i,y_i)

Proof: https://stats.stackexchange.com/questions/88912/optimism-bias-estimates-of-prediction-error

Cov(Y^,Y)=Cov(Y^,Y^+ϵ)=Cov(Y^)=Cov(HY)=HCov(Y)HTCov(y^i,yi)=[HHT]iiσ2i=1NCov(y^i,yi)=[X(XTX)1XT]iiσ2=trace(X)σ2=dσ2Cov(\hat{Y},Y)=Cov(\hat{Y},\hat{Y}+\epsilon)=Cov(\hat{Y})=Cov(HY)=HCov(Y)H^T \\ Cov(\hat{y}_i,y_i)=[HH^T]_{ii}\sigma^2 \\ \sum^N_{i=1}Cov(\hat{y}_i,y_i)=\sum[X(X^TX)^{-1}X^T]_{ii}\sigma^2=trace(X)\sigma^2=d\sigma^2
Ey(Errin)=Ey(errˉ)+2NCov(y^i,yi)=Ey(errˉ)+2dNσϵ2E_y(Err_{in})=E_y(\bar{err})+\frac{2}{N}\sum Cov(\hat{y}_i,y_i)=E_y(\bar{err})+2*\frac{d}{N}\sigma^2_\epsilon

Expected in-sample error. Trace becomes d because of effective number of parameters.

An obvious way to estimate prediction error is to estimate the optimism and then add it to the training error.

To estimate In-sample error we estimate expected In-sample error

Estimates of In-sample Prediction Error

Errin^=errˉ+w^\hat{Err_{in}}=\bar{err}+\hat{w}

In-sample error estimation = ω\omega(average optimism) estimation.

The way omega is estimated decide following equations:

Cp

Cp=errˉ+2dNσ^ϵ2C_p=\bar{err}+2*\dfrac{d}{N}\hat{\sigma}^2_\epsilon

AIC

AIC(Akaike Information Criterion) uses log-likelihood loss function. 아래의 식은 Expected in-sample error에 관한 식이다.

Proof: http://faculty.washington.edu/yenchic/19A_stat535/Lec7_model.pdf

The idea of AIC is to adjust the empirical risk to be an unbiased estimator of the true risk in a parametric model. Under a likelihood framework, the loss function is the negative log-likelihood function

Logistic regression model(binomial log likelihood)

AIC=2Nloglik+2dNAIC=-\frac{2}{N}loglik+2\frac{d}{N}

Gaussian model

AIC=errˉ+2dNσ^ϵ2AIC=\bar{err}+2\frac{d}{N}\hat{\sigma}^2_\epsilon

With tuning parameter α\alpha

AIC(α)=errˉ(α)+2d(α)Nσ^ϵ2AIC(\alpha)=\bar{err}(\alpha)+2\frac{d(\alpha)}{N}\hat{\sigma}^2_\epsilon

The aim is to find alpha that minimizes the value above using our estimated value of test error.

Last updated

Was this helpful?