Cross validation
CV(f^β)=N1βi=1βNβL(yiβ,f^ββΞΊ(i)xiβ))CV(f^β,Ξ±)=N1βi=1βNβL(yiβ,f^ββΞΊ(i)xiβ,Ξ±)) When f is linear, it follows that
N1βi=1βNβ[yiββf^ββi(xiβ)]2=N1βi=1βNβ[1βSiiβyiββf^β(xiβ)β]2 In left side we pick out one value of our data, but in right side there is no picking out. It means without actual removing process we can calculate the value of left side.
i=1βNβ(yiββf^β(xiβ))2β€i=1βNβ(yiββf(xiβ))2f^β(k)=argminfβiξ =kβnβ(yiββf(xiβ))2iξ =kβNβ(yiββf^β(k)(xiβ))2β€iξ =kβNβ(yiββf(xiβ))2 Boostrap Methods
Training point Z=(z1β,β¦,zNβ). We randomly pick data set in replacement. We want to estimate some aspects of the distribution of S(Z).
Var^[S(Z)]=Bβ11βb=1βBβ(S(Zβb)βSΛβ)2 Our estimate is below:
Err^bootβ=B1βN1βb=1βBβi=1βNβL(yiβ,f^ββb(xiβ)) The bootstrap datasets are acting as the training samples, but the original training set is acting as the test sample, and these two samples have observations in common. This overlap can make overfit predictions.
Err^(1)=N1βi=1βNββ£Cβiβ£1βbβCβiββL(yiβ,f^ββb(xiβ))Err^.632=.368errΛ+.632Err^(1)