Cross validation
CV(f^​)=N1​i=1∑N​L(yi​,f^​−κ(i)xi​))CV(f^​,α)=N1​i=1∑N​L(yi​,f^​−κ(i)xi​,α)) When f is linear, it follows that
N1​i=1∑N​[yi​−f^​−i(xi​)]2=N1​i=1∑N​[1−Sii​yi​−f^​(xi​)​]2 In left side we pick out one value of our data, but in right side there is no picking out. It means without actual removing process we can calculate the value of left side.
i=1∑N​(yi​−f^​(xi​))2≤i=1∑N​(yi​−f(xi​))2f^​(k)=argminf​iî€ =k∑n​(yi​−f(xi​))2iî€ =k∑N​(yi​−f^​(k)(xi​))2≤iî€ =k∑N​(yi​−f(xi​))2 Boostrap Methods
Training point Z=(z1​,…,zN​). We randomly pick data set in replacement. We want to estimate some aspects of the distribution of S(Z).
Var^[S(Z)]=B−11​b=1∑B​(S(Z∗b)−Sˉ∗)2 Our estimate is below:
Err^boot​=B1​N1​b=1∑B​i=1∑N​L(yi​,f^​∗b(xi​)) The bootstrap datasets are acting as the training samples, but the original training set is acting as the test sample, and these two samples have observations in common. This overlap can make overfit predictions.
Err^(1)=N1​i=1∑N​∣C−i∣1​b∈C−i∑​L(yi​,f^​∗b(xi​))Err^.632=.368errˉ+.632Err^(1)