The Bias-Variance Decomposition
Let's think about one specific point. To deal with the relationship between bias and variance, we have to decompose a test error.
Regression
(Y=f(X)+ϵ), error assumption(E(ϵ)=0,Var(ϵ)=σϵ2​), Squared-error loss, Regression fit f^​(X)
Err(x0​)====​E[(Y−f^​(x0​))2∣X=x0​]σϵ2​+[Ef^​(x0​)−f(x0​)]2+E[f^​(x0​)−Ef^​(x0​)]2σϵ2​+Bias2(f^​(x0​))+Var(f^​(x0​))IrreducibleError+Bias2+Variance​
k-nearest-neighbor regression fit
Err(x0​)==​E[(Y−f^​k​(x0​))2∣X=x0​]σϵ2​+[f(x0​)−k1​Σl=1k​f(x(l)​)]2+kσϵ2​​​ Var(f^​k​(x0​))=k21​kσϵ2​ Linear model
Err(x0​)==​E[(Y−f^​p​(x0​))2∣X=x0​]σϵ2​+[f(x0​)−Ef^​p​(x0​)]2+∣∣h(x0​)∣∣2σϵ2​​ Here h(x0​)=X(XTX)−1x0​
Var(f^​(x0​))=Var(x0T​β)=Var(x0T​(XTX)−1XTy)=x0T​(XTX)−1x0​σϵ2​=pσϵ2​ Let's look at In-sample error. This error is the error that X is equal to the value in training set, but y is just a random quantity.
N1​i=1∑N​Err(xi​)=σϵ2​+N1​i=1∑N​[f(xi​)−Ef^​(xi​)]2+Np​σϵ2​i=1∑N​xiT​(XTX)−1xi​=tr(X(XTX)−1XT)=tr((XTX)−1XTX)=tr(Ip​)=p It means In-sample error is affected by the dimension of input space.
Ridge regression
Bias can be composed into things in linear model.
Ex0​​[f(x0​)−Ef^​α​(x0​)]==​Ex0​​[f(x0​)−x0T​β∗​]2+Ex0​​[x0T​β∗​−Ex0T​−Ex0T​β^​α​]2Ave[ModelBias]2+Ave[EstimationBias]2​ In short, by restricting the range of parameters the bias is increased than one of least square model. However, the variance would be reduced due to this increased bias. In error decomposition bias has the squared form so, a slight increase in bias can decrease in variance.