The Bias-Variance Decomposition
Let's think about one specific point. To deal with the relationship between bias and variance, we have to decompose a test error.
Regression
(Y=f(X)+Ο΅), error assumption(E(Ο΅)=0,Var(Ο΅)=ΟΟ΅2β), Squared-error loss, Regression fit f^β(X)
Err(x0β)====βE[(Yβf^β(x0β))2β£X=x0β]ΟΟ΅2β+[Ef^β(x0β)βf(x0β)]2+E[f^β(x0β)βEf^β(x0β)]2ΟΟ΅2β+Bias2(f^β(x0β))+Var(f^β(x0β))IrreducibleError+Bias2+Varianceβ
k-nearest-neighbor regression fit
Err(x0β)==βE[(Yβf^βkβ(x0β))2β£X=x0β]ΟΟ΅2β+[f(x0β)βk1βΞ£l=1kβf(x(l)β)]2+kΟΟ΅2βββ Var(f^βkβ(x0β))=k21βkΟΟ΅2β Linear model
Let's look at In-sample error. This error is the error that X is equal to the value in training set, but y is just a random quantity.
It means In-sample error is affected by the dimension of input space.
Ridge regression
Bias can be composed into things in linear model.
In short, by restricting the range of parameters the bias is increased than one of least square model. However, the variance would be reduced due to this increased bias. In error decomposition bias has the squared form so, a slight increase in bias can decrease in variance.