The Bias-Variance Decomposition
Let's think about one specific point. To deal with the relationship between bias and variance, we have to decompose a test error.
Regression
(Y=f(X)+Ο΅), error assumption(E(Ο΅)=0,Var(Ο΅)=ΟΟ΅2β), Squared-error loss, Regression fit f^β(X)
Err(x0β)====βE[(Yβf^β(x0β))2β£X=x0β]ΟΟ΅2β+[Ef^β(x0β)βf(x0β)]2+E[f^β(x0β)βEf^β(x0β)]2ΟΟ΅2β+Bias2(f^β(x0β))+Var(f^β(x0β))IrreducibleError+Bias2+Varianceβ
k-nearest-neighbor regression fit
Err(x0β)==βE[(Yβf^βkβ(x0β))2β£X=x0β]ΟΟ΅2β+[f(x0β)βk1βΞ£l=1kβf(x(l)β)]2+kΟΟ΅2βββ Var(f^βkβ(x0β))=k21βkΟΟ΅2β Linear model
Err(x0β)==βE[(Yβf^βpβ(x0β))2β£X=x0β]ΟΟ΅2β+[f(x0β)βEf^βpβ(x0β)]2+β£β£h(x0β)β£β£2ΟΟ΅2ββ Here h(x0β)=X(XTX)β1x0β
Var(f^β(x0β))=Var(x0TβΞ²)=Var(x0Tβ(XTX)β1XTy)=x0Tβ(XTX)β1x0βΟΟ΅2β=pΟΟ΅2β Let's look at In-sample error. This error is the error that X is equal to the value in training set, but y is just a random quantity.
N1βi=1βNβErr(xiβ)=ΟΟ΅2β+N1βi=1βNβ[f(xiβ)βEf^β(xiβ)]2+NpβΟΟ΅2βi=1βNβxiTβ(XTX)β1xiβ=tr(X(XTX)β1XT)=tr((XTX)β1XTX)=tr(Ipβ)=p It means In-sample error is affected by the dimension of input space.
Ridge regression
Bias can be composed into things in linear model.
Ex0ββ[f(x0β)βEf^βΞ±β(x0β)]==βEx0ββ[f(x0β)βx0TβΞ²ββ]2+Ex0ββ[x0TβΞ²βββEx0TββEx0TβΞ²^βΞ±β]2Ave[ModelBias]2+Ave[EstimationBias]2β In short, by restricting the range of parameters the bias is increased than one of least square model. However, the variance would be reduced due to this increased bias. In error decomposition bias has the squared form so, a slight increase in bias can decrease in variance.