Residual Sum of Squares is important. The more strict notation is error, not residual because Error is the random variable but residual is a constant after fitted.
f(X)=Ξ²0β+X1βΞ²1β+X2βΞ²2βRSS(Ξ²)=(yβXΞ²)T(yβXΞ²)βΞ²βRSSβ=β2XT(yβXΞ²)βΞ²βΞ²Tβ2RSSβ=β2XTX XT(yβXΞ²)=0Ξ²^β=(XTX)β1XTyy^β=XΞ²^β=X(XTX)β1XTy=Hy π Geometrical view
Y is the projection onto the column space of X. This is because H is the projection matrix that has symmetric / idempotent properties. H is called as hat matrix (giving y a hat)
Ξ΅β₯xiβ, because Ο΅=yβy^β. If we estimate Ξ²in other methods with exclusion of LSM method, the form y^β=Ξ²0β+X1βΞ²1β+X2βΞ²2β still remains. y^β is interpreted still as the vector on col(X). However, In this case y^β is not a projected vector so that the residual and variables are not orthogonal.