Local regression
Last updated
Was this helpful?
Last updated
Was this helpful?
The local linear, polynomial uses 2)
Orginally, linear regression estimates function globally. We can make a variation using kernel in doing regression. Each function value has different kernel value, so we can estimate function locally.
and then, what is a local estimation? It means we can derive after we fit the regression line. When fitting the line, we consider the density of distance so closer points get more probability. Closer points have more power on determining the beta coefficients.
However, kernel method has a fatal flaw. In the side data, we don't consider the half of data. We just care about points in the right of one point which is the located on the leftmost. We call it boundary effect.
Trevor Hastie, Clive Loader "Local Regression: Automatic Kernel Carpentry," Statistical Science, Statist. Sci. 8(2), 120-129, (May, 1993)
In this paper, local linear regression solves this boundary issue like the below.
Q. proof: statkwon.github.io
To sum up, to control boundary effect we need to make a model with slope. Local linear regression is best for it.
Aκ° High-frequency contrastsμ λ μ§μ€νλλ‘ λ§λ€κΈ°.
low rank Aκ°μ - ridge functionμ λ§λ λ€.
κ·Έλ μ§λ§ μ΄λ κ² Aμ μ°¨μμ΄ μ»€μ§λ©΄ λ€λ£¨κΈ° νλ€μ΄μ Έ Structured regressionμ μ¬μ©νλ€.
This expression is subject to , . It means bias only depends on quadratic and higher order terms(curvature term). So if we estimate function in first order, curvature term and R becomes 0(bias becomes bigger). It solves the boundary issue.
Local polynomial can also derive as local linear regression does. The difference is how to define .
Linear
Polynomial
Let's think about high dimension. is a euclidean distance, and it is needed to standardize this with unit standard deviation because a different x has different distance.
Boundary effects become serious in high dimension, because of the dimensionality curse. The number of data close to boundary increase in high dimension, and also it becomes hard to visualize our data even if smoothing is usually for the visualization. Sampling density is proportional to , so sparsity problem would happen.
κ° μΆμ λμΌν μ¨μ΄νΈλ₯Ό μ μ©νμ§ μκ³ , μ’νμΆμ νμ€ν νλ λμ weight matrix Aλ₯Ό κ³±ν΄μ€λ€.
Diagonal condition: κ° μ μν₯μ μ‘°μ ν μ μλ€.
ANOVA decompositionμ λ°©μμ μ¬μ©.