Novelty Detection

PreFace

Novelty Detection is the detection for whether a new data point is an outlier, and outlier detection is the detection for whether a train data is an outlier. In other words, we find the most concentrated area in outlier detection.

Reference

For outlier detection, we first fit density. We define the data point as outlier if it has in low density. density  function≤t density \; function \leq t

We also can find the boundary between inlier and outlier. To do this, we have to find the smallest ball such that it includes all data points. This problem is converted into the problem finding a center c and radius r as below.

minr,c  rs.t.  (xi−c)T(xi−c)≤r,  r≥0min_{r,c} \; r \\ s.t. \; (x_i-c)^T(x_i-c)\leq r, \; r \geq 0

Minimum enclosing ball

L(r,c,α)=r+∑i=1mαi((xi−c)T(xi−c)−r),  αi≥0L(r,c,\alpha)=r+\sum^m_{i=1}\alpha_i((x_i-c)^T(x_i-c)-r), \; \alpha_i \geq 0
∂L∂r=1−Σαi=0∂L∂c=Σαi(−2xi+2c)=0\dfrac{\partial L}{\partial r}=1-\Sigma \alpha_i=0 \\ \dfrac{\partial L}{\partial c}=\Sigma \alpha_i(-2x_i+2c)=0

This leads to Σαi=1,  c=Σαixi\Sigma \alpha_i =1 , \; c=\Sigma \alpha_ix_i

L=r+Σαi((xi−c)T(xi−c)−r)=r+ΣαixiTxi−ΣαixiTc−ΣαicTxi+ΣαicTc−Σαir=ΣαixiTxi−ΣiΣjαiαjxiTxjs.t.Σαi=1,αi≥0\begin{align} L & =r+\Sigma\alpha_i((x_i-c)^T(x_i-c)-r) \\ &= r+\Sigma\alpha_ix_i^Tx_i-\Sigma \alpha_ix_i^Tc-\Sigma \alpha_ic^Tx_i+\Sigma \alpha_ic^Tc-\Sigma \alpha_ir \\ & = \Sigma \alpha_i x_i^Tx_i -\Sigma_i\Sigma_j \alpha_i \alpha_j x_i^Tx_j \quad \quad \quad \quad s.t.\Sigma \alpha_i=1, \alpha_i \geq 0 \end{align}

​Now, the problem becomes a quadratic problem with simple constraints (dual problem)

maxag(a)=bTα−αTAαs.t.Σαi=1,  αi≥0Aij=xiTxj,  bi=xiTxi,  αmax_ag(a)=b^T\alpha-\alpha^TA\alpha \quad\quad\quad\quad s.t. \Sigma \alpha_i=1, \; \alpha_i \geq 0 \\ A_{ij}=x_i^Tx_j, \; b_i=x_i^Tx_i, \; \alpha

When αi\alpha_i becomes 0, the points will be inside the circle. When αi\alpha_i is bigger than 0, the points will be exactly on the boundary. Thus, the solution in α\alpha is very sparse.

Last updated

Was this helpful?