Calculation

✏️ Calculation

δk(x)=12logΣ^k12(xμk)TΣ^k1(xμk)+logπk\delta_k(x) =-\dfrac{1}{2}log|\hat{\Sigma}_k|-\dfrac{1}{2}(x-\mu_k)^T\hat{\Sigma}_k^{-1}(x-\mu_k)+log\pi_k

Σ^k=UkDkUkT\hat{\Sigma}_k=U_kD_kU_k^Tcan make this calculation more faster.

®  (xμ^k)TΣ^k1(xμ^k)=[UkT(xμ^k)]TDk1[UkT(xμ^k)]®  logΣ^k=Σllogdkl✔️\; (x-\hat{\mu}_k)^T\hat{\Sigma}^{-1}_k(x-\hat{\mu}_k)=[U_k^T(x-\hat{\mu}_k)]^TD_k^{-1}[U_k^T(x-\hat{\mu}_k)] \\ ✔️\; log|\hat{\Sigma}_k|=\Sigma_llogd_{kl}

In normal dsitrubiton, the quadratic form means mahalanobis distance. This is the distance that(https://darkpgmr.tistory.com/41) how much each data is away from the mean over the standard deviation.

X=D1/2UTXCov(X)=Cov(D1/2UTX)=IX^*=D^{-1/2}U^TX \\ Cov(X^*)=Cov(D^{-1/2}U^TX)=I

Using this expression, we can interpret it as assigningXX^*to the loc μ\mu^*. This expression mahalanobis distance can be changed into euclidiean distance. In a transformed space, each point is assigned into the closest median point. It controls the effect of prior probability(πk\pi_k). The transformed space XX^*is equal to Whitening transformation that makes the variance of X I\mathbf{I}.

Y=WXWTW=Σ1D1/2UT=WY=WX \\ W^TW=\Sigma^{-1} \\ D^{-1/2}U^T=W

✏️ High Dimensional Data

  • pp: The dimension of input matrix

  • KK: The number of centroid.

LDA(QDA) reduces pp dimension into KK. The number of centroids in pp dimension is KK, so the dimension is reduced into K1K-1. Let HK1H_{K-1}be a subspace spanned by these centroids. The distance between this subspace and XX^*can be neglected. ( These centroids are already in the subspace, so this distance has same impact on these points. Project transformed XX^*onto HK1H_{K-1}, and compare distance between the projected points. If the variance of these projected centroids is big, this is an optimal situation. Finding the optimal subspace is same to finding PC space of centroids.

  • Class centroids M  (K×p)\mathbf{M} \;(K \times p) , Common covariance matrix W=Σ^=Σk=1KΣgi=k(xiμ^k)(xiμ^k)T/(NK)\mathbf{W}=\hat{\Sigma}=\Sigma^K_{k=1}\Sigma_{g_i=k}(x_i-\hat{\mu}_k)(x_i-\hat{\mu}_k)^T/(N-K)

  • M=MW1/2\mathbf{M}^*=\mathbf{MW}^{-1/2} using the eigen-decomposition of W\mathbf{W}

  • B=Cov(M)=VDBVT\mathbf{B}^*=Cov(\mathbf{M}^*)=\mathbf{V^*D}_B\mathbf{V}^{*T}, Columns of VT\mathbf{V}^{*T}are vlv_l^*

  • Zl=vlTX  with  vl=W1/2vlZ_l=v_l^TX \; with \; v_l=\mathbf{W}^{-1/2}v_l^*

  • W\mathbf{W}: Within-variance matrix, B\mathbf{B}: Between-variance matrix

Find the linear combination Z=aTXZ=a^TXsuch that the between-class variance is maximized relative to the within-class variance.

The problem can be changed into this one:

maxaaTBaaTWamaxaaTBasubject    to    aTWa=1max_a\dfrac{a^TBa}{a^TWa} \\ max_aa^TBa \quad subject \;\; to \;\; a^TWa=1

The solution aa is the biggest eigenvector of W1BW^{-1}B. We can represent our data in a reduced form using the axis of Zl=vlTX=alTXZ_l=v_l^TX=a_l^TX. ZlZ_l is called as a canonical variate and this becomes a new axis.

Last updated

Was this helpful?