LDA
✏️ Goal
The goal is to know the class posterior P(G∣X).
✏️ Expression
P(G=k∣X)=P(X)P(X∣G=k)P(G=k)=Σl=1Kfl(x)πlfk(x)πk ✏️ Assumption
The distribution in each class follows multivariate normal distribution with same variance.
(X∣G=k)∼N(μk,Σk)fk(x)=(2π)p/2∣Σk∣1/21exp{−21(x−μk)TΣk−1(x−μk)} s.t. Σk=Σ∀k
✏️ Unfolding
Because of the multivariate normal distribution and same variance assumption, log-ratio is easily unfolded.
logP(G=l∣X)P(G=k∣X)=logfl(x)fk(x)+logπlπk=logπlπk−21(μk+μl)TΣ−1(μk−μl)+xTΣ−1(μk−μl) ✏️ Classification
Decision boundary is as follows:
D={x∣P(G=k∣X)=P(G=l∣X)}={x∣δk(x)=δl(x)} P(G=k∣X)=P(G=l∣X)☺︎P(G=l∣X)P(G=k∣X)=1☺︎logP(G=l∣X)P(G=k∣X)=0☺︎logπlπk−21(μk+μl)TΣ−1(μk−μl)+xTΣ−1(μk−μl)=0☺︎xTΣ−1μk−21μkTΣ−1μk+logπk=xTΣ−1μl−21μlTΣ−1μl+logπl☺︎δk(x)=δl(x) ✏️ Parameter estimation
We can't know the parameter of normal distribution in real data, we just estimate this parameter.
π^k=Nk/Nμ^k=Σgi=kxi/NkΣ^=Σk=1KΣgi=k(xi−μ^k)(xi−μ^k)T/(N−K) ✏️ Graph
✏️ Another Perspective
We can use linear combination of a1H+a2W to make this table into two cluster.
a1,a2argmaxE[a1H+a2W∣G=0]−E[a2H+a2W∣G=1]s.t.Var[a1H+a2W]≤constant It can be more simply changed into this:
a1,a2argmaxh(a1,a2),s.t.g(a1,a2)=c μ1=E[H∣G=0]−E[H∣G=1]μ2=E[W∣G=0]−E[W∣G=1]h(a1,a2)=(a1a2)(μ1μ2)Tg(a1,a2)=a12Var(H∣G=0)+2a1a2Cov(H,W∣G=0)+a22Var(W∣G=0) Using Lagrange multiplier, it is solved.
∇g=λ∇h(a1a2)=(μ1μ2)(COV(H,H)COV(H,W)COV(W,H)COV(W,W))−1 QDA
✏️ Assumption
Like LDA, it assumes multivariate normal distribution at each class but there is no same variance assumption.
✏️ Classification
D={x∣δk(x)=δl(x)}δk(x)=−21log∣Σk∣−21(x−μk)TΣk−1(x−μk)+logπk This expression is similar with the expression of LDA, but a quadratic term still remains.