Boosting

Idea

  • It is the method combining several classifiers in a sequential way.

  • Correctly classified data doesn't need to be considered in our model, so it puts a weight on wrong classified data.

Notation

  • Classifiers: H={h1,,hm},  s.t.  hj:X{1,1}H=\{h_1,\cdots,h_m\}, \;s.t. \;h_j:X\rightarrow\{1,-1\}

  • Data: X={x1,,xn},  Y={y1,,yn},  yi{1,1} X=\{x_1,\cdots,x_n\}, \;Y=\{y_1,\cdots,y_n\}, \;y_i\in\{1,-1\}

  • Matrix: Aij=yihj(xi)  A_{ij}=y_ih_j(x_i) \;Aij=1(correct),Aij=1(wrong)A_{ij}=1(correct),A_{ij}=-1(wrong)

  • Weight for classifiers: wRmw\in\mathbb{R}^m

  • Weight f or data: λRn\lambda \in \mathbb{R}^n

Problem

p(λ)=min{wTAλ:wΔn},  s.t.  Δn={wΣi=1n=1,wi0}p(\lambda)=min\{w^TA\lambda:w\in\Delta_n\}, \;s.t. \;\Delta_n=\{w|\Sigma^n_{i=1}=1,w_i \geq0\}
(Aλ)=[A1Am][λ1λm],wTAλ=wTA1λ1++wTAmλm(A\lambda)=\begin{bmatrix}A_1 \cdots A_m\end{bmatrix} \begin{bmatrix} \lambda_1 \\ \vdots \\ \lambda_m\end{bmatrix}, w^TA\lambda=w^TA_1\lambda_1+\cdots+w^TA_m\lambda_m
maxλΔm{p(λ)=minwΔnwTAλ}minwΔn{f(w)=maxλΔmwTAλ}\max_{\lambda \in \Delta_m} \{p(\lambda)=\min_{w\in \Delta_n}w^TA\lambda\} \\ \min_{w\in \Delta_n}\{f(w)=\max_{\lambda \in \Delta_m}w^TA\lambda\}

The above optimization problem is an original problem, and the below one is the duality problem.

In this situation, if AjλjA_j\lambda_jis lower than other values, it means the classifier AjA_jis not good at predicting. So p(λ)p(\lambda) means the lowest value which the worst classifier has. When we make more weight on the worst classifier by making λ\lambda bigger, p(λ)p(\lambda)would be bigger.

To solve this problem, we can use subgradient method or mirror descent method.

Subgradient method

The optimization problem is defined as:

maxλ0p(λ)\max_{\lambda \geq 0 }p(\lambda)

Last updated

Was this helpful?