A View of Regression

A Regression Modeling View of NB methods.

r^ut=Σj∈Qt(u)AdjustedCosine(j,t)⋅rujΣj∈Qt(u)∣AdjustedCosine(j,t)∣\hat{r}_{ut}=\dfrac{\Sigma_{j\in Q_t(u)}AdjustedCosine(j,t)\cdot r_{uj}}{\Sigma_{j \in Q_t(u)}|AdjustedCosine(j,t)|}

User-Based Nearest Neighbor Regression

The predicted rating is a weighted linear combination of other ratings of the same item. If Pu(j)P_u(j)contains all ratings of item j, this combination becomes similar to a linear regression. The difference is that the linear regression find coefficients by solving optimization problems, whereas the recommender system chooses coefficients in a heuristic way with the user-user similarities.

r^uj=μu+Σv∈Pu(j)Sim(u,v)⋅svjΣv∈Pu(j)∣Sim(u,v)∣,    svj=rvj−μv\hat{r}_{uj}=\mu_u+\dfrac{\Sigma_{v\in P_u(j)}Sim(u,v)\cdot s_{vj}}{\Sigma_{v \in P_u(j)}|Sim(u,v)|}, \;\; s_{vj}=r_{vj}-\mu_v

The above expression is changed into the below expression.

r^uj=μu+∑v∈Pu(j)wvuuser⋅(rvj−μv)\hat{r}_{uj}=\mu_u+\sum_{v \in P_u(j)} w^{user}_{vu} \cdot (r_{vj}-\mu_v)
minJu=∑j∈Iu(ruj−r^uj)2=∑j∈Iu(ruj−[μu+∑v∈Pu(j)wvuuser⋅(rvj−μv)])2minJ_u=\sum_{j\in I_u}(r_{uj}-\hat{r}_{uj})^2=\sum_{j\in I_u}(r_{uj}-[\mu_u+\sum_{v \in P_u(j)} w^{user}_{vu}\cdot (r_{vj}-\mu_v)])^2
min∑u=1mJu=∑u=1m∑j∈Iu(ruj−[μu+∑v∈Pu(j)wvuuser⋅(rvj−μv)])2min\sum^m_{u=1}J_u=\sum^m_{u=1}\sum_{j\in I_u}(r_{uj}-[\mu_u+\sum_{v \in P_u(j)} w^{user}_{vu}\cdot (r_{vj}-\mu_v)])^2

To reduce model complexity, the regularization term like λΣj∈IuΣv∈Pu(j)(wvuuser)2\lambda \Sigma_{j \in I_u} \Sigma_{v \in P_u(j)} (w^{user}_{vu})^2 could be added as regression do. Pu(j)P_u(j) can be vastly different for the same user uu and varying item indices(denoted by jj), because of the extraordinary level of sparsity inherent in rating matrices. Let me consider a scenario where one similar user rated movie NeroNero whereas four similar user rated GladiatorGladiator for target user uu. The regression coefficient wvuuserw^{user}_{vu}is heavily influenced by the rating for GladiatorGladiator because it has more sample. It leads to overfitting problem, so scaling method needs to be applied in Pu(j)P_u(j).

r^uj⋅∣Pu(j)∣k=μu+∑v∈Pu(j)wvuuser⋅(rvj−μv)\hat{r}_{uj}\cdot \dfrac{|P_u(j)|}{k}=\mu_u+\sum_{v \in P_u(j)} w^{user}_{vu} \cdot (r_{vj}-\mu_v)

This expression predicts a fraction ∣Pu(j)∣k\frac{|P_u(j)|}{k} of the rating of target user uu for item jj.

r^uj=buuser+Σv∈Pu(j)wvuuser⋅(rvj−bvuser)∣Pu(j)∣\hat{r}_{uj}=b^{user}_u+\dfrac{\Sigma_{v \in P_u(j)} w^{user}_{vu} \cdot (r_vj - b^{user}_v)}{\sqrt{|P_u(j)|}}

μv\mu_v is replaced by a bias variable bub_u

r^uj=buuser+bjitem+Σv∈Pu(j)wvuuser⋅(rvj−bvuser−bjitem)∣Pu(j)∣\hat{r}_{uj}=b^{user}_u+b^{item}_j+\dfrac{\Sigma_{v \in P_u(j)} w^{user}_{vu} \cdot (r_{vj} - b^{user}_v - b^{item} _ j )}{\sqrt{|P_u(j)|}}

Item-Based Nearest Neighbor Regression

r^ut=∑j∈Qt(u)wjtitem⋅ruj\hat{r}_{ut}=\sum_{j \in Q_t(u)}w^{item}_{jt} \cdot r_{uj}
minJt=∑u∈Ut(rut−r^ut)2=∑u∈Ut(rut−∑j∈Qt(u)wjtitem⋅ruj)2minJ_t=\sum_{u \in U_t} (r_{ut}-\hat{r}_{ut})^2=\sum_{u \in U_t} (r_{ut}-\sum_{j \in Q_t(u)} w^{item}_{jt} \cdot r_{uj})^2
min∑t=1n∑u∈Ut(rut−∑j∈Qt(u)wjtitem⋅ruj)2min\sum_{t=1}^n\sum_{u \in U_t}(r_{ut} -\sum_{j \in Q_t(u)} w^{item}_{jt}\cdot r_{uj})^2
r^ut=buuser+btitem+Σj∈Qt(u)wjtitem⋅(ruj−buuser−bjitem)∣Qt(u)∣\hat{r}_{ut}=b^{user}_u+b^{item}_t+\dfrac{\Sigma_{j \in Q_t(u)}w^{item}_{jt} \cdot (r_{uj}-b^{user}_u-b^{item}_j)}{\sqrt{|Q_t(u)|}}

Combined Method

r^uj=buuser+bjitem+Σv∈Pu(j)wvuuser⋅(rvj−Bvj)∣Pu(j)∣+Σj∈Qt(u)wjtitem⋅(ruj−Buj)∣Qt(u)∣\hat{r}_{uj}=b^{user}_{u}+b^{item}_j+\dfrac{\Sigma_{v \in P_u(j)} w^{user}_{vu} \cdot (r_{vj}-B_{vj})}{\sqrt{|P_u(j)|}}+\dfrac{\Sigma_{j\in Q_t(u)} w^{item}_{jt} \cdot (r_{uj}-B_{uj})}{\sqrt{|Q_t(u)|}}

Last updated