A Regression Modeling View of NB methods.
r ^ u t = Σ j ∈ Q t ( u ) A d j u s t e d C o s i n e ( j , t ) ⋅ r u j Σ j ∈ Q t ( u ) ∣ A d j u s t e d C o s i n e ( j , t ) ∣ \hat{r}_{ut}=\dfrac{\Sigma_{j\in Q_t(u)}AdjustedCosine(j,t)\cdot r_{uj}}{\Sigma_{j \in Q_t(u)}|AdjustedCosine(j,t)|} r ^ u t ​ = Σ j ∈ Q t ​ ( u ) ​ ∣ A d j u s t e d C os in e ( j , t ) ∣ Σ j ∈ Q t ​ ( u ) ​ A d j u s t e d C os in e ( j , t ) ⋅ r u j ​ ​ User-Based Nearest Neighbor Regression
The predicted rating is a weighted linear combination of other ratings of the same item. If P u ( j ) P_u(j) P u ​ ( j ) contains all ratings of item j, this combination becomes similar to a linear regression. The difference is that the linear regression find coefficients by solving optimization problems, whereas the recommender system chooses coefficients in a heuristic way with the user-user similarities.
r ^ u j = μ u + Σ v ∈ P u ( j ) S i m ( u , v ) ⋅ s v j Σ v ∈ P u ( j ) ∣ S i m ( u , v ) ∣ ,       s v j = r v j − μ v \hat{r}_{uj}=\mu_u+\dfrac{\Sigma_{v\in P_u(j)}Sim(u,v)\cdot s_{vj}}{\Sigma_{v \in P_u(j)}|Sim(u,v)|}, \;\; s_{vj}=r_{vj}-\mu_v r ^ u j ​ = μ u ​ + Σ v ∈ P u ​ ( j ) ​ ∣ S im ( u , v ) ∣ Σ v ∈ P u ​ ( j ) ​ S im ( u , v ) ⋅ s v j ​ ​ , s v j ​ = r v j ​ − μ v ​ The above expression is changed into the below expression.
r ^ u j = μ u + ∑ v ∈ P u ( j ) w v u u s e r ⋅ ( r v j − μ v ) \hat{r}_{uj}=\mu_u+\sum_{v \in P_u(j)} w^{user}_{vu} \cdot (r_{vj}-\mu_v) r ^ u j ​ = μ u ​ + v ∈ P u ​ ( j ) ∑ ​ w vu u ser ​ ⋅ ( r v j ​ − μ v ​ ) m i n J u = ∑ j ∈ I u ( r u j − r ^ u j ) 2 = ∑ j ∈ I u ( r u j − [ μ u + ∑ v ∈ P u ( j ) w v u u s e r ⋅ ( r v j − μ v ) ] ) 2 minJ_u=\sum_{j\in I_u}(r_{uj}-\hat{r}_{uj})^2=\sum_{j\in I_u}(r_{uj}-[\mu_u+\sum_{v \in P_u(j)} w^{user}_{vu}\cdot (r_{vj}-\mu_v)])^2 min J u ​ = j ∈ I u ​ ∑ ​ ( r u j ​ − r ^ u j ​ ) 2 = j ∈ I u ​ ∑ ​ ( r u j ​ − [ μ u ​ + v ∈ P u ​ ( j ) ∑ ​ w vu u ser ​ ⋅ ( r v j ​ − μ v ​ )] ) 2 m i n ∑ u = 1 m J u = ∑ u = 1 m ∑ j ∈ I u ( r u j − [ μ u + ∑ v ∈ P u ( j ) w v u u s e r ⋅ ( r v j − μ v ) ] ) 2 min\sum^m_{u=1}J_u=\sum^m_{u=1}\sum_{j\in I_u}(r_{uj}-[\mu_u+\sum_{v \in P_u(j)} w^{user}_{vu}\cdot (r_{vj}-\mu_v)])^2 min u = 1 ∑ m ​ J u ​ = u = 1 ∑ m ​ j ∈ I u ​ ∑ ​ ( r u j ​ − [ μ u ​ + v ∈ P u ​ ( j ) ∑ ​ w vu u ser ​ ⋅ ( r v j ​ − μ v ​ )] ) 2 To reduce model complexity, the regularization term like λ Σ j ∈ I u Σ v ∈ P u ( j ) ( w v u u s e r ) 2 \lambda \Sigma_{j \in I_u} \Sigma_{v \in P_u(j)} (w^{user}_{vu})^2 λ Σ j ∈ I u ​ ​ Σ v ∈ P u ​ ( j ) ​ ( w vu u ser ​ ) 2 could be added as regression do. P u ( j ) P_u(j) P u ​ ( j ) can be vastly different for the same user u u u and varying item indices(denoted by j j j ), because of the extraordinary level of sparsity inherent in rating matrices. Let me consider a scenario where one similar user rated movie N e r o Nero N ero whereas four similar user rated G l a d i a t o r Gladiator Gl a d ia t or for target user u u u . The regression coefficient w v u u s e r w^{user}_{vu} w vu u ser ​ is heavily influenced by the rating for G l a d i a t o r Gladiator Gl a d ia t or because it has more sample. It leads to overfitting problem, so scaling method needs to be applied in P u ( j ) P_u(j) P u ​ ( j ) .
r ^ u j ⋅ ∣ P u ( j ) ∣ k = μ u + ∑ v ∈ P u ( j ) w v u u s e r ⋅ ( r v j − μ v ) \hat{r}_{uj}\cdot \dfrac{|P_u(j)|}{k}=\mu_u+\sum_{v \in P_u(j)} w^{user}_{vu} \cdot (r_{vj}-\mu_v) r ^ u j ​ ⋅ k ∣ P u ​ ( j ) ∣ ​ = μ u ​ + v ∈ P u ​ ( j ) ∑ ​ w vu u ser ​ ⋅ ( r v j ​ − μ v ​ ) This expression predicts a fraction ∣ P u ( j ) ∣ k \frac{|P_u(j)|}{k} k ∣ P u ​ ( j ) ∣ ​ of the rating of target user u u u for item j j j .
r ^ u j = b u u s e r + Σ v ∈ P u ( j ) w v u u s e r ⋅ ( r v j − b v u s e r ) ∣ P u ( j ) ∣ \hat{r}_{uj}=b^{user}_u+\dfrac{\Sigma_{v \in P_u(j)} w^{user}_{vu} \cdot (r_vj - b^{user}_v)}{\sqrt{|P_u(j)|}} r ^ u j ​ = b u u ser ​ + ∣ P u ​ ( j ) ∣ ​ Σ v ∈ P u ​ ( j ) ​ w vu u ser ​ ⋅ ( r v ​ j − b v u ser ​ ) ​ μ v \mu_v μ v ​ is replaced by a bias variable b u b_u b u ​
r ^ u j = b u u s e r + b j i t e m + Σ v ∈ P u ( j ) w v u u s e r ⋅ ( r v j − b v u s e r − b j i t e m ) ∣ P u ( j ) ∣ \hat{r}_{uj}=b^{user}_u+b^{item}_j+\dfrac{\Sigma_{v \in P_u(j)} w^{user}_{vu} \cdot (r_{vj} - b^{user}_v - b^{item} _ j )}{\sqrt{|P_u(j)|}} r ^ u j ​ = b u u ser ​ + b j i t e m ​ + ∣ P u ​ ( j ) ∣ ​ Σ v ∈ P u ​ ( j ) ​ w vu u ser ​ ⋅ ( r v j ​ − b v u ser ​ − b j i t e m ​ ) ​ Item-Based Nearest Neighbor Regression
r ^ u t = ∑ j ∈ Q t ( u ) w j t i t e m ⋅ r u j \hat{r}_{ut}=\sum_{j \in Q_t(u)}w^{item}_{jt} \cdot r_{uj} r ^ u t ​ = j ∈ Q t ​ ( u ) ∑ ​ w j t i t e m ​ ⋅ r u j ​ m i n J t = ∑ u ∈ U t ( r u t − r ^ u t ) 2 = ∑ u ∈ U t ( r u t − ∑ j ∈ Q t ( u ) w j t i t e m ⋅ r u j ) 2 minJ_t=\sum_{u \in U_t} (r_{ut}-\hat{r}_{ut})^2=\sum_{u \in U_t} (r_{ut}-\sum_{j \in Q_t(u)} w^{item}_{jt} \cdot r_{uj})^2 min J t ​ = u ∈ U t ​ ∑ ​ ( r u t ​ − r ^ u t ​ ) 2 = u ∈ U t ​ ∑ ​ ( r u t ​ − j ∈ Q t ​ ( u ) ∑ ​ w j t i t e m ​ ⋅ r u j ​ ) 2 m i n ∑ t = 1 n ∑ u ∈ U t ( r u t − ∑ j ∈ Q t ( u ) w j t i t e m ⋅ r u j ) 2 min\sum_{t=1}^n\sum_{u \in U_t}(r_{ut} -\sum_{j \in Q_t(u)} w^{item}_{jt}\cdot r_{uj})^2 min t = 1 ∑ n ​ u ∈ U t ​ ∑ ​ ( r u t ​ − j ∈ Q t ​ ( u ) ∑ ​ w j t i t e m ​ ⋅ r u j ​ ) 2 r ^ u t = b u u s e r + b t i t e m + Σ j ∈ Q t ( u ) w j t i t e m ⋅ ( r u j − b u u s e r − b j i t e m ) ∣ Q t ( u ) ∣ \hat{r}_{ut}=b^{user}_u+b^{item}_t+\dfrac{\Sigma_{j \in Q_t(u)}w^{item}_{jt} \cdot (r_{uj}-b^{user}_u-b^{item}_j)}{\sqrt{|Q_t(u)|}} r ^ u t ​ = b u u ser ​ + b t i t e m ​ + ∣ Q t ​ ( u ) ∣ ​ Σ j ∈ Q t ​ ( u ) ​ w j t i t e m ​ ⋅ ( r u j ​ − b u u ser ​ − b j i t e m ​ ) ​ Combined Method
r ^ u j = b u u s e r + b j i t e m + Σ v ∈ P u ( j ) w v u u s e r ⋅ ( r v j − B v j ) ∣ P u ( j ) ∣ + Σ j ∈ Q t ( u ) w j t i t e m ⋅ ( r u j − B u j ) ∣ Q t ( u ) ∣ \hat{r}_{uj}=b^{user}_{u}+b^{item}_j+\dfrac{\Sigma_{v \in P_u(j)} w^{user}_{vu} \cdot (r_{vj}-B_{vj})}{\sqrt{|P_u(j)|}}+\dfrac{\Sigma_{j\in Q_t(u)} w^{item}_{jt} \cdot (r_{uj}-B_{uj})}{\sqrt{|Q_t(u)|}} r ^ u j ​ = b u u ser ​ + b j i t e m ​ + ∣ P u ​ ( j ) ∣ ​ Σ v ∈ P u ​ ( j ) ​ w vu u ser ​ ⋅ ( r v j ​ − B v j ​ ) ​ + ∣ Q t ​ ( u ) ∣ ​ Σ j ∈ Q t ​ ( u ) ​ w j t i t e m ​ ⋅ ( r u j ​ − B u j ​ ) ​