Subproblem

Bias correction and implicit feedback

Biases of user and item

r^ij=oi+pj+s=1kuisvjs\hat{r}_{ij}=o_i+p_j+\sum^k_{s=1}u_{is}\cdot v_{js}
  • oio_i is the general bias of users to rate items. It is a positive value for a generous person, and a negative value for a curmudgeon.

  • pjp_jis the bias in the ratings of item jj. Highly liked items will have a larger value, while disliked items have a negative value.

  • (i,j)th(i,j)_{th} rating is explained by oi+pjo_i+p_j and the remainder is explained by the product of the latent variables.

eij=rijr^ij=rijoipjs=1kuisvjse_{ij}=r_{ij}-\hat{r}_{ij}=r_{ij}-o_i-p_j-\sum^k_{s=1}u_{is}\cdot v_{js}

The regularization factor λ\lambda can be differ from user biases, item biases, and factor variables. Instead of having separate bias variable oio_i and pjp_j, we just can increase the size of the factor matrices to incorporate these bias variables as follows:

ui,k+1=oi    i{1,...,m}ui,k+2=1    i{1,...,m}vj,k+1=1    j{1,...,n}vj,k+2=pj    j{1,...,n}u_{i,k+1}=o_i \;\; \forall i\in\{1,...,m\} \\ u_{i,k+2}=1\;\;\forall i \in \{1,...,m\} \\ v_{j,k+1}=1 \;\; \forall j \in \{1,...,n\} \\ v_{j,k+2}=p_j \;\;\forall j \in \{1,...,n\}

Now UU is m×(k+2)m\times (k+2) matrix and VV is n×(k+2)n \times (k+2) matrix. The optimization problem is changed as follows:

Update algorithm

uiquiq+α(eijvjqλuiq)    q{1,...,k+2} u_{iq} \Leftarrow u_{iq}+\alpha(e_{ij}\cdot v_{jq} -\lambda \cdot u_{iq}) \; \; \forall q\in \{1,...,k+2\}

vjqvjq+α(eijuiqλvjq)    q{1,...,k+2} v_{jq} \Leftarrow v_{jq}+\alpha(e_{ij}\cdot u_{iq} -\lambda \cdot v_{jq}) \; \; \forall q\in \{1,...,k+2\}

Reset the entries in (k+2)th(k+2)_{th}column of UU and (k+1)th(k+1)_{th}column of VVto 1s1s

Adding such bias terms reduces overfitting in many cases.

"Of the numerous new algorithmic contributions, I would like to highlight one - those humble baseline predictors (or biases), which capture main effects in the data. While the literature mostly concentrates on the more sophisticated algorithmic aspects, we have learned that an accurate treatment of main effects is probably at least as significant as comping up with modeling breakthroughs. - Netflix Prize contest"

One can just use biases in modeling, by doing so one can subtract this value from a rating matrix before applying collaborative filtering. It is similar with the row-wise mean centering for bias-correction in a neighborhood model, but it is a more sophisticated way because it adjusts for both user and item biases.

Implicit Feedback

Explicit feedback is a clear feedback such as rating and like, whereas implicit feedback is more unclear. For example, whether one clicks on the item is an implicit feedback. However, even in cases in which users explicitly rate items, the identity of the items they rate can be viewed as an implicit feedback.

"Intuitively, a simple process could explain the results [showing the predictive value of implicit feedback]: users chose to rate songs they listen to, and listen to music they expect to like, while avoiding genres they dislike. Therefore, most of the songs that would get a bad ratings are not voluntarily rated by the users. Since people rarely listen to random songs, or rarely watch random movies, we should expect to observe in many areas a difference between the distribution of ratings for random items and the corresponding distribution for the items selected by the users." - R. Devooght, N. Kourtellis, and A. Mantrach. Dynamic matrix factorization with priors on unknown values. ACM KDD Conference, 2015.

Asymmetric factor models and SVD++ have been proposed to incorporate implicit feedback. It uses two item factor matrices VVandYYwhich reflect explicit and implicit feedback, respectively.

Solution: Asymmetric Factor Models

This basic idea of this model is that two users will have similar user factors if they have rated similar items, irrespective of the values of the ratings. This model can incorporate other independent implicit feedback into matrix FF.

R[FY]VTR\approx [FY]V^T
  • The m×nm\times nimplicit feedback matrixFF is a row-scaled matrix of rating matrix.

  • The n×kn\times kimplicit item-factor matrix YY: if the element is large, it means that the act of rating item ii contains significant information about the affinity of that action for the jthj_{th} latent component, no matter what the actual value of the rating might be.

  • The n×kn\times kexplicit item-factor matrix VV.

  • In the item-based parameterization, [YVT][YV^T] can be viewed as an n×nn\times n item-to-item prediction matrix. It tells us that how the action selecting item ii affects the predicted rating of itemjj.

  • This model can work well for out-of-sample users, although it doesn't work for out-of-sample items.

Solution: SVD++

R(U+FY)VTR\approx (U+FY)V^T
  • FYFYis used to adjust the explicit user-factor matrix UU

  • The implicit feedback component of the predicted rating is given by (FY)VT(FY)V^T

(i,s)th(i,s)_{th} entry of [FY][FY] is given by hIiyhsIi\sum_{h\in I_i}\frac{y_{hs}}{\sqrt{|I_i|}}. This model can be viewed as a combination of the unconstrained matrix factorization model and the asymmetric factorization model. In terms of its having an implicit feedback term together with its regularizer, it's different from the model in the previous section.

Updates

uiquiq+α(eijvjqλuiq)    q{1,...,k+2}u_{iq} \Leftarrow u_{iq}+\alpha(e_{ij}\cdot v_{jq}-\lambda\cdot u_{iq}) \;\; \forall q \in \{1,...,k+2\}

vjqvjq+α(eij[uiq+hIiyhqIi]λvjq)    q{1,...,k+2}v_{jq} \Leftarrow v_{jq} +\alpha(e_{ij}\cdot[u_{iq}+\sum_{h \in I_i} \dfrac{y_{hq}}{\sqrt{|I_i|}}]-\lambda\cdot v_{jq}) \;\; \forall q\in \{1,...,k+2\}

yhqyhq+α(eijvjqIiλyhq)    q{1,...,k+2},  hIiy_{hq} \Leftarrow y_{hq}+\alpha(\dfrac{e_{ij}\cdot v_{jq}}{\sqrt{|I_i|}}-\lambda\cdot y_{hq}) \;\; \forall q\in\{1,...,k+2\}, \; \forall h\in I_i

Solution: Non-negative Matrix Factorization

NMF provides great interpretability to implicit feedback situation. Especially, it is useful for the mechanism to specify a liking for an item, but no mechanism to specify a dislike. In customer transaction data, not buying an item does not necessarily imply a dislike because there is a probability of customers buying this item.

Updates

uij(RV)ijuij(UVTV)ij+ϵ    i{1,...,m},  j{1,...,k}u_{ij} \Leftarrow \dfrac{(RV)_{ij}u_{ij}}{(UV^TV){ij}+\epsilon}\;\;\forall i\in\{1,...,m\}, \;\forall j\in \{1,...,k\}

vij(RTU)ijvij(VUTU)ij+ϵ    i{1,...,n},  j{1,...,k}v_{ij} \Leftarrow \dfrac{(R^TU)_{ij}v_{ij}}{(VU^TU)_{ij}+\epsilon}\;\; \forall i \in \{1,...,n\}, \; \forall j \in \{1,...,k\}

ϵ\epsilon is a small value such as 10910^{-9} to increase numerical stability. The entries are initialized to random values in (0,1)(0,1).

Proof

The modified version is as follows:

uijmax{[(RV)ijλ1uij(UVTV)ij+ϵ]uij,0},    i{1,...,m},  j{1,...,k}u_{ij} \Leftarrow max\{[\dfrac{(RV)_{ij}-\lambda_1 u_{ij}}{(UV^TV)_{ij}+\epsilon}]u_{ij},0\}, \;\; \forall i\in\{1,...,m\}, \; \forall j \in\{1,...,k\}

vijmax{[(RTU)ijλ2vij(VUTU)ij+ϵ]vij,0},    i{1,...,n},  j{1,...,k} v_{ij} \Leftarrow max\{[\dfrac{(R^TU)_{ij}-\lambda_2 v_{ij}}{(VU^TU)_{ij}+\epsilon}]v_{ij},0\}, \;\; \forall i \in \{1,...,n\}, \; \forall j\in \{1,...,k\}

  • There are clear two classes of dairy products and drinks.

  • All customers seem like juice, but there is a high correlation between user and buying aspects.

  • Customer 1 to 4 like dairy products, whereas customer 4 to 6 like drinks.

Each of a part can be viewed as a user-item co-cluster
UVT=i=1kUˉiVˉiTUV^T=\sum^k_{i=1}\bar{U}_i\bar{V}_i^T

Unlike explicit feedback data sets, it is not possible to ignore the missing entries in the optimization model because of the lack of negative feedback in such data. NMF just treats these missing entries as 0s0s. However, too many zeros can cause computational challenges in large matrices. It can be handled with an ensemble method or by weighting less zero entries.

Last updated

Was this helpful?