Subproblem
Bias correction and implicit feedback
Last updated
Bias correction and implicit feedback
Last updated
is the general bias of users to rate items. It is a positive value for a generous person, and a negative value for a curmudgeon.
is the bias in the ratings of item . Highly liked items will have a larger value, while disliked items have a negative value.
rating is explained by and the remainder is explained by the product of the latent variables.
The regularization factor can be differ from user biases, item biases, and factor variables. Instead of having separate bias variable and , we just can increase the size of the factor matrices to incorporate these bias variables as follows:
Now is matrix and is matrix. The optimization problem is changed as follows:
Adding such bias terms reduces overfitting in many cases.
"Of the numerous new algorithmic contributions, I would like to highlight one - those humble baseline predictors (or biases), which capture main effects in the data. While the literature mostly concentrates on the more sophisticated algorithmic aspects, we have learned that an accurate treatment of main effects is probably at least as significant as comping up with modeling breakthroughs. - Netflix Prize contest"
One can just use biases in modeling, by doing so one can subtract this value from a rating matrix before applying collaborative filtering. It is similar with the row-wise mean centering for bias-correction in a neighborhood model, but it is a more sophisticated way because it adjusts for both user and item biases.
Explicit feedback is a clear feedback such as rating and like, whereas implicit feedback is more unclear. For example, whether one clicks on the item is an implicit feedback. However, even in cases in which users explicitly rate items, the identity of the items they rate can be viewed as an implicit feedback.
"Intuitively, a simple process could explain the results [showing the predictive value of implicit feedback]: users chose to rate songs they listen to, and listen to music they expect to like, while avoiding genres they dislike. Therefore, most of the songs that would get a bad ratings are not voluntarily rated by the users. Since people rarely listen to random songs, or rarely watch random movies, we should expect to observe in many areas a difference between the distribution of ratings for random items and the corresponding distribution for the items selected by the users." - R. Devooght, N. Kourtellis, and A. Mantrach. Dynamic matrix factorization with priors on unknown values. ACM KDD Conference, 2015.
Asymmetric factor models and SVD++ have been proposed to incorporate implicit feedback. It uses two item factor matrices andwhich reflect explicit and implicit feedback, respectively.
Solution: Asymmetric Factor Models
This basic idea of this model is that two users will have similar user factors if they have rated similar items, irrespective of the values of the ratings. This model can incorporate other independent implicit feedback into matrix .
The implicit feedback matrix is a row-scaled matrix of rating matrix.
The implicit item-factor matrix : if the element is large, it means that the act of rating item contains significant information about the affinity of that action for the latent component, no matter what the actual value of the rating might be.
The explicit item-factor matrix .
In the item-based parameterization, can be viewed as an item-to-item prediction matrix. It tells us that how the action selecting item affects the predicted rating of item.
This model can work well for out-of-sample users, although it doesn't work for out-of-sample items.
Solution: SVD++
is used to adjust the explicit user-factor matrix
The implicit feedback component of the predicted rating is given by
entry of is given by . This model can be viewed as a combination of the unconstrained matrix factorization model and the asymmetric factorization model. In terms of its having an implicit feedback term together with its regularizer, it's different from the model in the previous section.
Solution: Non-negative Matrix Factorization
NMF provides great interpretability to implicit feedback situation. Especially, it is useful for the mechanism to specify a liking for an item, but no mechanism to specify a dislike. In customer transaction data, not buying an item does not necessarily imply a dislike because there is a probability of customers buying this item.
There are clear two classes of dairy products and drinks.
All customers seem like juice, but there is a high correlation between user and buying aspects.
Customer 1 to 4 like dairy products, whereas customer 4 to 6 like drinks.
Unlike explicit feedback data sets, it is not possible to ignore the missing entries in the optimization model because of the lack of negative feedback in such data. NMF just treats these missing entries as . However, too many zeros can cause computational challenges in large matrices. It can be handled with an ensemble method or by weighting less zero entries.