Neighborhood-Based method
Neighborhood-Based Collaborative Filtering
Last updated
Was this helpful?
Neighborhood-Based Collaborative Filtering
Last updated
Was this helpful?
This algorithm assumes that similar users show similar patterns in rating. It can be categorized into to methods.
User-based collaborative filtering(cf): The predicted ratings of user A are computed from the peer group ratings.
Item-based collaborative filtering(cf): The predicted ratings of user A are computed from the similar items with target item.
The problems we need to solve are two as follows:
Predicting the rating value of a user-item combination
Determining the top-k items or top-k users
These two problems are closely connected, because in order to determine the top-k items for one user, one can predict the ratings of each item for that user. For neighborhood-based method to solve the prediction problem, it makes use of similar user/item information.
Continuous ratings
Real value from -10 to 10
Interval-based ratings
Integer value from 1 to 5
Ordinal ratings
Ordered categorical values
Binary ratings
Positive or Negative
Unary ratings
Like button
The element in rating matrix varies by the type of rating method. Above this table, unary rating is an implicit feedback. It needs to be considered PU learning problem(Positive and Unlabeled learning).
The distribution of this rating often has a long tail property because some items are popular but others are not. It leads to a highly skewed distribution of the underlying ratings.
In many cases, high-frequency items give a little profit than the lower frequency item.
It is hard to provide robust rating prediction in the long tail part because of the rarity of this region. Many recommendation system has a tendency to suggest popular items rather than infrequent items.
The neighborhoods are often defined on the high frequency item, which doesn't reflect the low-frequency item. It leads to misleading evaluations of recommender system.
With this correction, the Pearson expression is also changed like this(re-read this part after you read the other parts!)
The system predicts rating of unspecified item for user A.
This result implies that item 1 should be prioritized over item 6 as a recommendation to user 3, and user 3 is likely to be interested in both movies 1 and 6 to a greater degree than any of the movies this user has already rated.
However, after the mean-centered computation, we can check the preference for item 6 is lowest among items user 3 has rated, which is opposed to the result above. It suggests that there are some biases in rating of users, so it must be corrected by mean centered way.
Similarity Function Variants
Pearson correlation coefficient is preferable to the raw cosine method because Pearson method adjusts the bias effect using mean-centering.
Prediction Function
The ratings are mean-centered.
Leverage the user's own ratings on similar items
Example
This result suggests item 1 is more likely to be preferred by user 3 than item 6.
The actual process is to compute all possible rating predictions for the relevant user-item pairs and then rank them. It needs a lot of computational time, so recommender system have an offline phase to compute these intermediate computations.
Offline phase: the user-user(or item-item) similarity values and peer groups are computed.
Online phase: Similarity values and peer groups are leveraged to make predictions
Calculating Similarity
Online computation
Ranking them for a target user
Ranking users for a target item
The thing is that computationally hard works have to be allocated to the offline phase.
Provides more relevant recommendations because of the fact that a user's own ratings are used to perform the recommendation.
Provides a concrete reason for the recommendation.(Because you watched "Secrets of the Wings," [the recommendations are] <List> - Netflix)
Being stable with changes to the ratings. Because the number of users is much larger than the number of items and new users are likely to be added more frequently in commercial systems than new items.
One can also use a unified method using a combination function of users and items.
Compressing the item dimension or the user dimension
Alleviating the sparsity problem
Determining the latent representations of both the row space and the column space
Being computed using either SVD or PCA
Example(SVD method)
This table clearly tells the fact that the ratings between Gladiator and Nero are same. It means the correlation between these two movies is high. However, in filling unspecified values(like in the SVD method above), the covariance structure changes as follows. In this case we fill out the missing value as 4(the mean of [1, 7, 1, 7])
This estimated covariance matrix seems wrong, because the covariance between Godfather and Gladiator(4.36) is bigger than one between Gladiator and Nero(2.18). It doesn't match the result of the rating table.
As remedies for bias, following methods are suggested.
Assuming covariance matrix based on a generative model.(Generative model is the term used in a semi-supervised model, and it focuses on the distribution of each class.)
More direct approach for bias correction is to use matrix factorization method. When the matrix is sparse, estimation for covariance matrix becomes unreliable because it lost robustness.
This algorithm uses matrix (user-item rating matrix) that has elements. Most of elements are missing in recommendation problem, so we need to handle with sparse matrix .
The thing is that the most important part in recommendation is assuming rating matrix
To resolve this long tail problem, we can get key concept from Inverse Document Frequency (idf). is the number of ratings of item , and is the total number of users. It makes weights for each item.
is a set of item indices for which ratings have been specified by user
In many implementations, and are calculated in pairwise fashion during the Pearson computation, which needs items rated both by user and . Aside this way, you just can calculate with ignoring user like in basic setting. This way is good to evaluate over a single common item. The calculation in this posting adopts the latter way.
To determine the similar group for user , we can select users with the highest Pearson coefficient with the user . However, closest users can vary by the item, so it's better to select users repeatedly by each predicted item.
After defining the similar group, we predict rating for user using a weighted average of ratings from similar group. For the calculation, we need to consider that some users might like to rate highly but others can make rating usually lowly. To correct this bias, raw ratings need to be mean-centered in row-wise fashion. is the set of closest users to target user , who have specified ratings for item ( When some users in this set have low or negative correlations with target user we can eliminate these users from this set as a heuristic enhancement. )
Two users can have only a small number of ratings in common. In this case, the similarity function has to be reduced to de-emphasize the importance of these two users. For this purpose, we can add the discount factor , called significance weighting.
The score can be used to rank the items in order of desirability for user , even if there are some cases that predicted values are outside the range of permissible ratings.
Furthermore, there is a way to amplify the connectivity with . Also by treating ratings as categorical values we can approach the recommender system as a classification problem.
is a set of top matching items to item . The concept is same with the user-based methods.
Determine similar items for item
Predict
( is the maximum number of specified ratings and is the maximum running time for computing the similarity between a pair of users)
Filling the incomplete matrix , by the mean of the corresponding row or column.
Computing matrix , s.t. is the filled matrix in the previous step.
Decomposing into , is a matrix whose columns have orthonormal eigenvector of S. Let be the matrix containing only the columns of corresponding to the largest eigenvectors.
is matrix, which represents m users in a dimensional space. We can use this matrix when we drive the peer group of each user.
In PCA method, the covariance matrix of is used instead of the similarity matrix . The difference is that to compute the covariance matrix, the mean centered process is needed. It shows benefits in reducing bias.
can be directly projected on the reduced matrix , rather than projecting the filled matrix on . is the averaged contribution of user on the th latent vector.