Neighborhood-Based method

Neighborhood-Based Collaborative Filtering

Introduction

Motivation

This algorithm assumes that similar users show similar patterns in rating. It can be categorized into to methods.

  • User-based collaborative filtering(cf): The predicted ratings of user A are computed from the peer group ratings.

  • Item-based collaborative filtering(cf): The predicted ratings of user A are computed from the similar items with target item.

This algorithm uses matrix RR(user-item rating matrix) that has [ruj][r_{uj}]elements. Most of elements are missing in recommendation problem, so we need to handle with sparse matrix RR.

Problem

The problems we need to solve are two as follows:

  • Predicting the rating value of a user-item combination

  • Determining the top-k items or top-k users

These two problems are closely connected, because in order to determine the top-k items for one user, one can predict the ratings of each item for that user. For neighborhood-based method to solve the prediction problem, it makes use of similar user/item information.

Rating Matrices

The thing is that the most important part in recommendation is assuming rating matrix RR

Rating methods

Rating
Example

Continuous ratings

Real value from -10 to 10

Interval-based ratings

Integer value from 1 to 5

Ordinal ratings

Ordered categorical values

Binary ratings

Positive or Negative

Unary ratings

Like button

The element in rating matrix varies by the type of rating method. Above this table, unary rating is an implicit feedback. It needs to be considered PU learning problem(Positive and Unlabeled learning).

Rating distribution

The distribution of this rating often has a long tail property because some items are popular but others are not. It leads to a highly skewed distribution of the underlying ratings.

  • In many cases, high-frequency items give a little profit than the lower frequency item.

  • It is hard to provide robust rating prediction in the long tail part because of the rarity of this region. Many recommendation system has a tendency to suggest popular items rather than infrequent items.

  • The neighborhoods are often defined on the high frequency item, which doesn't reflect the low-frequency item. It leads to misleading evaluations of recommender system.

To resolve this long tail problem, we can get key concept from Inverse Document Frequency (idf). mjm_jis the number of ratings of item jj, and mmis the total number of users. It makes weights for each item.

wj=log(mmj),      ∀j∈{1,...,n}w_j=log(\dfrac{m}{m_j}), \;\;\; \forall j\in\{1,...,n\}

With this correction, the Pearson expression is also changed like this(re-read this part after you read the other parts!)

Pearson(u,v)=Σk∈Iu∩Ivwk⋅(ruk−μu)⋅(rvk−μv)Σk∈Iu∩Ivwk⋅(ruk−μu)2⋅Σk∈Iu∩Ivwk⋅(rvk−μv)2Pearson(u,v)=\dfrac{\Sigma_{k\in I_u \cap I_v}w_k\cdot(r_{uk}-\mu_u)\cdot(r_{vk}-\mu_v)}{\sqrt{\Sigma_{k\in I_u \cap I_v}w_k\cdot(r_{uk}-\mu_u)^2} \cdot \sqrt{\Sigma_{k\in I_u \cap I_v}w_k\cdot(r_{vk}-\mu_v)^2}}

Prediction

User-based model

The system predicts rating of unspecified item for user A.

Basic setting

IuI_u is a set of item indices for which ratings have been specified by user uu

I1={1,2,3,4,5,6},  I3={2,3,4,5},  I1∩I3={2,3,4,5}I_1=\{1,2,3,4,5,6\}, \; I_3=\{2,3,4,5\}, \;I_1\cap I_3=\{2,3,4,5\}

μu=Σk∈Iuruk∣Iu∣      ∀u∈{1,...,m} \mu_u=\dfrac{\Sigma_{k\in I_u}r_{uk}}{|I_u|}\;\;\; \forall u\in \{1,...,m\}

Correlation

Sim(u,v)=Pearson(u,v)=Σk∈Iu∩Iv(ruk−μu)⋅(rvk−μv)Σk∈Iu∩Iv(ruk−μu)2⋅Σk∈Iu∩Iv(rvk−μv)2Sim(u,v)=Pearson(u,v)=\dfrac{\Sigma_{k\in I_u \cap I_v}(r_{uk}-\mu_u)\cdot(r_{vk}-\mu_v)}{\sqrt{\Sigma_{k\in I_u \cap I_v}(r_{uk}-\mu_u)^2} \cdot \sqrt{\Sigma_{k\in I_u \cap I_v}(r_{vk}-\mu_v)^2}}

In many implementations, μu\mu_u and μv\mu_v are calculated in pairwise fashion during the Pearson computation, which needs items rated both by user uu and vv. Aside this way, you just can calculate μu\mu_u with ignoring user vv like in basic setting. This way is good to evaluate μu\mu_u over a single common item. The calculation in this posting adopts the latter way.

To determine the similar group for user uu, we can select kk users with the highest Pearson coefficient with the user uu. However, closest users can vary by the item, so it's better to select kkusers repeatedly by each predicted item.

Prediction function

r^uj=μu+Σv∈Pu(j)Sim(u,v)⋅svjΣv∈Pu(j)∣Sim(u,v)∣,    svj=rvj−μv\hat{r}_{uj}=\mu_u+\dfrac{\Sigma_{v\in P_u(j)}Sim(u,v)\cdot s_{vj}}{\Sigma_{v \in P_u(j)}|Sim(u,v)|}, \;\; s_{vj}=r_{vj}-\mu_v

After defining the similar group, we predict rating for user uu using a weighted average of ratings from similar group. For the calculation, we need to consider that some users might like to rate highly but others can make rating usually lowly. To correct this bias, raw ratings need to be mean-centered in row-wise fashion. Pu(j)P_u(j) is the set of kk closest users to target user uu, who have specified ratings for item jj( When some users in this set have low or negative correlations with target user we can eliminate these users from this set as a heuristic enhancement. )

Example

Cosine(1,3)=6∗3+7∗3+4∗1+5∗162+72+42+52⋅32+32+12+12Cosine(1,3)=\dfrac{6*3+7*3+4*1+5*1}{\sqrt{6^2+7^2+4^2+5^2}\cdot \sqrt{3^2+3^2+1^2+1^2}}
Pearson(1,3)=(6−5.5)∗(3−2)+(7−5.5)∗(3−2)+(4−5.5)∗(1−2)+(5−5.5)∗(1−2)1.52+1.52+(−1.5)2+(−0.5)2⋅12+12+(−1)2+(−1)2 Pearson(1,3)=\dfrac{(6-5.5)*(3-2)+(7-5.5)*(3-2)+(4-5.5)*(1-2)+(5-5.5)*(1-2)}{\sqrt{1.5^2+1.5^2+(-1.5)^2+(-0.5)^2}\cdot \sqrt{1^2+1^2+(-1)^2+(-1)^2}}

Cosine(1,3)=0.956,    Pearson(1,3)=0.894Cosine(1,3)=0.956, \;\; Pearson(1,3)=0.894

r^31=7∗0.894+6∗0.9390.894+0.939≈6.49,    r^36=4∗0.894+4∗0.9390.894+0.939=4\hat{r}_{31}=\dfrac{7*0.894+6*0.939}{0.894+0.939}\approx 6.49 , \;\; \hat{r}_{36}=\dfrac{4*0.894+4*0.939}{0.894+0.939}=4

This result implies that item 1 should be prioritized over item 6 as a recommendation to user 3, and user 3 is likely to be interested in both movies 1 and 6 to a greater degree than any of the movies this user has already rated.

r^31=2+1.5∗0.894+1.2∗0.9390.894+0.939≈3.35,    r^36=2+−1.5∗0.894−0.8∗0.9390.894+0.939≈0.86\hat{r}_{31}=2+\dfrac{1.5*0.894+1.2*0.939}{0.894+0.939} \approx 3.35 ,\;\; \hat{r}_{36}=2+\dfrac{-1.5*0.894-0.8*0.939}{0.894+0.939}\approx0.86

However, after the mean-centered computation, we can check the preference for item 6 is lowest among items user 3 has rated, which is opposed to the result above. It suggests that there are some biases in rating of users, so it must be corrected by mean centered way.

Variation

Similarity Function Variants

RawCosine(u,v)=Σk∈Iu∩Ivruk⋅rvkΣk∈Iu∩Ivruk2⋅Σk∈Iu∩Ivrvk2RawCosine(u,v)=\dfrac{\Sigma_{k \in I_u \cap I_v} r_{uk} \cdot r_{vk}}{\sqrt{\Sigma_{k\in I_u \cap I_v} r^2_{uk}}\cdot \sqrt{\Sigma_{k\in I_u \cap I_v} r^2_{vk}}}
RawCosine(u,v)=Σk∈Iu∩Ivruk⋅rvkΣk∈Iuruk2⋅Σk∈Ivrvk2RawCosine(u,v)=\dfrac{\Sigma_{k\in I_u \cap I_v} r_{uk} \cdot r_{vk}}{\sqrt{\Sigma_{k\in I_u} r^2_{uk}} \cdot \sqrt{\Sigma_{k \in I_v} r^2_{vk}}}

Pearson correlation coefficient is preferable to the raw cosine method because Pearson method adjusts the bias effect using mean-centering.

DiscountedSim(u,v)=Sim(u,v)⋅min{∣Iu∩Iv∣,β}βDiscountedSim(u,v)=Sim(u,v)\cdot \dfrac{min\{|I_u \cap I_v|,\beta\}}{\beta}

Two users can have only a small number of ratings in common. In this case, the similarity function has to be reduced to de-emphasize the importance of these two users. For this purpose, we can add the discount factor min{∣Iu∩Iv∣,β}β\frac{min\{|I_u \cap I_v|, \beta\}}{\beta} , called significance weighting.

Prediction Function

σu=Σj∈Iu(ruj−μu)2∣Iu∣−1,    ∀u∈{1,...,m}\sigma_u=\sqrt{\dfrac{\Sigma_{j\in I_u}(r_{uj}-\mu_u)^2}{|I_u|-1}} ,\;\; \forall u\in \{1,...,m\}
r^uj=μu+σuΣv∈Pu(j)Sim(u,v)⋅zvjΣv∈Pu(j)∣Sim(u,v)∣,    zuj=ruj−μuσu=sujσu\hat{r}_{uj}=\mu_u+\sigma_u \dfrac{\Sigma_{v\in P_u(j)} Sim(u,v) \cdot z_{vj}}{\Sigma_{v\in P_u(j)}|Sim(u,v)|}, \;\; z_{uj}=\dfrac{r_{uj}-\mu_u}{\sigma_u}=\dfrac{s_{uj}}{\sigma_u}

The ZZscore can be used to rank the items in order of desirability for user uu, even if there are some cases that predicted values are outside the range of permissible ratings.

Furthermore, there is a way to amplify the connectivity with Sim(u,v)=Pearson(u,v)αSim(u,v)=Pearson(u,v)^\alpha. Also by treating ratings as categorical values we can approach the recommender system as a classification problem.

Item-Based Models

AdjustedCosine(i,j)=Σu∈Ui∩Ujsui⋅sujΣu∈Ui∩Ujsui2⋅Σu∈Ui∩Ujsuj2AdjustedCosine(i,j)=\dfrac{\Sigma_{u\in U_i \cap U_j}s_{ui}\cdot s_{uj}}{\sqrt{\Sigma_{u \in U_i \cap U_j} s^2_{ui}}\cdot \sqrt{\Sigma_{u\in U_i \cap U_j}s^2_{uj}}}

The ratings are mean-centered.

r^ut=Σj∈Qt(u)AdjustedCosine(j,t)⋅rujΣj∈Qt(u)∣AdjustedCosine(j,t)∣\hat{r}_{ut}=\dfrac{\Sigma_{j\in Q_t(u)}AdjustedCosine(j,t)\cdot r_{uj}}{\Sigma_{j \in Q_t(u)}|AdjustedCosine(j,t)|}

Qt(u)Q_t(u) is a set of top kk matching items to item tt. The concept is same with the user-based methods.

  1. Determine similar items for item tt

  2. Leverage the user's own ratings on similar items

  3. Predict r^\hat{r}

Example

AdjustedCosine(1,3)=1.5∗1.5+(−1.5)∗(−0.5)+(−1)∗(−1)1.52+(−1.5)2+(−1)2⋅1.52+(−0.5)2+(−1)2=0.912AdjustedCosine(1,3)=\dfrac{1.5*1.5+(-1.5)*(-0.5)+(-1)*(-1)}{\sqrt{1.5^2+(-1.5)^2+(-1)^2}\cdot\sqrt{1.5^2+(-0.5)^2+(-1)^2}}=0.912
r^31=3∗0.735+3∗0.9120.735+0.912=3,    r^36=1∗0.829+1∗0.7300.829+0.730=1\hat{r}_{31}=\dfrac{3*0.735+3*0.912}{0.735+0.912}=3 ,\;\; \hat{r}_{36}=\dfrac{1*0.829+1*0.730}{0.829+0.730}=1

This result suggests item 1 is more likely to be preferred by user 3 than item 6.

Actual Implementation

The actual process is to compute all possible rating predictions for the relevant user-item pairs and then rank them. It needs a lot of computational time, so recommender system have an offline phase to compute these intermediate computations.

  • Offline phase: the user-user(or item-item) similarity values and peer groups are computed.

  • Online phase: Similarity values and peer groups are leveraged to make predictions

Time complexity
User-based
Item-based

Calculating Similarity

O(m2⋅n′) O(m^2 \cdot n')

O(n2⋅m′)O(n^2\cdot m')

Online computation

O(k)O(k)

O(k) O(k)

Ranking them for a target user

O(kâ‹…n)O(k\cdot n)

O(kâ‹…n)O(k \cdot n)

Ranking users for a target item

O(kâ‹…m)O(k \cdot m)

O(kâ‹…m) O(k\cdot m)

( m′m' is the maximum number of specified ratings and n′n' is the maximum running time for computing the similarity between a pair of users)

The thing is that computationally hard works have to be allocated to the offline phase.

Advantage of item-based methods

  • Provides more relevant recommendations because of the fact that a user's own ratings are used to perform the recommendation.

  • Provides a concrete reason for the recommendation.(Because you watched "Secrets of the Wings," [the recommendations are] <List> - Netflix)

  • Being stable with changes to the ratings. Because the number of users is much larger than the number of items and new users are likely to be added more frequently in commercial systems than new items.

One can also use a unified method using a combination function of users and items.

Dimensionality Reduction

  • Compressing the item dimension or the user dimension

  • Alleviating the sparsity problem

  • Determining the latent representations of both the row space and the column space

  • Being computed using either SVD or PCA

Example(SVD method)

  1. Filling the incomplete m×nm\times n matrix RR, by the mean of the corresponding row or column.

  2. Computing n×nn \times n matrix S=RfTRfS=R_f^TR_f, s.t. RfR_f is the filled matrix in the previous step.

  3. Decomposing SS into PΔPTP\Delta P^T, PP is a n×nn\times n matrix whose columns have orthonormal eigenvector of S. Let PdP_d be the n×dn\times d matrix containing only the columns of PP corresponding to the largest dd eigenvectors.

  4. RfPdR_fP_d is m×dm\times d matrix, which represents m users in a dddimensional space. We can use this matrix when we drive the peer group of each user.

In PCA method, the covariance matrix of RfR_f is used instead of the similarity matrix RfTRfR_f^TR_f. The difference is that to compute the covariance matrix, the mean centered process is needed. It shows benefits in reducing bias.

Bias

This table clearly tells the fact that the ratings between Gladiator and Nero are same. It means the correlation between these two movies is high. However, in filling unspecified values(like in the SVD method above), the covariance structure changes as follows. In this case we fill out the missing value as 4(the mean of [1, 7, 1, 7])

Estimated Covariance Matrix

This estimated covariance matrix seems wrong, because the covariance between Godfather and Gladiator(4.36) is bigger than one between Gladiator and Nero(2.18). It doesn't match the result of the rating table.

As remedies for bias, following methods are suggested.

Assuming covariance matrix based on a generative model.(Generative model is the term used in a semi-supervised model, and it focuses on the distribution of each class.)

Estimated Covariance Matrix(From maximum likelihood estimation)

RR can be directly projected on the reduced matrix PdP_d, rather than projecting the filled matrix RfR_f on PdP_d. auia_{ui} is the averaged contribution of user uu on the iith latent vector.

aui=Σj∈Iurujeji∣Iu∣a_{ui}=\dfrac{\Sigma_{j\in I_u} r_{uj} e_{ji}}{|I_u|}

More direct approach for bias correction is to use matrix factorization method. When the matrix is sparse, estimation for covariance matrix becomes unreliable because it lost robustness.

R=QΣPTR = Q\Sigma P^T

Last updated

Was this helpful?