A Brief Introduction to Collaborative Filtering + Image
Pixly Logo
A Brief Introduction to Collaborative Filtering

A Brief Introduction to Collaborative Filtering

2019-11-21

Recommender Systems (RS) provide suggestions for particular items that are likely to be interested by target users. Those suggestions can be in any area that relates to decision-making processes.  

Youtube video suggestions, Spotify recommendations or  any particular suggestion offered to you on shopping sites that is related with the item that you bought is the result of a specific recommender system.

There are many approaches in recommenders systems considering various situations with pros and cons. Those are Content-Based, Collaborative Filtering, Knowledge-Based, Demographic and Hybrid-Systems. In this article we will only focus on collaborative filtering while considering movie recommendations.

 

It is like getting a movie recommendation from a friend you trust her/his cinema taste

As we know people tend to take movie recommendations from friends or other people. The basic idea of collaborative filtering is lying on analyzing people's shared interests on domain specific items (in our case, domain items are movies ). Calculating cinema similarity of people or similarity between different movies with each other will allow us to make plausible recommendations. Those recommendations are classified as user-based collaborative filtering (UB-CF) and item-based collaborative filtering (IB-CF) methods respectively.

On the rest of this article, to make a simple illustration we will consider a hypothetical situation that are mainly consists of two persons; Person A and Person B and two movies Movie X and Movie Y.

 

 

User-Based Collaborative Filtering

In real life, Person A asks a good movie to watch  from Person B. If suggested movie is liked by Person A than we assume that Person A will be more likely to ask Person B for new suggestions in the future.

In our case, we first define a threshold number which defines the required minimum number of commonly watched movies of two persons. It was found that choosing 25 as our threshold number could significantly improve the accuracy of the predicted ratings, and that a value of 50 for gave the best results[2,3]. 

We will assume that the number of  commonly watched movies of Person A and Person B has more than our threshold number. Then, we will calculate their cinema taste similarity with  Pearson Correlation and we assume that those two persons have positive correlation. Then we will consider those two persons as neighbours.

After that, if some of the neighbours ( Person B) of Person A  likes Movie X, recommending Movie X to Person A can be a good suggestion.

Furthermore, with enough information and a good algorithm, we can also make a plausible prediction about how much rating will be given to Movie X by Person A. This can be classified as user-based collaborative filtering (UB-CF). On atomic perspective, UB-CF firstly holds two persons and compare their ratings on commonly watched movies, then classify them whether they are neighbours or not. After all the neighbours are considered, then our algorithm decides to whether a specific movie can be recommended to Person A or not.


 

Item-Based Collaborative Filtering

On the other side, Item-Based Collaborative-Filtering (IB-CF) analyze the similarity of two movies while comparing the ratings given by the same user.

In our case, let's say Movie X and Movie Y has rated by 50 different persons. Firstly, similarity between those two movies are calculated. When those two items are positively correlated, this time we can say that Movie X and Movie Y are neighbour items.

In item-based methods, the rating predicted for a movie is based on the ratings given to similar movies. Consequently, recommender systems using this approach will tend to recommend to a user items that are related to those usually appreciated by this user. For instance, in  our case recommending movies having the same genre, actors or director as those highly rated by the user are likely to be recommended. While this may lead to safe recommendations, it reduces the chance of discovering a movie from a genre that a user never watched before. 

Although user-based method relatively more risky than the item based methods, it is more likely to make serendipitous recommendations.

In Pixly,  We are calculating the similarities of movies with various machine learning methods. We also attach great importance to serendipitous experience of our users. You will see two different ‘similar section’ in movie pages. One of them is ‘movie recommendations’ which is based on user-based collaborative filtering, and the other one is ‘similar movies’ section which is based on content-based recommendations. 

 

 

References

  1. Francesco Ricci, Lior Rokach, Bracha Shapira: Recommender Systems Handbook 2E

  2. Herlocker, J., Konstan, J.A., Riedl, J.: An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf. Retr. 5(4), 287–310 (2002)

  3. Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: SIGIR ’99: Proc. of the 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 230–237.  ACM, New York, NY, USA (1999)