Here’s the app accompanying this blog
As the Internet gets more and more popular, information overload poses an important challenge for a lot of online services. With all of the information pouring out from the web, users can be overwhelmed and confused as to what, exactly, they should be paying attention. A recommendation system provides a solution when a lot of useful content becomes too much of a good thing. A recommendation engine can help users discover information of interest by analyzing historical behaviors. More and more online companies — including Netflix, Google, Facebook, and many others — are integrating a recommendation system into their services to help users discover and select information that may be of particular interest to them.
With literally tens of thousands of hours of premium video content, users are also prone to content overload. Given the wide variety of content available on the service at any one time, it may be difficult for users to discover new video that best matches their historic interests. So the first goal of a Recommendation System is to help users find content which will be of interest to them.
In addition to users, recommendation system should also help content owners promote their video. Part of the mission is to deliver a service that users, advertisers, and content owners all unabashedly love. Having many different content partners, and understanding the content, these partners want users to watch their videos — especially when new videos are released. By using personal recommendation instead of more traditional recommendation systems, one promotes video content more effectively to users who are likely to enjoy the content being recommended.
Data Characteristics
Before explaining the design of a recommendation system, its crucial to explain some characteristics of the data.
Since a lot of content is comprised of episodes or clips within a show, by deciding to recommend shows to users instead of individual videos. Shows are a good method of organization, and videos in the same show are usually very closely related.
Content can be mainly divided into two parts: on-air shows and library shows. On-air shows are highly important since more than half of streaming comes from them.
Although on-air shows occupy a large part of the content, they are touched by a seasonal effect. During summer months, most of on-air shows do not air, causing on-air show streaming to decrease. Furthermore, there are fewer shows aired during weekends, thus the streaming of library shows will increase. Keeping this information in mind we can design a recommendation system to recommend more library shows to users during the weekend or summer months, as an example.
The key data that drives most recommendation systems is user behavior data. There are two main types of user behavior data: implicit user feedback data and explicit user feedback data.
Explicit user feedback data primarily includes user voting data. Explicit feedback data can show a user’s preference on a show explicitly,
Implicit feedback data includes information on users watching, browsing, searching, etc. Implicit feedback data does not show users preference of a show explicitly. For example, if a user gives a 5-star rating to a show, we know that this user likes the show very much. But if a user only watches a video from a show page or searches for a show, we don’t know whether this user likes the show.
As the quantity of implicit data at far outweighs the amount of explicit feedback, the system should be designed primarily to work with implicit feedback data.
Architecture
There are many different types of recommendation algorithms, and perhaps the most famous algorithm is collaborative filtering (CF). CF relies on user behavior data, and its main idea is to predict user preferences by analyzing their behaviors. There are two types of CF methods: user-based CF (UserCF) and item-based CF (ItemCF).
UserCF assumes that a user will prefer items which are liked by other users who have similar preferences to that user.
ItemCF assumes that a user will prefer items similar to the assets (s)he preferred previously. ItemCF is widely used by many others (for example, Amazon and Netflix), as it has two main advantages. Firstly, it is suitable for sites where there are a lot more users than items. Secondly, ItemCF could easily explain recommendations given users’ historical behaviors. For example, if you have watched “Family Guy” it will recommend “American Dad” to you and tell you that it recommend this because you have watched “Family Guy”. Hence, ItemCF as our basic recommendation algorithm for our Movies Recommender System Application.
Online Architecture
Fig 1 shows an on-line architecture of the recommendation system. This system contains 5 main modules:
- User profile builder: When a user first comes into the recommendation system, it will first build a profile of them. The profile includes the user’s historical behaviors and topics, and these are generated from their old behaviors. Users can have many different types of behaviors. For example, they can watch videos, add shows to favorites, search for videos and vote on videos and shows. All these behaviors are all considered by the system and, after extracting all these behaviors, it uses a topic model which is trained offline to generate users’ preference on topics.
- Recommendation Core: After generating the list of user’s historical preferences on shows and topics, it put all of those similar shows into raw recommendations.
- Filtering: For some pretty obvious reasons, raw recommendation results cannot be presented to users directly. The need to filter out shows the user has already seen or engaged with, thus increases the recommendations shows a little more precise.
- Ranking: The ranking module will re-rank raw recommendations to make them better fit users preferences. First, it will make recommendation more diverse. Then increase novelty of recommendations so that users will find shows they like, but have never seen before.
- Explanation: Explanation is one of the most crucial components of every recommendation system. The explanation module generates some reasoning for every recommendation result using the user’s historical behaviors. For example, it will recommend «American Dad» to a user who had previously watched «Family Guy.» The explanation will say, «We recommend ‘American Dad’ to you because you have watched ‘Family Guy’».
Offline Architecture
In the above on-line architecture, some components rely on offline resources, such as the topic model, related model, feedback model, etc. The offline system is also an important part of the recommendation system. The offline system has these main components:
- Data Center: The data center contains all user behavior data. Some of them are stored in Hadoop clusters and some of them are stored in a relational database.
- Related Table Generator: The related table is an important resource for online recommendation. Using two main types of related table: one that’s based on collaborative filtering (which we’ll call CF), and another based on content.
In CF, show A and show B will have high similarity if users who like show A also like show B.
With content filtering, we use content information including title, description, channel, company, actor/actress, and tags.
- Topic Model: A topic is represented by a group of shows that have similar content. Topics are thus larger in scope than shows, but they’re still smaller than channels. The topics are learned by LDA–part of the dimension reduction family of algorithms, which is a popular topic model in machine learning.
- Feedback Analyzer: Feedback specifically means users’ reactions to recommendation results. Using user feedback can improve recommendation quality. For example, say a show is recommended to many users, but most of them do not click this show. In that case, it will decrease the rank of this show. Users will also have different types of behavior, thus using all these behaviors in developing the recommendations. However, some users may prefer recommendations to come from their prior watch history, and some users may prefer their recommendations to come from their voting behavior. All these effects can be modeled offline by analyzing users’ feedback on their recommendations.
- Report Generator: Evaluation is the most important part of the recommendation system. The report generator will generate a report including multiple metrics each day to show the quality of recommendations.
Algorithms
So far, we’ve given a brief overview of the recommendation architecture. From previous discussion, we can see that a recommendation system is primarily based on ItemCF. By added many improvements on top of the ItemCF algorithm, too, in order to make it generate better recommendations. To test these improvements, we’ve performed many A/B tests on different algorithms. In following sections, we’ll introduce some of these algorithms and the experiment results.
Item-based Collaborative Filtering
Item-based Collaborative Filtering (ItemCF) is the basis of all our algorithms. In ItemCF, let N(u) be a set of items user u has preferred previously. User u’s preference on item i (i is not in N(u)) can then be measured by:
$$p(u,i)=\sum\limits_{j\in N(u)}^{} r(u,j)s(i,j)$$
Here, r(u,j) is the preference weight of user u on show j, and s(i,j) is the similarity between show i and show j. In CF, the similarity between two shows is calculated by user behavior data on these two shows. Let N(i) be a set of users who watched show i and N(j) be a set of users who watched show j. Then, the similarity s(i,j) between show i and show j is calculated by following formula:
$$s(i,j)=\frac{\left | N(i)\cap N(j) \right |}{\sqrt{\left | N(i) \parallel N(j) \right |}}$$
In this definition, show i will be highly relevant to show j if most users who watch show i will also watch show j. However, this definition will have the «Harry Potter problem,» which means that every show will have high relevance with popular shows.
Recent Behavior
The first lesson we learned from A/B testing is that recommendations should fit users’ recent preference and that users’ recent behavior is more important than their older, historical behaviors. So, in our engine, we will put more weight on users’ recent behaviors. In our system, CTR of recommendations that originate from users’ recent watch behavior is 1.8 times higher than CTR of recommendations originating from users’ old watch behavior.
Novelty
Just because a recommendation system can accurately predict user behavior does not mean it produces a show that you want to recommend to an active user. For example, «Family Guy» is a very popular show, and thus most users have watched at least some episodes from this show. These users do not need us to recommend this show to them — the show is popular enough that users will decide whether or not to watch it by themselves.
Thus, novelty is also an important metric to evaluate recommendations. The first way we think can increase novelty is by revising ItemCF algorithm:
- First, we will decrease weight of popular shows that users have watched before.
- Then, we’ll put more weight on shows that are not only similar to shows the active user watched before, but also less popular than shows the active user watched before.
Explanation-Based Diversity
Most users have diverse preferences, so the recommendation should also meet their diverse interests. In our system, we use explanations to diversify our recommendations. We think a diverse recommendation means most of the recommendation shows have different explanations.
By performing an A/B test to show the usefulness of diversification (shown in the above fig). The results of the experiment show that, for active users who had previously watched 10 or more shows, diversification can increase recommendation CTR significantly.
Temporal Diversity
A good recommendation system should not generate static recommendations. Users want to see new suggestions every time they visit the recommendation system. If a user has new behaviors, (s)he will find the recommendations have changed because due to more weight on the user’s recent behaviors. But if a user has no new behaviors, we also need to change our recommendations. We use three methods to keep temporal diversity of our system:
- Firstly, we’ll recommend recently-added shows to users. Many new shows are added each day, and it will suggest these shows to users who will like them. Thus, users will see fresh ideas for shows to watch when new ones are added.
- Secondly, we’ll randomize our recommendations. Randomization is the simplest way to keep recommendations fresh.
- Finally, we’ll decrease rank of recommendations which users have seen many times. This is called implicit feedback, and data show that CTR is increased by 10% after using this method.
Performance Of Recommendation Hub
The recommendation hub is a personal recommendation page for each user. On this page users will see 6 carousels. The top carousel is «top recommendations», which includes shows that we think users will prefer very much. After top recommendations, there are three carousels for three genres. These three genres are selected by analyzing users’ historical preferences. The next carousel is bookmarks, which include shows that users have indicated they’d like to watch later. The last carousel is filled with shows that the user has already rated. This carousel is designed to collect more explicit feedback from users.
By performing an A/B test to compare our recommendation algorithms with two simple recommendation algorithms: Most Popular (which recommends the most popular shows to every user) and Highest Rated (which recommends highly-rated shows to each user). As shown in the above figure, experiment results show that the CTR of the algorithm is much higher than both simple methods.
Lessons
Each user behavior can reflect user preferences. In our system, we use a slew of user behaviors to come up with our recommendations. We’ve computed the CTR of recommendations originating from different types of behaviors. As shown in Fig 3, we can see that recommendations from every type of behavior can generate recommendations that will be clicked by users.
Conclusion
Explicit Feedback data is more important than implicit feedback data.
As shown in Fig 3, CTR of recommendations that originate from users’ historically loved (voted 5 stars on shows) and liked (voted 4 stars on shows) behaviors is higher than CTR of recommendations that come from users’ historical subscribe/watch/search behavior. So although the size our explicit feedback data is much smaller than implicit feedback data, they’re much more important.
Recent behaviors are much more important than old behaviors.
Novelty, Diversity, and offline accuracy are all important factors.
Most researchers focus on improving offline accuracy, such as RMSE, precision/recall. However, recommendation systems that can accurately predict user behavior alone may not be a good enough for practical use. A good recommendation system should consider multiple factors together. In our system, after considering novelty and diversity, the CTR has improved by more than 10%.