Greece Outside In: After Building A Powerful Recommendation System For Netflix, This Guy Wants To Help You Find Your Next Favorite Book

Monday, March 3, 2014

After Building A Powerful Recommendation System For Netflix, This Guy Wants To Help You Find Your Next Favorite Book

Nicholas Ampazis builds software that makes recommendations for you. Where it was once making movie recommendations on Netflix, he's turning it to the book world.

It started in 2006 when Ampazis and a small team competed for the Netflix Prize, a competition that the company sponsored to pit engineers against each other to see who can build the best movie recommendation engine. When the competition ended in 2009, Ampazis and his team had won second place. He's now plying his skills in computer-enhanced recommendations for Entitle, a company that aims to be the Amazon of just e-books.

And with the number of books Entitle customers sift through in search of their next great read, why not use a computer to help you eliminate irrelevant stuff you don't like?

We caught up with Ampazis via email to learn more.

BUSINESS INSIDER: What's your background?

NICHOLAS AMPAZIS: I studied electrical engineering at Imperial College London and after my graduation I continued my studies pursuing a master's and a PhD degree in neural networks from King's College London and National Technical University of Athens, respectively. I'm currently an assistant professor at University of the Aegean, Greece, and I'm leading the Data Science group at Pattern Explorations Ltd, London.

BI: What is your relationship to the Netflix recommendation engine? How long were you involved?

NA: My team got involved with the Netflix Prize since its launch in late 2006. We called our team "Feeds2" — named after the Feeds 2.0 personalized news aggregator service that we'd launched earlier that year. What was special about that competition is that it put the spotlight on the use of data mining and machine learning methods for predicting user preferences. The Netflix Prize provided an excellent opportunity, as well as a challenge, for us in order to test the efficiency and scaling of the algorithms developed for the Feeds 2.0 service. Initially we tried to implement some of our Feeds 2.0 algorithms for the Netflix dataset, but it soon became apparent that the problem was quite different.

Feeds 2.0 used a lot of text mining methods that were found to be inapplicable to the Netflix dataset due to the different nature of the data involved. In the Netflix Prize there was no textual information and the only applicable algorithms that we'd had from Feeds 2.0 were pretty much along the same lines of the Cinematch approach that Netflix already had. It took a lot of effort developing collaborative filtering/machine learning methods and code from scratch in order to climb up quickly on the Netflix prize leaderboard. In July 2009, when the competition ended, we ended up in the 2nd place as members of "The Ensemble" team (tied in score with the winning team). Feeds2 was also the 3rd autonomous team in the leaderboard.

BI: What kinds of things did the Netflix software look at to make recommendations? How long you watched something?

NA: The dataset released for the Netflix Prize consisted of star ratings (on a scale 1-5) that users gave to movies that they'd already watched. At the time the prize was announced Netflix was a DVD company and the goal was to help people fill their queue with titles to receive in the mail over the coming days and weeks (therefore there was no feedback during viewing). However, Netflix as a whole has changed dramatically in the last few years.

Netflix launched an instant streaming service in 2007, one year after the Netflix Prize began. Streaming has not only changed the way members interact with the service, but also the type of data available to use in the algorithms. For streaming, members are looking for something great to watch right now; they can sample a few videos before settling on one, they can consume several in one session, and therefore Netflix can observe viewing statistics such as whether a video was watched fully or only partially. So nowadays, Netflix monitors a plethora of signals that they blend into their recommendation engine.

BI: What are the differences between recommending books and movies?

NA: Entitle is a paradise for recommendations because we have two very well-defined sources of information. The actual text that is contained in the books, plus the star ratings that users give to the books that they read. Thus we have the best of both worlds. We can apply the text mining methods that we'd developed for Feeds 2.0 (enhanced with the latest findings in text mining research) and the collaborative filtering / machine learning methods from the Netflix prize.

A very important parameter here is the psychological process by which users rate a book. This is a very well thought process because reading a book takes significantly more time than watching a movie. Therefore people select carefully the star rating to give to a book they've read depending on the emotions that it made them feel while they were reading it. This makes the rating signals that we have at our disposal quite more accurate than those that we had for the Netflix prize.

BI: How does one build a book recommendation engine?

NA: A good recommendation engine is judged by the quality of the recommendations that it produces and by its utility to users. There is a variety of factors and metrics that can measure the performance of the recommendation engine. Examples are the deviation of the engine's predictions from known ratings, the quality of similarities that it produces between books or between users with similar tastes, or the choice of the order by which to place the recommended books. Optimizing all these factors is critical in providing an effective personalized experience.

BI: What types of interactions does Entitle's system look for to make recommendations?

NA: Entitle's recommendation engine utilizes a suite of machine learning algorithms that aim to discover and annotate the archive of Entitle's book collection with topical information. It analyzes the actual texts to discover the topics that run through them and how those topics are connected to each other. It then utilizes the results of the analysis to assign the books to well defined multi-thematic areas.

Thus, in a sense, Entitle's recommendation engine is like having a librarian at your disposal who has actually read all the books in the library and knows which are the most representative books for each subject.

In addition, as is the case with a trusted librarian in your local library, the engine can monitor how your tastes for certain subjects change over time and can adjust its focus on providing trusted recommendations on the new topics of interest while not losing track of past associations.

BI: In general, how good are people at finding things they like, such as movies, books, otherwise? Do they do better with help from software such as this?

NA: American psychologist Barry Schwartz in his book "The Paradox of Choice: Why More Is Less" argued that people become less satisfied with their decision when they are given more options to choose from. Thus eliminating some irrelevant choices can greatly reduce consumer anxiety.

In addition as more and more rapidly changing information becomes available, people are overwhelmed by the new information and are no longer able to maintain trust on their decisions. The publishing, movie, and music industries are notable examples that provide practically endless choices. Recommendation systems as an integral part of any such online service can indeed help users by providing better access to the products that fit their needs and help them discover items that they would otherwise likely miss in a sea of information.

Over 35% of sales at Amazon and Netflix comes from recommendations, which, if anything, proves the indisputable value of a good recommendation system.

Join the conversation about this story »