Using GraphX/Pregel on Browsing History to Discover Purchase Intent

Slides PDF Video

At Rubicon Project Buyer Cloud, we use intent data to find new customers for advertisers. Intent data comes in many forms, one of which is online browsing history. This type of data is large in volume, high dimensional, sparse, noisy, but informative. We present a propagation-based model that uses GraphX and Pregel to identify possible customers for each advertiser. This method shows significant improvement over applying a combination of dimension reduction and classification algorithms. Finally, we describe the technical challenges in bringing the model to production.

Lisa Zhang, Data Scientist at Rubicon Project

About Lisa

Lisa is a Data Scientist at Rubicon Project Buyer Cloud. Her work involves using machine learning to identify possible customers for advertisers, using Spark, GraphX, and MLlib extensively to do so. Lisa has a soft spot for well-designed data visualizations and data tools. She founded a data visualization startup called Polychart.