Recommendation engines are proven drivers of sales. Competitions, such as the Netflix Prize and Kaggle, have driven a great deal of research on recommendations. However, most real world data sets don’t give you as much to work with. Many e-commerce datasets lack explicit ratings, consisting solely of binary purchase information. This means that for most of our data, we don’t know whether a missing value is actually missing or negative. Such data requires special consideration and treatment for both model selection and validation of results. In this talk I will describe implementation of a recommendation system for binary purchase data in Spark’s MLlib, compare fitting and prediction benchmarks for various models, and illustrate the performance differences across different scales of big data. Finally, I will share the lessons learned in how to efficiently select and implement the best recommendation model for your dataset.
Leah McGuire is a Senior Member of Technical Staff at Salesforce, implementing data-driven features and recommendations in Salesforce products. Before joining Salesforce, Leah was a Senior Data Scientist on the data products team at LinkedIn working on personalization, entity resolution, and relevance for a variety of LinkedIn data products. She completed a PhD and a Postdoctoral Fellowship in Computational Neuroscience at the University of California, San Francisco, and at University of California, Berkeley, where she studied the neural encoding and integration of sensory signals.