Recommendation systems have been very successful in helping companies to keep engaging their existing users as well as to attract new users. At MyFitnessPal, now part of Under Armour – Connected Fitness, our goal is to recommend the most relevant and healthy foods/recipes to our users, and some important implementation considerations are: 1. change in user’s preference over time, which means that RecSys pipelines should be updated frequently. 2. 15B+ food entries generated by our 80M+ registered users, which makes it hard to develop a production-ready, well fine-tuned and fully scalable RecSys in a timely manner. In this talk, I will present how we used Spark to address the scalability, flexibility and easy-to-develop aspect of our RecSys pipeline taking into account the above mentioned implementation considerations. Overall, Spark has been very effective tool letting us focus on more fundamental aspects of modeling while minimizing the extra burden of dealing with very complex parallel processing of data.
Joohyun Kim is senior data scientist at MyFitnessPal, Under Armour—Connected Fitness, currently working on various data science projects including food categorization and RecSys. Prior to MyFitnessPal, he worked on machine learning based fraud detection at eBay. He received his Ph.D in Computer Science from the University of Texas at Austin. During his Ph.D, he specialized on semantic parsing/understanding and other related NLP problems. His most recent interests include how to build large-scale distributed systems and algorithms for machine learning and NLP.