Large-Scale Ads CTR Prediction with Spark and Deep Learning: Lessons Learned

Slides PDF Video

CTR prediction algorithms are essential, and are used extensively for ads bidding and sponsored search. While logistic regression models have proven effective for this kind of problem, rapid growth in the amount of data has created a lot of challenges. For example, how to train a logistic regression model with billions of parameters in a commodity hardware cluster, or how to improve the model’s accuracy with better feature engineering. Other challenges include figuring out how to benefit from popular deep learning technologies to reduce the dependence on human labor and expert knowledge, and how to improve job performance given such a complicated workload.

At Spark Summit East 2017, Hortonworks introduced vector-free L-BFGS to conquer the scalability challenge of MLlib and provide a very scalable logistic regression implementation. In this talk, hear about their experience integrating this implementation with different feature learning technologies to solve Ad CTR prediction problems, and the lessons they learned.

Session hashtag: #SFds1

Yanbo Liang, Software Engineer at Hortonworks

About Yanbo

Yanbo is an Apache Spark Committer working on MLlib and SparkR at Hortonworks. His main interests center around implementing effective machine learning algorithms and building machine learning applications based on scalable distributed system. He is an active Apache Spark contributor, delivered the implementation of some major MLlib algorithms. Prior to Hortonworks, he was a software engineer at Yahoo! and France Telecom working on machine learning and distributed system.