Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this “black-arts” have got started. We have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, the best parameters and the best features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. Our evaluation with real open data demonstrates that our system could explore hundreds of predictive models and discovers the highly-accurate predictive model in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive amount of models on Spark, particularly from reliability and stability standpoints.
Session hashtag: #SFds5
Masato Asahara (Ph.D.) is currently leading developments of Spark-based machine learning and data analytics systems, which fully automate predictive modeling. Masato received his Ph.D. degree from Keio University, and has worked at NEC for 7 years as a researcher in the field of distributed computing systems and computing resource management technologies.
Ryohei Fujimaki (Ph.D.) is research fellow, data science research laboratories, for NEC Corporation, a leading provider of advanced analytics technologies based on artificial intelligence.
In addition to technology R&D, Ryohei is also heavily involved with co-developing cutting-edge advanced analytics solutions with NEC’s global business clients and partners.
Ryohei received his Ph.D. degree from the University of Tokyo, and became the youngest research fellow ever in NEC Corporation’s 117-year history.