Spark’s large success stems from providing a comprehensive environment for both data analytics & predictive analytics (machine learning) scaled out across multi-node clusters holding big data in memory. Successfully enabling the power of deep learning, with its superior performance on unstructured data, will be a key requirement for Spark’s continued ascendancy. Deep learning however relies on training with very large datasets to achieve superior performance, and this leads to a need for GPU-enablement to train models in business-like timeframes. We review different methods for achieving this GPU-enabled deep learning within Spark and the broader Spark ecosystem, and discuss potential strategies for future progress.
Andy Steinbach has over 20 years experience developing revolutionary new technologies and products in the semiconductor & imaging technology domains. Andy holds a PhD in device physics from the University of Colorado, Boulder and was a National Science Foundation postdoctoral Fellow at CEA Saclay in France. He has developed novel optoelectronic and consumer electronic devices at JDS Uniphase and Intel. More recently, Andy led a team developing data science and machine learning techniques for the microscopy domain at Carl Zeiss. Andy is now Senior Director at NVIDIA developing the deep learning market for industrial applications.