Data Science Transformation Via Apache Spark on Hybrid Cloud

Slides PDF Video

Most enterprises have their business running on legacy environments on premise. Just picking up and moving everything to the cloud isn’t an option for the vast majority. Cloud migration requires a critical mass of data, with applications around it. And a critical part of building data assets with apps around them is leveraging Apache Spark and other open source tools. If you have to wait until large scale enterprise is ready for cloud, you’ll miss the opportunity of hybrid cloud: to do data science at scale, and to have systems with higher reliance and more elasticity. Hybrid cloud also gives us the opportunity to apply Spark and other opens source technologies while you’re building your longer term cloud strategy. In this keynote I’ll share experiences using Spark for data science transformation and share some thoughts on a larger vision for data science transformation at scale.

Seth Dobrin, VP and Chief Data Officer, IBM Analytics at IBM

About Seth

Seth led the data science transformation at Monsanto, including oversight of their first use cases of Apache Spark for geospatial and genomic analysis. He was heavily engaged in Monsanto’s cloud migration which ultimately relied on what looks today like a hybrid cloud strategy. Seth is a member of the IBM Spark Technology Center Advisory Council and has recently joined IBM as VP and CDO for the Analytics Business Unit.