Since its inception in 2009, Spark has progressed from an academic endeavor to the most active open source Big Data project with over 400 contributors. Along the way, it has emerged as a popular option for powering enterprise data pipelines with hundreds of production deployments. However, as with any relatively new technology experiencing significant uptake in adoption, one of the most common inquiries around Spark from interested enterprises is to better understand who is using it, what are they using it for, and what lessons they learned along the way. This talk synthesizes our experience from being directly involved with over 50 production Spark deployments across a broad spectrum of industries to provide insights into the following:
* What were the primary drivers of Spark adoption?
* What are the most common Spark workflows and use cases and does it vary by vertical?
* What were the main stumbling blocks and lessons learned?
Arsalan Tavakoli-Shiraji is the VP of Customer Engagement and Business Development at Databricks. Prior to joining Databricks, he was an Associate Principal at McKinsey and Co, where he advised enterprises, vendors, and the public sector on a broad spectrum of strategic topics. Arsalan received a PhD in computer science from UC Berkeley in the area of Networking.