San Francisco
June 30 - July 2, 2014

Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.


Spark Summit 2014
Spark’s Role in the Big Data Ecosystem
Matei Zaharia (CTO, Databricks)

Apache Spark continues to grow quickly in both community size and technical capabilities. Since the last Spark Summit, in December 2013, Spark’s contributor base has grown from 100 contributors to more than 200, and Spark has become the most active open source project in big data. We’ve also seen significant new components added, such as the Spark SQL runtime, a larger machine learning library, and rich integration with other data processing systems. Given all this activity, where is Spark heading? I’ll share our goal of Spark as a unifying platform between the diverse applications (e.g. stream processing, machine learning and SQL) and diverse storage and runtime systems in big data.


Matei Zaharia started the Spark project at UC Berkeley and is currently CTO of Databricks. He finished his PhD at UC Berkeley in 2013 and is starting an assistant professor position at MIT.

Slides PDF |Video