San Francisco
June 30 - July 2, 2014

Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.


Spark Summit 2014
BI-style analytics on Spark (without Shark) using SparkSQL & SchemaRDD
Justin Langseth, Farzad Aref (Zoomdata)

“Spark allows for extremely fast analytics and joins across huge amounts of data, and the SparkSQL and SchemaRDD extensions in Spark 1.0 provide for new, easier interoperability with existing Hadoop-based data resources and schematized data.

We will share our work at Zoomdata implementing real-time and historical BI-style slice and dice analytics and dashboarding directly on top of Spark (without Shark, due to performance issues that we will discuss). We will highlight our early lessons learned related to data scalability, loading, context sharing, real-time RDD appending/coalescing, and concurrent query handling.

Also we will discuss the new SparkSQL and SchemaRDD features available in Spark 1.0 that allow direct access to Parquet and other schematized data, and discuss partitioning strategies to allow for in-application partition elimination to speed large analytical queries.”

“Justin Langseth is the President & CEO of Zoomdata. Zoomdata is a venture-backed next generation data analysis and visualization company based in Reston, Va. Prior to founding Zoomdata, Justin was the co-founder of Clarabridge and the inventor of Clarabridge’s text analytics software.
Prior to Clarabridge, Justin co-founded and was CTO of Claraview, a BI strategy and technology consultancy, which was sold to Teradata in 2008. Before founding Claraview, Justin served as founder and CTO of, a real-time data analysis and alerting subsidiary of MicroStrategy.
Justin currently holds 14 technology patents, and graduated from the Massachusetts Institute of Technology where he received an SB in Management of Information Technology from the MIT Sloan School of Management.”

As VP of Product Management, Farzad is responsible for Zoomdata’s Roadmap, UX, Quality, and Training. He has over 12 years of experience in successfully building high performing teams and managing complex Analytics implementations for Fortune 500 companies through his tenures with Clarabridge, IBM, and Deloitte. Farzad holds a Systems Engineering degree from University of Virginia where he graduated with distinction.

Slides PDF |Video