Better Visibility into Spark Execution for Faster Application Development

Slides PDF Video

It can be a frustrating experience for an application developer when her application:
(a) fails before completion,
(b) does not run quickly or efficiently, or
(c) does not produce correct results.
There are many reasons why such events happen. For example, Spark’s lazy evaluation, while excellent for performance, can make root-cause diagnosis hard. We are working closely with application developers to make diagnosis, tuning, and debugging of Spark applications easy. Our solution is based on holistic analysis and visualization of profiling information gathered from many points in the Spark stack: the program, the execution graph, counters, data samples from RDDs, time series of metrics exported by various end-points in Spark, YARN, as well as the OS, and others. Through a demo-driven walk-through of failed, slow, and incorrect applications taken from everyday use of Spark, we will show how such a solution can improve the productivity of Spark application developers tremendously.

Photo of Shivnath Babu

About Shivnath

Shivnath Babu is an Associate Professor of Computer Science at Duke University and the Chief Scientist at Unravel Data Systems. His research focuses on ease-of-use and manageability of data-intensive systems, automated problem diagnosis, and cluster sizing for applications running on cloud platforms. Shivnath co-founded Unravel to solve the application management challenges that companies face when they adopt systems like Hadoop and Spark. Unravel originated from the Starfish platform built at Duke which has been downloaded by over 100 companies. Shivnath has received a U.S. National Science Foundation CAREER Award, three IBM Faculty Awards, and an HP Labs Innovation Research Award.

Photo of Lance Co Ting Keh

About Lance

Lance Co Ting Keh is a Sr. Software Engineer and founding member of machine learning at Box, where they are building a platform that makes it ridiculously easy for enterprises to share, manage and create content. Lance is an academic at heart who is currently interested in distributed systems that can support large scale machine learning. His past papers and patents span the fields of nonlinear dynamics, fault tolerant computing and energy absorbing materials. He has a BS and MS from Duke University.