Your application is slow but you don’t know why. It could be doing unnecessary shuffles, or evicting heavily-used cached data from memory, or suffering from data skew, or… With the recent visualization additions to the SparkUI, users can now quickly pinpoint bottlenecks in their applications and derive compelling insights about their usages of Spark. In this talk, we will walk through how to leverage these visuals to illuminate the design decisions of several example Spark applications. The applications showcased will include those that use SparkSQL, Spark Streaming, MLlib, and dynamic allocation.
Andrew is a Spark PMC member. In the past, he has contributed several large features to the project, including event logging, external spilling, history server, dynamic allocation, and DAG visualization on the SparkUI. He is an active maintainer of the Spark on YARN integration component.