How well an analytics engine can respond to changing workload demands and resource availability will greatly determine its usefulness and adoption rate. In this talk, we will present a study of the effectiveness of the elasticity property of Spark when deployed on popular resource managers such as Mesos and YARN. In particular, we investigate how well Spark workloads running on Mesos and YARN clusters behave as nodes are added and removed from the clusters. Key measurements include workload runtime, resource utilization delay, average task waiting time, disk I/O and network bandwidth consumption. We then analyze the impact of changing key scheduling parameters (e.g., locality wait time, locality preference, granularity of locality wait time, speculation, resource re-offer interval, etc.) on the above measurements. Lessons from this work will enable the building of effective auto-scaling infrastructure for Spark in a cloud environment.
Michael Le is currently a research staff member at the IBM T. J. Watson Research Center. His current research focus is on cloud infrastructure and cloud platform management.
Min Li is a research staff member at IBM T.J. Watson Research Center. Her research interests include advanced data analytic platform, cloud computing, distributed systems and operating systems. She received her Ph.D. degree in computer science at Virginia Tech spring 2014. She has now focused on performance optimization and resource management technique for advanced data analytics platforms in the cloud.