While Spark and Mesos emerged together from the AMPLab at Berkeley, Mesos is now one of several clustering options for Spark, along with Hadoop YARN, which is growing in popularity, and Spark’s “standalone” mode. This talk describes in detail the integration between Spark and Mesos to support clustering of Spark jobs, including the sequence of events that occurs during the life cycle of a typical Spark job. We’ll discuss recommendations for optimizing performance and resource utilization, and to avoid known limitations. We’ll also discuss possible future work for Spark on Mesos. Along the way, we’ll understand the abstractions that Spark exposes for clustering, in general. We’ll also compare and contrast Spark on Mesos vs. Spark Standalone mode and Spark on YARN. We’ll offer suggestions for when to choose one option vs. the others.
Dean Wampler, Ph.D. is the Big Data Architect for Typesafe, where he leads the projects building products and services centered around Spark, Mesos, and Akka. He is the author of “Programming Scala, Second Edition”, the co-author of “Programming Hive”, and the author of “Functional Programming for Java Developers”, all from O’Reilly. Dean is a contributor to several open source projects and the co-organizer of several technology conferences and Chicago-based user groups.
Tim Chen is a Distributed Systems Engineer at Mesosphere and focuses on containerization and big data frameworks. He is also a PMC/committer on Apache Drill and Apache Mesos, and contributes to other open source projects such as Spark, Kafka and Docker. Before joining Mesosphere, Tim past experiences includes working on data services on Halo, CloudFoundry (PaaS) and search engines.