SESSION

Homologous Apache Spark Clusters Using Nomad

Slides PDF Video

Nomad is a modern cluster manager by HashiCorp, designed for both long-lived services and short-lived batch processing workloads. The Nomad team has been working to bring a native integration between Nomad and Apache Spark.

By running Spark jobs on Nomad, both Spark developers and the engineering organization benefit. Nomad’s architecture allows it to have an incredibly high scheduling throughput. To demonstrate this, HashiCorp scheduled 1 million containers in less than five minutes. That speed means that large Spark workloads can be immediately placed, minimizing job runtime and job start latencies.

For an organization, Nomad offers many benefits. Since Nomad was designed for both batch and services, a single cluster can service both an organization’s Spark workload and all service-oriented jobs. That, coupled with the fact that Nomad uses bin-packing to place multiple jobs on each machine, means that organizations can achieve higher density. Which saves money and makes capacity planning easier.

In the future, Nomad will also have the ability to enforce quotas and apply chargebacks, allowing multi-tenant clusters to be easily managed. To further increase the performance of Spark on Nomad, HashiCorp would like to ingest HDFS locality information to place the compute by the data.

Session hashtag: #SFeco14

Alex Dadgar, Project Lead at Hashicorp

About Alex

Alex is the project lead for Nomad, a distributed, highly-available cluster scheduler by HashiCorp. Prior to joining HashiCorp, Alex worked at Google where he architected a streaming-processing system to handle terabytes of YouTube data a day. Having seen the dream of infrastructure at Google, he joined HashiCorp to build it for the rest of the world!