Spark Summit 2013 brought the Apache Spark community together on December 2-3, 2013 at the Hotel Nikko in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
The current Spark scheduler relies on a single, centralized machine to make all scheduling decisions. However, as Spark is used on larger clusters and for shorter queries, the centralized scheduler will become a bottleneck. This talk will begin by discussing the throughput limitations of the current Spark scheduler. Next, I’ll present Sparrow, a new scheduler that uses a decentralized, random sampling approach to provide dramatically higher throughput than the current scheduler, while also providing scheduling delays of less than 10ms and fast scheduler failover. Sparrow’s superior performance make it the best choice for users who are pushing Spark to larger deployments and lower latencies.