San Francisco
June 30 - July 2, 2014

Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.


Spark Summit 2014
Spark on large Hadoop cluster and evaluation from the view point of enterprise Hadoop user and developer
Masaru Dobashi (NTT Data Corporation)

We launched on-premises Hadoop cluster consisting of 1000 nodes with NTT DOCOMO, the leading mobile carrier company in Japan, and have used it for 5 years without any data loss. Our particular emphasis was on the fault tolerance and the scalability to compute vast amount of data in the mobile carrier.

Though Hadoop made it possible to deal with petabytes of data, we need more speed and flexibility these days. Demand for the parallel distributed processing frameworks based on the computational model other than MapReduce was steadily increasing. In response to these demands, we launched feasibility study of Spark, because we considered Spark as a promising candidate which works along with Hadoop, provides us fast multi-stage computation, and simplifies the application development. NTT DOCOMO gave us the opportunity to evaluate the scalability and the operability of Spark on the 1000 nodes cluster.

In this talk, we will show you the result of the evaluation, as well as challenges and observations from the view point of the enterprise Hadoop user and developer.

Masaru Dobashi is a system infrastructure engineer and leads OSS professional service team at NTT DATA Corporation. He has developed enterprise Hadoop cluster consisting of over 1000 nodes in 2009 and this cluster was the one of the largest Hadoop clusters in Japan. After that, he has designed and provisioned several kinds of clusters using non-Hadoop OSS, such as Spark and Storm. He is now responsible for introducing Hadoop, Spark, Storm and other OSS middlewares into enterprise systems.

Slides PDF |Video