Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
As “the most active cluster data processing engine after Hadoop MapReduce”, Spark has already gathered a large community of users and gradually entered the datacenter for next-gen big data applications. During the past year, we spent a lot of efforts on building real-world applications by using Spark for several big web sites(e.g., Alibaba, iQiyi, Youku and etc.). Those experiences demonstrated real needs and concrete usage of Spark in graph analysis, interactive, batch OLAP/BI and real-time analytics. And also some learning of using Spark is obtained, for example memory management, analytic query execution and so on.
In this talk, we will present our experience and also several lessons learned while building real-world Spark application in production environment.
Grace Huang is currently an engineering manager in Intel SSG (Software and Services Group), responsible for advanced Big Data technology enhancement and optimization including Haodop, Spark and etc. Prior to that, she had been working in the big data area in Intel for over 5 years, with intensive experience on Hadoop, HBase performance tuning and optimization.