With the bloom of Apache spark, various big data applications shift to Spark pool to pursue better user experience. However the initial performance doesn’t always meet expectation. In this talk, we will share our experience on working with several top China internet companies to build their next generation big data engine on Spark – including graph analysis, interactive, batch OLAP/BI and real-time analytics. With careful tuning, Spark brought x5-100 speedup versus their original Map Reduce implements. We even accumulated certain experience to further improve the user experience from building real-world Spark application in production environment. We expect this talk will be very useful for people who want to deploy their own spark application and also spark developers who are interested to learn some real case challenges.
Grace Huang is currently an engineering manager in Intel SSG (Software and Services Group), responsible for advanced Big Data technology enhancement and optimization including Hadoop, Spark and etc. Prior to that, she had been working in the big data area in Intel for over 6 years, with intensive experience on Hadoop, HBase performance tuning and optimization.
Jiangang Duan manages Cloud and Bigdata engineering team in Intel Asia-Pacific Research & Development Ltd. He has worked on enterprise solution tuning and optimization for more than fourteen years, including several generation Intel processor performance evaluations and jointly working with local OEM to complete world record broken industry benchmark publications. His technical interests now focuses on Cloud Computing and big data technology, responsible for development, deployment and optimization efficient cloud and big data solution with open source software (xen/kvm/openstack/Apache Hadoop and Spark). Before joined Intel, Jiangang got his bachelor degree in 1999 and Master degree in 2001 from EE department