How Spark Fits into Baidu's Scale

Slides PDF Video

Over the last decade Baidu has built a very large-scale distributed computing infrastructure that empowers all of their core businesses, ranging from search ads to mobile offerings, to serve 500+million users worldwide. In early last year, they started introducing Spark as a core component of computing infrastructure for its great performance potentials. James Peng will share the successes and challenges in integrating Spark into Baidu’s fleet of distributed computing services.

Photo of James Peng

About James

Dr. James Peng is a Principal Architect at Baidu, where he steers the engineering direction for several divisions, including monetization platforms, infrastructure department, and data science and big data platform. The projects that he initiated and led have made significant contributions to a wide range of core products. The ads budget-control project that he led won Baidu’s prestigious Highest Award in 2013. Before joining Baidu, James was at Google Mountain View engineering team, where he has worked on various projects in the AdWords system. Prior to Google, he was a Research Associate at Stanford University, where his research was focused on distributed computing, data modeling, and large-scale databases. James holds a B.S. degree from Tsinghua University, a M.S. degree from Stats University of New York at Buffalo, and a Ph.D. degree from Stanford University.