SESSION

Apache Kylin: Speed Up Cubing with Apache Spark

Slides PDF Video

Apache Kylin is a distributed OLAP engine on Hadoop, which provides sub-second level query latency over datasets scaling to petabytes. Kylin’s superior query performance relies on pre-calculated multi-dimension Cube, which is often time-consuming to build. By default, Kylin uses MapReduce Cube Engine built atop of Hadoop MapReduce framework to aggregate huge amounts of source data. The MR Engine has been well-tuned over years and proven to be stable in hundreds of production deployments. Recently, the Kylin team is trying to further speed up the process of cube building by replacing MR with Spark. Kyligence has initiated the new Spark Cube Engine with some benchmarks between Spark and MR over different datasets, and has received some promising results. Hear about their results and experiences on moving Cube building, which is a huge computing task, to Spark.

Session hashtag: #SFeco7

Luke Han, Co-founder and CEO at Kyligence, Inc.

About Luke

Luke Han is Co-Founder and CEO at Kyligence, co-creator and PMC chair of Apache Kylin project; In past few years he had been working on growing Apache Kylin’s community, building ecosystem, and extending adoptions. Prior to Kyligence, he was the Big Data Product Lead at eBay. Prior to eBay, Luke was chief consultant at Actuate China.

Shaofeng Shi, Software Architect at Kyligence Inc

About Shaofeng

Shaofeng Shi is a software architect from Kylingence Inc. He is the committer and PMC member of Apache Kylin project. He developed a couple of core features in Kylin, and has abundant experience in Hadoop and Kylin enablement. Before joining Kyligence, he was a senior software engineer in eBay, CCOE and IBM China Lab.