Effective Spark with Alluxio

Slides PDF Video

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system and leverages memory for storing data and accelerating access to data in different storage systems.. Alluxio has a quickly growing open source community of developers and users and is deployed at such organizations as Alibaba, Baidu, Barclays, Intel, Huawei, and Qunar. Many of these deployments use Alluxio with Spark, and some of them scale out to over PB’s of data. While Spark is already gaining great adoption, Alluxio can enable Spark to be even more effective. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments both Alluxio and Spark working together. In the meantime, we will provide live demos for some of the use cases.

Gene Pang, Software Engineer at Alluxio

About Gene

Gene Pang is a software engineer at Alluxio and one of the top contributors to the Alluxio project. He recently graduated with a Ph.D. from the AMPLab at UC Berkeley, working on distributed database systems. Before starting at Berkeley, he worked at Google and has an M.S. from Stanford University, and B.S. from Cornell University.

Haoyuan Li, CEO at Alluxio

About Haoyuan

Haoyuan Li is founder and CEO of Alluxio Inc.(formerly Tachyon Nexus). Before founding the company, he was working on his Ph.D. at UC Berkeley AMPLab, where he co-created Alluxio, a memory-speed virtual distributed storage. Haoyuan is also a founding committer of Apache Spark. Before the AMPLab, he worked at Conviva and Google. Haoyuan has an MS from Cornell University and a BS from Peking University.