Best Practices for Using Alluxio with Apache Spark

Slides PDF Video

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage system that leverages memory for storing data and accelerating access to data in different storage systems. Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over petabytes of data. Alluxio can enable Spark to be even more effective, in both on-premise deployments and public cloud deployments. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. This session will briefly introduce Alluxio and present different ways that Alluxio can help Spark jobs. Get best practices for using Alluxio with Spark, including RDDs and DataFrames, as well as on-premise deployments and public cloud deployments.

Session hashtag: #SFexp2

Cheng Chang, Software Engineer at Alluxio

About Cheng

Cheng Chang is a software engineer at Alluxio and the fourth highest contributor to the Alluxio open source project. He graduated from Tsinghua University in computer science. Cheng is the main developer of Alluxio Manager.

Haoyuan Li, Founder and CEO at Alluxio

About Haoyuan

Haoyuan Li is founder and CEO of Alluxio Inc.(formerly Tachyon Nexus). Before founding the company, he was working on his Ph.D. at UC Berkeley AMPLab, where he co-created Alluxio, a memory-speed virtual distributed storage. Haoyuan is also a founding committer of Apache Spark. Before the AMPLab, he worked at Conviva and Google. Haoyuan has an MS from Cornell University and a BS from Peking University.