Speaker

Cheng Lian, Software Engineer at Databricks

Cheng Lian

Software Engineer, Databricks

Cheng got in touch with Spark since late 2013 and joined Databricks in early 2014 as one of the main developers behind Spark SQL. Now he’s a committer of Apache Spark and Apache Parquet. His current areas of interest include databases and programming languages.

Sessions

Why You Should Care about Data Layout in the Filesystem

Efficient data access is one of the key factors for having a high performance data processing pipeline. Determining the layout of data values in the filesystem often has fundamental impacts on the performance of data… Read more