Wenchen Fan, Software Engineer at Databricks

Wenchen Fan

Software Engineer, Databricks


Cost-Based Optimizer in Apache Spark 2.2 (continues)

Apache Spark 2.2 ships with a state-of-art cost-based optimization framework that collects and leverages a variety of per-column data statistics (e.g., cardinality, number of distinct values, NULL values, max/min, avg/max length, etc.) to improve the… Read more

A Developer’s View into Spark's Memory Model

As part of Project Tungsten, we started an ongoing effort to substantially improve the memory and CPU efficiency of Apache Spark’s backend execution and push performance closer to the limits of modern hardware. In this… Read more