Photo of

Michael Armbrust

Software Engineer, Databricks

Michael Armbrust is the lead developer of the Spark SQL project at Databricks. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage and query optimization. He was the 2011 recipient of the Sevin Rosen Award for Innovation.


Spark DataFrames: Simple and Fast Analysis of Structured Data

This post will provide a technical overview of Spark’s DataFrame API. First, we’ll review the DataFrame API and show how to create DataFrames from a variety of data sources such as Hive, RDBMS databases, or…