Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
In this talk, I will introduce the new JSON support in Spark. With the JSON support, users do not need to define a schema for a JSON dataset. Instead, Spark SQL automatically infers the schema based on data. Then, users can write SQL queries to process this JSON dataset like processing a regular table, or seamlessly convert a JSON dataset to other formats (e.g. Parquet file). I will also talk about our ongoing efforts on letting users easily work with data from different sources with different formats.
Yin Huai is a PhD student at The Ohio State University and he is advised by Xiaodong Zhang. His research focuses on improving the performance of data analytics systems. His research interests include distributed systems, database system, and storage systems. Currently, he is interning at Databricks.