Software Engineer, Databricks

Sameer Agarwal is a Software Engineer at Databricks working on Spark core and Spark SQL. Previously, he received his PhD in Databases from UC Berkeley AMPLab where he worked on BlinkDB, an approximate query engine for Spark.


Exceptions are the Norm: Dealing with Bad Actors in ETL

Stable and robust data pipelines are a critical component of the data infrastructure of enterprises. Most commonly, data pipelines ingest messy data sources with incorrect, incomplete or inconsistent records and produce curated and/or summarized data

SparkSQL: A Compiler from Queries to RDDs

SparkSQL, a module for processing structured data in Spark, is one of the fastest SQL on Hadoop systems in the world. This talk will dive into the technical details of SparkSQL spanning the entire lifecycle