Incremental Processing on Large Analytical Datasets

Slides PDF Video

Prasanna Rajaperumal and Vinoth Chandar will explore a specific problem of ingesting petabytes of data in Uber and why they ended up building an analytical datastore from scratch using Spark. Prasanna will discuss design choices and implementation approaches in building Hoodie to provide near-real-time data ingestion and querying using Spark and HDFS.

Session hashtag: #SFexp4

Prasanna Rajaperumal, Senior Engineer at Uber

About Prasanna

Prasanna Rajaperumal is a senior engineer at Uber, working on building the next generation Uber Data infrastructure. At Uber, he has been building data systems that scale along with Uber’s hyper growth. Over the last 6 months, he has been focussing on building a library that ingests change logs into large HDFS datasets, optimized for analytical workloads.

Over the last 12 years, he has had various roles at small to large companies building data systems. Prior to Uber, he was a software engineer in Cloudera working on building out Data Infrastructure for indexing and visualizing customer log files.

Vinoth Chandar, Staff Software Engineer at Uber

About Vinoth

Vinoth is the founding engineer/architect of the data team at Uber, as well as author of many data processing & querying systems at Uber, including “Hoodie”. He has keen interest in unified architectures for data analytics and processing. Previously, Vinoth was the lead on Linkedin’s Voldemort key value store and has also worked on Oracle Database replication engine, HPC, and stream processing.