Photo of

Puneet Kumar

Data Architect , PubMatic Inc.

Puneet Kumar is Data Architect at PubMatic Inc. and is responsible for schema design and data-pipes. Previously he was Lead Developer and ETL Architect at Amdocs.


Migrating Complex Data Aggregation from Hadoop to Spark

This talk discusses our experience of moving from Hadoop MR to Spark. Our initial implementation used a multiple stage aggregation framework within Hadoop MR to join, de-dupe, and group 12TB of incoming data every 3…