Yelp’s ad platform handles millions of ad requests everyday. To generate ad metrics and analytics in real-time, they built they ad event tracking and analyzing pipeline on top of Spark Streaming. It allows Yelp to manage large number of active ad campaigns and greatly reduce over-delivery. It also enables them to share ad metrics with advertisers in a more timely fashion.
This session will start with an overview of the entire pipeline and then focus on two specific challenges in the event consolidation part of the pipeline that Yelp had to solve. The first challenge will be about joining multiple data sources together to generate a single stream of ad events that feeds into various downstream systems. That involves solving several problems that are unique to real-time applications, such as windowed processing and handling of event delays. The second challenge covered is with regards to state management across code deployments and application restarts. Throughout the session, the speakers will share best practices for the design and development of large-scale Spark Streaming pipelines for production environments.
Session hashtag: #SFexp7
Amit is currently a software engineer at Yelp in the local ads space. He works on architecting and building pipelines for realtime processing and analytics. Prior to Yelp he was at Amazon working within the big data ecosystem. Over the years he has worked in various areas including medical imaging, robotics, machine learning and embedded systems. In his free time he likes to delve into abstract math, armchair physics and hand-wavy philosophy. He lives with his wife and two kids in California.
Yifan is an early member of Yelp’s ads team, responsible for the core advertising system representing the bulk of Yelp’s revenues. Over the years, he has developed systems enabling auction-based CPC advertising, mobile app advertising, auto-bidding, real-time ad event tracking and various other advertising features in Yelp.