Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming

Slides PDF Video

The Hadoop team at AutoTrader was tasked with moving the website’s core metric logic over from Netezza for hourly processing. Two solutions were proposed: one on Hive, and one on Spark. The Spark solution processed the results in 1.5 minutes, compared to 18 minutes for Hive, and the Spark solution is currently in production today. But a surprising benefit came in how much quicker development is with Spark, and the team finished the Spark solution with a month to spare. With the hourly Spark results already validated, the team copied the code into a Spark Streaming job, then used d3.js to visualize the results in real-time. In the three months since installing Spark on their cluster, AutoTrader went from an hourly Netezza process to a 30-second lag Spark Streaming visualization that delivers near-realtime insights into site activity.

Photo of Jon Gregg

About Jon

Jon Gregg is an Analytics Engineer working with the Hadoop team at AutoTrader. His background is in statistics, machine learning, and big data engineering.