Comcast provides personalized recommendations to its customers on the X1 Platform. Our initial implementation was built on the Hadoop map-reduce framework using a batch computation model. When we wanted to explore how we can offer real-time recommendations, we looked to Spark because of its increased computational efficiency and the ease to develop both streaming and batch processing solutions using the same code base. In this talk, we will be describing how we re-implemented our recommendation data pipeline using the Spark framework to support use cases where we need to integrate incoming streams of data in real-time with a latency of seconds. Specifically, at Comcast we are dealing with billions of machine generated events amounting to 100s of GB per day and to quickly compute the recommendations for users with low latency we needed a faster system than the batch oriented map-reduce framework. Spark allowed us to consume the events quickly taking advantage of the intermittent state of results due to the in-memory caching performed. As a result, we no longer had to rerun the complete pipeline every few hours which became unfeasible given that the number of events is increasing with time. In summary, our experience shows that Spark allows us to compute recommendation results much faster due to in-memory caching of Spark while also accelerating the development process significantly.
Jan Neumann manages the video content analysis group within Comcast Labs DC, but has also expanded his focus to work on novel algorithms and product prototypes in the areas of personalized media recommendations, large-scale machine learning and big data analysis, specifically focusing on applications to combine video content analysis with big data within Comcast.
Sridhar Alla currently works as the Big Data Architect at Comcast and has designed and delivered the backend for the personalization platform used by Comcast customers. He started his career in Network Appliance on NAS and caching technologies. He also served as the CTO of a security company eIQNetworks where he merged the concepts of Big Data and security products and holds several patents on the very large scale processing algorithms.