This talk will cover the tools we used, the hurdles we faced and the work arounds we developed with the help from Databricks support in our attempt to build a custom machine learning model and use it to predict the TV ratings for different networks and demographics.
The Apache Spark machine learning and dataframe APIs make it incredibly easy to produce a machine learning pipeline to solve an archetypal supervised learning problem. In our applications at Cadent, we face a challenge with high dimensional labels and relatively low dimensional features; at first pass such a problem is all but intractable but thanks to a large number of historical records and the tools available in Apache Spark, we were able to construct a multi-stage model capable of forecasting with sufficient accuracy to drive the business application.
Over the course of our work we have come across many tools that made our lives easier, and others that forced work around. In this talk we will review our custom multi-stage methodology, review the challenges we faced and walk through the key steps that made our project successful.
Michael has a PhD in Optimization and Decision Science from the University of Pennsylvania with a focus on constrained resource allocation problems. Michael leads the Data Science and Engineering initiatives at Cadent, a leading provider of media, advertising technology and data solutions for the pay-TV industry. He has also taught Convex Optimization at UPenn. He has been a practicing data driven business architect since 2005, working on various subcontracts during his undergraduate and graduate work.
PhD in Computer Science – Multiprocessor OS, Data Engineer at Cadent Network,
Application Developer at QVC, SQL Developer at CCP, Senior Software Analyst at EXE Technologies,
IT Consultant at UNISYS, Assistant Professor at Bulgarian academy of Sciences