A digital attribution model determines how credit for online conversion is assigned to media touch points in the conversion path. It helps marketers understand the effectiveness of media touches and often serve as a foundation for online media optimization. Based on Spark, we built an attribution system with three modules: model training, attribution, and insight generation. The attribution model obtains its basic form from logistic regression, but with two types of parameters that address the time decay effects and attribution weights respectively. And therefore the logistic regression module in MLlib cannot fit this model directly. We developed a new modeling algorithm in Spark to address this problem. Statistical modeling and text processing techniques such as survival analysis, causal modeling, and tokenization are employed in the system. Takeaways: 1. Spark is the right choice for building large-scale attribution systems 2. It is possible to customize MLlib algorithms for special business needs.
Yunzhu (Anny) Chen is a data scientist in the data scientist team in Adobe. She’s interested in applying statistical and machine learning models to real business problems. She is currently working on digital attribution modeling for customer conversion data. Prior to joining Adobe, she was a data scientist in Verizon, where she worked on big data solutions for anomaly detection. She received her MS in statistics from Stanford University in 2013 and her BS in Probability and Statistics from Peking University in 2011. She’s passionate about statistical application to real datasets and big data technology.
Zhenyu (William) Yan is a Sr Manager of Data Science with Adobe, where he leads a team to develop statistical modeling and machine learning algorithms that address challenging business problems in digital marketing. Prior to joining Adobe, William was a Lead Scientist with FICO, where he developed innovative algorithms for a variety of business applications, ranging from direct marketing to credit scoring. William was initially trained as a software engineer and then became a statistician as well as an operations researcher after completing his PhD in Systems Engineering. But now he call himself a data scientist.