Appraiser : How Airbnb Generates Complex Models in Spark for Demand Prediction

Slides PDF Video

Many open source machine learning frameworks exist, such as Spark’s MLLIB and the Hadoop based Mahout project. These frameworks are great for getting started with using ML in products, but because they are so generic they may lack certain production driven features. In this talk we will present the ML framework used to generate Appraiser and discuss some production driven concepts that inform the development of the framework such as: Configurable feature engineering Feature code is written once and configured using text files using a feature transformation pipeline Interactions between features are picked to make sense and thus we can scale boosting to many millions of bushy trees Debuggability Boosted random forests are hard to debug Product quantization enables engineers to rapidly debug models and check for data quality Production constraints Creating smooth models Enforcing monotonicity (e.g. demand should always decrease with increasing price)

Photo of Hector Yee

About Hector

Hector is a researcher / engineer in the field of personalized recommendations, machine learning and price prediction. He wrote a large chunk of the Emmy-award-winning personalized video recommendation engine at Youtube and the “Recommended for you (similar users)” part of the Google Play store. These recommendations are generated for many hundreds of millions of users and over millions of items and have increased watch time on YouTube and revenue on the Google Play store by a large percentage. At Airbnb, Hector works on the demand prediction model, image content analysis and machine learning ranking models in search.