SESSION

Scaling Up: How Switching to Apache Spark Improved Performance, Realizability, and Reduced Cost on a Very Large Scale ML Application

Slides PDF Video

Krux, a Salesforce company, is a Data Management Platform (DMP) that helps its clients collect, manage, analyze and activate their people data. With a wide range of premium clients such as Kellogg, L’Oréal, Warner Brothers, New York Times, Washington Post, Uber, Spotify and many other household names, they see over 3.5 billion unique users globally a month, across sites, media, mobile app, transactional and offline traffic sources. That is more than Facebook, Wikipedia and Twitter combined.

Processing this scale of data volume and velocity has presented many challenges over the seven years Krux has existed, and they had to develop various proprietary strategies and technologies to overcome those. In this session, Salesforce will share how Apache Spark, in particular, helped transform the DMP’s data processing infrastructure, using as an example the evolution of their “Look-alike” algorithm.

Look-alike, a similarity-based classifier, is one of the most commonly used algorithms by marketers and publishers looking to extend their audience reach. Get a high-level introduction to the use case and algorithm, and learn about Salesforce’s experience in moving the implementation from Hadoop to Spark and how it increased the performance, reliability and serviceability of the product. You will also hear about some of the technical challenges they faced, including large scale joins with skewed data, and how they solved those in Spark.

Learn how Spark provides a wide range of high-level and low-level APIs that prove useful when implementing customized machine learning algorithms as compared with Hadoop, and how the overall abstraction makes it very easy to develop modular and easy to maintain code that is also performant.

Session hashtag: #SFeco3

Kexin Xie,  at Salesforce

About Kexin

At Salesforce, Kexin is responsible for research and design of the core distributed data processing and machine learning architecture for the Krux data management platform. Lead the Data Science Engineers in implementing their design, and own pushing to continuously improve on the various related operational aspects including performance, fault tolerance, scaling, automation, and costs.

Before Salesforce, Kexin worked for Krux, BigCommerce, NICTA, Brandscreen, Freelancer and Microsoft Research building software systems for large-scale machine learning, data mining, real-time bidding, intelligent marketing, anti-fraud and anti-money laundering. Kexin also holds a Ph.D. degree in computer science.

Yacov Salomon, VP Software Engineering at Salesforce

About Yacov

Dr Yacov Salomon has over 10 years experience working with large, real-world complex data sets. He has headed multiple data science teams in companies including Salesforce, Krux, Bigcommerce and Brandscreen. He was responsible for research, design and development of systems and applications for large-scale machine learning, data mining, real-time bidding, intelligent marketing, attribution and recommendations. Yacov holds a Ph.D. in applied mathematics and his academic research focused on the areas of probability and non-parametric statistics.