Needle in the Haystack—User Behavior Anomaly Detection for Information Security

Slides PDF Video

Salesforce recently invented and deployed a real-time, scalable, terabyte data-level and low false positive personalized anomaly detection system. Anomaly detection on user in-app behavior at terabyte-data scale is extremely challenging because traditional techniques like clustering methods suffer serious production performance issues.

Salesforce’s method tackles the traditional challenges through three phases: 1) Leveraging Principal Component Analysis (PCA) to extract high-variance and low-variance feature subsets. The low-variance feature subset is valuable in cybersecurity because we want to determine if a user deviates from his or her stable behavior. The high-variance one is used for dimension reduction; 2) On each feature subset, they build a profile for each user to characterize the user’s baseline behavior and legitimate abnormal behavior; 3) During detection, for each incoming event, their method will compare it with the user’s profile and produce an anomaly score. The computation complexity of the detection module for each incoming event is constant.
st cloud computing platforms; the novelty of our user behavior profiling based anomaly detection technique and the challenges of implementing and deploying it with Apache Spark in production. We will also demonstrate how our system outperforms the other traditional machine learning algorithms.
Session Hashtag: #SFml5

Ping Yan, Research Scientist at

About Ping

Ping spent a decade innovating ways of making sense of data in various domains, from consumer behavior modeling to algorithmic security threat detection. Her works were published as journal articles, monographs and books. Ping holds a Ph.D. in Management Information System from the University of Arizona with a focus on Machine Learning and AI. She is currently a Research Scientist with the Salesforce Security Analytics team. Ping spoke at various Data Science and InfoSec conferences such as ICIS, WITS, CanSecWest 2013, OWASP AppSec 2015, and Spark Summit 2016.

Wei Deng, Data Scientist at Salesforce

About Wei

Wei is a data scientist at Salesforce building detection engines for cybersecurity. Previously he worked as applied researcher at Microsoft and Ebay for shipping machine learning products for a variety of business problems, including speech recognition, recommendation system and marketing campaign. Wei holds a Ph.D. in Applied Mathematics and M.S. degree in Computer Science at Michigan State University with a focus of machine learning and AI.