Under reasonable circumstances, how much can you expect to lose? The financial statistic Value at Risk (VaR) seeks to answer this question. Since its development on Wall Street soon after the stock market crash of 1987, VaR has been widely adopted across the financial services industry. Some organizations report the statistic to satisfy regulations, some use it to better understand the risk characteristics of large portfolios, and others compute it before executing trades to help make informed and immediate decisions. Estimating VaR can be computationally intensive. As a flexible processing framework with the ability to both scale up to large amounts of data and leverage vast compute resources, Apache Spark is a compelling platform for undertaking financial risk calculations. At Cloudera, we’ve assisted several organizations in using Spark to compute VaR and other financial statistics. In this talk, we’ll walk through a basic VaR calculation with Spark. The calculation employs the widely used Monte Carlo method, which is useful for modeling portfolios with non-normal distributions of returns. It simulates thousands or millions of random market scenarios and uses a model to predict the response of the portfolio to each scenario. The talk, which will cover Spark design patterns in time series analysis, visualizing data, and Monte Carlo simulation, aims to give a feel for what it is like to approach financial modeling with Spark.
Sandy is a data scientist at Cloudera focusing on Apache Spark and its ecosystem, and an author of the upcoming O’Reilly publication Advanced Analytics with Spark. He is a frequent Spark contributor and member of the Apache Hadoop project management committee He graduated Phi Beta Kappa from Brown University.