ABSTRACT: Data Analytics platforms can be broken down into three distinct parts: Acquisition, Computation and Serving — each part ideally having its own low-friction, elastic scalability. Goldman Sachs has invested its core competency in data analytics over the years to build powerful data computation frameworks into its intra-company platform. Returns on these investments have compounded to produce an integrated “it just works” suite of tools that has enabled financial professionals to focus on solving real business problems without getting bogged down by data infrastructure.
Apache Spark is emerging as the compelling “lingua franca” API and platform for scalable, Big Data computation. In this talk we’ll discuss why we are looking to embrace Spark’s Scala API in our platform, how we’re thinking about integrating Spark into data curation pipelines, why we’re looking to become active Apache Spark open-source contributors and finally the significant deployment challenges and opportunities that lie ahead particularly as managed cloud data offerings become viable.
BIO: Matt Glickman is a Managing Director at Goldman Sachs currently focused on the Data Platform for its Asset Management business. Matt has spent the majority of his 20+ year career building, managing and evangelizing analytics platforms and products that touch every aspect of the Goldman franchise.