SESSION

Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark

Slides PDF Video

HP ships millions of PCs, printers and other devices every year to customers in all market segments. Many of these systems have had various generations of data collection and reporting, going
back as many as 16 years. That has led to a significant sprawl of custom data formats, specialized code and numerous brittle legacy systems collecting, analyzing and reporting data.

This session will focus on samples of HP’s journey to find, catalog and ultimately eliminate these systems by migrating to Apache Spark with Databricks in the cloud. Hear about HP’s challenges dealing with legacy systems (some even located under engineers desks) and how the power of AWS, Spark, and visualization tools has significantly simplified their migrations. You’ll also learn how the success of this endeavor is not just in counting the number of systems deprecated, but also how the process is evolving into building companywide shared Spark libraries, notebooks and web services that are accelerating future migrations and analysis using Spark.

Session hashtag: #SFent3

John Cavanaugh, Master Architect/Strategist at HP

About John

John is a Master Architect/Strategist in the HP Platforms & Future Technology Group. He is based in San Diego, California and leads a Data Engineering group focused on creating platform solutions for Print analytics. He has been focused of late on migrating numerous legacy systems to HP’s new Spark with Databricks. John’s background has been a mix of both management & technical leadership but has found his sweet spot in driving business value through data. He started his career at HP after completing his MSEE & MBA from Purdue and has worked in several different business groups.