The Little Warehouse That Couldn’t, Or: How We Learned to Stop Worrying and Move to Spark

Slides PDF Video

Shopify’s commerce platform now powers over 150K merchants and continues to grow. The huge volume and variety of our data pushed our homegrown reporting and warehousing system to the edge. As the maintenance and performance costs became too much we moved to HDFS and Spark, using both python and scala to transform our vast amounts of operational data into dimensional models to provide better insight. Find out the lessons we’ve learned moving our entire organization onto fully conformed facts and dimensions, the clusters we’ve cratered, the walls we’ve hit, and what we did to overcome them to build our Starscream framework.

Photo of Yandu Oppacher

About Yandu

Yandu spent seven years working on the relational engine of IBM’s Cognos BI and reporting tool, working with dozens of vendors and helping to rebuild their 15 year old engine. For the past year and a half he’s been working on Shopify’s data engineering team to help build a new framework for dimensionally modeling Shopify’s data.