Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
If you’re currently using Hive and thinking about what Spark might offer this talk is for you. At Bizo, we wrote all of our cluster computing jobs in Hive for several years. We think Hive is great, but it comes with some less than pleasant tradeoffs. A year ago we tried Spark; the ability to write normal Scala code has made unit testing and custom functions a breeze, but it’s also not perfect. This talk will cover how we transitioned from Hive to Spark, including our best practices and some painful lessons learned along the way so you don’t have to repeat the same mistakes.
Josh Carver is a software engineer at Bizo. He’s been using Spark for the past year and a half to build many of the processing jobs that power Bizo’s web analytics platform.