Spark Summit 2013 brought the Apache Spark community together on December 2-3, 2013 at the Hotel Nikko in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
Beyond Word Count – Productionalizing Spark Streaming
Ryan Weald, Sharethrough
At Sharethrough we have deployed Spark to our production environment to support several user facing product features. While building these features we uncovered a consistent set of challenges across multiple streaming jobs. By addressing these challenges you can speed up development of future streaming jobs. In this talk we will discuss the 3 major challenges we encountered while developing production streaming jobs and how we overcame them.
First we will look at how to write jobs to ensure fault tolerance since streaming jobs need to run 24/7 even under failure conditions. Second we will look at the programming abstractions we created using functional programming and existing libraries. Finally we will look at the way we test all the pieces of a job –from manipulating data through writing to external databases– to give us confidence in our code before we deploy to production
Apache Spark, Spark, the Spark logo, and Apache are trademarks of the Apache Software Foundation and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse the materials provided at this event. Apache Spark is an effort undergoing incubation at The Apache Software Foundation (ASF). For more details about incubation, see the "Apache Incubator Notice" on the Spark Homepage.
IBM and the IBM logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.