Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
Spark is well documented and easy to get up and running for both experimentation and data analysis – especially for people coming from the Hadoop world. For a group just starting out in the “big data” technology world however, there are some important items that need to be addressed when you first go live with your new toolkit.
In this talk, we highlight some of the challenges a group starting out with a Spark centered stack may experience along with how they can be dealt with. These include:
– Choosing, maintaining and possibly compiling the right combination of packages to work with Spark (Hadoop/Cassandra, Mesos/Yarn)
– Data serialization/deserialization – especially when working with some binary protocols
– Performance issues with small data
– Deployment/configuration automation
– Preparing for non-developer usage (plugging in the right libraries/third party packages)
Gary is currently leading development for MediaCrossing’s digital media trading and analytics platform. His passion lies in building software that is reliable, user-friendly and scales up and out over time.