Spark Summit 2013 brought the Apache Spark community together on December 2-3, 2013 at the Hotel Nikko in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
Building a working data processing stack using open source or commercial components is a challenging and highly complex task.
Multiple often conflicting dependencies, many development teams with different release trains might dictate a substantial coordination effort. A constant flow of new features, bug fixes, and other changes are almost a disaster in making when it comes to the regression and quality control at any stage between development and production environment. Businesses with internal development teams are facing the issues with integration points of their deliverables into the bigger, company wide data platform software.
The problem is exaggerated by exponential growth of the standard libraries and transient dependencies. Oftentimes, it is next to impossible to create a well controlled and reproducible system environment in all stages of the platform life-cycle: from development, to validation, to production deployment and configuration management.
This session will demonstrate the practical answer to these and other problems using 100% open-source stack. The author will talk about real-life experience and challenges of introducing Spark and Shark components into commercially supported data analytics system on top of Hadoop.