This talk will start by addressing what we need from a big data pipeline system in terms of performance, scalability, distribution, and concurrency as self-healing systems. From there I will step into why this particular set of technologies addresses these requirements and the features of each which support them. I will show how they actually work together, from the application layer to deployment across multiple data centers, in the framework of Lambda Architecture. Finally, I will show how to easily leverage and integrate everything from an application standpoint (in Scala) for fast, streaming computations in asynchronous event-driven environments.
Helena is a committer on several open source projects including Akka, the Spark Cassandra Connector and previously Spring Integration and Spring AMQP. She has been working with Scala since 2010 as a Senior Cloud Engineer, and is currently a Senior Software Engineer in Analytics at DataStax.