San Francisco
June 30 - July 2, 2014


Spark Summit 2015e
Spark Streaming – The State of the Union and the Road Beyond
Tathagata Das (Databricks)

Spark Streaming extends the core Apache Spark API to perform large-scale stream processing, which is revolutionizing the way Big “Streaming” Data application are being written. It is rapidly adopted by companies spread across various business verticals – ad and social network monitoring, real-time analysis of machine data, fraud and anomaly detections, etc. These companies are mainly adopting Spark Streaming because – Its simple, declarative batch-like API makes large-scale stream processing accessible to non-scientists. – Its unified API and a single processing engine (i.e. Spark core engine) allows a single cluster and a single set of operational processes to cover the full spectrum of uses cases – batch, interactive and stream processing. – Its stronger, exactly-once semantics makes it easier to express and debug complex business logic. In this talk, I am going to elaborate on such adoption stories, highlighting interesting use cases of Spark Streaming in the wild. In addition, I am also going to talk about (and perhaps also demonstrate) the exciting new developments in Spark Streaming and the wish list of features that we may target in the future.

Tathagata Das is a Apache Spark Committer and a member of the PMC. He is the lead developer of behind Spark Streaming, and currently employed at Databricks. Earlier, he has spent in the AMPLab of UC Berkeley, research about datacenter frameworks and networks with professors Scott Shenker and Ion Stoica.