The Top Five Mistakes Made When Writing Streaming Applications

Slides PDF Video

So you know you want to write a streaming app, but any non-trivial streaming app developer would have to think about these questions:

– How do I manage offsets?
– How do I manage state?
– How do I make my Spark Streaming job resilient to failures? Can I avoid some failures?
– How do I gracefully shutdown my streaming job?
– How do I monitor and manage my streaming job (i.e. re-try logic)?
– How can I better manage the DAG in my streaming job?
– When do I use checkpointing, and for what? When should I not use checkpointing?
– Do I need a WAL when using a streaming data source? Why? When don’t I need one?

This session will share practices that no one talks about when you start writing your streaming app, but you’ll inevitably need to learn along the way.

Session hashtag: #SFdev5

Mark Grover, Software Engineer at Cloudera

About Mark

Mark is a software engineer working on Apache Spark at Cloudera. He is a co-author of Hadoop Application Architectures book and also wrote a section in Programming Hive book.
Mark is also a committer on Apache Bigtop and a committer and PMC member on Apache Sentry. He has contributed to a number of open source projects including Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume projects.
Mark is sought after speaker on topics related to Big Data at various national and international conferences. He occasionally blogs on topics related to technology on his blog.

Ted Malaska, Technical Group Architect at Blizzard, Inc.

About Ted

Ted is working on the team at Blizzard, helping support great titles like World of Warcraft, Overwatch, HearthStone, and much more. Previously, he was a Principal Solutions Architect at Cloudera, helping clients be successful with Hadoop and the Hadoop ecosystem. Previously, he was a Lead Architect at the Financial Industry Regulatory Authority (FINRA). He has also contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is also a co-author or O’Reilly “Hadoop Application Architectures” and a frequent speaker at many conferences, and a frequent blogger on data architectures.