Time series data is everywhere: IoT, sensor data, financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. However data is pointless without being able to process it in near real time or do batch analytics. That’s where Spark combined with Cassandra comes in, what was once just your storage system can be transformed into your analytics system, and you’ll be surprised how easy it is! So, join me for a whirl wind tour of how to use these two awesome open source projects for time series data. We’ll cover: + An overview of Cassandra – Why is it so good for time series? + A very brief introduction to Spark Streaming however I’ll assume the audience is Spark/Spark streaming literate + An Overview of the Spark-Cassandra connector Specific use case: processing weather data from thousands of weather stations
Christopher Batey (@chbatey) is a freelance Software Engineer/Architect/Trainer. His speciality is large scale operational systems and has worked on trading systems, online television services as well as building off the shelf software at IBM. Likes: Scala, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://christopher-batey.blogspot.co.uk/.