Everybody agrees that IoT is changing the world… and creates new challenges for software developers, architects and DevOps. How can we build efficient and highly scalable distributed applications using open-source technologies? What are characteristics of data generated by IoT devices and how it differs from traditional enterprise or Big Data problems? Which architectural patterns are beneficial for IoT use cases and why some trusted methods eventually turn out to be “anti-patterns”? This talk will show how to combine best-of-breed open-source technologies, like Apache Spark, Riak and Mesos to build scalable IoT pipelines to ingest, store and analyze huge amounts of data, while keeping operational complexity and costs under control. We will discuss cons and pros of using relational, NoSQL and object storage products for storing and archiving IoT data. Then we cover best practices how to use Spark with Riak NoSQL database. Will describe how Apache Spark advanced modules (Spark SQL, Spark Streaming and MLlib) can solve the problems common to IoT apps, while using Riak for fast and scalable persistence. At the end, will explain why Structured Spark Streaming is a godsend for IoT data and make a case for Time Series databases deserving a separate category in NoSQL classification.
Pavel is Director of Product Management with Basho, the company behind Riak, popular open-source NoSQL database. He is responsible for a new Basho product Riak TS (Time Series) and Riak integrations with Apache Spark, Mesos, Kafka and Redis. Pavel is particularly excited about IoT, Big Data, cloud and open source, not necessary in this order. Before joining Basho, Pavel was with Boundary, which has developed real-time SaaS monitoring solution and was acquired by BMC Corp. Prior to that, Pavel hold number of Product Management and Engineering roles, focusing on Big Data, Cloud, Networking and Analytics, and authored several patents.