How Spark enables the Internet of Things : Efficient integration of multiple Spark components for Smart City use cases

Slides PDF Video

COSMOS is a platform for developing IoT applications focusing on smart city use cases, ranging from intelligent transportation systems to smart energy management. A central challenge is to analyze large historical datasets from heterogeneous IoT devices and provide near real-time solutions. COSMOS meets this challenge using a generic integration of multiple Spark libraries and other open source components. Spark MLlib algorithms such as clustering and regression are utilized to gain insight from the data and provide intelligent, proactive solutions. The ML analysis accesses historical data via Spark SQL – this data is continuously collected, annotated with metadata and stored in an OpenStack Swift archive in Parquet format. Data access is optimized using the Spark SQL Data Frame APIs which support projection and selection pushdown, and we allow more selective pushdown than partitioning based approaches by implementing metadata search for Swift and using it for selection pushdown. The generic nature and wide applicability of our component integration patterns is demonstrated by two IoT use-case scenarios. The first involves the Madrid transportation system, where traffic data from over 3000 fixed monitoring locations is available. We analyze the historical data using k-means clustering and provide parameters for a CEP engine to infer complex events such as congestion or bad traffic in near real-time. We also provide a proactive approach for intelligent traffic management by predicting traffic parameters using regression mechanisms. The same integration approach is demonstrated on a second use-case scenario for smart energy management which infers office occupancy state from electricity consumption.

Photo of Paula Ta-Shma

About Paula

Dr. Paula Ta-Shma a Research Staff Member in the IBM Cloud Security & Analytics group. She holds M.Sc. and PhD degrees in computer science from the Hebrew University of Jerusalem. She is currently working on cloud storage infrastructure for the Internet of Things, and leads the IBM efforts in the COSMOS EU funded project. Previously she led research projects in IBM such as Continuous Data Protection. Dr. Ta-Shma also has expertise in database management systems, and prior to working at IBM she worked at several companies in this capacity, such as Informix Software Inc.