SESSION

Applying SparkSQL to Big Spatio-Temporal Data Using GeoMesa

Slides PDF Video

GeoMesa is an open-source toolkit for processing and analyzing spatio-temporal data, such as IoT and sensor-produced observations, at scale. It provides a consistent API for querying and analyzing data on top of distributed databases (e.g. HBase, Accumulo, Bigtable, Cassandra) and messaging networks (e.g. Kafka) to handle batch analysis of historical archives of data and low-latency processing of data in-stream.

GeoMesa has deep integration with Spark SQL. It has added spatial types (e.g. Point, LineString, Polygons), spatial predicates (st_contains, st_intersects, etc.), and geometry processing functions (e.g. st_buffer, st_convexHull, etc.) to Spark SQL. It also optimizes the processing of these extensions by integrating with the Catalyst SQL optimizer to intercept SQL statements with spatial predicates and provision RDDs based on the underlying spatial index.

This session will describe the implementation of the GeoMesa Spark SQL integration, illustrate its application in production systems and demonstrate spatial aggregations and analytics using map-based visualizations.

Session hashtag: #SFeco17

Anthony Fox, Director of Data Science at CCRi

About Anthony

Anthony Fox is Director of Data Science and System Architecture at CCRi, a Virginia firm focused on advanced analytics. He is a founder of GeoMesa, an open source toolkit for analysis and processing of big spatio-temporal data. He has over 15 years of experience in developing distributed systems and 10 years of experience developing scalable analytics.