SESSION

Jaws - Data Warehouse with Spark SQL

Slides PDF Video

Today there are more and more companies having lots of structured data that needs to be verified, transformed, and analysed. This requires a data warehouse built for the purpose of large scale advanced analytics. This talk is about one contribution we have made to the spark ecosystem (Jaws), an open source data warehouse built on top of Spark SQL, warehouse that enables users to efficiently analyze the data helping to take business decisions. Focusing on performance, scalability and analytics, Jaws aims to deliver business value through the analysis of data. Jaws offers the possibility to submit queries concurrently and asynchronously on top of a managed Spark Sql context. One of the strengths of this data warehouse is the support for in memory processing using Tachyon. During this presentation, we will go through Jaws main features, we will speak about the architectural decisions we made for building this highly scalable and resilient data warehouse and also we will speak about fine tuning the Spark Sql context in order to obtain the best performance during the data analyzing.

Photo of Ema Orhian

About Ema

Passionate engineer at Atigeo , interested in scaling algorithms and implementing statistical models, I work on bringing big-data analytics in healthcare apps. Main commiter on github on a highly scalable and resilient restful interface on top of a managed Spark SQL session (https://github.com/Atigeo/jaws-spark-sql-rest). I am a co-founder of a big data research group, that focuses on the technical problems that exist in the big data ecosystem and provides open source solutions to them: http://bigdataresearch.io. Actively involved in organizing Big Data Meetups, speaker at local and international big data conferences and meetups.