Spark makes possible just-in-time analytics – moving the data warehouse into the same environment that supports ETL and non-SQL analytics. This results in the benefits of elastic compute, schema-on-read, and Spark’s unified API for graph, streaming, and machine learning. However, even with this capability, challenges for interactivity, efficiency, and scalability remain. As just-in-time analytics becomes the norm, data scientists and engineers have had to take on the capacity planning, configuration, and performance tuning roles of the DBA as well.
The Algebraix Query Accelerator for Spark shims the existing Spark DataFrames and SQL APIs so that it can unobtrusively build a model of how the users’ queries relate to the data and to each other. The AQA uses this model to predict future query characteristics and deploy optimizations to speed up future queries. SQL queries and DataFrame programs are translated into SQL-DA, a data algebra representation, and stored together in a graph-like data structure called the algebraic cache, which serves as the core data structure in the model. An exemplary use case of materializing views into common expression patterns is discussed.
AQA for Spark is an artificially intelligent agent that helps data scientists and engineers focus on the analysis task by automating the performance tuning and resource management tasks of the DBA. The AQA functions as an inter-query optimizer that complements a traditional query optimizer (Catalyst) by creating additional speed-up, which is demonstrated with a benchmark analysis.
Session hashtag: #SFeco16
Kristian Alexander is the VP of Product Management at Algebraix Data, a software company specializing in database performance technologies using Data Algebra. Previously, Kristian worked in product innovation and software development roles and has co-founded multiple development businesses. He graduated from University of California with bachelor’s degrees in Chemistry and Political Science.
Wes Holler is Vice President of Engineering at Algebraix Data, where he works on creating the Algebraix Query Accelerator, an application for Spark that improves performance and reduces infrastructure costs. Wes previously co-architected SPARQL Server, a fast RDF store build on a data algebra foundation. Before Algebraix, he worked at Zebra Imaging, developing their autostereoscopic 3D display. Wes has a BS in Computer Science from St. Edward’s University.