Leveraging GPU-Accelerated Analytics on top of Apache Spark

Slides PDF Video

There has been growing interest in harnessing the parallelism of Graphics Processing Units (GPUs) to accelerate analytics workloads. GPUs have become the standard platform for many machine learning algorithms, particularly in the field of deep neural networks (DNNs), while making increasing inroads into more traditional domains such as analytics databases and visual analytics. However there is a strong need to couple these new platforms with Apache Spark, which has emerged as the de facto analytics platform for data scientists. In this talk we discuss how we built a connector from Spark to the open source GPU-powered MapD Analytics Platform, and the use cases such a connector enables around being able to pull high value data from Spark and cache it on the GPU for subsequent interactive visual analysis and machine learning. We will conclude with a brief demo of an end-to-end Spark-to-MapD pipeline.

About Todd

Todd is the CEO and Founder of MapD Technologies. Todd built the original prototype of MapD after tiring of the inability of conventional tools to allow for interactive exploration of big datasets while conducting his Harvard graduate research on the role of Twitter in the Arab Spring. He then joined MIT as a research fellow focusing on GPU databases before turning the MapD project into a startup.