From R Script to Production Using rsparkling

Slides PDF Video

The rsparkling R package is an extension package for sparklyr (an R interface for Apache Spark) that creates an R front-end for the Sparkling Water Spark package from H2O. This provides an interface to H2O’s high performance, distributed machine learning algorithms on Spark, using R. The main purpose of this package is to provide a connector between sparklyr and H2O’s machine learning algorithms.

In this session, Gill will introduce the basic architectures of rsparkling, H2O Sparkling Water and sparklyr, and go over how these frameworks work together to build a cohesive machine learning framework. In addition, you’ll learn about various implementations for using rsparkling in production. The session will conclude with a live demo of rsparkling that will display an end-to-end use case of data ingestion, munging and machine learning.

Session hashtag: #SFdev15

Navdeep Gill, Hacker Scientist at

About Navdeep

Navdeep is a Hacker Scientist at He graduated from California State University, East Bay with a M.S. degree in Computational Statistics, B.S. in Statistics, and a B.A. in Psychology (minor in Mathematics). During his education he gained interests in machine learning, time series analysis, statistical computing, data mining, & data visualization.

Previous to he worked at a couple start ups and Cisco Systems, Inc. focusing on data science, software development, and marketing research. Before that, he was a consultant at FICO working with small to mid level banks in the U.S. & South America focusing on risk management.