This session will start with a recap of what sparklyr is, and how it can be used to analyze, visualize and perform machine learning in Spark from R. We will walk through installation, configuration, data wrangling with SQL or dplyr, modeling in MLlib or H2O, and extending sparklyr by calling Scala functions from R or writing Scala modules accessible from R. You’ll then get a detailed update on new sparklyr features. After sparklyr 0.4 was released to CRAN last year, RStudio released 0.5, which implements new connections, features and architecture changes worth reviewing. We will wrap up with a discussion of uses cases relevant in the R ecosystem. The uses cases will demonstrate how to model data using popular frameworks in the R ecosystem that in seamless interactions between Spark and R using sparklyr.
Session hashtag: #SFdd8
Javier holds a double degree in Math and Software Engineer and decades of industry experience with a focus on data analysis. He currently works in RStudio and previously in Microsoft Research and SAP.