SparkR is a new and evolving interface to Apache Spark. It offers a wide range of APIs and capabilities to Data Scientists and Statisticians. Being a distributed system with a JVM core some R users find SparkR errors unfamiliar. In this talk we will show what goes on under the hood when you interact with SparkR. We will look at SparkR architecture, performance bottlenecks and API semantics. Equipped with those, we will show how some common errors can be eliminated. I will use debugging examples based on our experience with real SparkR use cases. #SFds18
Hossein Falaki is a software engineer and data scientist at Databricks, working on the next big thing. Prior to that he was a data scientist at Apple’s personal assistant, Siri. He graduated with a Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing (CENS).