Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
R is a widely used statistical programming language but its interactive use is typically limited to a single machine. To enable large scale data analysis from R, we will present SparkR, an open source R package developed at UC Berkeley, that allows data scientists to analyze large data sets and interactively run jobs on them from the R shell. This talk will introduce SparkR, discuss some of its features and highlight the power of combining R’s interactive console and extension packages with Spark’s distributed run-time.
Shivaram Venkataraman is a third year PhD student at the University of California, Berkeley and works with Mike Franklin and Ion Stoica at the AMP Lab. He is a committer on the Apache Spark project and his research interests are in designing frameworks for large scale machine-learning algorithms. Before coming to Berkeley, he completed his M.S at the University of Illinois, Urbana-Champaign and worked as a Software Engineer at Google.
Zongheng is an undergraduate student at UC Berkeley studying computer science and math. He is also a research assistant at AMPLab; previously he worked on SparkR.