Closing the Loop: Interactive Analysis and Visualization with Spark


One of Spark’s most compelling features is its capability for interactive analytics. Especially for complex data sets, exploration can become richer, faster, and more tactile by combining analytics with interactive visualization. This is particularly relevant in scientific exploration, where any given data set requires many views and many approaches. This talk describes a framework using Spark alongside the open-source visualization server Lightning to both process and visualize data interactively. Workflows can incorporate a variety of Spark libraries, such as Spark Streaming for visualizing streaming machine learning algorithms, and GraphX for displaying graph analyses. The results of user-interactivity within a visualization can immediately feed back into Spark analytics, including live during data streams. And it can all be driven by clients in either Python or Scala. Neuroscientists are using Spark and Lightning side-by-side to analyze large-scale recordings from mice and zebrafish brains, and the same combination promises utility in a wide

Photo of Matthew Conlen

About Matthew

Matthew Conlen is a software engineer and information designer in New York. He is a partner at the New York Data Company, and works as the senior developer for Rhizome and computational journalist at FiveThirtyEight. Matthew collaborates with researchers from HHMI Janelia on the open source Lightning data visualization server. He graduated from the University of Michigan with degrees in computer science and applied mathematics.