Spark Summit 2013 brought the Apache Spark community together on December 2-3, 2013 at the Hotel Nikko in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
The behavior of an animal reflects computations across its entire nervous system, involving the coordinated activity of thousands of neurons within and across multiple brain areas. New technologies for imaging the nervous system allow us to monitor neural function at unprecedented scales. But the data sets are quickly outpacing the capabilities of ordinary analytical approaches. They are large (one terabyte or more per hour), complex, and high-dimensional, and we want to understand their structure as it evolves over both space and time.
The Spark cluster computing system is ideally suited to this problem because it enables both interactive exploratory analysis of large data and the use of algorithms that require iterative computations. We have used Spark to build a library for neural data analysis and interactive visualization. We apply these analyses to data from zebrafish and mice, which perform complex sensory-motor tasks while we monitor their neural responses using microscopy and genetically-encoded calcium indicators.
Our analyses, including regression and dimensionality reduction, yield computational maps of the brain that describe how neural activity relates to both external factors — the stimulus or the behavioral state — and intrinsic low-dimensional structure.
We are also using Spark Streaming, alongside visualization in D3.js, for real-time analysis, in which patterns of neural activity are extracted online during an experiment and used to guide targeted perturbations of neurons and assess the behavioral consequences. These approaches, enabled by Spark, show how large-scale data analysis can uncover behaviorally-relevant neural computation at the whole brain level.