Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
Spark’s support for efficient execution and rapid interactive prototyping enable novel approaches to understanding data-rich domains that have historically been underserved by analytical techniques. One such field is endurance sports, where athletes are faced with GPS and elevation traces as well as samples from heart rate, cadence, temperature, and wattage sensors. These data streams can be somewhat comprehensible at any given moment, when looking at a small window of samples on one’s watch or cycle computer, but are overwhelming in the aggregate.
In this talk, I’ll present my recent efforts using Spark and MLLib to mine my personal cycling training data for deeper insights and help me design workouts to meet particular fitness goals. This work incorporates analysis of geographic and time-series data, computational geometry, visualization, and domain knowledge of exercise physiology. I’ll show how Spark made this work possible, demonstrate some novel techniques for analyzing fitness data, and discuss how these approaches could be applied to make sense of data from an entire community of cyclists.
William Benton is a Senior Software Engineer at Red Hat, where he works in the Office of the CTO on distributed computing technologies; his recent efforts include integrating Apache Spark into the Fedora ecosystem. His professional expertise includes research and development in the areas of static program analysis, managed language runtimes, logic databases, cluster management, and music technology. Benton holds a PhD in computer sciences from the University of Wisconsin.