The Spark community has a lot of experience using Spark for offline batch analysis tasks coming from a broad range of use cases. But creating an interactive web application which aims for sub-second response times using Spark as the computation backend is still a somewhat unexplored territory. We at Lynx Analytics wandered into this territory when we built Kite, our big graph analysis tool. The tool enables users to interactively explore graphs of hundreds of millions of vertices and billions of edges. Exploration includes global and local views of the graph featuring visualization of attributes, connections and distributions. This talk is about the technical challenges – general and domain specific – we faced during building this software and about our solutions. We will talk about problems like scheduler delay, GC pauses, interoperability with other Akka based libraries and solutions like sorted RDDs, prefix sampling, and column based attribute representation.
Daniel Darabos has been building the graph analytics system described in the talk as a Software Engineer at Lynx Analytics R&D for the past year. Before that, Daniel worked on ads serving and machine learning as a Site Reliability Engineer at Google.