While GraphX provides nice abstractions and dataflow optimizations for parallel graph processing on top of Spark, there are still many challenges in applying it to an Internet-scale, production setting (e.g., graph algorithms and underlying frameworks optimized for billions of graph edges and 1000s of iterations). In this talk, we will present our efforts in building real-world, large-scale graph analysis applications using GraphX for some of the largest organizations/websites in the world, including both algorithm level and framework level optimizations (e.g., minimizing graph state replications, optimizing long RDD lineages, etc.)
Jason (Jinquan) Dai is currently the Chief Architect of Big Data Technologies at Intel. Prior to that, he was a Principle Architect in Microsoft, responsible for building large-scale Cloud and Big Data platform that powers some of the largest Internet services in the company. Before joining Microsoft, he was an Engineering Director and Principal Engineer in Intel, responsible for advanced research and development of Big Data platforms in Intel.