Graph-parallel algorithms such as PageRank operate on an entire graph at once. Efficient distributed implementations of these algorithms are important at scale. This session will introduce the two main abstractions for these types of algorithms: Pregel and PowerGraph.
Explore how GraphX combines the best of both abstractions and walk through multiple example algorithms. Note: Familiarity with Apache Spark and basic Graph concepts is expected.
Session hashtag: #SFds16
Dr. Andrew Ray is a Principal Data Engineer at Silicon Valley Data Science. He enjoys working at the intersection of engineering and data science. Andrew is an active contributor to the Apache Spark project. In his past life Andrew was a Data Scientist at Walmart, where he built an analytics platform on Hadoop that integrated data from multiple retail channels using fuzzy matching and graph algorithms. Andrew also led the adoption of Spark at Walmart from proof-of-concept to production. Andrew earned his Ph.D. in Mathematics from the University of Nebraska, where he worked on extremal graph theory.