Identifying graph isomorphisms is one of the most powerful graph techniques, and has a wide variety of applications. In this presentation, you’ll see how to find simple graph isomorphisms in GraphX, and how the exciting new GraphFrames from AMPlab — intended for inclusion in Spark 2.x — allows the use of SQL and a subset of Cypher (the query language from Neo4j) to find more complex graph isomorphisms. Applications covered include finding missing data from Wikipedia (using the YAGO3 data set), which is a form of graph mining, and fraud detection. Also covered will be, due to its newness, a brief overview of GraphFrames, its performance over GraphX due to Catalyst and Tungsten, and how to use it to query graphs using SQL and the Cypher subset.
Michael Malak is the lead author of Spark GraphX In Action and has been developing Spark solutions at two Fortune 200 companies since early 2013. He has been programming computers since before they could be bought pre-assembled in stores.