GraphFrames: Graph Queries In Spark SQL

Slides PDF Video

Graph analysis is important in domains including commerce, social networks, and medicine. Graph analysis comes in two forms: pattern matching to find subgraphs of interest, and graph algorithms such as PageRank and triangle counting. GraphX and similar systems have made it possible to run graph algorithms within relational systems like Spark, but until recently, pattern queries required moving data manually to a specialized graph database. GraphFrames is a new effort to integrate pattern matching and graph algorithms with Spark SQL, simplifying the graph analytics pipeline and enabling optimizations across graph and relational queries. A key component of GraphFrames is our graph-aware query planner, which can speed up queries by an order of magnitude. We will describe the GraphFrame API, its query planning algorithm, and the latest performance results.

Ankur Dave, Graduate Student at AMP Lab UC Berkeley

About Ankur

Ankur is a third-year PhD student advised by Ion Stoica in the UC Berkeley AMPLab. He’s a Spark committer and a maintainer for GraphX.