Integrating Spark and Solr

Slides PDF Video

As more organizations seek to leverage Spark for big data analytics and machine learning, the need for seamless integration between Spark and Solr emerges. In this presentation, Timothy Potter covers how to populate Solr from a Spark streaming job as well as how to expose the results of any Solr query as an RDD. Attendees will come away with a solid understanding of common use cases, access to open source code, and performance metrics to help them develop their own large-scale search and discovery solution with Spark and Solr.

Photo of Timothy Potter

About Timothy

Timothy Potter is a senior member of the engineering team at Lucidworks and a committer on the Apache Solr project. Tim focuses on scalability and hardening the distributed features in Solr. Previously, Tim was an architect on the Big Data team at Dachis Group, where he worked on large-scale machine learning, text mining, and social network analysis problems using Hadoop, Cassandra, and Storm. Tim is the co-author of Solr In Action, a comprehensive guide to using Solr 4. He lives with his two Shiba Inus in the mountains outside Denver, CO.