This case study concerns moving large amounts of patent data from Cassandra to Solr. How we approached the problem, the introduction of Spark as a solution, and how to optimize the Spark job. I will cover:
* Understanding the parts of a Spark Job. Which components run where and common issues.
* Adding metrics to show where pain points are in your code.
* Comparing various methods in the API to achieve more performant code.
* How we saved time and made a repeatable process with Spark.
High performance drives Christopher Bradford. He has worked across various industries including the federal government, higher education, social news syndication, low latency HD video delivery and usability research. Mr. Bradford combines application engineering principles and systems administration experience to design and implement performant systems. He has architected applications and systems to create highly available, fault tolerant, services in a myriad environments.