Cassandra and Spark: Optimizing for Data Locality

Slides PDF Video

There are only three things that are important in doing analytics on a distributed database: Locality, locality and locality. Learn how the Cassandra-Spark connector builds RDD’s and optimizes for interacting with local Cassandra machines. We’ll go in depth into how Cassandra stores data in a cluster and the steps the Open Source Connector uses for both reading and writing data to Cassandra. Discover the Cassandra specific RDD functions that allow you to take advantage of underlying Cassandra mechanisms and perform lightening fast analytics on the world’s most scalable OLTP database. You will learn to take advantage of these strategies in your applications and make sure that you are making the most of your cluster resources.

Photo of Russell Spitzer

About Russell

After earning his Ph.D in bioinformatics from UCSF, Russell Spitzer took his love of big data to DataStax. There he has worked on all aspects of integrating Cassandra with other Apache technologies like Spark, Hadoop and Solr as a test engineer. He currently is working as a developer helping to build better analytics tools for working with Apache Cassandra