Spark Summit 2013 brought the Apache Spark community together on December 2-3, 2013 at the Hotel Nikko in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
A variety of problems in digital marketing can be modeled as Markov decision processes and solved by dynamic programming with the goal of calculating the policy that maximizes the expected discounted reward. Algorithms, such as policy iteration, require a state transition and a reward model, which can be estimated based on a given data set. In this paper, we compare the execution times for estimating the transition function in a map-reduce fashion if the data set becomes large in terms of the number of records and features. Therefore, we create different-sized Spark and Hadoop clusters in the Amazon cloud computing environment.
The in-memory clustering system Spark is outperforming Hadoop and runs up to 71% faster. Furthermore, we study the execution times of policy iteration running on Spark clusters and show the execution time reduction gained by increasing the number of instances in the cluster.