Organizations from small startups to large enterprises are rapidly adopting Apache Spark on Amazon EMR in Amazon Web Services (AWS) to run streaming analytics, data science, machine learning, and batch processing workloads. These customers can quickly create big data architectures within minutes, and decouple compute and storage with Amazon S3 as a highly scalable, durable, and secure data lake, lower costs using Amazon EC2 Spot Instances and Auto Scaling, and utilize a wide range of encryption and access control features. In this session, we discuss how customers are using Spark on AWS and common architectures for easily running performant Spark clusters at scale and low cost with Amazon EMR.
Jonathan Fritz leads product management for Amazon EMR, a managed service in AWS that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data using the Apache Hadoop and Apache Spark ecosystem. He holds an MBA from the Stanford Graduate School of Business and a bachelor’s degree in chemistry with minor in biology from Washington University in St. Louis. He received a certificate for accomplishment in entrepreneurship from the Skandalaris Center for Entrepreneurial Studies.