Apache Beam is an open source model and set of tools which help you create batch and streaming data-parallel processing pipelines. These pipelines can be written in Java or Python SDKs and run on one of the many Apache Beam pipeline runners, including the Apache Spark runner. This talk will provide an overview and demo of creating pipelines in Apache Beam and executing those pipelines on Apache Spark.
I love data because it surrounds us – everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That’s why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I’ve previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos.