Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren

Slides PDF Video

While systems like Apache Spark have moved beyond a simple map-reduce model, many data scientists and scientific users still struggle with complex cluster management and configuration tools when trying to do data processing in the cloud. Recently, cloud providers have offered infrastructure such as AWS Lambda to run event-driven, stateless functions as micro-services. In this model, a function is deployed once and is invoked repeatedly whenever new inputs arrive and elastically scales with input size. In this session, the speakers claim that microservices on serverless infrastructure present a viable platform for eliminating cluster management overhead and fulfilling the promise of elasticity in cloud computing for all users. Their key insight is that they can dynamically inject code into these stateless functions and, combined with remote storage, they can build a data processing system that inherits the elasticity of the serverless model while addressing the simplicity required by end users.

Using PyWren, their implementation on AWS Lambda, they show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently. Learn about a number of scientific and machine learning applications that they have built with PyWren, and how this model could be used to develop a serverless-Spark in the future.

Session hashtag: #SFr3

Eric Jonas, Postdoc in Computer Science at UC Berkeley

About Eric

Eric Jonas is currently a postdoc in computer science at UC Berkeley working with Ben Recht on machine learning for scientific data acquisition. He earned his PhD in Computational Neuroscience, M. Eng in Electrical Engineering, BS in Electrical Engineering and Computer Science, and BS in Neurobiology, all from MIT. Prior to his return to academia, he was founder and CEO of Prior Knowledge, a predictive database company which was acquired in 2012 by, where he was Chief Predictive Scientist until 2014. In 2015 he was named one of the top rising stars in bioengineering by DARPA.

Shivaram Venkataraman, PhD Candidate at UC Berkeley

About Shivaram

Shivaram Venkataraman is a PhD Candidate at the University of California, Berkeley and works with Mike Franklin and Ion Stoica. He is a committer on the Apache Spark project and his research interests are in designing systems for large scale machine-learning. Before coming to Berkeley, he completed his M.S at the University of Illinois, Urbana-Champaign and worked as a Software Engineer at Google.