Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applications

Slides PDF Video

Uber is running businesses in 279 cities across 53 countries. In US, we covers 64% of the population. Every month, we created 50,000 driver jobs. Every day, we facilitates 1+ million trips globally. Uber has over ten engineering, business and data scientists teams which run all kinds of different data applications with all sorts of different performance characteristics and SLA. In this talk, we will present how we design and build this multi-tenancy architecture by Spark as a Platform while we scale our cluster size rapidly to match our business growth. We will discuss how we are leveraging Spark, SparkSQL, Spark Streaming, MLlib and IPython Notebook with Spark to build data applications efficiently in this shared environment. We will show why Spark make it a lot easier and we will also share our lessons from this on-going journey.

Photo of Kelvin Chu

About Kelvin

Kelvin is a founding member of the Data Platform team at Uber. He is creating services and tools on top of Spark to support multi-tenancy and large scale data applications. At Ooyala, he was co-creator of Spark Job Server which was an open source RESTful server for submitting, running, and managing Spark jobs, jars and contexts. He implemented real-time video analytics engines on top of it by datacube materializations via RDD. Before Ooyala, Kelvin was a startup engineer at Jobvite creating enterprise SaaS business. Kelvin holds a Master’s degree in Computer Science from the University of Maryland at College Park.