SESSION

Building Data Product Based on Apache Spark at Airbnb

Slides PDF Video

Building data product requires having Lambda Architecture to bridge the batch and streaming processing. AirStream is a framework built on top of Apache Spark to allow users to easily build data products at Airbnb. It proved Spark is impactful and useful in the production for mission-critical data products.

On the streaming side, hear how AirStream integrates multiple ecosystems with Spark Streaming, such as HBase, Elasticsearch, MySQL, DynamoDB, Memcache and Redis. On the batch side, learn how to apply the same computation logic in Spark over large data sets from Hive and S3. The speakers will also go through a few production use cases, and share several best practices on how to manage Spark jobs in production.

Session hashtag: #SFeco5

Jingwei Lu, Member of the Data Infrastructure Team at Airbnb

About Jingwei

Jingwei Lu is currently a member of the Data Infrastructure team at Airbnb. He was previously a tech-leader in Facebook data infrastucture team in charge of Bumblebee project (hive/hadoop replacement) query processing and language. Prior to Facebook he redesigned SCOPE(Microsoft equivalent of hive) runtime in Microsoft. Spent 10 years in Microsoft SQL Server engine team building commercial relational database engine.

Liyin Tang, Software Engineer at Airbnb

About Liyin

Liyin Tang is a software engineering on the Data Infrastructure team at Airbnb. Before Airbnb, he worked at Facebook and Dropbox. He focuses on building high available and reliable storage services and helping the services scale in the face of exponential data growth.

Mr Tang joined HBase PMC in 2013 and also contributed to other Apache projects including HDFS and Hive. Recently, he is building a streaming infrastructure to power realtime data products at Aribnb.

He holds a master’s degree in computer science from University of Southern California.