SPARK SUMMIT 2015: DATA SCIENCE AND ENGINEERING AT SCALE

The Spark Summit is an event to bring the Apache Spark community together. Spark Summit 2015 ran from on Monday, June 15 through Wednesday, June 17, 2015 at The Hilton Union Square in San Francisco. Attendees heard from leading production users of Spark, SparkSQL, Spark Streaming and related projects; they were able to find out where project development is headed; and learn how to use the Spark stack in a variety of applications.

If you have questions, or would like information on sponsoring a Spark Summit, please contact organizers@spark-summit.org

Jump to:   Day 1   Day 2   Day 3

 

Day 1 • Monday, June 15 • Conference

7:00 AM

Registration

9:00 AM

Spark Community Update

Apache Spark continues to grow quickly, with new features including data frames, R support, and machine learning pipelines added in the past few releases. We’re also seeing fast diversification of the user community, with exciting… (35 minutes)
9:35 AM

Powering Data Science with Spark

Data professionals are finding out that there are many challenges in achieving scale and managing complexity in their data journey. At Databricks, we have a singular focus of making big data simple. Spark made a… (30 minutes)
10:05 AM

Break – Sponsored by MapR

10:35 AM

Spark & Hadoop at Production Scale

How are leading companies deploying Spark with Hadoop in production? What insights have they learned and what key considerations should you consider to put your Spark-based innovative app to work faster? Hear real-life customer examples… (10 minutes)
10:45 AM

Accelerating Innovation with Spark

Companies today want to infuse more intelligence into applications for faster and better decisions. With IBM’s commitment to Spark, we will push the boundaries of technology innovation in order to accelerate analytics in business. (10 minutes)
10:55 AM

Hadoop and Spark – Perfect Together

Unlocking a fully integrated Spark experience within your enterprise Hadoop environment that is manageable, secure and deployable anywhere. (10 minutes)
11:05 AM

A Tale of a Data-Driven Culture

It was the best of times, it was the worst of times. Every company aspires to build data-driven culture into their DNA, but where to start? In this talk, we’ll discuss various components to fostering… (15 minutes)
11:20 AM

Spark at NASA/JPL

11:40 AM

Software Above the Level of a Single Device: The Implications

It’s easy to talk about “the Internet of Things” and to miss the bigger pattern: we are no longer just building software for individual devices, but creating networks of intelligence and action that make it… (20 minutes)
12:00 PM

Lunch

Grand Ballroom A
Grand Ballroom B
Imperial Ballroom (Level 2)
DATA SCIENCE TRACK
DEVELOPER TRACK
APPLICATIONS TRACK
USE CASES TRACK
1:00 PM

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

(30 minutes)
2:30 PM

Break – Sponsored by Cask

Grand Ballroom A
Grand Ballroom B
Imperial Ballroom (Level 2)
DATA SCIENCE TRACK
DEVELOPER TRACK
USE CASES TRACK
APPLICATIONS TRACK
3:30 PM

Integrating Spark and Solr

(30 minutes)
4:00 PM

Healthcare Predictive Analytics within the OR

(15 minutes)
4:15 PM

Break – Sponsored by Platfora

Grand Ballroom A
Grand Ballroom B
Imperial Ballroom (Level 2)
DATA SCIENCE TRACK
DEVELOPER TRACK
APPLICATIONS TRACK
USE CASES TRACK
5:00 PM

Flyby: Improved Dense Matrix Multiplication

(30 minutes)
5:30 PM

Spark on Mesos – A Deep Dive

(30 minutes)
6:00 PM

Spark Committer Meetup

Join us on-site for a Bay Area Spark Users Meetup. More information here (2.5 hours)
7:00 PM

Attendee Networking Event @ 620 Jones Street

Join us on Monday night to connect with attendees, mingle with presenters, and share what you learned during the day all while enjoying some cocktails and appetizers on us. Head over to 620 Jones Street… (4 hours)

 

Day 2 • Tuesday, June 16 • Conference

7:00 AM

Registration

9:20 AM

Perspectives on Big Data & Analytics

The intelligence community is home to some of the most challenging and critical analytic issues of our time. In seeking to tackle these, one of our biggest areas of focus has been on tapping into… (20 minutes)
9:40 AM

Spark in the Hadoop Ecosystem

Apache Spark is one of the most exciting projects in the big data ecosystem today. The community, contributor base and collection of users are driving innovation internally, and exciting new use cases externally, around the… (10 minutes)
9:50 AM

Fireside Chat

Ben Lorica will lead a fireside chat with Ben Horowitz, co-founder of Andreessen Horowitz (15 minutes)
10:05 AM

Break – Sponsored by ActionML

10:35 AM

Data Driven - Toyota Customer 360 Insights on Apache Spark and MLlib

Producing highly accurate Predictive Models in Social Data Mining can be a challenge. Feature Engineering using traditional methodologies can only take you so far. Finding that needle in a haystack requires creative thinking, large time… (15 minutes)
10:50 AM

Field Notes from Expeditions in the Cloud

Matt Wood will discuss some observations, themes and stories gathered from providing Spark to customers in the cloud. We’ll discuss real world expeditions companies have made with Spark on the AWS cloud, along with some… (15 minutes)
11:05 AM

How Spark Fits into Baidu's Scale

Over the last decade Baidu has built a very large-scale distributed computing infrastructure that empowers all of their core businesses, ranging from search ads to mobile offerings, to serve 500+million users worldwide. In early last… (15 minutes)
11:20 AM

Accelerating Apache Spark-based Analytics on Intel Architecture

To find new trends and strong patterns from large complex data sets, a strong analytics foundation is needed. Intel is working closely with Databricks, AMPLab, Spark community and its ecosystem to advance these analytics capabilities… (10 minutes)
11:30 AM

Fireside Chat

Arsalan Tavakoli will lead a fireside chat with Sanjay Krishnamurthi from Informatica (10 minutes)
11:40 AM

Fireside Chat

Arsalan Tavakoli will lead a fireside chat with George Mathew of Alteryx (10 minutes)
11:50 AM

Fireside Chat

Arsalan Tavakoli will lead a fireside chat with Justin Langseth of Zoomdata (10 minutes)
12:00 PM

Lunch

Grand Ballroom A
Grand Ballroom B
Imperial Ballroom (Level 2)
DATA SCIENCE TRACK
DEVELOPER TRACK
BUSINESS TRACK
APPLICATIONS TRACK
USE CASES TRACK
1:00 PM

Better Visibility into Spark Execution for Faster Application Development

(30 minutes)
2:30 PM

Break – Sponsored by ActionML

Grand Ballroom A
Grand Ballroom B
Imperial Ballroom (Level 2)
DATA SCIENCE TRACK
DEVELOPER TRACK
USE CASES TRACK
3:30 PM

iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark

  • Joohyun Kim (MyFitnessPal, Under Armour–Connected Fitness)
(30 minutes)
4:15 PM

Break

Grand Ballroom A
Grand Ballroom B
Imperial Ballroom (Level 2)
DATA SCIENCE TRACK
DEVELOPER TRACK
USE CASES TRACK
4:30 PM

SparkR: The Past, the Present and the Future

(30 minutes)
5:30 PM

Reception Preperation

(30 minutes)
6:00 PM

Attendee Reception

8:00 PM

End of Day 2

 

Day 3 • Wednesday, June 17 • Conference

Grand Ballroom A
Grand Ballroom B
Imperial Ballroom (Level 2)
9:00 AM

TRAINING: Intro to Apache Spark

(3 hours)
12:00 PM

Lunch

Grand Ballroom A
Grand Ballroom B
Imperial Ballroom (Level 2)
1:00 PM

TRAINING CONTINUES: Intro to Apache Spark

(5 hours)
6:00 PM

End of Day 3