Jump to:   Training Day   Developer Day   Enterprise Day   Live Stream

 

View the Schedule on your mobile device!

Get it on Google Play

 

Day 1 • Monday, June 5 • Training Day

7:00 AM

Registration

9:00 AM
12:00 PM

Lunch

1:00 PM
6:00 PM

Meetup

Join us for an evening Bay Area Apache Spark Meetup at the 10th Spark Summit featuring tech-talks about using Apache Spark at scale from Pepperdata’s CTO Sean Suchter, RISELab’s Dan Crankshaw, and Databricks’ Spark committers… Read more

 

Day 2 • Tuesday, June 6 • Developer Day

7:00 AM

Registration

9:05 AM

Expanding Apache Spark Use Cases in 2.2 and Beyond

2017 continues to be an exciting year for big data and Apache Spark. I will talk about two major initiatives that Databricks has been building: Structured Streaming, the new high-level API for stream processing, and… Read more
9:45 AM

Snorkel: Dark Data and Machine Learning

Building applications that can read and analyze a wide variety of data may change the way we do science and make business decisions. However, building such applications is challenging: real world data is expressed in… Read more
10:00 AM

Unleashing Data Intelligence with Intel and Apache Spark

Organizations are developing deep learning applications to derive new insights, identify new opportunities and uncover new efficiencies. However, deep learning application development often means tapping into multiple frameworks, libraries, and clusters—a complex, time-consuming, and costly… Read more
10:10 AM
10:30 AM

Break

11:00 AM
Research

Scaling Genetic Data Analysis with Apache Spark

Developer

A Deep Dive into Spark SQL's Catalyst Optimizer

Enterprise

Spark Compute as a Service at Paypal

Streaming

SSR: Structured Streaming on R for Machine Learning

Machine Learning

Challenging Web-Scale Graph Analytics with Apache Spark

11:40 AM
Research

Lazy Join Optimizations Without Upfront Statistics

Spark Ecosystem

Apache Kylin: Speed Up Cubing with Apache Spark

Spark Experience and Use Cases

Incremental Processing on Large Analytical Datasets

Streaming

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling

Machine Learning

Needle in the Haystack—User Behavior Anomaly Detection for Information Security

12:20 PM
Spark Ecosystem

Building a Unified Data Pipeline with Apache Spark and XGBoost

Developer

Hive Bucketing in Apache Spark

Spark Experience and Use Cases

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2.x

Technical Deep Dives

Ray: A Cluster Computing Engine for Reinforcement Learning Applications

Enterprise

How Apache Spark and AI Powers UberEats

Streaming

The Top Five Mistakes Made When Writing Streaming Applications

Machine Learning

Random Walks on Large Scale Graphs with Apache Spark

12:50 PM

Lunch

BoF Discussion-Deep Learning on Apache Spark

There are increasing interest and applications for running deep learning on Apache Spark platform (e.g., BigDL, TensorFrames, Caffe/TensorFlow-on-Spark, etc.) in the community. In this BoF discussion, we would like to cover related topics such as…

BoF Discussion-Apache Spark on Kubernetes

Come learn about the community development project to add a native Kubernetes scheduling back-end to Apache Spark! Meet contributors and network with community members interested in running Spark on Kubernetes. Learn how to run Spark…

BoF Discussion-Scaling Spark to long-running and large workloads

Apache Spark usage is growing in both industry and academia as a performant and reliable framework for processing long running and/or large amounts of data. Our discussion will center around the challenges of pushing the…

BoF Discussion-A roadmap for extending Apache Spark

Best-suited for current Spark contributors and Spark package creators, the conversation will focus on how the open-source community can help Spark grow outside of the Apache project, which has strict criteria about what is in…
2:00 PM
Spark Ecosystem

Building Data Product Based on Apache Spark at Airbnb

Spark Experience and Use Cases

Building a Versatile Analytics Pipeline on Top of Apache Spark

Technical Deep Dives

Cost-Based Optimizer in Apache Spark 2.2

2:40 PM
Research

Apache Spark on Supercomputers: A Tale of the Storage Hierarchy

Spark Ecosystem

Extending the R API for Spark with sparklyr and Microsoft R Server

Spark Experience and Use Cases

Best Practices for Using Alluxio with Apache Spark

Technical Deep Dives

Cost-Based Optimizer in Apache Spark 2.2 (continues)

Sponsored Sessions

Introducing Exactly Once Semantics in Apache Kafka

Sponsored Sessions

Make Spark Support 1 Trillion Dimensions Logistic Regression

Machine Learning

Fuzzy Matching on Apache Spark

3:20 PM
Spark Ecosystem

Apache Spark on Kubernetes

Developer

Tricks of the Trade to be an Apache Spark Rock Star

Spark Experience and Use Cases

Experiences Migrating Hive Workload to SparkSQL

Spark Ecosystem

Building Operational Data Lake using Spark and SequoiaDB

Sponsored Sessions

Structured Streaming for Columnar Data Warehouses

Sponsored Sessions

Analytics at Scale with Apache Spark on AWS

Enterprise

Transforming B2B Sales with Spark-Powered Sales Intelligence

Machine Learning

Assigning Responsibility for Deteriorations in Video Quality

3:50 PM

Break

4:20 PM
Spark Ecosystem

More Algorithms and Tools for Genomic Analysis on Apache Spark

Developer

Improving Python and Spark Performance and Interoperability with Apache Arrow

Spark Experience and Use Cases

Lessons Learned from Managing Thousands of Production Apache Spark Clusters Daily

Sponsored Sessions

Sponsored Sessions

Leveraging GPU-Accelerated Analytics on top of Apache Spark

Machine Learning

Multi-Label Graph Analysis and Computations Using GraphX

5:00 PM
Research

Speeding Up Spark with Data Compression on Xeon+FPGA

Developer

Building Robust ETL Pipelines with Apache Spark

Spark Experience and Use Cases

From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets

Sponsored Sessions

Enterprise

Applying Machine Learning to Construction

Machine Learning

Visualization of Enhanced Spark Induced Naive Bayes Classifier

5:40 PM
Spark Experience and Use Cases

Apache Spark and Citizen Science: Using eBird Data to Predict Bird Abundance at Scale

Technical Deep Dives

Sponsored Sessions

Enterprise

Rental Cars and Industrialized Learning to Rank

6:10 PM

Attendee Reception

Have fun mingling with other attendees over hors d’oeuvres and cocktails as you tour the Spark Summit Expo Hall.

 

Day 3 • Wednesday, June 7 • Enterprise Day

8:00 AM

Registration

9:00 AM
9:25 AM

Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time

In the last year Hotels.com has begun it’s journey to becoming an algorithmic business. Matt will talk about their experiences of exponential growth in Data Science Algorithms whilst at the same time the team have… Read more
9:40 AM

Machine Learning Innovation Fireside Chat

10:00 AM
10:10 AM

Cutting Edge Predictive Analytics

Apache Spark empowers predictive analytics and machine learning by increasing the reach and potential. But, before jumping to new deployments, it’s critical we 1) get the analytics right and 2) not overlook less conspicuous business… Read more
10:30 AM

Break

11:00 AM
Spark Ecosystem

HDFS on Kubernetes—Lessons Learned

Spark Experience and Use Cases

OAP: Optimized Analytics Package for Spark Platform

Technical Deep Dives

Deep Dive Into Apache Spark Multi-User Performance

Sponsored Sessions

Transactional I/O on Cloud Storage in Databricks

Data Science

Yelp Ad Targeting at Scale with Apache Spark

Machine Learning

Embracing a Taxonomy of Types to Simplify Machine Learning

11:40 AM
Spark Ecosystem

Homologous Apache Spark Clusters Using Nomad

Developer

Productive Use of the Apache Spark Prompt

Sponsored Sessions

How to Run Spark Data Engineering Workloads in the Cloud

Data Science

Data Wrangling with PySpark for Data Scientists Who Know Pandas

12:20 PM
Research

Neuro-Symbolic AI for Sentiment Analysis

Spark Ecosystem

Interoperating a Zoo of Data Processing Platforms Using Rheem

Sponsored Sessions

Women in Big Data Lunch

Enterprise

Big Data at Audi: Root Cause Analysis in an Automotive Paint Shop Using MLlib

Data Science

Smart Scalable Feature Reduction With Random Forests

12:50 PM

Lunch

BoF-How to bring the pipeline built in Notebook / Apache Spark to production, and machine learning deployment cycles

Notebook is a widely used tools for data scientists to analyze the data to find insight and build learning models. However, there is a gap bringing the notebook into production pipeline. How do we streamline…

BoF-Real-time / Low-latency Apache Spark

Let’s get together to discuss usage of Spark for real time / low latency scenarios. Share your experiences and let’s help each other learn!

BoF-Multi tenancy in Apache Spark

This session is an informal meeting about deploying spark in a multi-user / multi-tenant environment. We will discuss the different deploy alternatives: standalone, mesos, yarn, cook, databricks with their pros and cons. If time allows…
2:00 PM
Data Science

Natural Language Processing with CNTK and Apache Spark

Developer

Improving Apache Spark with S3

Spark Experience and Use Cases

Tuning Apache Spark for Large-Scale Workloads

Technical Deep Dives

Sparklyr: Recap, Updates, and Use Cases

Enterprise

From Data to Actions and Insights at Conviva

Machine Learning

Building Competing Models Using Apache Spark DataFrames

2:40 PM
Data Science

ADMM-Based Scalable Machine Learning on Apache Spark

Developer

Demystifying DataFrame and Dataset

Spark Experience and Use Cases

Performance Optimization of Recommendation Training Pipeline at Netflix

Technical Deep Dives

Sparklyr: Recap, Updates, and Use Cases (continues)

Sponsored Sessions

GPU-Powered Deed Learning in the Spark Ecosystem

Sponsored Sessions

Remote Monitoring using Apache Spark

Enterprise

Changing the Way Viacom Looks at Video Performance

Machine Learning

Real-Time Image Recognition with Apache Spark

3:20 PM
Spark Ecosystem

Just-in-Time Analytics and the Need for Autonomous Database Administration

Technical Deep Dives

Sponsored Sessions

Virtualizing Apache Spark

Sponsored Sessions

Operationalizing Machine Learning at Scale

Data Science

Write Graph Algorithms Like a Boss

3:50 PM

Break

4:20 PM
Spark Ecosystem

Getting Ready to Use Redis with Apache Spark

Developer

A Developer’s View into Spark's Memory Model

Spark Experience and Use Cases

Why You Should Care about Data Layout in the Filesystem

Enterprise

Leveraging Spark to Democratize Data for Omni-Commerce

5:00 PM
Data Science

Creating Personalized Container Solutions with Azure Container Services

Spark Ecosystem

From R Script to Production Using rsparkling

Developer

Continuous Application with FAIR Scheduler

Spark Experience and Use Cases

RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Environment

Sponsored Sessions

Enterprise

Stream All Things—Patterns of Modern Data Integration

5:40 PM
Spark Experience and Use Cases

The Smart Data Warehouse: Goal-Based Data Production

Technical Deep Dives

Sponsored Sessions

Machine Learning

Deep Learning with Apache Spark and GPUs

8:00 PM

JOIN Party

Come close out the 10th edition of Spark Summit at the JOIN attendee party. This rockin’ celebration includes drinks, games, DJs, dancing and a few fun surprises. In the coming weeks, we will announce even… Read more

 

Live Stream Registration

Register to watch the Spark Summit keynotes for FREE via live web streaming.

The Spark Summit live stream will be active from 9:00-10:30 AM Pacific Time on Tuesday, June 6 through Wednesday, June 7, 2017.

Register now