Jump to:   Training Day   Developer Day   Enterprise Day

 

Day 1 • Monday, June 5 • Training Day

7:00 AM

Registration

9:00 AM
12:00 PM

Lunch

1:00 PM
6:00 PM

Meetup

Join us for an evening Bay Area Apache Spark Meetup at the 10th Spark Summit featuring tech-talks about using Apache Spark at scale from Pepperdata’s CTO Sean Suchter, RISELab’s Dan Crankshaw, and Databricks’ Spark committers… Read more

 

Day 2 • Tuesday, June 6 • Developer Day

7:00 AM

Registration

9:05 AM

Expanding Apache Spark Use Cases in 2.2 and Beyond

2017 continues to be an exciting year for big data and Apache Spark. I will talk about two major initiatives that Databricks has been building: Structured Streaming, the new high-level API for stream processing, and… Read more
9:45 AM

Snorkel: Dark Data and Machine Learning

Building applications that can read and analyze a wide variety of data may change the way we do science and make business decisions. However, building such applications is challenging: real world data is expressed in… Read more
10:00 AM

Unleashing Data Intelligence with Intel and Apache Spark

Organizations are developing deep learning applications to derive new insights, identify new opportunities and uncover new efficiencies. However, deep learning application development often means tapping into multiple frameworks, libraries, and clusters—a complex, time-consuming, and costly… Read more
10:10 AM
10:30 AM

Break

11:00 AM
Research

Scaling Genetic Data Analysis with Apache Spark

Developer

A Deep Dive into Spark SQL's Catalyst Optimizer

Technical Deep Dives

Data Science Deep Dive: Spark ML with High Dimensional Labels

Sponsored Sessions

BigDL: Bringing Ease of Use of Deep Learning for Apache Spark

Enterprise

Spark Compute as a Service at Paypal

Streaming

SSR: Structured Streaming on R for Machine Learning

Machine Learning

Challenging Web-Scale Graph Analytics with Apache Spark

11:40 AM
Research

Lazy Join Optimizations Without Upfront Statistics

Spark Ecosystem

Apache Kylin: Speed Up Cubing with Apache Spark

Spark Experience and Use Cases

Incremental Processing on Large Analytical Datasets

Sponsored Sessions

Connect Code to Resource Consumption to Scale Your Production Spark Applications

Enterprise

Using SparkML to Power a DSaaS (Data Science as a Service)

Streaming

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling

Machine Learning

Needle in the Haystack—User Behavior Anomaly Detection for Information Security

12:20 PM
Spark Ecosystem

Building a Unified Data Pipeline with Apache Spark and XGBoost

Developer

Hive Bucketing in Apache Spark

Spark Experience and Use Cases

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2.x

Technical Deep Dives

Ray: A Cluster Computing Engine for Reinforcement Learning Applications

Enterprise

How Apache Spark and AI Powers UberEats

Streaming

The Top Five Mistakes Made When Writing Streaming Applications

Machine Learning

Random Walks on Large Scale Graphs with Apache Spark

12:50 PM

Lunch

2:00 PM
2:40 PM
Research

Apache Spark on Supercomputers: A Tale of the Storage Hierarchy

Spark Ecosystem

Extending the R API for Spark with sparklyr and Microsoft R Server

Spark Experience and Use Cases

Best Practices for Using Alluxio with Apache Spark

Technical Deep Dives

Cost-Based Optimizer in Apache Spark 2.2 (continues)

Sponsored Sessions

Introducing Exactly Once Semantics in Apache Kafka

Sponsored Sessions

Make Spark Support 1 Trillion Dimensions Logistic Regression

Enterprise

Scaling Data Science Capabilities with Apache Spark at Stitch Fix

Machine Learning

Fuzzy Matching on Apache Spark

3:20 PM
Spark Ecosystem

Apache Spark on Kubernetes

Developer

Tricks of the Trade to be an Apache Spark Rock Star

Spark Experience and Use Cases

Experiences Migrating Hive Workload to SparkSQL

Spark Ecosystem

Building Operational Data Lake using Spark and SequoiaDB

Sponsored Sessions

Structured Streaming for Columnar Data Warehouses

Sponsored Sessions

Analytics at Scale with Apache Spark on AWS

Enterprise

Transforming B2B Sales with Spark-Powered Sales Intelligence

Machine Learning

Assigning Responsibility for Deteriorations in Video Quality

3:50 PM

Break

4:20 PM
Spark Ecosystem

More Algorithms and Tools for Genomic Analysis on Apache Spark

Developer

Improving Python and Spark Performance and Interoperability with Apache Arrow

Spark Experience and Use Cases

Lessons Learned from Managing Thousands of Production Apache Spark Clusters Daily

Sponsored Sessions

Sponsored Sessions

Leveraging GPU-Accelerated Analytics on top of Apache Spark

Enterprise

GoDaddy Customer Success Dashboard Using Apache Spark

Streaming

Dynamic DDL: Adding Structure to Streaming Data on the Fly

Machine Learning

Multi-Label Graph Analysis and Computations Using GraphX

5:00 PM
Research

Speeding Up Spark with Data Compression on Xeon+FPGA

Spark Ecosystem

Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spark SQL

Developer

Building Robust ETL Pipelines with Apache Spark

Spark Experience and Use Cases

From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets

Sponsored Sessions

Enterprise

Applying Machine Learning to Construction

Machine Learning

Visualization of Enhanced Spark Induced Naive Bayes Classifier

5:40 PM
Spark Ecosystem

Building a Large Scale Recommendation Engine with Spark and Redis-ML

Spark Experience and Use Cases

Apache Spark and Citizen Science: Using eBird Data to Predict Bird Abundance at Scale

Technical Deep Dives

Sponsored Sessions

Enterprise

Rental Cars and Industrialized Learning to Rank

Streaming

Scalable Monitoring Using Apache Spark and Friends

Machine Learning

The Key to Machine Learning is Prepping the Right Data

6:10 PM

Attendee Reception

Have fun mingling with other attendees over hors d’oeuvres and cocktails as you tour the Spark Summit Expo Hall.

 

Day 3 • Wednesday, June 7 • Enterprise Day

8:00 AM

Registration

9:00 AM
9:25 AM

Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth in Data Science Whilst Migrating to Spark+Cloud all at the Same Time

In the last year Hotels.com has begun it’s journey to becoming an algorithmic business. Matt will talk about their experiences of exponential growth in Data Science Algorithms whilst at the same time the team have… Read more
9:40 AM

Machine Learning Innovation Fireside Chat

10:00 AM
10:10 AM

Cutting Edge Predictive Analytics

Apache Spark empowers predictive analytics and machine learning by increasing the reach and potential. But, before jumping to new deployments, it’s critical we 1) get the analytics right and 2) not overlook less conspicuous business… Read more
10:30 AM

Break

11:00 AM
Spark Ecosystem

HDFS on Kubernetes—Lessons Learned

Developer

Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop

Spark Experience and Use Cases

Spinach: Providing Ad-Hoc Query Support on Top of Spark SQL

Technical Deep Dives

Deep Dive Into Apache Spark Multi-User Performance

Sponsored Sessions

Transactional I/O on Cloud Storage in Databricks

Enterprise

Archiving, E-Discovery, and Supervision with Spark and Hadoop

Data Science

Yelp Ad Targeting at Scale with Apache Spark

Machine Learning

Embracing a Taxonomy of Types to Simplify Machine Learning

11:40 AM
Spark Ecosystem

Homologous Apache Spark Clusters Using Nomad

Developer

Productive Use of the Apache Spark Prompt

Spark Experience and Use Cases

Social Media, Spark, Machine Learning, and Data Visualization to Find Patterns and Insight

Technical Deep Dives

Deep Dive Into Apache Spark Multi-User Performance (continues)

Sponsored Sessions

How to Run Spark Data Engineering Workloads in the Cloud

Data Science

Data Wrangling with PySpark for Data Scientists Who Know Pandas

12:20 PM
Research

Neuro-Symbolic AI for Sentiment Analysis

Spark Ecosystem

Interoperating a Zoo of Data Processing Platforms Using Rheem

Spark Experience and Use Cases

Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for Forensics, Fraud, and Finance

Technical Deep Dives

From Pipelines to Refineries: Building Complex Data Applications with Apache Spark

Sponsored Sessions

Women in Big Data Lunch

Enterprise

Big Data at Audi: Root Cause Analysis in an Automotive Paint Shop Using MLlib

Data Science

Smart Scalable Feature Reduction With Random Forests

Machine Learning

Large-Scale Ads CTR Prediction with Spark and Deep Learning: Lessons Learned

12:50 PM

Lunch

2:00 PM
Data Science

Natural Language Processing with CNTK and Apache Spark

Developer

Improving Apache Spark with S3

Spark Experience and Use Cases

Tuning Apache Spark for Large-Scale Workloads

Technical Deep Dives

Sparklyr: Recap, Updates, and Use Cases

Sponsored Sessions

Data Science and Deep Learning on Spark with 1/10th of the Code

Enterprise

From Data to Actions and Insights at Conviva

Data Science

Fully-Reproducible ML Deployment with Spark, Pachyderm, and MLeap

Machine Learning

Building Competing Models Using Apache Spark DataFrames

2:40 PM
Data Science

ADMM-Based Scalable Machine Learning on Apache Spark

Spark Ecosystem

Applying SparkSQL to Big Spatio-Temporal Data Using GeoMesa

Developer

Demystifying DataFrame and Dataset

Spark Experience and Use Cases

Performance Optimization of Recommendation Training Pipeline at Netflix

Technical Deep Dives

Sparklyr: Recap, Updates, and Use Cases (continues)

Sponsored Sessions

GPU-Powered Deed Learning in the Spark Ecosystem

Sponsored Sessions

Remote Monitoring using Apache Spark

Enterprise

Changing the Way Viacom Looks at Video Performance

Machine Learning

Real-Time Image Recognition with Apache Spark

3:20 PM
Spark Ecosystem

Just-in-Time Analytics and the Need for Autonomous Database Administration

Developer

Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

Spark Experience and Use Cases

Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Codeless Modeling

Technical Deep Dives

Sponsored Sessions

Virtualizing Apache Spark

Sponsored Sessions

Operationalizing Machine Learning at Scale

Enterprise

Leveraging Apache Spark to Disrupt Airline Pricing Distribution

Data Science

Write Graph Algorithms Like a Boss

3:50 PM

Break

4:20 PM
Data Science

Apache SparkR Under the Hood: How to Debug your SparkR Applications

Spark Ecosystem

Getting Ready to Use Redis with Apache Spark

Developer

A Developer’s View into Spark's Memory Model

Spark Experience and Use Cases

Why You Should Care about Data Layout in the Filesystem

Technical Deep Dives

Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more

Sponsored Sessions

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search

Enterprise

Leveraging Spark to Democratize Data for Omni-Commerce

Data Science

Using AI for Providing Insights and Recommendations on Activity Data

Machine Learning

Deep Learning in Security—Are We Ready?

5:00 PM
Data Science

Creating Personalized Container Solutions with Azure Container Services

Spark Ecosystem

From R Script to Production Using rsparkling

Developer

Continuous Application with FAIR Scheduler

Spark Experience and Use Cases

RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Environment

Sponsored Sessions

Enterprise

Stream All Things—Patterns of Modern Data Integration

Data Science

NLP with MLlib: Global Empire-Building for Fun and Profit

Machine Learning

Deep Learning to Big Data Analytics on Apache Spark Using BigDL

5:40 PM
Spark Experience and Use Cases

The Smart Data Warehouse: Goal-Based Data Production

Technical Deep Dives

Sponsored Sessions

Enterprise

Case Study: Analytic Insights in Retail Using Apache Spark

Machine Learning

Deep Learning with Apache Spark and GPUs

8:00 PM

JOIN Party

Come close out the 10th edition of Spark Summit at the JOIN attendee party. This rockin’ celebration includes drinks, games, DJs, dancing and a few fun surprises. In the coming weeks, we will announce even… Read more