Jump to:   Spark Training   Developer Day   Enterprise Day   Live Stream

 

Day 1 • Tuesday, February 7 • Spark Training

7:00

Registration

9:00
12:00

Lunch

1:00 PM
18:00

 

Day 2 • Wednesday, February 8 • Developer Day

7:00

Registration

9:00

What to Expect for Big Data and Apache Spark in 2017

Big data remains a rapidly evolving field with new applications and infrastructure appearing every year. In this talk, I’ll cover new trends in 2016 / 2017 and how Apache Spark is moving to meet them.… Read more
9:20

Using Apache Spark for Intelligent Services

Salesforce is developing Einstein which is an artificial intelligence (AI) capability built into the core of the Salesforce Platform. Einstein helps power the world’s smartest CRM to deliver advanced AI capabilities to sales, services, and… Read more
9:40

Production-Ready Structured Streaming

In Spark 2.0, we introduced Structured Streaming, which allows users to continually and incrementally update your view of the world as new data arrives, while still using the same familiar Spark SQL abstractions. I talk… Read more
9:55

Scaling Genetic Data Analysis with Apache Spark

In 2001, it cost ~$100M to sequence a single human genome. In 2014, due to dramatic improvements in sequencing technology far outpacing Moore’s law, we entered the era of the $1,000 genome. At the same… Read more
10:15

RISELab: Enabling Intelligent Real-Time Decisions

A long-standing grand challenge in computing is to enable machines to act autonomously and intelligently: to rapidly and repeatedly take appropriate actions based on information in the world around them. To address this challenge, at… Read more
10:30

Break

11:00
Spark Ecosystem

New Directions in pySpark for Time Series Analysis

Developer

Processing Terabyte-Scale Genomics Datasets with ADAM

Spark Experience and Use Cases

Going Real-Time: Creating Frequently-Updating Datasets for Personalization

Data Science

Netflix's Recommendation ML Pipeline Using Apache Spark

Research

Drizzle—Low Latency Execution for Apache Spark

11:40
Spark Ecosystem

Time Series Analytics with Spark

Spark Experience and Use Cases

Spark for Behavioral Analytics Research

Research

Time-evolving Graph Processing on Commodity Clusters

12:20
Spark Ecosystem

Lessons Learned from Dockerizing Spark Workloads

Developer

Cost-Based Optimizer Framework for Spark SQL

Spark Experience and Use Cases

Spark as the Gateway Drug to Typed Functional Programming

Data Science

Feature Hashing for Scalable Machine Learning

Sponsored Sessions

Women In Big Data Lunch

12:50 PM

Lunch

2:00 PM
Developer

Optimizing Apache Spark SQL Joins

Spark Experience and Use Cases

Exploring Spark for Scalable Metagenomics Analysis

Data Science

Tuning and Monitoring Deep Learning on Apache Spark

Sponsored Sessions

Women In Big Data Lunch

2:40 PM
Developer

The Joy of Nested Types with Spark

Spark Experience and Use Cases

Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn

Sponsored Sessions

Spark SQL: Another 16x Faster After Tungsten

Sponsored Sessions

Cornami Accelerates Performance on SPARK

3:20 PM
Spark Ecosystem

Building a Dataset Search Engine with Spark and Elasticsearch

Spark Experience and Use Cases

Experiences with Spark's RDD APIs for Complex, Custom Applications

Data Science

Global Empire-Building for Fun and Profit

Sponsored Sessions

A New “Sparkitecture” for Modernizing your Data Warehouse

Sponsored Sessions

Analytics at the Real-Time Speed of Business

3:50 PM

Break

4:20 PM
Developer

What No One Tells You About Writing a Streaming App

Spark Experience and Use Cases

Problem Solving Recipes Learned from Supporting Spark

Sponsored Sessions

Building Deep Learning Powered Big Data

Sponsored Sessions

Delivering Insights from 5PB of Product Logs at Pure Storage

5:00 PM
Spark Ecosystem

Apache Toree: A Jupyter Kernel for Spark

Spark Experience and Use Cases

Migrating from Redshift to Spark at Stitch Fix

Sponsored Sessions

Building the Ideal Stack for Real-Time Analytics

Sponsored Sessions

Compress Software Development Cycles with supercomputer-based Spark

Research

Analysis Andromeda Galaxy Data Using Spark

5:40 PM
Spark Ecosystem

Secured (Kerberos-based) Spark Notebook for Data Science

Developer

Spark and Object Stores —What You Need to Know

Spark Experience and Use Cases

Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics

Data Science

BigDL: A Distributed Deep Learning Library on Spark

Research

Bringing HPC Algorithms to Big Data Platforms

6:10 PM

Attendee Reception

8:00 PM

End of Day

 

Day 3 • Thursday, February 9 • Enterprise Day

8:00

Registraion

9:00

Virtualizing Analytics with Apache Spark

In the race to invent multi-million dollar business opportunities with exclusive insights, data scientists and engineers are hampered by a multitude of challenges just to make one use case a reality – the need to… Read more
9:20

Big Data Meets Learning Science

How do we learn and how can we learn better? Educational technology is undergoing a revolution fueled by learning science and data science. The promise is to make a high-quality personalized education accessible and affordable… Read more
9:30

Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark

Deep learning is a fast growing subset of machine learning. There is an emerging trend to conduct deep learning in the same cluster along with existing data processing pipelines to support feature engineering and traditional… Read more
9:40

Artificial Intelligence: How Enterprises Can Crush It With Apache Spark

Artificial intelligence (AI) is not new. It emerged as a computer science discipline in the 50’s and has been a persistent theme in science fiction. What is new is that enterprises now have the prerequisites… Read more
10:00

Data Science Transformation Via Apache Spark on Hybrid Cloud

Most enterprises have their business running on legacy environments on premise. Just picking up and moving everything to the cloud isn’t an option for the vast majority. Cloud migration requires a critical mass of data,… Read more
10:10

Apache Spark in Cloud and Hybrid: Why Security and Governance Become More Important

An Increasing number of Apache Spark deployments are in Cloud and hybrid environments. This often means that Spark workloads are ephemeral but the data exists in a durable storage either in cloud and on-prem. The… Read more
10:20

Break

11:00
Spark Ecosystem

Auto Scaling Systems With Elastic Spark Streaming

Developer

Exceptions are the Norm: Dealing with Bad Actors in ETL

Spark Experience and Use Cases

Learnings Using Spark Streaming and DataFrames for Walmart Search

Data Science

Scalable Data Science with SparkR

Research

Sparkler—Crawler on Apache Spark

11:40
Developer

Improving Python and Spark Performance and Interoperability

Spark Experience and Use Cases

Spark-Streaming-as-a-Service with Kafka and YARN

Research

12:20
Spark Ecosystem

Kerberizing Spark

Spark Experience and Use Cases

Sparking Up Data Engineering

Enterprise

Unlocking Value in Device Data Using Spark

Research

Large-Scale Text Processing Pipeline with Spark ML and GraphFrames

12:50 PM

Lunch

2:00 PM
Spark Ecosystem

The Fast Path to Building Operational Applications with Spark

Developer

Robust and Scalable ETL Over Cloud Storage with Spark

Spark Experience and Use Cases

Lambda Processing for Near Real Time Search Indexing at WalmartLabs

Data Science

Apache Spark for Machine Learning with High Dimensional Labels

Enterprise

Modeling Catastrophic Events in Spark

2:40 PM
Spark Ecosystem

Building Real-Time BI Systems with Kafka, Spark, and Kudu

Developer

Spark and Online Analytics

Spark Experience and Use Cases

Fault Tolerance in Spark: Lessons Learned from Production

Data Science

Scaling Apache Spark MLlib to Billions of Parameters

Enterprise

R&D to Product Pipeline Using Apache Spark in AdTech

Research

Algorithms and Tools for Genomic Analysis on Spark

3:20 PM
Developer

Spark + Parquet In Depth

Spark Experience and Use Cases

Spark Autotuning

Enterprise

FIS: Accelerating Digital Intelligence in FinTech

3:50 PM

Break

4:20 PM
Spark Ecosystem

Effective Spark with Alluxio

Data Science

GoDaddy Small Business Success Index Using Apache Spark

Enterprise

Distributed Real-Time Stream Processing: Why and How

Research

Neural Network That Learns From a Huge Graph

5:00 PM
5:40 PM
Developer

SparkSQL: A Compiler from Queries to RDDs

Spark Experience and Use Cases

Keeping Spark on Track: Productionizing Spark for ETL

Data Science

Parallelizing Existing R Packages with SparkR

Enterprise

 

Can’t make it to Spark Summit?

Register to watch Spark Summit East 2017 for FREE via live web streaming.

The Spark Summit live stream will be active from 9:00 PM to 6:00 PM Eastern Time on Wednesday, February 8 through Thursday, February 9, 2017.

Register now