Spark Summit EU 2017 features a number of hands-on training workshops designed to help improve your Apache Spark skills. See below for the list of available courses.
Please note that training is offered as a standalone ticket. If you wish to attend any Spark Summit EU 2017 conference sessions or networking activities on October 25th or 26th, you must register for a Conference Only in addition to your Training pass.
All training courses will be held at the Convention Centre Dublin. Please visit the Venue page for more details, including hotel and travel discounts.
Training will take place from 9AM to 5PM on Tuesday, October 24th. It will include training and lunch on that day only.
To attend any of the Spark Summit 2017 conference sessions or networking activities on October 25th and 26th, including the welcome reception in the Expo Hall, you’ll need to register for a Conference Only in addition to your Training pass.
Bring your own WiFi-enabled laptop with Google Chrome or Firefox installed.
This 1-day course is for data engineers, analysts, architects, dev-ops, and team-leads interested in troubleshooting and optimizing Apache Spark applications. It covers troubleshooting, tuning, best practices, anti-patterns to avoid, and other measures to help tune and troubleshoot Spark applications and queries.
Each topic includes lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Inspired by tools like IPython/Jupyter, notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. Students may keep the notebooks and continue to use them with the free Databricks Community Edition offering; all examples are guaranteed to run in that environment. Alternatively, each notebook can be exported as source code and run within any Spark environment.
After taking this class, students will:
The Data Science with Apache Spark workshop will show how to use Apache Spark to perform exploratory data analysis (EDA), develop machine learning pipelines, and use the APIs and algorithms available in the Spark MLlib DataFrames API. It is designed for software developers, data analysts, data engineers, and data scientists.
It will also cover parallelizing machine learning algorithms at a conceptual level. The workshop will take a pragmatic approach, with a focus on using Apache Spark for data analysis and building models using MLlib, while limiting the time spent on machine learning theory and the internal workings of Spark, although we will view Spark’s source code a couple of times.
We’ll work through examples using public datasets that will show you how to apply Apache Spark to help you iterate faster and develop models on massive datasets. This workshop will provide you the tools so that you can be productive using Spark on practical data analysis tasks and machine learning problems. You’ll learn about how to use familiar Python libraries with Spark’s distributed and scalable engine. After completing this workshop you should be comfortable using DataFrames, the DataFrames MLlib API, and related documentation. These building blocks will enable you to use Apache Spark to solve a variety of data analysis and machine learning tasks.
Some experience coding in Python or Scala, a basic understanding of data science topics and terminology, and some experience using Spark are required. Familiarity with the concept of a DataFrame is helpful.
Brief conceptual reviews of data science techniques will be performed before the techniques are used. Labs and demos will be available in both Python and Scala.
Instructor: Adam Breindel
This Deep Learning workshop introduces the conceptual background as well as implementation for key architectures in neural network machine learning models. We will see how and why deep learning has become such an important and popular technology, and how it is similar to and different from other machine learning models as well as earlier attempts at neural networks.
We’ll see how deep learning models can be used to enhance your traditional business analytics, in addition to covering the famous cases like image recognition, language processing, and autonomous agents. Most of our models will be built with the Keras API/Library, but we’ll also take a look at “what’s under the hood” with TensorFlow. But we won’t just hack demos: our goal is to develop an intuition for the key concepts and issues at play in deep learning.
The class will also feature a discussion about using Apache Spark for training and inference, and other deployment / operational concerns. Along the way, we’ll hopefully explain enough ideas and terminology that you’ll be comfortable going further with deep learning on your own!
Familiarity with the basics of Python and with common ideas and techniques in machine learning / predictive analytics. You should be be familiar with classification vs. regression problems, supervised vs. unsupervised learning, bias-variance tradeoff, and common evaluation metrics like RMSE, precision, and recall.
No prior deep learning knowledge, vector calculus, or Spark experience is required.