The Key to Machine Learning is Prepping the Right Data

Machine learning has its challenges, and understanding the algorithms is not always easy. In this session, you’ll discover methods to make these challenges less daunting.

Intended for software engineers who need to understand the requirements and constraints of data scientists, and data scientists who need to implement or help implement production systems, the session will begin with a quick introduction to data quality and a level-set on common vocabulary. You’ll then explore the formats that are required by Spark ML to run its algorithms, and see how to automate the build through user-defined functions and other techniques. Automation will make reproducibility easy, minimize errors and increase the efficiency of data scientists.

Key takeaways will include:
– How to build the required tool set in Java
– Understanding the formats required by Spark ML (a new vocabulary)
– Learning fundamentals about data quality and how to make sure the data is usable

Session hashtag: #SFml10

Jean Georges Perrin, Software Architect at Zaloni

About Jean

Jean Georges “JGP” Perrin is a Software Architect for Zaloni. He is proud to have been the first in France to be named as an IBM Champion, and to have been awarded the honor for his ninth consecutive year. Active within the Raleigh-Durham Spark community, JGP shares his more than 20 years of experience in IT as a presenter and participant at conferences and by publishing articles in print and online media. His blog is visible at When he is not immersed in IT, which he loves, he enjoys exploring his adopted region of North Carolina with his kids.