SESSION

Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust

Slides PDF Video

PixieDust is a new open source library that helps data scientists and developers working in Jupyter Notebooks and Apache Spark be more efficient. PixieDust speeds up data manipulation and display with features like: auto-visualization of Spark DataFrames, real-time Spark job progress monitoring, automated local install of Python and Scala kernels running with Spark, and much more.

Come along and learn how you can use this tool in your own projects to visualize and explore data effortlessly with no coding. Oh, and if you prefer working with a Scala Notebook, this session is also for you, as PixieDust can also run on a Scala Kernel. Imagine being able to visualize your favorite Python chart engines from a Scala Notebook!

We’ll finish the session with a demo combining Twitter, Watson Tone Analyzer, Spark Streaming, and some fun real-time visualizations–all running within a Notebook.

Session hashtag: #SFdev26

David Taieb, STSM at IBM

About David

David Taieb is the STSM for the Watson Data Platform Developer Advocacy team at IBM, leading a team of avid technologists with the mission of educating developers on the art of possible with cloud technologies. He’s passionate about building Open Source tools like the PixieDust Python Library for Jupyter Notebooks and Apache Spark, that help improve developer productivity and overall experience. David enjoys sharing his experience by speaking at conferences and meeting as many people as possible.