Apache Spark-and-Tensorflow-as-a-Service

In Sweden, from the Rise ICE Data Center at, we are providing to reseachers both Spark-as-a-Service and, more recently, Tensorflow-as-a-Service as part of the Hops platform. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows, from batch to streaming to structured streaming applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Tensorframes to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We introduce the different programming models supported and highlight the importance of cluster support for managing different versions of python libraries on behalf of users. We will also present cluster management support for sharing GPUs, including Mesos and YARN (in Hops Hadoop). Finally, we will perform a live demonstration of training and inference for a TensorflowOnSpark application written on Jupyter that can read data from either HDFS or Kafka, transform the data in Spark, and train a deep neural network on Tensorflow. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training.

Session hashtag: #EUai8

Jim Dowling, Associate Professor at KTH—Royal Institute of Technology

About Jim

Jim Dowling is a native of Dublin (Ireland) and an Associate Professor at the School of Information and Communications Technology in the Department of Software and Computer Systems at KTH Royal Institute of Technology, a Senior Researcher at SICS RISE, and CEO of Logical Clocks AB. He received his Ph.D. in Distributed Systems from Trinity College Dublin (2005) and worked at MySQL AB (2005-2007). He’s a distributed systems researcher and his research interests are in the area of high-performance, large-scale distributed computer systems. He’s lead architect of Hops Hadoop (, the world’s most scalable Hadoop distribution.