SESSION

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling

Slides PDF Video

Since mid-2016, Spark-as-a-Service has been available to researchers in Sweden from the Rise SICS ICE Data Center at www.hops.site. In this session, Dowling will discuss the challenges in building multi-tenant Spark structured streaming applications on YARN that are metered and easy-to-debug. The platform, called Hopsworks, is in an entirely UI-driven environment built with only open-source software. Learn how they use the ELK stack (Elasticsearch, Logstash and Kibana) for logging and debugging running Spark streaming applications; how they use Grafana and InfluxDB for monitoring Spark streaming applications; and, finally, how Apache Zeppelin can provide interactive visualizations and charts to end-users.

This session will also show how Spark applications are run within a ‘project’ on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In addition, hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.afka topics are protected from access by users that are not members of the project. We will also discuss the experiences of our users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.

Session hashtag: #SFexp5

Jim Dowling, Associate Professor at KTH Royal Institute of Technology

About Jim

Jim Dowling is an Associate Professor at the School of Information and Communications Technology in the Department of Software and Computer Systems at KTH Royal Institute of Technology as well as a Senior Researcher at SICS – Swedish ICT. He received his Ph.D. in Distributed Systems from Trinity College Dublin (2005) and worked at MySQL AB (2005-2007). He is a distributed systems researcher and his research interests are in the area of large-scale distributed computer systems. He is lead architect of Hadoop Open Platform-as-a-Service (www.hops.io), a next generation distribution of Hadoop for Humans.