SESSION

RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Environment

Slides PDF Video

The retail industry has a long history of fierce competition leading to innovations in marketing and operational efficiencies; however, this rapid advancement has not always kept pace with the latest advances in technology. This is evident by the abundance of business analysts at large enterprise retailers who are often constrained more by their own IT departments than by a lack of expertise or problems to solve.

RubiOne was designed as a vertically-integrated big data analytics development environment for retail business analysts and data scientists, with Apache Spark as the cornerstone of the product. It allows retailers to make data-driven decisions going beyond traditional analytics tools such as SQL and Excel. Using Apache Spark as one of the primary tools to query data and perform analytics, issues such as package installation, computational resources, and scalability are seamlessly handled by RubiOne.

In this session, you will learn how Apache Spark can serve as a shared backbone for an entire suite of enterprise services such as credential management, continuous integration, ad-hoc interactive data exploration, and task automation, while still maintaining hard enterprise requirements around security, availability, and cost. Learn from our war stories and best practices around transparently scaling Apache Spark clusters with Kubernetes, managing service and user isolation, and monitoring accurate enough for both debugging and billing. Beyond the technical aspects, we’ll also share our experiences of working with a global enterprise retailer to drive adoption of a modern big data technology stack centered around Apache Spark.

Session hashtag: #SFeco20

Adrian Petrescu, Chief Engineer at Rubikloud

About Adrian

Adrian is a Chief Engineer at Rubikloud, where he’s been solving data problems for retail since the early days of the company. His interests are mainly in the cloud architecture space, designing scalable solutions for large elastic workloads.