San Francisco
June 30 - July 2, 2014

Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.


Spark Summit 2014
xPatterns on Spark, Shark, Tachyon and Mesos
Claudiu Barbura (Atigeo)

xPatterns is a big data analytics platform as a service that enables rapid development of enterprise-grade analytical applications. It provides tools, api sets and a management console for building an ELT pipeline with data monitoring and quality gates, a data warehouse for ad-hoc and scheduled querying, analysis, model building and experimentation, tools for exporting data to NoSql and SolrCloud feeding real-time access through low-latency/high-throughput apis as well as dashboard and visualization api/tools leveraging the available data and models. We will showcase the entire lifecycle of one of the xPatterns applications built for our largest production customer (20 billion medical, pharmacy and lab data records worth 200 TB of compressed hdfs data) while evolving our infrastructure from Hadoop and Hive to Spark, Shark, Tachyon and Mesos. We will provide detailed ELT pipeline stats with lessons learned (Hadoop vs Spark, Hive vs Shark vs Shark w/ Tachyon), tips & tricks for fine-tuning performance on various EC2 hardware configurations, live demos of Jaws, our Restful SharkServer and GUI for exploring the warehouse through Shark queries, Mesos providing resource management for multiple workloads (Hadoop/Hive, Spark, multiple instances of load balanced Jaws), Tachyon, an in-memory distributed file system, backed by hdfs that allows for a better performing and more resilient Spark/Shark stack, the Export to NoSql API console (generates geo-replicated apis for real-time access to Cassandra data exported from the warehouse through Spark jobs), the Referral Provider Network, a user-facing dashboard application (D3.js) and finally, monitoring and instrumentation consoles (Nagios, Ganglia and Graphite).

In my current role at Atigeo (Bellevue, WA), Senior Director of Engineering, Platform Services, I oversees multiple agile engineering teams distributed over the US and Romania while I play the Lead Architect role in building and operating xPatterns, an enterprise-class, Big Data Analytics platform, consisting of large distributed systems serving high-throughput/low latency APIs as well as offline and interactive processing of tens of terabytes of unstructured and semi-structured data. The platform enables many machine learning and NLP algorithms that build models for solutions across multiple verticals like healthcare, energy and education.

I have 17 years of industry experience in various roles from individual contributor to Software/Systems Architect, Dev Lead and Dev Manager, with a strong passion for Software Architecture leveraging industry best patterns and practices and contributing with a significant level of innovation. My experience span across the Open Source, Big Data and Microsoft’s Windows/.Net technology stacks.

Fluent in English, Romanian (native), Serbian and Hungarian, intermediate level in German, I spend my spare time practicing soccer, winter/water sports, hiking. I love traveling and meeting new people, experiencing new cultures.

Slides PDF |Video