Technical Trainer / Contract (Databricks)



Job Overview

Databricks is building a world-class technical training organization to deliver the most advanced curriculum in the Big Data industry. As a Technical Trainer specializing in the Apache Spark ecosystem, you will fly to customer sites to explain how Spark processes big data at scale. This role requires both public speaking experience and engineering skills to explain how the various components of Spark work.

Advanced knowledge of Spark is useful but not required. If selected, you will go through two months of in-office and self-paced preparation training before flying to customer sites.

About Spark & Databricks

Apache Spark is an open-source cluster computing framework originally developed in 2009 in the UC Berkeley AMPLab. Spark makes it easy to get value from big data. It can read from any data source (relational, NoSQL, file systems, etc) and offers one unified API for batch analytics, SQL queries, real time analysis, machine learning and graph processing. Developers no longer have to learn separate processing engines for different tasks.

Spark is 10x-100x faster than older systems such as Hadoop MapReduce with proven scalability (the largest Spark cluster has over 8,000 nodes). Spark had over 450 contributors in 2014, making it the most active open source project in the Big Data space.

The core Spark team is now at Databricks. With over 50 employees and $47 million in funding, Databricks is the main steward of the Spark project. As a Trainer for Databricks and Spark, you will work closely with this team to understand Spark’s internals and then teach it to engineers around the world.

Job Duties

  • Deliver instructor-led classroom training to end-user customers (developers, administrators, architects) and partners on a regular schedule in US and abroad.
  • Assist in developing and maintaining technical training content, lab exercises, presentations, and accompanying materials.



  • Learn the Apache Spark ecosystem (Core, SQL, Streaming, MLlib, BlinkDB, Tachyon, etc.) from the original team that created and drives the Spark project. Get early access to production use cases, technical roadmap, and work in person with the Spark committers.
  • You will be one of the highest paid trainers in the big data industry.
  • Work your own schedule. You tell us how many days per month you want to train and in which location on the planet; we’ll work to accommodate your schedule.



(Note, we aim to find candidates who meet all of the following requirements, but understand that some qualified candidates may not have 100% of the requirements.)

  • This is a 2-year engagement. The first 3 months will be mostly studying/learning. Please be open to committing 2 years for this.
  • Delivered 20+ multi-day technical training classes in the past year on either a big data technology or sufficiently complex software stack. Note, we are also considering exceptional individuals with strong consulting backgrounds.
  • Ability to quickly learn and grasp advanced concepts
  • Ability to travel about 50% of the time
  • Strong speaking skills for both technical and non-technical audiences, both on-site and via WebEx
  • Solid grasp of programming principles (object oriented and functional). At least medium level proficiency in one of: Java, Scala, Python, R
  • Medium-level (hands-on) proficiency in at least a handful of the following big data technologies: Spark, Hadoop, HDFS, MapReduce, YARN, Cassandra, Hive, Pig, HBase, Kafka, Flume, Storm, ZooKeeper, Couchbase, MongoDB, ElasticSearch, Lucene, Solr, Parquet, Avro, Neo4j, Machine Learning



Job posted 6/6/2015