San Francisco
|
June 30 - July 2, 2014

Job Board

Data Scientist (Databricks)San Francisco, CA

Date posted: February 23, 2015

Data Scientist – Databricks
San Francisco, CA

As a Data Scientist/Engineer at Databricks you will prototype, design, build, and launch production data products. You will also perform exploratory data modeling and analysis. As an early team member you will help shape the direction of the team. You will use a variety of tools, including the Databricks Cloud itself, and techniques to analyze Databricks Cloud usage data. You will also transform exploratory results into production low-latency data products that produce interactive dashboards (see the video of our product keynote at the Spark Summit 2014 for reference), data that is integrated into other systems (e.g. our CRM system), and reports consumed by the executive team. Your work will directly influence our core strategy, future product decisions, and interactions with customers.

Responsibilities:

  • Exploratory analysis and prototyping using the Databricks Cloud.
  • Design, build, test and launch production data products inside of Databricks including our own low-latency data pipeline.
  • Explore, identify, and justify key metrics to Databricks success, and deliver products based on them that measure and drive business/customer/product actions.
  • Where appropriate push bug fixes and features upstream to Apache Spark and Databricks Cloud (includes design, review, testing, etc.)

General Requirements

  • Strong engineering background
  • Machine learning, probability, and linear algebra expertise
  • Modeling and analysis of large scale data
  • Verbal and written communication skills
  • Experience building end-to-end applications

Languages and Systems Requirements

  • Languages for data mining/analysis such as R, SAS, Matlab, SQL
  • Systems-building languages such as Scala, Java, C++
  • Functional languages such as Scala, Python, Lisp
  • Distributed systems such as Spark, Hadoop
  • Nice to have but not required: data warehouse environments such as Hive, Greenplum, Teradata

Apply Here