The business knows that there’s gold in all that data, and your team’s job is to find it. But being a detective with a bunch of clunky tools and difficult to setup infrastructure is hard. You want to be the hero who figures out what’s going on with the business, but you’re spending all your time wrestling with the tools.
We built Databricks to make big data simple. Apache Spark™ made a big step towards achieving this mission by providing a unified framework for building data pipelines. Databricks takes this further by providing a zero-management cloud platform built around Spark that delivers 1) fully managed Spark clusters, 2) an interactive workspace for exploration and visualization, 3) a production pipeline scheduler, and 4) a platform for powering your favorite Spark-based applications. So instead of tackling data headaches, you can finally focus on finding answers that make an immediate impact on your business.
Anyone who wants to extract value from their big data quickly and efficiently ranging from data scientists and engineers to developers and data analysts. By providing an interactive workspace the exposes Spark’s native R, Scala, Python and SQL interfaces; a REST API for remote programmatic access; the ability to execute arbitrary Spark jobs developed offline; and seamless support for 3rd party applications such as BI and domain-specific tools; Databricks enables users to consume data and insights through the interface they’re most comfortable with.
Databricks is being used by enterprises from a wide variety of verticals, including financial services, healthcare, retail, media & entertainment, and utilities. To date, we have seen customers utilize our platform for a broad spectrum of use cases including core ETL, data discovery and exploration, data warehousing, data product deployment, and insight publishing using dashboards for internal and external audiences.
Absolutely. Enterprises are accumulating massive quantities of data, but the big data analysis process in itself brings many barriers, ranging from infrastructure management needs to provisioning bottlenecks to high costs of acquisition and management. Databricks is designed to remove all these hurdles. We want big data to become as easy to use for the enterprises, making it as common as business applications used today like Excel.
Databricks was founded by the team who started the Spark research project at UC Berkeley, which later became Apache Spark™. Databricks works with the open source community to continue to expand the project. We have contributed more code to Spark than any other company. We also provide Databricks certification programs for Spark developers, system integrators, applications, distributors, and trainers. Additionally, we’ve developed Databricks, a Unified Analytics Platform that accelerates innovation by unifying data science, engineering and business.
Yes, Databricks is generally available. Many enterprise customers are leveraging Databricks today to run production jobs at significant scale across a broad spectrum of industries and use cases. Get started here.
Yes. Databricks makes it easy to develop, test, and deploy Apache Spark applications. We provide ODBC/JDBC connectivity, the standard Spark API, as well as a native REST API for 3rd party applications.
Databricks currently supports browser-based file uploads, pulling data from Azure Blob Storage, AWS S3, Azure SQL Data Warehouse, Azure Data Lake Store, NoSQL data stores such as Cosmos DB, Cassandra, Elasticsearch, JDBC data sources, HDFS, Sqoop, and a variety of other data sources supported natively by Apache Spark.
Security and fault tolerance is a top priority for Databricks, and our product has been built from ground up with proper authentication and isolation mechanisms in place. For more information, see our Security Page
Databricks runs 100% Apache Spark, hence all the code and applications developed on it can run on any Apache Spark compatible distribution (e.g., all Databricks Certified Distributions).
Databricks is currently available Microsoft Azure and Amazon AWS.
Yes. Databricks is deployed entirely within its own VPC in your account to provide an additional layer of security and isolation.
No, not at the moment. However, we are continuously investigating other deployment scenarios, some of them involving on-premise clusters.
Users of Databricks read from and persist data to their own datastores, using their own credentials.
No, you do not need to transfer data into Databricks. In most cases your data can be accessed from its current data sources.
You have control over access to your data and notebooks in your organization by adding users to your Databricks Account. Anyone added to your Databricks account will have access to the platform.
Databricks provides you with the option to deploy infrastructure exclusively for you. In the single-tenant mode, all Databricks services will be run in a separate VPC dedicated to you and completely isolated from others. You can peer your VPC with the Databricks VPC to connect and launch clusters in your own AWS account.
Databricks has already implemented its own security architecture based on industry best practices. We also continuously work to achieve higher standards such as SANS Top 20 Controls for Internet Security, Consensus Audit Guidelines, NIST guidelines, and Internet standards.
Databricks also retains a security firm to identify application or network-level security issues that could adversely affect the integrity of Databricks on a regular basis.
AWS offers a business continuity program (media.amazonwebservices.com/AWS_Disaster_Recovery.pdf), and Databricks is designed to run out of multiple regions and multiple availability zones, or data centers.