connectors

No menu items for this category
OpenMetadata Documentation
Databricks

Databricks

PROD
Available In
Feature List
Metadata
Query Usage
Lineage
Column-level Lineage
Data Profiler
Data Quality
dbt
Sample Data
Reverse Metadata (Collate Only)
Auto-Classification
Stored Procedures
Tags
Owners

In this section, we provide guides and references to use the Databricks connector.

Configure and schedule Databricks metadata and profiler workflows from the OpenMetadata UI:

To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. If you want to install it manually in an already existing Airflow host, you can follow this guide.

If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check the following docs to run the Ingestion Framework in any orchestrator externally.

To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment.

If, instead, you want to manage your workflows externally on your preferred orchestrator, you can check the following docs to run the Ingestion Framework anywhere.

We have support for Python versions 3.9-3.11

To run the Databricks ingestion, you will need to install:

Databricks connector supports three authentication methods:

  1. Personal Access Token (PAT): Generated Personal Access Token for Databricks workspace authentication.
  2. Databricks OAuth (Service Principal): OAuth2 Machine-to-Machine authentication using a Service Principal.
  3. Azure AD Setup: Specifically for Azure Databricks workspaces that use Azure Active Directory for identity management. Uses Azure Service Principal authentication through Azure AD.

The required permissions vary based on the authentication method used:

When using PAT, the token inherits the permissions of the user who created it. Ensure the user has:

For Service Principal authentication, grant permissions to the Service Principal:

If you are using unity catalog in Databricks, then checkout the Unity Catalog connector.

Click Settings in the side navigation bar and then Services.

The first step is to ingest the metadata from your sources. To do that, you first need to create a Service connection first.

This Service will be the bridge between OpenMetadata and your source system.

Once a Service is created, it can be used to configure your ingestion workflows.

Visit Services Page

Select your Service Type and Add a New Service

Click on Add New Service to start the Service creation.

Create a new Service

Add a new Service from the Services page

Select Databricks as the Service type and click Next.

Select Service

Select your Service from the list

Provide a name and description for your Service.

OpenMetadata uniquely identifies Services by their Service Name. Provide a name that distinguishes your deployment from other Services, including the other Databricks Services that you might be ingesting metadata from.

Note that when the name is set, it cannot be changed.

Add New Service

Provide a Name and description for your Service

In this step, we will configure the connection settings required for Databricks.

Please follow the instructions below to properly configure the Service to read from your sources. You will also find helper documentation on the right-hand side panel in the UI.

Configure Service connection

Configure the Service connection by filling the form