Databricks Lakehouse

Availability: Core Standard Plus Pro Enterprise Flex Self-Managed Enterprise PyAirbyte
Support Level: Airbyte
Connector Version: 4.0.0 (last updated a day ago)
Sync Success Rate: High
Usage Rate: High
Definition ID: 072d5540-f236-4294-ba7c-ade8fd918496

Direct Load

Starting with version 4.0.0, the Databricks Lakehouse destination uses Direct Load architecture. This means data is written directly to final tables without using intermediate raw tables, providing improved performance and reduced storage costs.

For migration details and backward compatibility options, see the Databricks Migration Guide.

Prerequisites

A Databricks workspace with Unity Catalog enabled.
A SQL warehouse or compute cluster to run queries against.
Authentication credentials: an OAuth2 client ID and secret (recommended), or a personal access token.
Acceptance of the Databricks JDBC ODBC driver license. By using this connector, you agree that it may only be used to connect third-party applications to Apache Spark SQL within a Databricks offering using the ODBC and/or JDBC protocols.

Network access

If you're using Airbyte Cloud and this destination uses IP-based access controls, add Airbyte's IP addresses to your allowlist.

Step 1: Set up Databricks

You will need the following information from your Databricks workspace:

Server Hostname / HTTP Path / Port

Open the workspace console.
Open your SQL warehouse:
Open the Connection Details tab:
Note the Server Hostname, HTTP Path, and Port values.
You will also need the Databricks Unity Catalog Name — the name of the Unity Catalog that contains the database you want to write to. This is not found on the Connection Details tab; look for it in the Databricks workspace sidebar under Catalog.

Authentication

OAuth2 (Recommended)

Create a service principal in your Databricks workspace and generate a client ID and secret.

Personal Access Token

Open your workspace console.
Click on your icon in the top-right corner, and head to settings, then developer, then manage under access tokens
Enter a description for the token and how long it will be valid for (or leave blank for a permanent token):

Step 2: Set up the Databricks destination in Airbyte

Log in to your Airbyte account.
In the left navigation bar, click Destinations. In the top-right corner, click + New destination.
Find and select Databricks Lakehouse from the list of available destinations.
Enter the Server Hostname, HTTP Path, Port, and Databricks Unity Catalog Name from Step 1.
Select your Authentication method and enter the required credentials.
Configure the remaining options:
- Default Schema - The schema that will contain your data. You can later override this on a per-connection basis.
- CDC deletion mode - Whether CDC deletions are propagated as hard deletes (the row is removed) or soft deletes (the row is kept with a tombstone). Defaults to hard delete.
- Purge Staging Files and Tables - Whether to delete staging files after loading them into tables. Disable for debugging.
Click Set up destination.

Supported sync modes

Sync mode	Supported?
Full Refresh - Overwrite	Yes
Full Refresh - Append	Yes
Full Refresh - Overwrite + Deduped	Yes
Incremental Sync - Append	Yes
Incremental Sync - Append + Deduped	Yes

Output schema

Each stream is written directly to a final table in your configured schema. The table includes your data columns plus the following Airbyte metadata columns:

Column	Type	Notes
`_airbyte_raw_id`	`STRING`	A UUID assigned by Airbyte to each processed event.
`_airbyte_extracted_at`	`TIMESTAMP`	Timestamp when the event was pulled from the data source.
`_airbyte_meta`	`STRING`	JSON metadata about the record, including sync information.
`_airbyte_generation_id`	`LONG`	See the refreshes documentation.

Data type map

Airbyte Type	Databricks Type	Notes
`string`	`STRING`
`number`	`DECIMAL(38, 10)`	Max 28 integer digits, 10 fractional
`integer`	`LONG`	64-bit integer
`boolean`	`BOOLEAN`
`object`	`STRING`	Serialized as JSON
`array`	`STRING`	Serialized as JSON
`timestamp_with_timezone`	`TIMESTAMP`	Microsecond precision
`timestamp_without_timezone`	`TIMESTAMP_NTZ`	Microsecond precision, no timezone
`time_with_timezone`	`STRING`	No native Databricks equivalent
`time_without_timezone`	`STRING`	No native Databricks equivalent
`date`	`DATE`

Naming conventions

Schema and table names are lowercased automatically. Databricks treats them as case-insensitive identifiers.
Column names preserve the casing from your source data.
Special characters in identifiers are escaped automatically by the connector.

Namespace support

This destination supports namespaces. The namespace maps to a Databricks schema.

Reference

Config fields reference

Field

Type

Property name

boolean

accept_terms

object

authentication

string

database

string

hostname

string

http_path

string

cdc_deletion_mode

string

port

boolean

purge_staging_data

string

schema

Changelog

Expand to review

Version	Date	Pull Request	Subject
4.0.0	2026-06-29	80951	Major rewrite: upgraded to Direct-Load architecture using the Bulk CDK
3.3.8	2026-03-11	74732	Add JDBC ConnectTimeout and SocketTimeout to prevent indefinite hangs when Databricks SQL warehouse is paused or unresponsive
3.3.7	2025-07-15	63311	Support arbitrary number of streams in findExisitngTable query
3.3.6	2025-03-24	56355	Upgrade to airbyte/java-connector-base:2.0.1 to be M4 compatible.
3.3.5	2025-03-07	55232	fix table name collision multiple connections same schema
3.3.3	2025-01-10	51506	Use a non root base image
3.3.2	2024-12-18	49898	Use a base image: airbyte/java-connector-base:1.0.0
3.3.1	2024-12-02	#48779	bump resource reqs for `check`
3.3.0	2024-09-18	#45438	upgrade all dependencies.
3.2.5	2024-09-12	#45439	Move to integrations section.
3.2.4	2024-09-09	#45208	Fix CHECK to create missing namespace if not exists.
3.2.3	2024-09-03	#45115	Clarify Unity Catalog Name option.
3.2.2	2024-08-22	#44941	Clarify Unity Catalog Path option.
3.2.1	2024-08-22	#44506	Handle uppercase/mixed-case stream name/namespaces
3.2.0	2024-08-12	#40712	Rely solely on PAT, instead of also needing a user/pass
3.1.0	2024-07-22	#40692	Support for refreshes and resumable full refresh. WARNING: You must upgrade to platform 0.63.7 before upgrading to this connector version.
3.0.0	2024-07-12	#40689	(Private release, not to be used for production) Add `_airbyte_generation_id` column, and `sync_id` entry in `_airbyte_meta`
2.0.0	2024-05-17	#37613	(Private release, not to be used for production) Alpha release of the connector to use Unity Catalog
1.1.2	2024-04-04	#36846	(incompatible with CDK, do not use) Remove duplicate S3 Region
1.1.1	2024-01-03	#33924	(incompatible with CDK, do not use) Add new ap-southeast-3 AWS region
1.1.0	2023-06-02	#26942	Support schema evolution
1.0.2	2023-04-20	#25366	Fix default catalog to be `hive_metastore`
1.0.1	2023-03-30	#24657	Fix support for external tables on S3
1.0.0	2023-03-21	#23965	Added: Managed table storage type, Databricks Catalog field
0.3.1	2022-10-15	#18032	Add `SSL=1` to the JDBC URL to ensure SSL connection.
0.3.0	2022-10-14	#15329	Add support for Azure storage.
	2022-09-01	#16243	Fix Json to Avro conversion when there is field name clash from combined restrictions (`anyOf`, `oneOf`, `allOf` fields)
0.2.6	2022-08-05	#14801	Fix multiply log bindings
0.2.5	2022-07-15	#14494	Make S3 output filename configurable.
0.2.4	2022-07-14	#14618	Removed additionalProperties: false from JDBC destination connectors
0.2.3	2022-06-16	#13852	Updated stacktrace format for any trace message errors
0.2.2	2022-06-13	#13722	Rename to "Databricks Lakehouse".
0.2.1	2022-06-08	#13630	Rename to "Databricks Delta Lake" and add field orders in the spec.
0.2.0	2022-05-15	#12861	Use new public Databricks JDBC driver, and open source the connector.
0.1.5	2022-05-04	#12578	In JSON to Avro conversion, log JSON field values that do not follow Avro schema for debugging.
0.1.4	2022-02-14	#10256	Add `-XX:+ExitOnOutOfMemoryError` JVM option
0.1.3	2022-01-06	#7622 #9153	Upgrade Spark JDBC driver to `2.6.21` to patch Log4j vulnerability; update connector fields title/description.
0.1.2	2021-11-03	#7288	Support Json `additionalProperties`.
0.1.1	2021-10-05	#6792	Require users to accept Databricks JDBC Driver Terms & Conditions.
0.1.0	2021-09-14	#5998	Initial private release.

Prerequisites​

Network access​

Step 1: Set up Databricks​

Server Hostname / HTTP Path / Port​

Authentication​

OAuth2 (Recommended)​

Personal Access Token​

Step 2: Set up the Databricks destination in Airbyte​

Supported sync modes​

Output schema​

Data type map​

Naming conventions​

Namespace support​

Reference​

Config fields reference

Changelog​

Prerequisites

Network access

Step 1: Set up Databricks

Server Hostname / HTTP Path / Port

Authentication

OAuth2 (Recommended)

Personal Access Token

Step 2: Set up the Databricks destination in Airbyte

Supported sync modes

Output schema

Data type map

Naming conventions

Namespace support

Reference

Changelog