Skip to main content

Microsoft Fabric Lakehouse setup

profiles.yml file is for dbt Core users only

If you're using dbt Cloud, you don't need to create a profiles.yml file. This file is only for dbt Core users. To connect your data platform to dbt Cloud, refer to About data platforms.

Below is a guide for use with Fabric Data Engineering, a new product within Microsoft Fabric. This adapter currently supports connecting to a lakehouse endpoint.

To learn how to set up dbt using Fabric Warehouse, refer to Microsoft Fabric Data Warehouse.

  • Maintained by: Microsoft
  • Authors: Microsoft
  • GitHub repo: microsoft/dbt-fabricspark
  • PyPI package: dbt-fabricspark
  • Slack channel: db-fabric-synapse
  • Supported dbt Core version: v1.7 and newer
  • dbt Cloud support: Not supported
  • Minimum data platform version: n/a

Installing dbt-fabricspark

Use pip to install the adapter. Before 1.8, installing the adapter would automatically install dbt-core and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install dbt-core. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation:

Configuring dbt-fabricspark

For Microsoft Fabric-specific configuration, please refer to Microsoft Fabric configs.

For further info, refer to the GitHub repository: microsoft/dbt-fabricspark

Connection methods

dbt-fabricspark can connect to Fabric Spark runtime using Fabric Livy API method. The Fabric Livy API allows submitting jobs in two different modes:

  • session-jobs A Livy session job entails establishing a Spark session that remains active throughout the spark session. A spark session, can run multiple jobs (each job is an action), sharing state and cached data between jobs.
  • batch jobs entails submitting a Spark application for a single job execution. In contrast to a Livy session job, a batch job doesn't sustain an ongoing Spark session. With Livy batch jobs, each job initiates a new Spark session that ends when the job finishes.
Supported mode

To share the session state among jobs and reduce the overhead of session management, dbt-fabricspark adapter supports only session-jobs mode.

session-jobs

session-jobs is the preferred method when connecting to Fabric Lakehouse.

~/.dbt/profiles.yml
your_profile_name:
target: dev
outputs:
dev:
type: fabricspark
method: livy
authentication: CLI
endpoint: https://api.fabric.microsoft.com/v1
workspaceid: [Fabric Workspace GUID]
lakehouseid: [Lakehouse GUID]
lakehouse: [Lakehouse Name]
schema: [Lakehouse Name]
spark_config:
name: [Application Name]
# optional
archives:
- "example-archive.zip"
conf:
spark.executor.memory: "2g"
spark.executor.cores: "2"
tags:
project: [Project Name]
user: [User Email]
driverMemory: "2g"
driverCores: 2
executorMemory: "4g"
executorCores: 4
numExecutors: 3
# optional
connect_retries: 0
connect_timeout: 10
retry_all: true

Optional configurations

Retries

Intermittent errors can crop up unexpectedly while running queries against Fabric Spark. If retry_all is enabled, dbt-fabricspark will naively retry any queries that fails, based on the configuration supplied by connect_timeout and connect_retries. It does not attempt to determine if the query failure was transient or likely to succeed on retry. This configuration is recommended in production environments, where queries ought to be succeeding. The default connect_retries configuration is 2.

For instance, this will instruct dbt to retry all failed queries up to 3 times, with a 5 second delay between each retry:

~/.dbt/profiles.yml
retry_all: true
connect_timeout: 5
connect_retries: 3

Spark configuration

Spark can be customized using Application Properties. Using these properties the execution can be customized, for example, to allocate more memory to the driver process. Also, the Spark SQL runtime can be set through these properties. For example, this allows the user to set a Spark catalogs.

Supported functionality

Most dbt Core functionality is supported, Please refer to Delta Lake interoporability.

Delta-only features:

  1. Incremental model updates by unique_key instead of partition_by (see merge strategy)
  2. Snapshots
  3. Persisting column-level descriptions as database comments

Limitations

  1. Lakehouse schemas are not supported. Refer to limitations
  2. Service Principal Authentication is not supported yet by Livy API.
  3. Only Delta, CSV & Parquet table data formats are supported by Fabric Lakehouse.
0