Databricks Unity - Placeholder values

Please consult the article Replace the placeholders in the Artifacts to learn how to replace the placeholders with the values explained below.

Placeholder values

Depending on the Generator Configuration you are using for a Databricks Unity Target Technology, you can replace the placeholder values with the following:

database_name

Each database_name placeholder should contain the database name in your Unity Catalog.

If you are using a Linked Project (from a Stage JDBC or a Stage File Project), it should contain the database name of the source table created before with the Linked Project in your Unity Catalog.

database_directory

Each database_directory placeholder should contain the path to create the Target Parquet files.

If you are using our Databricks environment example, the path is the one inside the Storage Account chosen to store the Target Data Lake
abfss://docu-datalake@bgdatabricksdatalake1.dfs.core.windows.net/rawvault/ for the Raw Vault layer placeholder
docu-datalake is the name of the folder we created inside the Storage Account
bgdatabricksdatalake1 is the name of the Storage Account itself

All the paths should have a slash appended at the end: abfss://docu-datalake@bgdatabricksdatalake1.dfs.core.windows.net/rawvault/

schema_name

Each schema_name placeholder should contain the schema name in your Unity Catalog

In our example, it is docu_rawvault, docu_businessvault and docu_mart.

If you are using a Linked Project (from a Stage JDBC or a Stage File Project), it should contain the schema name of the source table created before with the Linked Project in your Unity Catalog.

In our example, it is docu_stage

execution_dependencies_file_path

If you plan to use a native loaded control using multi-threading load, please update the execution_dependencies_file_path placeholder with the Databricks path that contains the generated dependencies file XXXX_dependencies.json
In our example, it is file:/Workspace/Users/[UserName]/StageFile/bg_databricksunitystagefiles_dependencies.json

It corresponds to the full path of the file in Databricks prefixed by "file:"

file_path

Each file_path placeholder should contain the path where the Source Parquet files are available.

Suppose you are using our Databricks environment example. In that case, the path is the one inside the Storage Account chosen to store the Source Parquet files
abfss://source@bgdatabrickslandingzone1.dfs.core.windows.net/adventureworks

source is the name of the folder we created inside the Storage Account
bgdatabrickslandingzone1 is the name of the Storage Account itself
adventureworks is the name of the subfolder we created inside the source folder

It would be best to use the same path for all the file_path placeholders. Each Parquet source file is stored in an uppercase folder with the same name. This folder name will be generated automatically. Please don't use it in the placeholder replacement.

All the paths should have a slash appended at the end: abfss://docu-datalake@bgdatabricksdatalake1.dfs.core.windows.net/stage/

jdbc_url

For a SQL Server JDBC source, for example: jdbc:sqlserver://{ServerName}.database.windows.net:1433;encrypt=true;databaseName={DatabaseName};trustServerCertificate=true;

Replace:

{ServerName} with your Azure SQL Server name
{DatabaseName} with your SQL Server database name.

jdbc_driver

For a SQL Server JDBC source, for example:

com.microsoft.sqlserver.jdbc.SQLServerDriver

jdbc_user

For a SQL Server JDBC source, for example:

'{DatabaseUser}'

Replace {DatabaseUser} with your SQL Server user.

Putting the user inside quotes 'user_name' is essential.

Or you can use KeyVault to store a secret.

In this case, the syntax should be:
dbutils.secrets.get(scope='jdbc', key='<name of the secret>')

jdbc_password

For a SQL Server JDBC source, for example:

'{DatabasePassword}'

Replace {DatabasePassword} with your SQL Server password

Putting the password inside quotes 'password' is essential.

user_name

If you are using Airflow as a Load Control environment, databricks_job#databricks_job#user_name must be your databricks user name.

notebook_task_path

If you are using Airflow as a Load Control environment, databricks_job#databricks_job#notebook_task_path must be the location of the Jupyter Notebooks into Databricks.

In our example: /Workspace/Users/<username>/Artifacts_DVDM/

existing_cluster_id

If you are using Airflow as a Load Control environment databricks_job#databricks_job#existing_cluster_id must be the ID of the cluster to use.

To find it:

Open Databricks
Click on the Compute menu on the left-hand-side:
Open the Cluster created in your Databricks environment
Click on the More ... button:
Select the View JSON option
The Cluster ID is available on the top of the JSON:

You can download a replacement_config.json example here for:

Databricks Unity - Placeholder values - 1.7

Placeholder values

database_name

database_directory

schema_name

execution_dependencies_file_path

file_path

jdbc_url

jdbc_driver

jdbc_user

jdbc_password

user_name

notebook_task_path

existing_cluster_id