Databricks Unity - Placeholder values

Please consult the article Replace the placeholders in the Artifacts to learn how to replace the placeholders with the values explained below.

Placeholder values

Depending on the Generator Configuration you are using for a Databricks Unity Target Technology, you can replace the placeholder values with the following:

database_name

Each database_name placeholder should contain the database name in your Unity Catalog.

If you are using a Linked Project (from a Stage JDBC or a Stage File Project), it should contain the database name of the source table created before with the Linked Project in your Unity Catalog.

schema_name

Each schema_name placeholder should contain the schema name in your Unity Catalog

In our example, it is docu_rawvault, docu_businessvault and docu_mart.

If you are using a Linked Project (from a Stage JDBC or a Stage File Project), it should contain the schema name of the source table created before with the Linked Project in your Unity Catalog.

In our example, it is docu_stage

file_path

Each file_path placeholder should contain the path where the Source Parquet files are available.

Suppose you are using our Databricks environment example. In that case, the path is the one inside the Storage Account chosen to store the Source Parquet files
abfss://source@bgdatabrickslandingzone1.dfs.core.windows.net/adventureworks

source is the name of the folder we created inside the Storage Account
bgdatabrickslandingzone1 is the name of the Storage Account itself
adventureworks is the name of the subfolder we created inside the source folder

It would be best to use the same path for all the file_path placeholders. Each Parquet source file is stored in an uppercase folder with the same name. This folder name will be generated automatically. Please don't use it in the placeholder replacement.

All the paths should have a slash appended at the end: abfss://docu-datalake@bgdatabricksdatalake1.dfs.core.windows.net/stage/

jdbc_url

For a SQL Server JDBC source, for example: jdbc:sqlserver://{ServerName}.database.windows.net:1433;encrypt=true;databaseName={DatabaseName};trustServerCertificate=true;

Replace:

{ServerName} with your Azure SQL Server name
{DatabaseName} with your SQL Server database name.

jdbc_driver

For a SQL Server JDBC source, for example:

com.microsoft.sqlserver.jdbc.SQLServerDriver

jdbc_user

For a SQL Server JDBC source, for example:

'{DatabaseUser}'

Replace {DatabaseUser} with your SQL Server user.

Putting the user inside quotes 'user_name' is essential.

Or you can use KeyVault to store a secret.

In this case, the syntax should be:
dbutils.secrets.get(scope='jdbc', key='<name of the secret>')

jdbc_password

For a SQL Server JDBC source, for example:

'{DatabasePassword}'

Replace {DatabasePassword} with your SQL Server password

Putting the password inside quotes 'password' is essential.

loader_folder_path

For a load, you can specify where the loader Jupyter Notebooks are located.

It is ../Jupyter/ by default.

user_name

If you are using Airflow as a Load Control environment, databricks_job#databricks_job#user_name must be your Databricks user name.

email_notification_on_failure

If you are using Airflow as a Load Control environment, databricks_job#databricks_job#email_notification_on_failure can contain the email where to send the notification if a load fails.

notebook_task_path

If you are using Airflow as a Load Control environment, databricks_job#databricks_job#notebook_task_path must be the location of the Jupyter Notebooks in Databricks.

In our example: /Workspace/Users/<username>/Artifacts_DVDM/

existing_cluster_id

If you are using Airflow as a Load Control environment databricks_job#databricks_job#existing_cluster_id must be the ID of the cluster to use.

To find it:

Open Databricks
Click on the Compute menu on the left-hand side:
Open the Cluster created in your Databricks environment
Click on the More ... button:
Select the View JSON option
The Cluster ID is available on the top of the JSON:

log_table_name

For logging, you can choose the table name that will contain the logs.

It is execution_log by default.

application_name

For logging, you can fill application name to easily identify the logs concerning your Project in Databricks.

application_environment_name

For logging, you can fill application environment name to easily identify the logs concerning your Project in Databricks, depending on the target environment (development, integration, production...).

It is DEV by default.

You can download a replacement_config.json example here for:

Databricks Unity Stage File
Databricks Unity Stage JDBC
Databricks Unity Data Store
Databricks Unity Dimensional and Mart
- This example is similar for Databricks Unity Stage, Unity Dimensional, and Unity Datamart
Databricks Unity DataVault and Mart
- This example is similar for Databricks Unity Stage, Unity DataVault, and Unity Datamart

Databricks Unity - Placeholder values - 1.9

Placeholder values

database_name

schema_name

file_path

jdbc_url

jdbc_driver

jdbc_user

jdbc_password

loader_folder_path

user_name

email_notification_on_failure

notebook_task_path

existing_cluster_id

log_table_name

application_name

application_environment_name