Databricks Unity - Placeholder values - 1.9
Please consult the article Replace the placeholders in the Artifacts to learn how to replace the placeholders with the values explained below.
Placeholder values
Depending on the Generator Configuration you are using for a Databricks Unity Target Technology, you can replace the placeholder values with the following:
database_name
Each database_name placeholder should contain the database name in your Unity Catalog.
If you are using a Linked Project (from a Stage JDBC or a Stage File Project), it should contain the database name of the source table created before with the Linked Project in your Unity Catalog.
schema_name
Each schema_name placeholder should contain the schema name in your Unity Catalog
In our example, it is docu_rawvault, docu_businessvault and docu_mart.
If you are using a Linked Project (from a Stage JDBC or a Stage File Project), it should contain the schema name of the source table created before with the Linked Project in your Unity Catalog.
In our example, it is docu_stage
file_path
Each file_path placeholder should contain the path where the Source Parquet files are available.
Suppose you are using our Databricks environment example. In that case, the path is the one inside the Storage Account chosen to store the Source Parquet files
abfss://source@bgdatabrickslandingzone1.dfs.core.windows.net/adventureworks
- source is the name of the folder we created inside the Storage Account
- bgdatabrickslandingzone1 is the name of the Storage Account itself
- adventureworks is the name of the subfolder we created inside the source folder
It would be best to use the same path for all the file_path placeholders. Each Parquet source file is stored in an uppercase folder with the same name. This folder name will be generated automatically. Please don't use it in the placeholder replacement.
All the paths should have a slash appended at the end: abfss://docu-datalake@bgdatabricksdatalake1.dfs.core.windows.net/stage/
jdbc_url
For a SQL Server JDBC source, for example: jdbc:sqlserver://{ServerName}.database.windows.net:1433;encrypt=true;databaseName={DatabaseName};trustServerCertificate=true;
Replace:
- {ServerName} with your Azure SQL Server name
- {DatabaseName} with your SQL Server database name.
jdbc_driver
For a SQL Server JDBC source, for example:
com.microsoft.sqlserver.jdbc.SQLServerDriver
jdbc_user
For a SQL Server JDBC source, for example:
'{DatabaseUser}'
Replace {DatabaseUser} with your SQL Server user.
Putting the user inside quotes 'user_name' is essential.
Or you can use KeyVault to store a secret.
In this case, the syntax should be:
dbutils.secrets.get(scope='jdbc', key='<name of the secret>')
jdbc_password
For a SQL Server JDBC source, for example:
'{DatabasePassword}'
Replace {DatabasePassword} with your SQL Server password
Putting the password inside quotes 'password' is essential.
loader_folder_path
For a load, you can specify where the loader Jupyter Notebooks are located.
It is ../Jupyter/ by default.
user_name
If you are using Airflow as a Load Control environment, databricks_job#databricks_job#user_name must be your Databricks user name.
email_notification_on_failure
If you are using Airflow as a Load Control environment, databricks_job#databricks_job#email_notification_on_failure can contain the email where to send the notification if a load fails.
notebook_task_path
If you are using Airflow as a Load Control environment, databricks_job#databricks_job#notebook_task_path must be the location of the Jupyter Notebooks in Databricks.
In our example: /Workspace/Users/<username>/Artifacts_DVDM/
existing_cluster_id
If you are using Airflow as a Load Control environment databricks_job#databricks_job#existing_cluster_id must be the ID of the cluster to use.
To find it:
- Open Databricks
- Click on the Compute menu on the left-hand side:
- Open the Cluster created in your Databricks environment
- Click on the More ... button:
- Select the View JSON option
- The Cluster ID is available on the top of the JSON:
log_table_name
For logging, you can choose the table name that will contain the logs.
It is execution_log by default.
application_name
For logging, you can fill application name to easily identify the logs concerning your Project in Databricks.
application_environment_name
For logging, you can fill application environment name to easily identify the logs concerning your Project in Databricks, depending on the target environment (development, integration, production...).
It is DEV by default.
You can download a replacement_config.json example here for:
- Databricks Unity Stage File
- Databricks Unity Stage JDBC
- Databricks Unity Data Store
- Databricks Unity Dimensional and Mart
- This example is similar for Databricks Unity Stage, Unity Dimensional, and Unity Datamart
- Databricks Unity DataVault and Mart
- This example is similar for Databricks Unity Stage, Unity DataVault, and Unity Datamart