Databricks Unity - Placeholder values - 1.7
Please consult the article Replace the placeholders in the Artifacts to learn how to replace the placeholders with the values explained below.
Placeholder values
Depending on the Generator Configuration you are using for a Databricks Unity Target Technology, you can replace the placeholder values with the following:
database_name
Each database_name placeholder should contain the database name in your Unity Catalog.
If you are using a Linked Project (from a Stage JDBC or a Stage File Project), it should contain the database name of the source table created before with the Linked Project in your Unity Catalog.
database_directory
Each database_directory placeholder should contain the path to create the Target Parquet files.
If you are using our Databricks environment example, the path is the one inside the Storage Account chosen to store the Target Data Lake
abfss://docu-datalake@bgdatabricksdatalake1.dfs.core.windows.net/rawvault/ for the Raw Vault layer placeholder
docu-datalake is the name of the folder we created inside the Storage Account
bgdatabricksdatalake1 is the name of the Storage Account itself
All the paths should have a slash appended at the end: abfss://docu-datalake@bgdatabricksdatalake1.dfs.core.windows.net/rawvault/
schema_name
Each schema_name placeholder should contain the schema name in your Unity Catalog
In our example, it is docu_rawvault, docu_businessvault and docu_mart.
If you are using a Linked Project (from a Stage JDBC or a Stage File Project), it should contain the schema name of the source table created before with the Linked Project in your Unity Catalog.
In our example, it is docu_stage
execution_dependencies_file_path
If you plan to use a native loaded control using multi-threading load, please update the execution_dependencies_file_path placeholder with the Databricks path that contains the generated dependencies file XXXX_dependencies.json
In our example, it is file:/Workspace/Users/[UserName]/StageFile/bg_databricksunitystagefiles_dependencies.json
It corresponds to the full path of the file in Databricks prefixed by "file:"
file_path
Each file_path placeholder should contain the path where the Source Parquet files are available.
Suppose you are using our Databricks environment example. In that case, the path is the one inside the Storage Account chosen to store the Source Parquet files
abfss://source@bgdatabrickslandingzone1.dfs.core.windows.net/adventureworks
- source is the name of the folder we created inside the Storage Account
- bgdatabrickslandingzone1 is the name of the Storage Account itself
- adventureworks is the name of the subfolder we created inside the source folder
It would be best to use the same path for all the file_path placeholders. Each Parquet source file is stored in an uppercase folder with the same name. This folder name will be generated automatically. Please don't use it in the placeholder replacement.
All the paths should have a slash appended at the end: abfss://docu-datalake@bgdatabricksdatalake1.dfs.core.windows.net/stage/
jdbc_url
For a SQL Server JDBC source, for example: jdbc:sqlserver://{ServerName}.database.windows.net:1433;encrypt=true;databaseName={DatabaseName};trustServerCertificate=true;
Replace:
- {ServerName} with your Azure SQL Server name
- {DatabaseName} with your SQL Server database name.
jdbc_driver
For a SQL Server JDBC source, for example:
com.microsoft.sqlserver.jdbc.SQLServerDriver
jdbc_user
For a SQL Server JDBC source, for example:
'{DatabaseUser}'
Replace {DatabaseUser} with your SQL Server user.
Putting the user inside quotes 'user_name' is essential.
Or you can use KeyVault to store a secret.
In this case, the syntax should be:
dbutils.secrets.get(scope='jdbc', key='<name of the secret>')
jdbc_password
For a SQL Server JDBC source, for example:
'{DatabasePassword}'
Replace {DatabasePassword} with your SQL Server password
Putting the password inside quotes 'password' is essential.
user_name
If you are using Airflow as a Load Control environment, databricks_job#databricks_job#user_name must be your databricks user name.
notebook_task_path
If you are using Airflow as a Load Control environment, databricks_job#databricks_job#notebook_task_path must be the location of the Jupyter Notebooks into Databricks.
In our example: /Workspace/Users/<username>/Artifacts_DVDM/
existing_cluster_id
If you are using Airflow as a Load Control environment databricks_job#databricks_job#existing_cluster_id must be the ID of the cluster to use.
To find it:
- Open Databricks
- Click on the Compute menu on the left-hand-side:
- Open the Cluster created in your Databricks environment
- Click on the More ... button:
- Select the View JSON option
- The Cluster ID is available on the top of the JSON:
You can download a replacement_config.json example here for: