The Generated Artifacts zip file contains files with placeholders for the target environment parameters.
You should replace all the placeholders to deploy your Databricks Unity Stage File Target System.
Replace Placeholders
A toolkit is provided in the generated artifacts to replace placeholders.
It is composed of a Powershell script to execute: replace_placeholders.ps1.
For executing commands with Powershell later in this article, please use at least version 7 provided by Microsoft. If you would like to install the latest version, please check here.
The process to follow is:
- Update the replacement_config.json file with your values:
- Insert the value to replace into the value node
- Each database_name placeholder should contain the database name in your Unity Catalog
- In our example, it is datalakehouse
- Each schema_name placeholder should contain the schema name in your Unity Catalog
- In our example, it is docu_stage
- Each file_path placeholder should contain the path where the Source Parquet files are available.
- Suppose you are using our Databricks environment example. In that case, the path is the one inside the Storage Account chosen to store the Source Parquet files
abfss://source@bgdatabrickslandingzone1.dfs.core.windows.net/adventureworks- source is the name of the folder we created inside the Storage Account
- bgdatabrickslandingzone1 is the name of the Storage Account itself
- adventureworks is the name of the subfolder we created inside the source folder
- Suppose you are using our Databricks environment example. In that case, the path is the one inside the Storage Account chosen to store the Source Parquet files
- Each database_name placeholder should contain the database name in your Unity Catalog
- Insert the value to replace into the value node
It would be best to use the same path for all the file_path placeholders. Each Parquet source file is stored in an uppercase folder with the same name. This folder name will be generated automatically. Please don't use it in the placeholder replacement.
All the paths should have a slash appended at the end: abfss://docu-datalake@bgdatabricksdatalake1.dfs.core.windows.net/stage/
If you are using Airflow as a Load Control environment, please update the following placeholders:
- databricks_job#databricks_job#user_name: it must be your databricks user name
- databricks_job#databricks_job#notebook_task_path: it must be the location of the Jupyter Notebooks into Databricks
- In our example: /Workspace/Users/<username>/Artifacts_DVDM/
- databricks_job#databricks_job#existing_cluster_id: it must be the ID of the personal compute to use. To find it:
- Open Databricks
- Click on the Compute menu on the left-hand-side:
- Open the Personal Compute created in your Databricks environment
- Click on the More ... button:
- Select the View JSON option
- The Compute ID is available on the top of the JSON:
Replace the placeholders in the files:
- Open Windows Powershell (or equivalent) in the replace_placeholders.ps1 location:
- Execute the following command :
-
.\replace_placeholders.ps1
- You should have a similar result:
- The configured values in all generated artifacts replaced the placeholders
- You can now use these files and deploy your Target system
Some parameters can be added to the replace_placeholders.ps1 command.
All are described and available by executing:
.\replace_placeholders.ps1 -help
The -ReplacementConfigPath parameter mainly permits using a replacement_config.json file in another path. It is beneficial when you are in a development mode for your project.
Example of usage:
.\replace_placeholders.ps1 -ReplacementConfigPath "C:\TEMP\Replacement config files\replacement_config_