Databricks - Deployment - 1.4

Before deploying the Generated Artifacts, please:

You should now have the Notebook files inside Databricks ready for deployment.

Deploy the generated Artifacts

We will explain how to deploy the generated artifacts using the possible target environment Databricks.

To deploy the generated Artifacts:

  • Open Databricks with the URL provided in the Azure Databricks Service resource:
  • Databricks is opened:
  • Click on the Workspace menu on the left-hand-side:
  • Expand the folders Workspace > Users > Your_user > artifacts:
  • Open the 500_Deploy_and_Load_DataVault_Databricks.ipynb file, which contains three steps:
    1. %run ./Deployment.ipynb: Deploy the code
    2. %run ./DocumentationSparkDataVault_SimpleLoadExecution.ipynb: Load the data
    3. %run ./DocumentationSparkDataVault_SelectResults.ipynb: Display the results
  • Select the Compute Cluster (Demo Cluster in our example)
  • Execute step 1

  • The deployment is done:
    • Open Azure Storage Explorer
    • You should have new folders, one by layer, created in your Target folder inside the Target Storage Account:
    • Each layer folder contains one folder per Target Model Object, for example, for the stage layer:
    • Each Target Model Object folder contains a folder delta_log with the needed files to be able to load the date later:

You can now load the data.

If you want to start a new deployment, please:

  • Clear the Target Data Lake folder (claire-data-lake in our example).
  • If you want to change the Target Data Lake folder, please:
    • Replace the placeholders with the new folder
    • Delete the existing databases because they are linked to the previous folder by executing the following code in a Notebook:
%sql
DROP DATABASE `stage` CASCADE;
DROP DATABASE `rawvault` CASCADE;
DROP DATABASE `businessvault` CASCADE;