Databricks Unity - Load data - Native Load Control

Before loading the data from your Source(s) System(s) to your Databricks Unity Target System, please:

You are now ready to load the data.

To load the data natively with Databricks Unity Catalog, you will use a multi-thread approach to load them in parallel.

It is also possible to load data partially from a Data Modeling View.

You can also load the data with an external tool as:

We will explain how to load the data using the possible target environment Databricks Unity.

To load the data:

Open Databricks with the URL provided in the Azure Databricks Service resource:
Databricks is open:
Click on the Workspace menu on the left-hand side:
Expand the folders Workspace > Users > Your_user > artifacts:
Open the LoadControl/XXX_MultithreadingLoadExecution.ipynb file
Select the Compute Cluster (Demo Cluster in our example)
Execute all the steps

The data were loaded:
- You should have the target Parquet files created for each Target Model Object, for example, for the Stage CreditCard:
You can check the load:
- Open the Helpers/XXX_SelectResults.ipynb file
- Run it
- A resume of the number of rows loaded for each Target Model Object is displayed, for example:

You can now check that your data was correctly loaded with the following script:

%sql
select * from `rawvault`.`rdv_hub_creditcard_hub`;

And see the content of your Target Parquet file:

It is possible to generate, deploy, and then load data partially based on the Dataflow Modeling Views content.

All the previous steps are similar to the complete Project:

You use the generated code only for the View.

If you organized your Project Model Objects in different Dataflow Modeling Views, you can:

To load the data for all the Views in a Databricks Job:

A Model Object must be in only one View
If a Model Object needs to be loaded after another, it must be in the same View or in a View loaded after the first one

Example: 3 View CreditCard, Customer, and Order. Order contains Model Objects that depend on Model Objects in the Customer and CreditCard Views.

The View flow is:

For each Data Modeling View:

Generate for a View rather than the overall Project
Deploy the partial generation
- Deploy the load control only once (with the first deployed View, for example)

Then, create a Databricks Job:

Open Databricks with the URL provided in the Azure Databricks Service resource:
Databricks is open:
Click on the Jobs & Pipelines menu on the left-hand side:
Click on Create new > Job:
Add the first task by clicking on Notebook:
Fill in:
- Task name: Load_View_XXX (XXX is the name of the first View to load)
- Type: keep Notebook
- Path: select the notebook XXX_MultithreadingLoadExecution.ipynb for the corresponding view
- Compute: select your compute
Click on Create task
Create the other tasks:
- Click on Add task
- Choose Notebook:
- Fill in similar information as for the first task
- Configure the Depends on:
  
  You can let the Depends on empty if there is no dependency on another view load.
In our example, the complete Job looks like the following:
Click on the Run now button:
The data load starts:

Databricks Unity - Load data - Native Load Control - 2.0