1. biGENIUS-X Help Center
  2. Generators
  3. Load data with a native load control

Databricks - Load data - 1.5

Before loading the data from your Source(s) System(s) to your Databricks Target System, please:

You are now ready to load the data.

There are two possibilities to load the data natively with Databricks:

  • Sequentially on a single thread
  • In parallel with a multi-thread approach

You can also load the data with an external tool as:

Load the data

We will explain how to load the data using the possible target environment: Databricks.

To load the data:

  • Open Databricks with the URL provided in the Azure Databricks Service resource:
  • Databricks is opened:
  • Click on the Workspace menu on the left-hand-side:
  • Expand the folders Workspace > Users > Your_user > artifacts:
  • Open the 500_Deploy_and_Load_DataVault_Databricks.ipynb file, which contains three steps:
    1. %run ./XXX_Deployment.ipynb: Deploy the code
    2. %run ./XXX_SimpleLoadExecution.ipynb: Load the data sequentially in a single thread
    3. %run ./XXX_MultithreadingLoadExecution: Load the data in parallel with a multi-thread approach
    4. %run ./XXX_SelectResults.ipynb: Display the results
  • Select the Compute Cluster (Demo Cluster in our example)
  • Execute (steps 2 or step 3), then step 4
  • The data were loaded:
    • You should have the target Parquet files created for each Target Model Object, for example, for the Stage CreditCard:
    • Step 3 displayed a resume of the number of rows loaded for each Target Model  Object, for example:

    You can now check that your data were correctly loaded with the following script:

    %sql
    select * from `rawvault`.`rdv_hub_creditcard_hub`;

    And see the content of your Target Parquet file: