Skip to content
  • There are no suggestions because the search field is empty.

Fabric Lakehouse - Load data - Native Load Control - 2.0

Before loading the data from your Source(s) System(s) to your Fabric Lakehouse Target System, please:

You are now ready to load the data.

To load the data natively with Fabric Lakehouse, you will use a multi-thread approach to load them in parallel.

It is also possible to load data partially from a Data Modeling View.

Load the data

We will explain how to load the data using the possible target environment for Fabric Lakehouse.

To load the data:

  • Open Microsoft Fabric at https://app.fabric.microsoft.com/
  • Open the Workspace where you deployed the artifacts by clicking on the Workspace option on the left menu:
  • Then, on the desired Workspace (bgfabricdvdm in our example):
  • Open the LoadControl/XXX_MultithreadingLoadExecution.ipynb file
  • Choose the Lakehouse by:
    • Clicking on the Lakehouses menu on the left:
    • Clicking on the Add button:
    • Selecting Existing lakehouse:
    • Selecting the Lake House previously created (docu_bglakehouse in our example):
  • Execute all the steps
  • The data were loaded:
    • You should have the target Parquet files created for each Target Model Object, for example, for the Stage CreditCard:
    • Step 3 displayed a resume of the number of rows loaded for each Target Model  Object, for example:

You can now check that your data was correctly loaded with the following script:

--Create a new step with the following code:
mydf = spark.sql("select * from `rdv_hub_customer_hub_result`")
mydf.show(truncate = False)

And see the content of your Target Parquet file:

Load the data partially

It is possible to generate, deploy, and then load data partially based on the Dataflow Modeling Views content.

All the previous steps are similar to the complete Project:

You use the generated code only for the View.

Load all the data from partial deployments

If you organized your Project Model Objects in different Dataflow Modeling Views, you can:

To load the data for all the Views in a Data Factory Pipeline:

  • A Model Object must be in only one View
  • If a Model Object needs to be loaded after another, it must be in the same View or in a View loaded after the first one

Example: 3 View CreditCard, Customer, and Order. Order contains Model Objects that depend on Model Objects in the Customer and CreditCard Views.

The View flow is:

For each Data Modeling View:

  • Generate for a View rather than the overall Project
  • Deploy the partial generation
    • Deploy the load control only once (with the first deployed View, for example)

Then, create a Data Factory Pipeline:

  • Open Microsoft Fabric at https://app.fabric.microsoft.com/
  • Open the Workspace where you deployed the artifacts by clicking on the Workspace option on the left menu:
  • Click on the New item button:
  • Search for Pipeline:
  • Fill in a name:
  • Start with a blank canvas:
  • Search for Notebook:
     
    • General tab: 
      • Name: Load_View_XXX (XXX is the name of the first View to load)
    • Settings tab:
      • Notebook: Select the notebook XXX_MultithreadingLoadExecution.ipynb for the corresponding view
  • Create the other Activities:
    • Activities tab, click on Notebook:
    • Fill in similar information as for the first task
  • Link activities:
    • Hover over the source Activity, drag the green icon(On Success) to the target Activity:
    • Do it for each dependency
  • In our example, the complete Pipeline looks like the following:
  • For each Notebook used in the pipeline:
    • Open it
    • Choose the Lakehouse by:
      • Clicking on the Lakehouses menu on the left:
      • Clicking on the Add button:
      • Selecting Existing lakehouse:
      • Selecting the Lake House previously created (docu_bglakehouse in our example):
  • Click on the Run now button:
  • The data load starts: