Azure Synapse - Load data - Native Load Control - 1.9
Before loading the data from your Source(s) System(s) to Azure Synapse, please:
You are now ready to load the data.
There is one possibility to load the data natively with Azure Synapse:
- In parallel with a multi-thread approach
Load the data
We will explain how to load the data using the possible target environment: Azure Synapse.
To load the data:
- Open Azure Synapse Analytics with the Workspace web URL provided in the Azure Synapse Workspace resource:
- Azure Synapse Analytics is opened:
- Click on the Develop menu on the left-hand side:
- Open the 500_Deploy_and_Load_Synapse.ipynb file, which contains four steps:
- %run XXX_Deployment: Deploy the code
- %run XXX_Deployment_LoadControl.ipynb: Deploy the load control
- %run XXX_MultithreadingLoadExecution: Load the data in parallel with a multi-thread approach
- %run XXX_SelectResults: Display the results
- Select the Apache Spark Pool (bgaasspark33v2 in our example).
- Execute steps 3 then step 4
- The data were loaded:
- You should have the target Parquet files created for each Target Model Object, for example, for the Stage CreditCard:
- Step 3 displayed a resume of the number of rows loaded for each Target Model Object, for example:
- You should have the target Parquet files created for each Target Model Object, for example, for the Stage CreditCard:
You can now check that your data was correctly loaded with the following script:
--Create a new step with the following code:
mydf = spark.sql("select * from `rawvault`.`rdv_hub_creditcard_hub`")
mydf.show(truncate = False)
And see the content of your Target Parquet file: