Microsoft SQL Server - Load data - Native Load Control

Before loading the data from your Source(s) System(s) to your Microsoft SQL Server Target System, please:

Deploy the generated Artifacts

You are now ready to load the data.

There are two possibilities to load the data natively with Microsoft SQL Server:

Sequentially on a single thread
In parallel with a multi-thread approach

With both approaches, you can load several linked Project data in an overall load.

You can also load the data with an external tool as:

Apache Airflow

Load the data sequentially

To load the data sequentially:

Open Powershell
Navigate to your generated artifacts folder:

--Example
cd C:\XXXXXXXXX\20240219144515

Execute the Powershell script helper for load:

.\execute.ps1

If your project name contains spaces, a known bug exists in the execute.sql file, which is called by the execute.ps1 script.
Please open it and add quotes for the path, for example:

:r ".\LoadControl\LoadControl_Execution_SQL Server Data Vault and Mart.sql"
GO

The prompt asks you for the server or instance name:
Enter a server or instance name

If you want to deploy on your laptop's default SQL Server instance, you can leave the value empty; it will use localhost by default.

Enter a username

You can leave the value empty if you want to use Windows Authentication.

If you enter a username, enter the password
You have a similar result:

-------------------------------------------------------------------------------------------------------------------------------
If you want to check the logs of the execution:
- Open SQL Server Management Studio
- Connect to the target database
- Execute the following statement:

--Adapt the execution plan name
EXEC [LC].[Show_Execution] @ExecutionPlan = N'DEV_XXXX';

The logs should look like the following. An essential piece of information is the number of rows inserted:
You can now check that your data were correctly loaded with the following script:

SELECT
SCHEMA_NAME(schema_id) AS [SchemaName],
[Tables].name AS [TableName],
SUM([Partitions].[rows]) AS [TotalRowCount],
Tables.type_desc
FROM sys.tables AS [Tables]
JOIN sys.partitions AS [Partitions] ON [Tables].[object_id] =
[Partitions].[object_id] AND [Partitions].index_id IN (0, 1)
WHERE SCHEMA_NAME(schema_id) NOT IN ('LC') --We exclude the Load Control tables
GROUP BY SCHEMA_NAME(schema_id),[Tables].name, Tables.type_desc
ORDER BY SCHEMA_NAME(schema_id), TotalRowCount DESC;

Load the data with a multi-thread approach

To load the data with a multi-thread approach:

Don't use the Powershell script helper for load execute.ps1
Open SQL Server Management Studio (or any SQL databases management tool)
Open the LoadControl_Execution_XXXX.sql (from the LoadControl folder)
Copy the code in other query tabs as many as you want different threads
Execute the code from all the tabs at the same time

If you want to check the logs of the execution:
- Execute the following statement:

--Adapt the execution plan name
EXEC [LC].[Show_Execution] @ExecutionPlan = N'DEV_XXXX';

The logs should look like the following. An essential piece of information is the number of rows inserted:
You can observe in this example that 3 threads were used for the load:

You can now check that your data were correctly loaded with the following script:

SELECT
SCHEMA_NAME(schema_id) AS [SchemaName],
[Tables].name AS [TableName],
SUM([Partitions].[rows]) AS [TotalRowCount],
Tables.type_desc
FROM sys.tables AS [Tables]
JOIN sys.partitions AS [Partitions] ON [Tables].[object_id] =
[Partitions].[object_id] AND [Partitions].index_id IN (0, 1)
WHERE SCHEMA_NAME(schema_id) NOT IN ('LC') --We exclude the Load Control tables
GROUP BY SCHEMA_NAME(schema_id),[Tables].name, Tables.type_desc
ORDER BY SCHEMA_NAME(schema_id), TotalRowCount DESC;

Load several linked Projects data

As you read in the previous chapter, Load the data with a multi-thread approach, biGENIUS-X generates a LoadControl_Execution_XXXX.sql, which executes the default execution plan.

To load several linked Project data in an overall load, you need to define a single execution plan to load all necessary projects in the correct order.

Dependencies between projects

Currently, biGENIUS-X doesn’t generate dependencies between Projects.

You will need to create these dependencies manually.

The most straightforward approach is to create a dependency between all Model Objects from each Project with all Model Objects from other Projects.

To accomplish this, execute the following statement after a new deployment where Project B (ex.: Dimensional Project) uses Project A (ex.: Stage File Project) as a Linked Project.

This will add all necessary dependencies to the Dependency table used by the Load control.

--Replace [UserName] by a user name you want to use to identify the new dependency rows

--Replace [List of "entrance" schemas for Project B] by the list of schemas in Project B which contains the first loads. 
--Ex.: schema used for the cleansing layer of a Dimensional Project

--Replace [List of "exit" schemas for Project A] by the list of schemas in Project A which contains the last loads.
--Ex.: schema used for the data store layer of a Data Store Project

INSERT INTO [LC].[LoadConfig_LoadObject_Dependency]

           ([FK_LoadObject_ID]

           ,[FK_DependentOnLoadObject_ID]

           ,[AllowDisable]

           ,[InsertedAt]

           ,[InsertedBy]

           ,[UpdatedAt]

           ,[UpdatedBy])

SELECT obj.ID, objdep.ID, 1, GETDATE(), '[UserName]', GETDATE(), '[UserName]'

FROM [LC].[LoadConfig_LoadObject] obj

CROSS JOIN [LC].[LoadConfig_LoadObject] objdep

WHERE obj.SchemaName IN ([List of "entrance" schemas for Project B])

AND objdep.SchemaName IN ([List of "exit" schemas for Project A])

Overall load execution plan

Create an overall load execution plan containing the Model Objects of all your Projects.

This execution plan will care about the dependencies we created in the previous chapter.

Example:

--Replace [Project A load config name] by the load config name for the Project A
--Check the LC.LoadConfig table
--Replace [Project B load config name] by the load config name for the Project B
--Check the LC.LoadConfig table

--Creation of the new execution plan with all the objects from Project A and Project B
EXEC [LC].[Build_ExecutionPlan]                    @ExecutionPlan = N'LoadAllProjects';
EXEC [LC].[Build_ExecutionPlan_AddLoadObjects]     @ExecutionPlan = N'LoadAllProjects', @LoadConfig = N'[Project A load config name]';
EXEC [LC].[Build_ExecutionPlan_AddLoadObjects]     @ExecutionPlan = N'LoadAllProjects', @LoadConfig = N'[Project B load config name]';

--Launch the load with the new execution plan
EXEC [LC].[Start_Execution]                        @ExecutionPlan = N'LoadAllProjects';

--Display the logs of the load
EXEC [LC].[Show_Execution]                         @ExecutionPlan = N'LoadAllProjects';

You can now use this execution plan to load your data sequentially or with a multi-thread approach.

undefined

Microsoft SQL Server - Load data - Native Load Control - 1.8

Load the data sequentially

Load the data with a multi-thread approach

Load several linked Projects data

Dependencies between projects

Overall load execution plan