Snowflake - Load data - 1.5

Before loading the data from your Source(s) System(s) to your Snowflake Target System, please:

Deploy the generated Artifacts

You are now ready to load the data.

There are two possibilities to load the data natively with Snowflake:

Sequentially based on Snowflake Tasks
With a near real-time approach with Snowflake Dynamic Tables and Stream

The near real-time approach uses Snowflake Dynamic Tables with Stream, a Snowflake public preview.

Don't use this approach for a data solution in production.

You can also load the data with an external tool as:

Apache Airflow

Load the data sequentially

To load the data:

Open Snowflake
Currently, the generated artifacts don't include helpers to load data. Please download the Helper_Snowflake.sql and upload it as Worksheet in Snowflake.
Open the Worksheet Helper_Snowflake. It contains 3 parts:
1. Load the data
2. Monitor the execution
3. Suspend the root Task
Select the 3 lines of the Load the data part and execute them: the root task to load the data is launched

Be careful: The root task will be executed every 5 minutes by default.

You can change this in the Load Control script creation:

CREATE OR REPLACE TASK LOAD_SNOWFLAKE1_4_0DATAVAULTANDMART_ROOT_TASK
    WAREHOUSE = TRAINING_WH_461345104558
    SCHEDULE = '5 minute'
    ALLOW_OVERLAPPING_EXECUTION = FALSE
    SUSPEND_TASK_AFTER_NUM_FAILURES = 3
AS
BEGIN
    LET start_time VARCHAR := TO_VARCHAR(CONVERT_TIMEZONE('UTC', CURRENT_TIMESTAMP()))::VARCHAR;
    CALL system$set_return_value(:start_time);
END;

You can monitor the load execution with the Monitor the execution part. For example, just after the load data launch, you will have only the root task scheduled:
After the completion of the load, the root task and all the child tasks will be completed:
After the completion of the load, in testing mode, please suspend the root task to avoid additional costs in Snowflake execution. To do this, execute the Suspend root Task part.

Load the data in near real-time with Stream

All your Model Objects should have the Property Use Stream checked to use a Stream approach.

REMINDER: The near real-time approach uses Snowflake Dynamic Tables with Stream, a Snowflake public preview.

Don't use this approach for a data solution in production.

Using a Stream approach to load data for PIT tables is impossible.

To load the data:

Open Snowflake
Currently, the generated artifacts don't include helpers to load data. Please download the Helper_Snowflake_Stream.sql and upload it as Worksheet in Snowflake.
Open the Worksheet Helper_Snowflake. It contains 5 parts:
1. Generate the SQL code to resume the stream tasks
2. Load the data
3. Monitor the execution
4. Generate the SQL code to suspend the stream tasks
5. Suspend the root Task
Select the 3 lines of the Generate the SQL code to resume the stream tasks part and execute them: the list of SQL statements to resume all the tasks is generated.
Copy the generated SQL statements in the Load the data part
Select the X lines of the Load the data part and execute them: the tasks to load the data are launched

Be careful: The tasks will be executed every 5 minutes by default.

You can change this for each task in the Load Control script creation:

CREATE OR REPLACE TASK STG_ST_CREDITCARD_LOADER_TASK
    WAREHOUSE = TRAINING_WH_461345104558
    SCHEDULE = '5 minute'
WHEN
...........

CREATE OR REPLACE TASK RDV_HUB_CREDITCARD_HUB_LOADER_TASK
    WAREHOUSE = TRAINING_WH_461345104558
    SCHEDULE = '5 minute'
WHEN
...........

You can monitor the load execution with the Monitor the execution part. For example, just after the load data launch, you will have all the tasks scheduled:
After the completion of the load, all the tasks will be completed:

As all the Stage objects should be filled before continuing the load, the first execution will load the data only for the Stage objects:

After the completion of the load, in testing mode, please suspend all the tasks to avoid additional costs in Snowflake execution. To do this:
- Select the 3 lines of the Generate the SQL code to suspend the stream tasks part and execute them: the list of SQL statements to suspend all the tasks is generated
- Copy the generated SQL statements in the Suspend root Task part

- Execute the Suspend root Task part

Check the data

You can now check that your data were correctly loaded with the following script:

USE DATABASE IDENTIFIER('<Name of your target database>');

SELECT table_schema AS table_schema
, table_name AS schema_table
, table_type AS table_type
, row_count AS rows_count
FROM information_schema.tables
WHERE table_type = 'BASE TABLE'
AND table_schema IN ('SA','RDV','BV','DM')
ORDER BY row_count;