Configure an incremental load
If you run a full load at each load, the time required to process is monotonically increasing because today’s load will always have more data than yesterday’s.
A best practice is to use incremental loads, which only load the delta.
The benefits are:
- Run faster since they touch less data
- Usually steady over time
- Can preserve historical data. Many source systems purge old data periodically
We can configure the Incremental Load behavior for the following Model Objects:
- Stage
- PIT
- Business PIT
- Mart Fact
To configure it, you must map the BG_IncrementalFilter Default Term to:
- A Source Term that contains the increment to consider, for example:
- A function composed of several Source Terms, for example:
The relevant Model Object should not set the Implementation Type Property to Virtual.
See Implementation Type (Model Object) for more details.
The BG_IncrementalFilter is a by default with an unsupported Data Type.
To set an incremental load behavior, please change it to the correct Data Type according to the Source(s) Term(s) mapped.
All Source terms can be used with data Types such as datetime, int...
Example:
Depending on the Model Object Type, the incremental load implementation is different:
- Stage: As a Stage loads data from the Source system, we prioritize to access this data only 1 time not to stress the Source System too much
- Other Model Object Types: As a Mart Fact, for example, can contain a lot of data, so we prioritize minimizing the number of columns
Stage Model Object Type
If an incremental load behavior was set for a Stage, it contains the increment value used.
So the max value of the Source(s) Term(s) mapped to the Default Term.
Example of values:
This increment is stored at each load in a new table XXX_INC. For example, here is where two loads happened:
During deployment, this table is empty. It is filled during the load.
Reset or Override of the Increment
Once the incremental load is configured, all increments are derived from the logs of previous loads.
However, there are specific scenarios where you want to override this behavior manually:
- to re-populate an incrementally loaded table (e.g., after a truncate or a re-deployment)
- to start incremental load at a specific increment (e.g., when you enabled incremental load on an already populated table)
- to ignore everything before a particular value during the initial load (e.g., to apply a specific value of cut-off)
- to correct problems with a previous load where you manually modified or removed data in an incrementally loaded table
To configure this manual override for a specific loader, you can update the column BG_IncrementalFilter in the table XXX_Increment.
So, truncate the table to re-populate an incrementally loaded table or add a new line with a different increment to start the incremental load at a specific increment.
Other Model Object Types
If an incremental load behavior was set for a PIT, a Business PIT, or a Mart Fact, it does not contain the increment value used.
This increment is stored at each load in a new table XXX_INC. For example, here is where two loads happened:
During the deployment, this table already contains one row per Dataflow and Dataflow Set for the Model Object:
It is filled during the load.
The view XXX_IncrementSource allows calculating which rows to load according to the increment.
Reset or Override of the Increment
Once the incremental load is configured, all increments are derived from the logs of previous loads.
However, there are specific scenarios where you want to override this behavior manually:
- to re-populate an incrementally loaded table (e.g., after a truncate or a re-deployment)
- to start incremental load at a specific increment (e.g., when you enabled incremental load on an already populated table)
- to ignore everything before a particular value during the initial load (e.g., to apply a specific value of cut-off)
- to correct problems with a previous load where you manually modified or removed data in an incrementally loaded table
To configure this manual override for a specific loader, you can update the column BG_IncrementalFilter in the table XXX_Increment.
So, delete the concerned rows to re-populate an incrementally loaded table or add a new line with a different increment to start the incremental load at a specific increment.
Please never delete the rows created during the deployment with the NULL value for the increment.