Optimize load performance

To optimize load performance, you can follow best practices provided in biGENIUS-X.

The following best practices can improve your load performance for any target technology:

Use incremental load:
- When the source system provides an indicator that can be used to build a highwater mark on (date, timestamp, decimal, integer,...)
- On biGENIUS-X default columns (auto-increment, timestamps, ...)

Use the implementation type Virtual when you you do not require a physical table
Filtered loading:
- Limit the data loaded via partition predicates (e.g., WHERE date >= '2024-01-01').

The following best practices can improve your load performance for a Microsoft SQL Server target technology:

Add an INDEX on tables where load performance is bad
Disable non-cluster indexes during loading, then rebuild them
Table partitioning: Segment tables by date ranges to speed up incremental loading

The following best practices can improve your load performance for a Snowflake target technology:

Set warehouse auto-suspend intervals thoughtfully to keep caches warm without incurring unnecessary costs
Monitor for query queuing, which happens when all warehouse resources are busy. If queuing is frequent, consider increasing warehouse size or using multi-cluster warehouses to handle more concurrent queries
Adjust warehouse size based on workload: larger warehouses process queries faster but cost more, so match size to your needs
For very large tables with complex query patterns, you can define clustering keys to guide Snowflake in organizing micro-partitions around specific columns. This is only necessary if you notice query performance degradation and should be used judiciously, as clustering maintenance can be resource-intensive (use the alter table script for that as biGENIUS-X doesn't support it yet)

The following best practices can improve your load performance for a Databricks target technology:

Use partition filtered load:
- On source data in a data lake
Cache hashing results on tables with many columns
Partition (Partition by)
Liquid clustering (Cluster by)
Run Deltas OPTIMIZE command on some schedules
- Run OPTIMIZE ... ZORDER BY to group data by frequently queried columns

The following best practices can improve your load performance for a Microsoft Fabric target technology: