Optimize load performance
To optimize load performance, you can follow best practices provided in biGENIUS-X.
Common best practices
The following best practices can improve your load performance for any target technology:
- Use incremental load:
- When the source system provides an indicator that can be used to build a highwater mark on (date, timestamp, decimal, integer,...)
- On biGENIUS-X default columns (auto-increment, timestamps, ...)
- Use the implementation type Virtual when you you do not require a physical table
- Filtered loading:
- Limit the data loaded via partition predicates (e.g., WHERE date >= '2024-01-01').
Best practices for Microsoft SQL Server
The following best practices can improve your load performance for a Microsoft SQL Server target technology:
- Add an INDEX on tables where load performance is bad
- Disable non-cluster indexes during loading, then rebuild them
- Table partitioning: Segment tables by date ranges to speed up incremental loading
- Use parallel load execution (multi-threaded)
- Depending on the amount and type of columns it can make sense to use different Delta Detection methods (Hash Value Comparison or Column Comparison)
Best practices for Snowflake
The following best practices can improve your load performance for a Snowflake target technology:
- Set warehouse auto-suspend intervals thoughtfully to keep caches warm without incurring unnecessary costs
- Monitor for query queuing, which happens when all warehouse resources are busy. If queuing is frequent, consider increasing warehouse size or using multi-cluster warehouses to handle more concurrent queries
- Adjust warehouse size based on workload: larger warehouses process queries faster but cost more, so match size to your needs
- For very large tables with complex query patterns, you can define clustering keys to guide Snowflake in organizing micro-partitions around specific columns. This is only necessary if you notice query performance degradation and should be used judiciously, as clustering maintenance can be resource-intensive (use the alter table script for that as biGENIUS-X doesn't support it yet)
Best practices for Databricks
The following best practices can improve your load performance for a Databricks target technology:
- Use partition filtered load:
- On source data in a data lake
- Cache hashing results on tables with many columns
- Partition (Partition by)
- Liquid clustering (Cluster by)
- Run Deltas OPTIMIZE command on some schedules
- Run OPTIMIZE ... ZORDER BY to group data by frequently queried columns
Best practices for Microsoft Fabric
The following best practices can improve your load performance for a Microsoft Fabric target technology:
- Use partition filtered load:
- On source data in a data lake
- Cache hashing results on tables with many columns
- Partition (Partition by)
- Liquid clustering (Cluster by)
- Run Deltas OPTIMIZE command on some schedules