Optimize load performance
To optimize load performance, you can follow best practices provided in biGENIUS-X.
Common best practices
The following best practices can improve your load performance for any target technology:
- Use incremental load:
- When the source system provides an indicator that can be used to build a highwater mark on (date, timestamp, decimal, integer,...)
- On biGENIUS-X default columns (auto-increment, timestamps, ...)
- Use the implementation type Virtual when you you do not require a physical table
- Filtered loading:
- Limit the data loaded via partition predicates (e.g., WHERE date >= '2024-01-01').
Best practices for Microsoft SQL Server
The following best practices can improve your load performance for a Microsoft SQL Server target technology:
- Add an INDEX on tables where load performance is bad
- Disable non-cluster indexes during loading, then rebuild them
- Table partitioning: Segment tables by date ranges to speed up incremental loading
Best practices for Snowflake
The following best practices can improve your load performance for a Snowflake target technology:
- Set warehouse auto-suspend intervals thoughtfully to keep caches warm without incurring unnecessary costs
- Monitor for query queuing, which happens when all warehouse resources are busy. If queuing is frequent, consider increasing warehouse size or using multi-cluster warehouses to handle more concurrent queries
- Adjust warehouse size based on workload: larger warehouses process queries faster but cost more, so match size to your needs
- For very large tables with complex query patterns, you can define clustering keys to guide Snowflake in organizing micro-partitions around specific columns. This is only necessary if you notice query performance degradation and should be used judiciously, as clustering maintenance can be resource-intensive (use the alter table script for that as biGENIUS-X doesn't support it yet)
Best practices for Databricks
The following best practices can improve your load performance for a Databricks target technology:
- Use partition filtered load:
- On source data in a data lake
- Cache hashing results on tables with many columns
- Partition (Partition by)
- Liquid clustering (Cluster by)
- Run Deltas OPTIMIZE command on some schedules
- Run OPTIMIZE ... ZORDER BY to group data by frequently queried columns
Best practices for Microsoft Fabric
The following best practices can improve your load performance for a Microsoft Fabric target technology:
- Use partition filtered load:
- On source data in a data lake
- Cache hashing results on tables with many columns
- Partition (Partition by)
- Liquid clustering (Cluster by)
- Run Deltas OPTIMIZE command on some schedules