Model a BI Solution

There are three different options to define the architecture of a BI solution:

  • The Data Vault modeling (Dan Linstedt)
  • The Dimensional modeling (Star Schema / Ralph Kimball)
  • The Relational approach (Header and Version)

The choice is up to you: Each type of modeling has benefits and drawbacks that depend on your project's specific needs (the number of sources, the update frequency of your sources, etc.).

You can also use a specific architecture called Stage JDBC to load data from a JDBC source to a target file such as the Parquet format.

Data Vault Modeling

Invented by Dan Linstedt in the 1990s, the Data Vault modeling is a scalable Data Solution architecture.

Data Vault should store "all data, any time." Data should not be cleansed.

The Data Solution is based on the source data (the source system or an ODS based on the source system) and transforms the data in a Data Vault modeling (Hubs, Links, Satellites in the Silver Layer Group).

  • The business transformations are done in the Silver Layer Group only
  • Complete historization of all attributes
  • Can be multi-sourced
  • Ability to scale the model

You can base your Data Vault project on an ODS, such as the Dimensional or Relational approach, and construct a Data Mart on your Data Vault Modeling.

Please look at Model a Data Vault Solution for more details.

Dimensional Modeling

Over the past 20 years, Ralph Kimball's approach has proven effective for reporting. It optimizes data for user queries. It is also referred to as Star Schema.

This architecture can contain the following Project types:

Data Store or Operational Data - ODS

The needed data is stored for each source; no data transformation is done. The Data Solution is then based on the operational data stores.

Advantages of having a Data Store:

  • The ODS is loaded once per load frequency. The Data Solution will access the ODS and not the source systems, so there are no performance problems.
  • The ODS can be historized (technical tracking of the changes in the source) if the source does not provide data historization. It enables a more straightforward Data Solution modeling, avoiding historization in the Data Solution.
  • The ODS can monitor source deletions.
  • Recreating the Data Solution is possible anytime as the ODS has all the data.

An ODS may be unnecessary if a Data Hub or an equivalent exists.

Please look at Model a Data Store Solution for more details.

Dimensional Data Solution

The Data Solution is based on the source data (the source system or an ODS based on the source system) and transforms the data into a snowflake schema (core: facts and entities) and a star schema (mart: fact tables and dimensions).

  • The business transformations are done in the Silver Layer Group only.
  • If a source system is replaced over time, the old and the new source system / ODS will be maintained as the source.
  • It can be multi-sourced.
  • In biGENIUS-X, the Data Solution can be split into multiple projects for different purposes, for example
    • to minimize the effort by doing a biGENIUS project for standard dimensions (Date, Customer, Geography, etc.)
    • to keep a domain utterly separate from the rest (i.e., HR reporting)

Please look at Model a Dimensional Solution for more details.

Data Mart

A data mart is a structure/access pattern specific to data solution environments that retrieves client-facing data. The data mart is a data solution subset usually oriented to a particular line of business or team.

In biGENIUS, you can model one Data Mart project by business topic using the system configuration Data Mart. This is done in the Gold layer.

Please look at Model a Data Mart Solution for more details.

Relational Modeling

Soon available in biGENIUS-X!

The relational Data Solution approach differs from the star schema / Kimball approach in the following aspects:

  • It is a snowflake schema
  • The modeling is done in the third normal form
  • The design uses primarily entities and only rarely facts
  • Master data entities are split up into two tables:
    • Header table: Contains the business key and all SCD0 and SCD1 attributes
    • Version table (optional): Contains all SCD2 attributes. Historization is done in the CORE layer in the version tables of an entity.
  • Transactional data can be modeled as entities or facts depending on the loading requirements (Merge or Insert only or different fact load strategies)

The Data Solution is based on the source data (source system or ODS based on the source system) and transforms the data into a relational schema.

Stage JDBC

The Stage JDBC Modeling can be used to load data from a JDBC source to target files.

Please look at Model a Stage JDBC Solution for more details.

Stage File

The Stage File Modeling can be used to load data from a source files to target files or databases.

Please look at Model a Stage File Solution for more details.