Deduplication Method (Dataflow)

Description

The Property named Deduplication Method is available when you edit a Dataflow

It permits setting how to deduplicate the rows of a Model Object for a dedicated Dataflow.

Format

The Deduplication Method is a List of values.

The possible values are:

  • None:
    • No deduplication applied to the rows of the Model Object
  • Distinct:
    • A deduplication method based on DISTINCT is applied
    • The rows are deduplicated according to all the Term's values
  • Partition:
    • A deduplication method based on a ROW NUMBER() is applied
    • The PARTITION BY depends on the Model Object Type:
      • Hub: the Business Key(s)
      • Satellite: the Foreign Keys of Relationships (which correspond to the Business Key(s) of the Hub)
    • The ORDER BY is, by default, the same as the PARTITION BY

It is possible to define another ORDER BY than the default one. For this, please set the Deduplication Partition Property.


Example:

Default Value


The default value for all Model Objects is None.

 

Impact

The value of the Deduplication Method is used during a load of a Dataflow for a Model Object to duplicate the source rows before applying the business rules and the modeled transformations.