A Model Object of type Same as Link can be created in the Silver Layer.
What is a Same as Link
In a Data Vault model, a Same-As Link Model Object is a one-level deep hierarchy that maps similar terms (synonyms) of business keys to a single master term or business key.
Use case example
We take customer data from 2 source systems. Some customers exist in the two source systems but are created with different hash keys in the Customer Hub. We want to identify these different versions of a customer as unique.
In this example, we will create a mapping table between some customers in the sample database AdventureWorks. Let's consider that Persons with the same FirstName and LastName are the same people.
Creation of the Customer's Account Numbers mapping for the example:
/*
Master table with examples of duplicated customers if we take a look only to Firstname and Lastname:
AccountNumber FirstName LastName
AW00016377 Jordan Adams
AW00020387 Jordan Adams
AW00026574 Jordan Allen
AW00019835 Jordan Allen
AW00015903 Abigail Barnes
AW00029215 Abigail Barnes
*/
--DROP TABLE [Sales].[DuplicatedCustomerConsolidation];
CREATE TABLE [Sales].[DuplicatedCustomerConsolidation](
[AccountNumberMaster] varchar(10),
[AccountNumberDuplicate] varchar(10)
);
INSERT INTO [Sales].[DuplicatedCustomerConsolidation] VALUES ('AW00016377','AW00020387');
INSERT INTO [Sales].[DuplicatedCustomerConsolidation] VALUES ('AW00026574','AW00019835');
INSERT INTO [Sales].[DuplicatedCustomerConsolidation] VALUES ('AW00015903','AW00029215');
Creation steps
There are two possibilities to create a Same as Link Model Object:
- Create a Model Object with the Wizard:
- Recommended possibility
- Use the Wizard Create a Same as Link from a Stage or a Stage Composite that contains the following steps:
- Create Model Object
- Create Dataflow
- Create Relations: number = {null}
- Map Foreign Key Terms
- Add to View
- Create a Model Object manually:
- Not recommended possibility
- Create a Model Object of type Same as Link in the Silver Layer
- Create a Dataflow Set
- Add a Model Object to the Dataflow Set: add the Stage or Stage Composite Model Object
-
- Add a Relationship to the first corresponding Hub
- Edit the Foreign Key Terms: set them as Business Key
- Add a second relationship to the same Hub
- Edit the Role names (See Have several Relationships to the same Model Object) with:
- First Relationship with the Hub: Customer_Master
- Second Relationship with the same Hub: Customer_Duplicate
- Map the Foreign Key Terms: select the target mode and manually map the Foreign Key Terms to the corresponding Source Terms:
- FK for the Role Customer_Master with the Source Term containing the master value of the BK
- FK for the Role Customer_Duplicate with the Sourchavingtaining the duplicate value of the BK
Properties
A Same as Link Model Object can be configured through the following Properties:
-
Deduplication Method or Optimization Method (Only for Spark Generator)
- Deduplication Partition or Optimization Method Columns (Only for Spark Generator)
- File format (Only for Spark Generator)
Default Terms
A Same as Link Model Object will include the following Default Terms:
Business Rules
The following Business Rules are checked during the Model Object creation:
- Dataflows amount: min = {1}, max = {null}
- Dataflow Sets amount: min = {1}, max = {null}
- Dataflow Set Model Objects amount: min = {1}, max = {null}
- Relationships amount: min = {2}, max = {null}
- Terms amount: exclude Default Terms = {true}, exclude Business Keys = {true}, exclude Identities = {false}, exclude Foreign Keys = {false}, exclude unmapped Terms = {false}, min = {null}, max = {0}
- Supported Term Data Types: exclude Default Terms = {true}, exclude Business Keys = {false}, exclude Identities = {false}, exclude Foreign Keys = {false}, exclude unmapped Terms = {false}
- Business Key Terms amount: min = {1}, max = {null}
- Identity Terms amount: min = {1}, max = {1}
- Identity Terms not nullable
- Supported Implementation Types: supported Implementation Types = {Permanent}
- Supported Deduplication Methods: supported deduplication Methods = {None, Distinct, Partition}
- Supported Load Cachings: supported Load Cachings = {Hashing}