Data Vault Anti-pattern: Using Effectivity Satellites as SCD2

 

In Data Vault 2.0 Effectivity Satellites are artifacts that are exclusively used to Track the temporal relevance of a relationship based on a Driving Key. As such, they hang from a Link Table.

 

Effectivity Satellites are not same as the SCD2.

So what is an Effectivity Satellite?

For an Effectivity Satellite, a Driving Key needs to be defined. For e.g. let's say we have a Link for Opportunities and relations to Accounts. The Account on Opportunity can change overtime. A common scenario is that the Opportunity is Assigned to a Global Parent account (e.g. Seagate Technology), and may get re-assigned to the Account Subsidiary (e.g. Lyve Labs) and get re-assigned to the Global Parent (Seagate Technology) at a later date

This will be tracked in the LINK as following:

LINK_HKOPPORTUNITY_HKOPPORTUNITYACCOUNT_HKACCOUNTLOAD_DATE
8913b4fac89470111e184de02Lyve Labs2023-02-21 14:43:05
8cee5f97c8947011192b3503Seagate Technology2023-02-21 14:53:57

Now the 3rd step, i.e. 111 switching back to Seagate Technology will not be captured in this LINK as a new row in this LINK

An Effectivity SAT can be used to Track effectivity (temporal relevance) of the aforementioned Opportunity to Account relationships.

LINK_HKOPPORTUNITY_HKOPPORTUNITYACCOUNT_HKACCOUNTSTART_DATEEND_DATELOAD_DATE
8913b4fac89470111e184de02Lyve Labs2023-02-21 14:43:059999-12-31 0:00:002023-02-21 14:43:05
8cee5f97c8947011192b3503Seagate Technology2023-02-21 14:53:579999-12-31 0:00:002023-02-21 14:53:57
8913b4fac89470111e184de02Lyve Labs2023-02-21 14:43:052023-02-21 14:53:572023-02-21 14:53:57
8cee5f97c8947011192b3503Seagate Technology2023-02-21 14:53:572023-02-21 15:00:312023-02-21 15:00:31
8913b4fac89470111e184de02Lyve Labs2023-02-21 15:38:309999-12-31 0:00:002023-02-21 15:38:30
8cee5f97c8947011192b3503Seagate Technology2023-02-21 15:44:429999-12-31 0:00:002023-02-21 15:44:42
8913b4fac89470111e184de02Lyve Labs2023-02-21 15:38:302023-02-21 15:44:422023-02-21 15:44:42
  1. When the relationship is first recorded only 1 record is inserted (highlighted in green)
  2. When there is a change in a relationship it must be based on one of the participants of the relationship– the driver, we track its change vs the other keys in the relationship. To do that we end_date one record and insert a new "active record"; i.e. 2 records inserted. (highlighted in red)

 


Data Vault Anti-pattern: Load Dates that are anything other than time of loading the Staging

 

By using the Load Date we should be able to identify all the Data that was loaded into DV in that particular batch. If the Load Date is some else, for e.g. the Load Date from the ETL tool, the entire batch in DV can not identified using the Load Date.

Data Vault Anti-pattern: Using varchar to store the HashKeys and HashDiffs

 

Why would you do that? HashKeys and HashDiffs are binary generated using a hashing algorithm like MD5 or SHA-1. Just store them as binary and effectively halve your storage and double your I/O! No need to convert them to the Char to store them as VARCHAR.

Data Vault Anti-pattern: Creating Hubs for Dependent Children

 

Dependent Children should not have their own Hubs. They are not Business Concepts and as such should not be a Business Key by themselves. They only make sense when associated with a Business concept. 

One example is Line Items in a Order. The Line Item in Order by itself does not make sense. It needs to be associated with an Order. This makes a Line Item a Dependent Child i.e. it is not a Business Concept on its own.

One way to model the Dependent Child is to add them in the Link as following:

 

 

 

Data Vault Anti-pattern: Implementing Business Rules at the Infomart Level

 

While it is tempting to implement Business Rules at the Infomart Level, that is not where the Business Rules should reside. They should reside in Business Vault. This enables historisation of the Business Rules and introduces auditability. When the Business Rule changes, with historisation it is possible to go back in time and analyze the impact of the Business Rule change. Now if this Business Rule was implemented at the Infomart Level, there would be no history thus no auditability. As a best-practice, the Business Vault tables should be materialized physical TABLES instead of VIEWs.

Data Vault Anti-pattern: Using Historized Links to store Transactional data that does not change

 

Transactional Data that does not change e.g. sensor data, stock trades, call center call data log, medical test results, event logs etc. should reside in a Non-historized Link (NHL) aka. Transaction Link. There is no point in using a Historized Link to store data that can not change.  All of the attributes of the Transaction can be stored within the NHL. Here is an example:

 

Note that the PATIENT_HK, LOCATION_HK, COVID_TEST_ID, and LOAD_DATE form the Unique Key for the Non-historized Link. The descriptive attributes are stored in the Non-historized Link instead of a SAT hanging from the Link.

"Non-historised links are used when the data in the source should not be modified at any time"

 

Data Vault Anti-pattern: Excluding the Dependent Child in the generating the LINK Hashkey

If your LINKs have Dependent Childs, for e.g. Order Line Item as show below, it is crucial that the Order Line Item is included the LINK Hashkey.

 

If the Dependent Child is not included the generation of the LINK Hashkey, the JOINs to the LINK table will not be easy.