Data Vault Anti-pattern: Load Dates that are anything other than time of loading the Staging

 

By using the Load Date we should be able to identify all the Data that was loaded into DV in that particular batch. If the Load Date is some else, for e.g. the Load Date from the ETL tool, the entire batch in DV can not identified using the Load Date.

Data Vault Anti-pattern: Using varchar to store the HashKeys and HashDiffs

 

Why would you do that? HashKeys and HashDiffs are binary generated using a hashing algorithm like MD5 or SHA-1. Just store them as binary and effectively halve your storage and double your I/O! No need to convert them to the Char to store them as VARCHAR.

Data Vault Anti-pattern: Creating Hubs for Dependent Children

 

Dependent Children should not have their own Hubs. They are not Business Concepts and as such should not be a Business Key by themselves. They only make sense when associated with a Business concept. 

One example is Line Items in a Order. The Line Item in Order by itself does not make sense. It needs to be associated with an Order. This makes a Line Item a Dependent Child i.e. it is not a Business Concept on its own.

One way to model the Dependent Child is to add them in the Link as following:

 

 

 

Data Vault Anti-pattern: Implementing Business Rules at the Infomart Level

 

While it is tempting to implement Business Rules at the Infomart Level, that is not where the Business Rules should reside. They should reside in Business Vault. This enables historisation of the Business Rules and introduces auditability. When the Business Rule changes, with historisation it is possible to go back in time and analyze the impact of the Business Rule change. Now if this Business Rule was implemented at the Infomart Level, there would be no history thus no auditability. As a best-practice, the Business Vault tables should be materialized physical TABLES instead of VIEWs.

Data Vault Anti-pattern: Using Historized Links to store Transactional data that does not change

 

Transactional Data that does not change e.g. sensor data, stock trades, call center call data log, medical test results, event logs etc. should reside in a Non-historized Link (NHL) aka. Transaction Link. There is no point in using a Historized Link to store data that can not change.  All of the attributes of the Transaction can be stored within the NHL. Here is an example:

 

Note that the PATIENT_HK, LOCATION_HK, COVID_TEST_ID, and LOAD_DATE form the Unique Key for the Non-historized Link. The descriptive attributes are stored in the Non-historized Link instead of a SAT hanging from the Link.

"Non-historised links are used when the data in the source should not be modified at any time"

 

Data Vault Anti-pattern: Excluding the Dependent Child in the generating the LINK Hashkey

If your LINKs have Dependent Childs, for e.g. Order Line Item as show below, it is crucial that the Order Line Item is included the LINK Hashkey.

 

If the Dependent Child is not included the generation of the LINK Hashkey, the JOINs to the LINK table will not be easy.


 

Data Vault Anti-pattern: Having two or more records in a SAT for a Single Business Key with the same LOAD_DATE

A Satellite, by definition, should have only one record per Business Key per Load Date. The Business Key (or the hash of the Business Key) + LOAD_DATE is the unique key for the record. BK + LOAD_DATE is the Primary of the Satellite

The exception is a Multi-Active Satellite where a Sequence Number is added to indicate the each of the active records. Sequence Number becomes part of the PK of the Multi-active Satellite.

See section 5.4 of the Data Vault Data Modeling Specification v 2.0.2

Update 05/07/2022:
An exception to this rule is when a stable descriptive attribute from the source can be used as part of the Private Key for the SAT. For e.g. the Modify Timestamp from the source can be combined with the HASHKEY and the LOAD_DATE to generate the Private Key, which will be unique for all records in the SAT

HASHKEY + LOAD_DATE + MODIFY_TS