You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-16Lines changed: 15 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,44 +5,43 @@ An example Data Vault data warehouse modelling Microsoft's Northwind sample data
5
5
6
6
### Purpose
7
7
8
-
The objective of this project is to develop an accessible example data warehouse illustrating how to model the well-known Northwind sample database using the Data Vault methodology.
8
+
The objective of this project is to develop an easiily accessible example data warehouse illustrating how to model the well-known Northwind sample database using the Data Vault methodology.
9
9
10
-
I intend the repository to mostly be of use to new Data Vault practitioners and go some way to helping join a few in dots in the somewhat steep learning curve that comes with Data Vault modelling. Personally, I found the 'getting the data in' part of Data Vault intuitive enough, but the 'getting the data out' a bit more challenging, and I hope the content of this repository is helpful in that regard.
10
+
I intend the repository to mostly be of use to new Data Vault practitioners, providing a convenient option for hands-on experimentation. Hopefully it helps join a few dots on the somewhat steep learning curve that comes with Data Vault modelling.
11
11
12
12
13
-
### Setup and Documentation
13
+
### Setup
14
14
15
15
1. Set up a SQL Server instance to hold the five component databases.
16
-
2. Open SQL Server Management Studio and connect to your SQL Server instance.
17
-
3. Run the following SQL scripts in this order to create the five databases.
16
+
2. Run the following SQL scripts in this order to create the five databases.
4. Confirm setup is correct with an initial load. Run the following SQL script to execute the stored procedures for the Stage_Area and Data_Vault databases.
22
+
3. Confirm setup is correct with an initial load. Run the following SQL script to execute the stored procedures for the Stage_Area and Data_Vault databases.
24
23
* SQL\ETL\Load_Data_Vault.sql
25
-
5. Additionally, check the contents of Meta_Metrics_Error_Mart.error.Error_Log for any unxpected errors.
26
-
6. Basic documentation covering the data model and table mapping can be found in the Documentation directory.
27
-
7. From here, follow your nose until it all makes sense. I recommend starting by altering some Northwind source data, rerunning the load, and following the changes through the layers into the Information_Mart views.
24
+
4. Additionally, check the contents of Meta_Metrics_Error_Mart.error.Error_Log for any unxpected errors.
28
25
29
26
30
-
### Caveats and Notes
27
+
### Documentation
28
+
Basic documentation covering the data model and table mapping can be found in the Documentation directory.
29
+
31
30
32
-
* First and foremost, if you're interested in working with Data Vault professionally, theres's no replacement for proper training and expert advice. I recommend seeking out a Data Vault Alliance course as your first port of call.
31
+
### Caveats and Notes
33
32
34
33
* My implementation here most closely adheres to Dan Linstedt's Data Vault 2.0 standard, but also takes some inspiration from the work of others, primarily Hans Hultgren and Patrick Cuba.
35
34
36
-
* This implementation should not be considered 'textbook' or representative of a real production Data Vault data warehouse; it is intended as a simplified example for educational purposes. A full-scale production data warehouse would almost certainly integrate data from multiple source systems, use incremental loads, and employ parallelised ETL scheduling as a just a few examples. My goal here was to have something somebody can set up, load, and be getting info out of the mart with just SQL Server in minutes.
35
+
* This implementation should not be considered 'textbook' or representative of a real production Data Vault data warehouse; it is intended as a simplified example for educational purposes. A full-scale production data warehouse would almost certainly integrate data from multiple source systems, use incremental loads, and employ parallelised ETL scheduling as a just a few examples. My goal here was to have something somebody can set up, load, and be getting info out of the information mart with just SQL Server in a matter of minutes.
37
36
38
-
* This project was coded by hand, so do not be surprised if there are occasional syntax inconsistencies. In a production implementation, I would consider a code automation tool to be a non-negotiable expense (WhereScape 3D/RED, Vaultspeed, dbtvault, etc.).
37
+
* This project was coded by hand, so no doubt there are a few inconsistencies in coding conventions. In a professional Data Vault implementation, I would consider a code automation tool to be a non-negotiable expense (WhereScape 3D/RED, Vaultspeed, dbtvault, etc.).
39
38
40
39
* My approach to Satellite types is simplistic, illustrating a few basic functions they commonly serve - link effectivity (including deletions and driving key relationships), business key deletions, and historisation of contextual attributes.
41
40
42
-
*The genuine business keys in Northwind are seldom uniquely indexed, so the surrogate IDs have been used where necessary.
41
+
*Business keys in Northwind are seldom uniquely indexed, so the surrogate IDs have been used where necessary.
43
42
44
-
* Information Mart objects are fully virtual, i.e. views. In addition to star schema fact, dimension, and bridge views, I have included 'replica' views, which, as the name suggests, exactly replicate all Northwind database source tables from their respective Data Vault objects.
43
+
* Information Mart objects are fully virtual, i.e. views. In addition to star schema facts, dimensions, and bridges, I have included 'replica' views, which, as the name suggests, exactly replicate all Northwind database source tables from their respective Data Vault objects.
45
44
46
45
* Information Mart views present 'current' data as it existed in the data warehouse as at the specific (load effective) time specified in the 'Value' field of the 'Information_Mart_Load_Effective_Datetime' (ID = 1) record in Meta_Metrics_Error_Mart.meta.Parameter. If this field is left blank, the current datetime will be employed. This functionality is enabled by the use of PIT tables built for each Hub.
47
46
48
-
*I haven't invested a great deal of time just yet in tweaking indexes for Information_Mart query performance.
47
+
*Remaining to-do items include fleshing out the Meta_Metrics_Error_Mart and tweaking indexing for Information_Mart query performance.
0 commit comments