Skip to content

Commit 367441e

Browse files
committed
SQL and documentation added.
1 parent bc3098b commit 367441e

9 files changed

+15663
-1
lines changed
139 KB
Binary file not shown.

README.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,48 @@
1-
# Data-Vault-Example-Northwind
1+
# Data-Vault-Example-Northwind
2+
3+
An example Data Vault data warehouse modelling Microsoft's Northwind sample database, built for SQL Server.
4+
5+
6+
### Purpose
7+
8+
The objective of this project is to develop an accessible example data warehouse illustrating how to model the well-known Northwind sample database using the Data Vault methodology.
9+
10+
I intend the repository to mostly be of use to new Data Vault practitioners and go some way to helping join a few in dots in the somewhat steep learning curve that comes with Data Vault modelling. Personally, I found the 'getting the data in' part of Data Vault intuitive enough, but the 'getting the data out' a bit more challenging, and I hope the content of this repository is helpful in that regard.
11+
12+
13+
### Setup and Documentation
14+
15+
1. Set up a SQL Server instance to hold the five component databases.
16+
2. Open SQL Server Management Studio and connect to your SQL Server instance.
17+
3. Run the following SQL scripts in this order to create the five databases.
18+
* SQL\DDL\Northwind\instnwnd.sql
19+
* SQL\DDL\Create_Database_Stage_Area.sql
20+
* SQL\DDL\Create_Database_Meta_Metrics_Error_Mart.sql
21+
* SQL\DDL\Create_Database_Data_Vault.sql
22+
* SQL\DDL\Create_Database_Information_Mart.sql
23+
4. Confirm setup is correct with an initial load. Run the following SQL script to execute the stored procedures for the Stage_Area and Data_Vault databases.
24+
* SQL\ETL\Load_Data_Vault.sql
25+
5. Additionally, check the contents of Meta_Metrics_Error_Mart.error.Error_Log for any unxpected errors.
26+
6. Basic documentation covering the data model and table mapping can be found in the Documentation directory.
27+
7. From here, follow your nose until it all makes sense. I recommend starting by altering some Northwind source data, rerunning the load, and following the changes through the layers into the Information_Mart views.
28+
29+
30+
### Caveats and Notes
31+
32+
* First and foremost, if you're interested in working with Data Vault professionally, theres's no replacement for proper training and expert advice. I recommend seeking out a Data Vault Alliance course as your first port of call.
33+
34+
* My implementation here most closely adheres to Dan Linstedt's Data Vault 2.0 standard, but also takes some inspiration from the work of others, primarily Hans Hultgren and Patrick Cuba.
35+
36+
* This implementation should not be considered 'textbook' or representative of a real production Data Vault data warehouse; it is intended as a simplified example for educational purposes. A full-scale production data warehouse would almost certainly integrate data from multiple source systems, use incremental loads, and employ parallelised ETL scheduling as a just a few examples. My goal here was to have something somebody can set up, load, and be getting info out of the mart with just SQL Server in minutes.
37+
38+
* This project was coded by hand, so do not be surprised if there are occasional syntax inconsistencies. In a production implementation, I would consider a code automation tool to be a non-negotiable expense (WhereScape 3D/RED, Vaultspeed, dbtvault, etc.).
39+
40+
* My approach to Satellite types is simplistic, illustrating a few basic functions they commonly serve - link effectivity (including deletions and driving key relationships), business key deletions, and historisation of contextual attributes.
41+
42+
* The genuine business keys in Northwind are seldom uniquely indexed, so the surrogate IDs have been used where necessary.
43+
44+
* Information Mart objects are fully virtual, i.e. views. In addition to star schema fact, dimension, and bridge views, I have included 'replica' views, which, as the name suggests, exactly replicate all Northwind database source tables from their respective Data Vault objects.
45+
46+
* Information Mart views present 'current' data as it existed in the data warehouse as at the specific (load effective) time specified in the 'Value' field of the 'Information_Mart_Load_Effective_Datetime' (ID = 1) record in Meta_Metrics_Error_Mart.meta.Parameter. If this field is left blank, the current datetime will be employed. This functionality is enabled by the use of PIT tables built for each Hub.
47+
48+
* I haven't invested a great deal of time just yet in tweaking indexes for Information_Mart query performance.

0 commit comments

Comments
 (0)