Skip to content

Commit e5cb05a

Browse files
committed
2 parents d591b35 + 83665a0 commit e5cb05a

5 files changed

+105
-99
lines changed
Lines changed: 105 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -1,112 +1,118 @@
11
# Solution Pattern - Data Modelling - Presentation Layer
22

33
## Purpose
4+
<<<<<<< HEAD
45
This Implementation Pattern describes the data modelling conventions and architecture for the Presentation Layer.
56

67
##Structure
78
In principle, there are two mechanisms towards preparing information for consumption in the Presentation Layer (created in the Presentation Layer database):
89
* Direct view on top of the Integration Layer (virtual information mart). In by far the most scenarios the first option (direct view / virtual) option is preferred as the subsequent layers in the (BI) architecture are typically MOLAP or in-memory.
10+
=======
11+
This Solution Pattern describes the data modelling conventions and objects used to expose and/or load dimensions in the Presentation Layer.
12+
13+
## Motivation
14+
15+
Dimension in the Presentation Layer provide the context and descriptions against which metrics are viewed. They are the objects that are exposed to consumers of information in Business Intelligence environments. There are many (seven to be exact) defined ways of how history can be represented in a Dimensional Model, but for all intents and purposes this pattern only focuses on Type 1 (current state) and Type 2 (changes over time).
16+
17+
Dimensions provide contexts for transactions (facts) which are captured in Fact tables. Dimension and Fact tables constitute the Dimensional Model.
18+
19+
## Applicability
20+
21+
This patterns is heavily geared towards Dimensional Modelling (Dimensional Model or Snowflake Model), but in principle is applicable to Data Vault Point-in-Time (PIT) and other types of time-variant tables as well.
22+
23+
## Structure
24+
25+
In principle, there are two technical approaches / mechanisms to prepare information for consumption in the Presentation Layer (created in the Presentation Layer database):
26+
27+
* A direct view on top of the Integration Layer (virtual information mart). In -by far- the most scenarios this option is preferred as the subsequent layers in the (BI) architecture are typically MOLAP or in-memory and therefore address performance concerns from an end-user's perspective.
28+
>>>>>>> 83665a0b59440a7c43985df8305b15a4bf404985
929
* Table / persistence / physical storage using a view to join and prepare the data in the format that matches the table (logic) and can be used to incrementally load the table.
1030

1131
In both cases these Presentation Layer objects will require one or more views to decouple the Business Intelligence (BI) and Data Warehouse (DWH) environments. These decoupling views are also intended to apply the history perspective at attribute level; e.g. how every attribute is displayed in time (e.g. Type1, Type2, Type 6).
12-
The following are general guidelines:
13-
The logic views and tables contain all history (Type2) by default. Any interpretation of history, such as ‘current state view’, can be queried using the decoupling views.
14-
There may be multiple decoupling views, but two views is the guideline, to represent history using different perspectives (e.g. current state / Type1 or mixed / Type2).
15-
The decoupling views are faced towards the BI / business side and aims to present information in the way it is easiest to consume.
16-
The decoupling views should be generated from metadata, and use Extended Properties defined at the view logic view to generate an accurate representation of history.
17-
The logic views used to populate tables are geared towards ETL and follows the rigor of naming conventions to support automation. BIML scripts are available to generate SSIS packages from the logic view to the Presentation Layer tables.
18-
The pres schema is the enterprise information / mart schema and contains the decoupling views, so this contains what is effectively the complete dimensional model exposed to the BI environment. This also allows the decoupling views to have the same name as the accompanying Dimension or Fact table.
19-
The ben schema contains the logic views and tables since objects (views and tables in this case) cannot be named identically (as they would be in the pres schema). The views are named with the ‘_VW’ suffix.
20-
Normal casing is used, with underscores (no spaces) for all tables and attributes.
21-
Definitions are maintained in the Confluence Business Glossary.
22-
Logic views are primarily manually developed (with some history merge scripts to assist) as these views handle the change from data handling to business use.
23-
Tables and decoupling views are generated from metadata.
24-
Decoupling views are used to expose history using additional metadata (‘extended properties’).
25-
The extract schema is used for data provision to support external systems (e.g. non-BI) and is therefore considered not to be part of the standard Presentation Layer.
26-
There is also a va schema which is specifically there to expose information to SAS Visual Analytics.
27-
There also is a temp schema which is strictly only used to store ETL required information / to support the performance and workings of the ETL.
32+
33+
The following general guidelines have been defined for Presentation Layer development:
34+
35+
* The **decoupling** **views** are facing the BI / business side and aim to present information in the way it is easiest to consume.
36+
* The **logic** **views** used to populate tables are geared towards ETL support, and follow the rigor of naming conventions to support automation. BIML scripts are available to generate SSIS packages from the logic view to the Presentation Layer tables.
37+
* The 'pres' schema is the enterprise information / mart schema and contains the decoupling views. In other words this schema contains what is effectively the complete dimensional model exposed to the BI environment. This also allows the decoupling views to have the same name as the accompanying Dimension or Fact table.
38+
* The 'ben' schema contains the logic views and tables required for performance management. This dedicated schema to support the 'pres' schema is required because objects (views and tables in this case) cannot be named identically. The views are named with the _VW’ suffix.
39+
* Normal casing is used, with underscores (no spaces) for all tables and attributes.
40+
* Definitions are maintained in the Business Glossary. Where possible an identifier (key / link) is used to refer to the (implementation of) business logic.
41+
* Logic views are primarily manually developed (with some history merge scripts to assist) as these views handle the change from data handling to business use.
42+
* Tables and decoupling views can be generated from metadata.
43+
* Decoupling views are used to expose history using additional metadata (history type at attribute level).
44+
* The 'extract' schema is used for data provision to support external systems (e.g. non-BI) and is therefore considered *not* to be part of the standard Presentation Layer.
45+
* There is also a 'va' schema which is specifically there to expose information to SAS Visual Analytics.
46+
* There also is a 'temp' schema which is strictly only used to store ETL required information / to support the performance and workings of the ETL.
47+
2848
This is displayed in the following diagram:
2949

30-
The modelling conventions for the Presentation Layer tables are outlined in the table below.
31-
Table Type
32-
Table name convention
33-
Mandatory attribute
34-
Comments
35-
Dimension
36-
ben.DIM_<name>
37-
table name>_SK
38-
OMD_INSERT_MODULE_INSTANCE_ID
39-
OMD_DELETED_RECORD_INDICATOR
40-
OMD_UPDATE_MODULE_INSTANCE_ID
41-
OMD_CHECKSUM_TYPE_1
42-
OMD_CHECKSUM_TYPE_2
43-
OMD_EFFECTIVE_DATETIME
44-
OMD_EXPIRY_DATETIME
45-
OMD_CURRENT_RECORD_INDICATOR
46-
<attributes>
47-
The first attribute (SK) is the primary key, and is a hash value (32 byte character)
48-
Optionally, a unique key / index is placed on the combination of level natural keys and the OMD_EFFECTIVE_DATETIME. This represents a unique point in time record. See the consequences section for more details
49-
Every attribute is specified as Type 0, Type 1, Type 2 (can be combined to type 3 or 6 - check the relevant pattern). This is specified in the model / database as an extended property
50-
Fact Table
51-
ben.FACT_<name>
52-
<table name>_SK
53-
<Dimension Keys>
54-
OMD_INSERT_MODULE_INSTANCE_ID
55-
OMD_INSERT_DATETIME
56-
OMD_INSERT_MODULE_INSTANCE_ID
57-
OMD_DELETED_RECORD_INDICATOR
58-
OMD_UPDATE_MODULE_INSTANCE_ID
59-
OMD_CHECKSUM_TYPE_1
60-
OMD_CHECKSUM_TYPE_2
61-
OMD_EFFECTIVE_DATETIME
62-
OMD_EXPIRY_DATETIME
63-
OMD_CURRENT_RECORD_INDICATOR
64-
<attributes>
65-
The first attribute (SK) is the primary key
66-
A unique key / index is placed on the combination of Dimension keys.
67-
Other
68-
ben.<name>
69-
<table name>_SK
70-
OMD_INSERT_MODULE_INSTANCE_ID
71-
OMD_INSERT_DATETIME
72-
<any OMD attributes required>
73-
<attributes>
74-
Not every delivery of information is necessarily in the form of a Star Schema / Dimensional Model. If a dataset is better delivered in a different format (wide table, normalised) this is preferred.
75-
76-
The modelling conventions for the Presentation Layer views are outlined in the table below.
77-
View Type
78-
Table name convention
79-
Mandatory attribute
80-
Comments
81-
Logic View
82-
ben.<name>_VW
83-
<view name>_SK
84-
OMD_INSERT_MODULE_INSTANCE_ID
85-
OMD_INSERT_DATETIME
86-
OMD_INSERT_MODULE_INSTANCE_ID
87-
OMD_DELETED_RECORD_INDICATOR
88-
OMD_UPDATE_MODULE_INSTANCE_ID
89-
OMD_CHECKSUM_TYPE_1
90-
OMD_CHECKSUM_TYPE_2
91-
OMD_EFFECTIVE_DATETIME
92-
OMD_EXPIRY_DATETIME
93-
OMD_CURRENT_RECORD_INDICATOR
94-
<attributes>
95-
Used to load a standard Dimension or Fact table supported by BIML.
96-
The name of the view needs to match the name of the target table (except for the _VW suffix)
97-
The _VW suffix is required as there may be a table with the original name in the ben schema
98-
The checksums for Type 1 and Type 2 calculations will be handled by the BIML, and do not need to be present in the views. This allows for a more automated update if required
99-
All other OMD attributes required in the target table are handled by the BIML scripts
100-
Decoupling View
101-
pres.<name> (for regular views)
102-
pres.<name>_history (for history or mixed-history views)
103-
Underlying ‘ben’ table or logic view, but without OMD attributes.
104-
Surrogate keys optional.
105-
Business-facing, e.g. DIM_CUSTOMER, or DIM_CUSTOMER_HISTORY.
106-
107-
Related Design Patterns
108-
Design Pattern 002 - Generic - Types of History
109-
Consequences
110-
Related to having Data Vault Surrogate (Hub) Keys (SK) in the Dimensional Model: it is OK to add Hub keys (Surrogate Keys) in the Presentation Layer for tracing and auditability purposes. However they cannot be adequately used as level keys as a level in a Dimension may not 100% map a business concept. For instance a 'business unit type' may not be modelled as a Hub in the Data Vault, but could be a level in a Dimension. By using Hub Keys for Dimension lookups a dependancy between the Integration and Presentation Layers is created that should be avoided. An example is where you have Business Unit Type, State, Counter and Ownership in the same Satellite (e.g. SAT_BUSINESS_UNIT). If these attributes are modelled in separate Dimensions in the Presentation Layer the Hub Key (from HUB_BUSINESS_UNIT) cannot be used, rather a separate Dimension Key must be created and a dedicated natural key must be selected appropriate for the dimension. In other words, lookups and constraints should be using natural keys.
111-
Discussion items (not yet to be implemented or used until final)
112-
None.
50+
![](../../Data_Integration_Framework\9000_Images\Solution_Pattern_Presentation_Layer_01.png)
51+
52+
53+
54+
## Implementation Guidelines
55+
56+
Type 1 and Type 2 are developed as separate ETL workflows. This allows us to only require development of (resource expensive) Type 2 logic when this is really required, and if required schedule Type 1 and Type 2 streams at different times. This helps to address performance impacts related to Type 2 calculations.
57+
58+
For instance, Type 1 can run every day while for the Type 2 equivalent it may be sufficient to run in the weekend only.
59+
60+
The difference between the ETL processes, views and table structures is organised by adopting naming conventions. Type 1 naming is 'normal', without any prefixes or suffixes. Consider the example below:
61+
62+
![](../../Data_Integration_Framework\9000_Images\Solution_Pattern_Presentation_Layer_02.png)
63+
64+
65+
66+
For Type 2 dimensions a naming convention is used to differentiate the objects ('_Pit').
67+
68+
![](../../Data_Integration_Framework\9000_Images\Solution_Pattern_Presentation_Layer_03.png)
69+
70+
71+
72+
The modelling conventions for the Presentation Layer **views** are outlined in the table below.
73+
74+
| **View type** | **Naming convention** | **Mandatory attribute(s)** | **Comment** |
75+
| ------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ |
76+
| **Logic View** | ben.<name>_VW | <view name>_SK<br />OMD_INSERT_MODULE_INSTANCE_ID OMD\_INSERT_DATETIME <br />OMD_INSERT_MODULE_INSTANCE_ID <br />OMD_DELETED_RECORD_INDICATOR OMD_UPDATE_MODULE_INSTANCE_ID OMD_CHECKSUM_TYPE_1 OMD_CHECKSUM_TYPE_2 OMD_EFFECTIVE_DATETIME OMD_EXPIRY_DATETIME OMD_CURRENT_RECORD_INDICATOR <attributes> | Used to load a standard Dimension or Fact table supported by BIML.<br />The name of the view needs to match the name of the target table (except for the _VW suffix).<br /><br />The _VW suffix is required as there may be a table with the original name in the ben schema.<br /><br />The checksums for Type 1 and Type 2 calculations will be handled by the BIML, and do not need to be present in the views. <br />This allows for a more automated update if required. All other OMD attributes required in the target table are handled by the BIML scripts. |
77+
| **Decoupling View** | pres.<name> (for regular views) pres.<name>_history (for history or mixed-history views) | Underlying ‘ben’ table or logic view, but without OMD attributes. Surrogate keys optional. | Business-facing, e.g. DIM_CUSTOMER, or DIM_CUSTOMER_HISTORY. |
78+
79+
80+
81+
The modelling conventions for the Presentation Layer **tables** are outlined in the table below.
82+
83+
| Table type | Naming convention | Mandatory attribute(s) | Comments |
84+
| -------------- | ----------------- | ------------------------------------------------------------ | :----------------------------------------------------------- |
85+
| **Dimension** | ben.DIM_<name> | <table_name>_SK OMD_INSERT_MODULE_INSTANCE_ID OMD_DELETED_RECORD_INDICATOR OMD_UPDATE_MODULE_INSTANCE_ID OMD_CHECKSUM_TYPE_1 OMD_CHECKSUM_TYPE_2 OMD_EFFECTIVE_DATETIME OMD_EXPIRY_DATETIME OMD_CURRENT_RECORD_INDICATOR | The first attribute (SK) is the primary key, and is a hash value (32 byte character).<br /><br />Optionally, a unique key / index is placed on the combination of level natural keys and the OMD_EFFECTIVE_DATETIME.<br />This represents a unique point in time record. See the consequences section for more details. Every attribute is specified as Type 0, Type 1, Type 2 (can be combined to type 3 or 6 - check the [relevant pattern](file://aubriprfil06/display/BI/Design+Pattern+002+-+Generic+-+Types+of+History)). <br /><br />This is specified in the model / database as an extended property. |
86+
| **Fact Table** | ben.FACT_<name> | <table_name>_SK<br /><Dimension Keys> OMD_INSERT_MODULE_INSTANCE_ID OMD_INSERT_DATETIME OMD_INSERT_MODULE_INSTANCE_ID OMD_DELETED_RECORD_INDICATOR OMD_UPDATE_MODULE_INSTANCE_ID OMD_CHECKSUM_TYPE_1 OMD_CHECKSUM_TYPE_2 OMD_EFFECTIVE_DATETIME OMD_EXPIRY_DATETIME OMD_CURRENT_RECORD_INDICATOR | The first attribute (SK) is the primary key. A unique key / index is placed on the combination of Dimension keys. |
87+
| **Other** | ben.<name> | <table name>_SK OMD_INSERT_MODULE_INSTANCE_ID OMD_INSERT_DATETIME <br /><any OMD attributes required> <attributes> | Not every delivery of information is necessarily in the form of a Star Schema / Dimensional Model. <br /><br />If a dataset is better delivered in a different format (wide table, normalised) this is preferred. |
88+
89+
90+
91+
92+
93+
## Considerations and Consequences
94+
95+
**Separation of Type 1 and Type 2 ETL processes**
96+
97+
This pattern separates the Type 1 and Type 2 streams completely for performance reasons. The idea is that in most cases there is no requirement for Type 2, so resources are wasted by having to calculate a full history if Type 1 is the only requirement.
98+
99+
The initial approach used to implemented everything as Type 2 first. It seemed a good idea at the time, but it didn't work out due to performance trade-offs.
100+
101+
**Usage of surrogate keys**
102+
103+
Related to having Data Vault Surrogate (Hub) Keys (SK) in the Dimensional Model: it is possible to add Hub keys (Surrogate Keys) in the Presentation Layer for tracing and auditability purposes. However they cannot be adequately used as level keys as a level in a Dimension may not 100% map a business concept.
104+
105+
For instance, a 'business unit type' may not be modelled as a Hub in the Data Vault, but could be required as a level in a Dimension.
106+
107+
By using Hub Keys for Dimension lookups a dependency between the Integration and Presentation Layers is created that should be avoided.
108+
109+
An example is where you have Business Unit Type, State, Counter and Ownership in the same Satellite (e.g. SAT_BUSINESS_UNIT). If these attributes are modelled in separate Dimensions in the Presentation Layer the Hub Key (from HUB_BUSINESS_UNIT) cannot be used, rather, a separate Dimension Key must be created and a dedicated natural key must be selected appropriate for the Dimension.
110+
111+
In other words, lookups and constraints should be using natural keys.
112+
113+
**Mixed history**
114+
115+
If the logic view and/or base table are Type 2 it is relatively easy to translate this into a mixed-history view. If , for example, half the attributes are Type 2 and the other half are Type 1 this can be implemented in the decoupling view while leaving the underlying structures as full Type 2.
116+
117+
## Related Patterns
118+
* Design Pattern 002 - Generic - Types of History

9000_Images/Images.pptx

5.23 KB
Binary file not shown.
92.1 KB
Loading
68.1 KB
Loading
72 KB
Loading

0 commit comments

Comments
 (0)