Skip to content

Commit c2d8a89

Browse files
authored
Merge pull request #1044 from Savvythelegend/feature/add-dimensional-modelling-docs
feat: Add comprehensive guide on dimensional modelling concepts
2 parents 6eefd1b + c8ebf6b commit c2d8a89

File tree

2 files changed

+180
-0
lines changed

2 files changed

+180
-0
lines changed
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
---
2+
id: dimensional-modelling
3+
title: Dimensional Modelling in Data Warehousing
4+
sidebar_label: 🔄 Dimensional Modelling
5+
description: A comprehensive guide to dimensional modelling concepts, including fact tables, dimension tables, and schema types
6+
keywords: [sql, data warehouse, dimensional modelling, fact tables, dimension tables, star schema, snowflake schema]
7+
---
8+
9+
# Dimensional Modelling: Structuring Data for Analytics
10+
11+
## Introduction to Dimensional Modelling
12+
13+
Dimensional modelling is an architectural approach to structuring data that optimizes it for analytical queries and reporting. Unlike traditional database design that focuses on eliminating data redundancy through normalization, dimensional modelling prioritizes query performance and business user understanding.
14+
15+
Think of dimensional modelling as organizing a library: While a normalized database would catalog books by ISBN and store author details separately to avoid duplication (like a library's internal system), dimensional modelling would organize books by genre, author, and publication date (like the actual shelves in the library) - making it easier for readers to find what they want.
16+
17+
The Kimball methodology, developed by Ralph Kimball, has become the industry standard for dimensional modelling. It emphasizes a bottom-up approach that starts with specific business processes and builds a cohesive data warehouse through standardized dimensions.
18+
19+
## Core Components: Fact and Dimension Tables
20+
21+
### Dimension Tables (DIM Tables)
22+
23+
Dimension tables provide the context for your business measurements. They answer the who, what, where, when, and how of your data.
24+
25+
#### Structure
26+
```sql
27+
CREATE TABLE Dim_Product (
28+
ProductKey INT IDENTITY(1,1), -- Surrogate Key
29+
ProductID VARCHAR(50), -- Natural/Business Key
30+
ProductName VARCHAR(100),
31+
Category VARCHAR(50),
32+
Brand VARCHAR(50),
33+
Color VARCHAR(30),
34+
Size VARCHAR(20),
35+
UnitPrice DECIMAL(10,2),
36+
EffectiveDate DATE, -- When this version became active
37+
CurrentFlag BOOLEAN -- Is this the current version?
38+
);
39+
```
40+
41+
Each dimension table uses a surrogate key (an artificially generated key) as its primary key. This approach:
42+
- Isolates the data warehouse from source system changes
43+
- Enables historical tracking through slowly changing dimensions
44+
- Provides consistent joining mechanisms across fact tables
45+
46+
### Fact Tables (FACT Tables)
47+
48+
Fact tables contain the measurements or metrics of your business processes. They're the "verbs" to your dimension tables' "nouns."
49+
50+
#### Structure
51+
```sql
52+
CREATE TABLE Fact_Sales (
53+
SaleKey INT IDENTITY(1,1),
54+
DateKey INT, -- Foreign key to Dim_Date
55+
ProductKey INT, -- Foreign key to Dim_Product
56+
StoreKey INT, -- Foreign key to Dim_Store
57+
CustomerKey INT, -- Foreign key to Dim_Customer
58+
QuantitySold INT, -- Measure
59+
Revenue DECIMAL(10,2), -- Measure
60+
Cost DECIMAL(10,2), -- Measure
61+
FOREIGN KEY (DateKey) REFERENCES Dim_Date(DateKey),
62+
FOREIGN KEY (ProductKey) REFERENCES Dim_Product(ProductKey),
63+
FOREIGN KEY (StoreKey) REFERENCES Dim_Store(StoreKey),
64+
FOREIGN KEY (CustomerKey) REFERENCES Dim_Customer(CustomerKey)
65+
);
66+
```
67+
68+
Facts are typically:
69+
- Numerical
70+
- Additive (can be summed across dimensions)
71+
- Generated when business events occur
72+
73+
## Understanding Grain
74+
75+
Grain is the fundamental atomic level of detail represented in a fact table. It's the answer to the question: "What does a single row in my fact table represent?"
76+
77+
### Importance of Grain
78+
Defining the grain is the most critical design decision in dimensional modelling because it:
79+
- Determines which dimensions can be used
80+
- Affects the size and performance of your data warehouse
81+
- Influences the types of analysis possible
82+
83+
### Example Grains
84+
1. **Transaction Grain**
85+
```sql
86+
-- One row per product per transaction
87+
SELECT TransactionID, ProductKey, QuantitySold, Revenue
88+
FROM Fact_Sales;
89+
```
90+
91+
2. **Daily Summary Grain**
92+
```sql
93+
-- One row per product per day per store
94+
SELECT DateKey, StoreKey, ProductKey,
95+
SUM(QuantitySold) as DailyQuantity,
96+
SUM(Revenue) as DailyRevenue
97+
FROM Fact_Sales
98+
GROUP BY DateKey, StoreKey, ProductKey;
99+
```
100+
101+
## Schema Types
102+
103+
### Star Schema
104+
The star schema is the fundamental building block of dimensional modelling. It features:
105+
106+
- A central fact table
107+
- Surrounding dimension tables
108+
- Direct relationships (no normalization of dimensions)
109+
110+
```sql
111+
-- Example Star Schema Query
112+
SELECT
113+
d.DepartmentName,
114+
p.ProductName,
115+
t.Year,
116+
SUM(f.Revenue) as TotalRevenue
117+
FROM Fact_Sales f
118+
JOIN Dim_Product p ON f.ProductKey = p.ProductKey
119+
JOIN Dim_Department d ON f.DepartmentKey = d.DepartmentKey
120+
JOIN Dim_Time t ON f.TimeKey = t.TimeKey
121+
GROUP BY d.DepartmentName, p.ProductName, t.Year;
122+
```
123+
124+
### Snowflake Schema
125+
The snowflake schema normalizes dimension tables into multiple related tables. While it saves storage space, it typically sacrifices query performance due to additional joins.
126+
127+
```sql
128+
-- Example Snowflake Schema Query
129+
SELECT
130+
c.CategoryName,
131+
sc.SubCategoryName,
132+
p.ProductName,
133+
SUM(f.Revenue) as TotalRevenue
134+
FROM Fact_Sales f
135+
JOIN Dim_Product p ON f.ProductKey = p.ProductKey
136+
JOIN Dim_SubCategory sc ON p.SubCategoryKey = sc.SubCategoryKey
137+
JOIN Dim_Category c ON sc.CategoryKey = c.CategoryKey
138+
GROUP BY c.CategoryName, sc.SubCategoryName, p.ProductName;
139+
```
140+
141+
## Best Practices
142+
143+
1. **Choose the Right Grain**
144+
- Start with the finest grain that makes business sense
145+
- Document grain decisions clearly
146+
- Maintain consistent grain within each fact table
147+
148+
2. **Design for Performance**
149+
- Denormalize dimensions in star schemas
150+
- Create appropriate indexes
151+
- Partition large fact tables
152+
153+
3. **Maintain Data Quality**
154+
- Implement surrogate keys
155+
- Handle slowly changing dimensions appropriately
156+
- Establish clear update procedures
157+
158+
4. **Think About the Future**
159+
- Design for extensibility
160+
- Plan for changing business requirements
161+
- Document assumptions and decisions
162+
163+
## Common Pitfalls to Avoid
164+
165+
1. **Mixing Different Grains** in a single fact table
166+
2. **Over-normalizing** dimension tables
167+
3. **Including redundant dimension columns** in fact tables
168+
4. **Neglecting** to handle slowly changing dimensions
169+
5. **Creating too many** bridge tables
170+
171+
## Summary
172+
173+
Dimensional modelling is a powerful approach for organizing data to support business intelligence and analytics. By following these principles and best practices, you can create a data warehouse that is:
174+
- Easy to understand
175+
- Fast to query
176+
- Flexible for future changes
177+
- Reliable for decision-making
178+
179+
Remember that the goal is to create a structure that makes sense to business users while maintaining good performance for analytical queries. Always start with the business requirements and work backwards to the technical implementation.

sidebars.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ const sidebars: SidebarsConfig = {
167167
"sql/SQL-Advance/sql-indexes",
168168
"sql/SQL-Advance/sql-advanced-analytics",
169169
"sql/SQL-Advance/sql-procedures-functions-triggers",
170+
"sql/SQL-Advance/dimensional-modelling",
170171
],
171172
},
172173
],

0 commit comments

Comments
 (0)