Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions docs/sql/SQL-Advance/dimensional-modelling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---
id: dimensional-modelling
title: Dimensional Modelling in Data Warehousing
sidebar_label: 🔄 Dimensional Modelling
description: A comprehensive guide to dimensional modelling concepts, including fact tables, dimension tables, and schema types
keywords: [sql, data warehouse, dimensional modelling, fact tables, dimension tables, star schema, snowflake schema]
---

# Dimensional Modelling: Structuring Data for Analytics

## Introduction to Dimensional Modelling

Dimensional modelling is an architectural approach to structuring data that optimizes it for analytical queries and reporting. Unlike traditional database design that focuses on eliminating data redundancy through normalization, dimensional modelling prioritizes query performance and business user understanding.

Think of dimensional modelling as organizing a library: While a normalized database would catalog books by ISBN and store author details separately to avoid duplication (like a library's internal system), dimensional modelling would organize books by genre, author, and publication date (like the actual shelves in the library) - making it easier for readers to find what they want.

The Kimball methodology, developed by Ralph Kimball, has become the industry standard for dimensional modelling. It emphasizes a bottom-up approach that starts with specific business processes and builds a cohesive data warehouse through standardized dimensions.

## Core Components: Fact and Dimension Tables

### Dimension Tables (DIM Tables)

Dimension tables provide the context for your business measurements. They answer the who, what, where, when, and how of your data.

#### Structure
```sql
CREATE TABLE Dim_Product (
ProductKey INT IDENTITY(1,1), -- Surrogate Key
ProductID VARCHAR(50), -- Natural/Business Key
ProductName VARCHAR(100),
Category VARCHAR(50),
Brand VARCHAR(50),
Color VARCHAR(30),
Size VARCHAR(20),
UnitPrice DECIMAL(10,2),
EffectiveDate DATE, -- When this version became active
CurrentFlag BOOLEAN -- Is this the current version?
);
```

Each dimension table uses a surrogate key (an artificially generated key) as its primary key. This approach:
- Isolates the data warehouse from source system changes
- Enables historical tracking through slowly changing dimensions
- Provides consistent joining mechanisms across fact tables

### Fact Tables (FACT Tables)

Fact tables contain the measurements or metrics of your business processes. They're the "verbs" to your dimension tables' "nouns."

#### Structure
```sql
CREATE TABLE Fact_Sales (
SaleKey INT IDENTITY(1,1),
DateKey INT, -- Foreign key to Dim_Date
ProductKey INT, -- Foreign key to Dim_Product
StoreKey INT, -- Foreign key to Dim_Store
CustomerKey INT, -- Foreign key to Dim_Customer
QuantitySold INT, -- Measure
Revenue DECIMAL(10,2), -- Measure
Cost DECIMAL(10,2), -- Measure
FOREIGN KEY (DateKey) REFERENCES Dim_Date(DateKey),
FOREIGN KEY (ProductKey) REFERENCES Dim_Product(ProductKey),
FOREIGN KEY (StoreKey) REFERENCES Dim_Store(StoreKey),
FOREIGN KEY (CustomerKey) REFERENCES Dim_Customer(CustomerKey)
);
```

Facts are typically:
- Numerical
- Additive (can be summed across dimensions)
- Generated when business events occur

## Understanding Grain

Grain is the fundamental atomic level of detail represented in a fact table. It's the answer to the question: "What does a single row in my fact table represent?"

### Importance of Grain
Defining the grain is the most critical design decision in dimensional modelling because it:
- Determines which dimensions can be used
- Affects the size and performance of your data warehouse
- Influences the types of analysis possible

### Example Grains
1. **Transaction Grain**
```sql
-- One row per product per transaction
SELECT TransactionID, ProductKey, QuantitySold, Revenue
FROM Fact_Sales;
```

2. **Daily Summary Grain**
```sql
-- One row per product per day per store
SELECT DateKey, StoreKey, ProductKey,
SUM(QuantitySold) as DailyQuantity,
SUM(Revenue) as DailyRevenue
FROM Fact_Sales
GROUP BY DateKey, StoreKey, ProductKey;
```

## Schema Types

### Star Schema
The star schema is the fundamental building block of dimensional modelling. It features:

- A central fact table
- Surrounding dimension tables
- Direct relationships (no normalization of dimensions)

```sql
-- Example Star Schema Query
SELECT
d.DepartmentName,
p.ProductName,
t.Year,
SUM(f.Revenue) as TotalRevenue
FROM Fact_Sales f
JOIN Dim_Product p ON f.ProductKey = p.ProductKey
JOIN Dim_Department d ON f.DepartmentKey = d.DepartmentKey
JOIN Dim_Time t ON f.TimeKey = t.TimeKey
GROUP BY d.DepartmentName, p.ProductName, t.Year;
```

### Snowflake Schema
The snowflake schema normalizes dimension tables into multiple related tables. While it saves storage space, it typically sacrifices query performance due to additional joins.

```sql
-- Example Snowflake Schema Query
SELECT
c.CategoryName,
sc.SubCategoryName,
p.ProductName,
SUM(f.Revenue) as TotalRevenue
FROM Fact_Sales f
JOIN Dim_Product p ON f.ProductKey = p.ProductKey
JOIN Dim_SubCategory sc ON p.SubCategoryKey = sc.SubCategoryKey
JOIN Dim_Category c ON sc.CategoryKey = c.CategoryKey
GROUP BY c.CategoryName, sc.SubCategoryName, p.ProductName;
```

## Best Practices

1. **Choose the Right Grain**
- Start with the finest grain that makes business sense
- Document grain decisions clearly
- Maintain consistent grain within each fact table

2. **Design for Performance**
- Denormalize dimensions in star schemas
- Create appropriate indexes
- Partition large fact tables

3. **Maintain Data Quality**
- Implement surrogate keys
- Handle slowly changing dimensions appropriately
- Establish clear update procedures

4. **Think About the Future**
- Design for extensibility
- Plan for changing business requirements
- Document assumptions and decisions

## Common Pitfalls to Avoid

1. **Mixing Different Grains** in a single fact table
2. **Over-normalizing** dimension tables
3. **Including redundant dimension columns** in fact tables
4. **Neglecting** to handle slowly changing dimensions
5. **Creating too many** bridge tables

## Summary

Dimensional modelling is a powerful approach for organizing data to support business intelligence and analytics. By following these principles and best practices, you can create a data warehouse that is:
- Easy to understand
- Fast to query
- Flexible for future changes
- Reliable for decision-making

Remember that the goal is to create a structure that makes sense to business users while maintaining good performance for analytical queries. Always start with the business requirements and work backwards to the technical implementation.
1 change: 1 addition & 0 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ const sidebars: SidebarsConfig = {
"sql/SQL-Advance/sql-indexes",
"sql/SQL-Advance/sql-advanced-analytics",
"sql/SQL-Advance/sql-procedures-functions-triggers",
"sql/SQL-Advance/dimensional-modelling",
],
},
],
Expand Down
Loading