Skip to content

Db normalization guides #553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
9 changes: 8 additions & 1 deletion src/content/docs/guides/_map.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,12 @@
["mysql-local-setup", "Local setup of MySQL"],
["seeding-with-partially-exposed-tables", "Seeding Partially Exposed Tables with Foreign Key"],
["seeding-using-with-option", "Seeding using 'with' option"],
["full-text-search-with-generated-columns", "Full-text search with Generated Columns"]
["full-text-search-with-generated-columns", "Full-text search with Generated Columns"],
["first-normal-form", "First Normal Form (1NF)"],
["second-normal-form", "Second Normal Form (2NF)"],
["third-normal-form", "Third Normal Form (3NF)"],
["boyce-codd-normal-form", "Boyce-Codd Normal Form (BCNF)"],
["elementary-key-normal-form", "Elementary Key Normal Form (EKNF)"],
["fourth-normal-form", "Fourth Normal Form (4NF)"],
["fifth-normal-form", "Fifth Normal Form (5NF)"]
]
201 changes: 201 additions & 0 deletions src/content/docs/guides/boyce-codd-normal-form.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
---
title: Boyce-Codd Normal Form (BCNF)
slug: boyce-codd-normal-form
---

import Section from "@mdx/Section.astro";
import CodeTabs from '@mdx/CodeTabs.astro';
import CodeTab from '@mdx/CodeTab.astro';
import Prerequisites from "@mdx/Prerequisites.astro";

<Prerequisites>
- You should be familiar with [3NF](/docs/guides/third-normal-form)
- You should be familiar with [EKNF](/docs/guides/elementary-key-normal-form)
</Prerequisites>

**Boyce-Codd Normal Form (BCNF)** is a stricter form of normalization than **Third Normal Form**. While **3NF** focuses on the relationship between non-prime attributes and candidate keys, **BCNF** goes a step further by addressing potential redundancies even when **3NF** is achieved. The goal is to eliminate all redundancy that can be detected through functional dependencies.

In practice, you barely see tables in **3NF** that violate **BCNF**. However, it's important to understand the concept of **BCNF**.

## The BCNF Rule

To achieve **BCNF**, table should already be in **3NF** and must satisfy the following condition:

For every non-trivial functional dependency X -> Y, the determinant (X) must be a super key. In simpler terms, the left-hand side of any functional dependency must uniquely determine the entire row in the table.

## Example

We have a table `room_reservations` with the following schema:

<CodeTabs items={["schema.ts", "schema.sql", "schema.txt"]}>
<CodeTab>
```ts
import { integer, pgTable, primaryKey, unique, varchar } from "drizzle-orm/pg-core";

export const roomReservations = pgTable("room_reservations", {
room: integer("room").notNull(),
startTime: varchar("start_time", { length: 5 }).notNull(),
endTime: varchar("end_time", { length: 5 }).notNull(),
rateType: varchar("rate_type", { length: 20 }).notNull(),
}, (t) => [
primaryKey({ columns: [t.room, t.startTime] }),
unique().on(t.room, t.endTime),
unique().on(t.rateType, t.startTime),
unique().on(t.rateType, t.endTime),
]);
```
</CodeTab>
<CodeTab>
```sql
CREATE TABLE "room_reservations" (
"room" integer NOT NULL,
"start_time" varchar(5) NOT NULL,
"end_time" varchar(5) NOT NULL,
"rate_type" varchar(20) NOT NULL,
CONSTRAINT "room_reservations_room_start_time_pk" PRIMARY KEY("room","start_time"),
CONSTRAINT "room_reservations_room_end_time_unique" UNIQUE("room","end_time"),
CONSTRAINT "room_reservations_rate_type_start_time_unique" UNIQUE("rate_type","start_time"),
CONSTRAINT "room_reservations_rate_type_end_time_unique" UNIQUE("rate_type","end_time")
);
```
</CodeTab>
<CodeTab>
```plaintext
+------------------------------------------------+
| room_reservations |
+------------+------------+----------+-----------+
| room_number| start_time | end_time | rate_type |
+------------+------------+----------+-----------+
| 101 | 08:00 | 09:00 | SAVER |
| 101 | 09:30 | 10:30 | SAVER |
| 101 | 12:00 | 13:30 | STANDARD |
| 101 | 14:00 | 15:30 | STANDARD |
| 201 | 10:00 | 11:30 | DELUXE-B |
| 201 | 11:30 | 13:30 | DELUXE-B |
| 201 | 15:00 | 16:30 | DELUXE-A |
+------------+------------+----------+-----------+
```
</CodeTab>
</CodeTabs>

### Business rules

1. Each row in this table represents a hotel room reservation. For simplicity, assume the hotel has two rooms: `Room 101` (a standard room) and `Room 201` (a deluxe suite).
2. A booking is defined by the `room` and the time period (`start_time` to `end_time`) for which that room is reserved. No room can have overlapping bookings (only one reservation per room at a given time).
3. Additionally, each booking has a `rate_type`. There are four distinct rate types in this example, each implying a specific guest status (membership level):
- `SAVER` - for `Room 101` bookings made by loyalty members (member discount rate)
- `STANDARD` - for `Room 101` bookings made by non-members (regular rate)
- `DELUXE-A` - for `Room 201` bookings made by members (premium room, member rate)
- `DELUXE-B` - for `Room 201` bookings made by non-members (premium room, standard/non-member rate)

### Functional Dependencies

1. `room, start_time -> end_time, rate_type`
2. `room, end_time -> start_time, rate_type`
3. `rate_type, start_time -> room, end_time`
4. `rate_type, end_time -> room, start_time`
5. `rate_type -> room`. Each rate type is associated with exactly one room. From the business rules, we see that a given rate code implies a specific room (e.g. any `SAVER` rate booking is always for `Room 101`, and any `DELUXE-B` booking is always for `Room 201`). Formally, this means rate type functionally determines room.

### Candidate keys

The candidate keys for this table are:
1. `room, start_time`
2. `room, end_time`
3. `rate_type, start_time`
4. `rate_type, end_time`

Even though in the sample data each `start_time` and `end_time` happens to be unique across all bookings, we cannot treat `start_time` or `end_time` alone as a key because on another day two different rooms could have bookings that begin or end at the same time. That is why a combination of attributes is needed to uniquely identify a booking.

### 3NF Analysis

Since every attribute in this table is part of at least one candidate key, there are no non-prime attributes. This means no transitive dependencies exist, so the table is in **3NF**.

### BCNF Analysis

The relation is not in **BCNF** because of this functional dependency:

`rate_type -> room`

The left-hand side (`rate_type`) is not a super key. Both `rate_type` and `room` are prime attributes because they are part of candidate keys. This dependency violates the **BCNF** rule.

This leads to:
1. Information duplication.
2. Update anomalies. If the hotel changes which room a given rate code applies to, multiple rows would need updating.
3. Deletion anomalies. If a `DELUXE-A` booking is deleted, the information about which room that rate code applies to would be lost.

### BCNF Decomposition

We need to split the table to eliminate the problematic dependency. We create two tables:

<CodeTabs items={["schema.ts", "schema.sql", "schema.txt"]}>
<CodeTab>
```ts
import { boolean, foreignKey, integer, pgTable, primaryKey, unique, varchar } from "drizzle-orm/pg-core";

export const rateTypes = pgTable("rate_types", {
rateType: varchar("rate_type", { length: 20 }).notNull().primaryKey(),
room: integer("room").notNull(),
memberFlag: boolean("member_flag").notNull(),
}, (t) => [
unique().on(t.room, t.memberFlag),
]);

export const reservations = pgTable("reservations", {
room: integer("room").notNull(),
startTime: varchar("start_time", { length: 5 }).notNull(),
endTime: varchar("end_time", { length: 5 }).notNull(),
memberFlag: boolean("member_flag").notNull(),
}, (t) => [
primaryKey({ columns: [t.room, t.startTime] }),
foreignKey({
columns: [t.room, t.memberFlag],
foreignColumns: [rateTypes.room, rateTypes.memberFlag],
}).onDelete("cascade").onUpdate("cascade"),
unique().on(t.room, t.endTime),
]);
```
</CodeTab>
<CodeTab>
```sql
CREATE TABLE "rate_types" (
"rate_type" varchar(20) PRIMARY KEY NOT NULL,
"room" integer NOT NULL,
"member_flag" boolean NOT NULL,
CONSTRAINT "rate_types_room_member_flag_unique" UNIQUE("room","member_flag")
);
--> statement-breakpoint
CREATE TABLE "reservations" (
"room" integer NOT NULL,
"start_time" varchar(5) NOT NULL,
"end_time" varchar(5) NOT NULL,
"member_flag" boolean NOT NULL,
CONSTRAINT "reservations_room_start_time_pk" PRIMARY KEY("room","start_time"),
CONSTRAINT "reservations_room_end_time_unique" UNIQUE("room","end_time")
);
--> statement-breakpoint
ALTER TABLE "reservations" ADD CONSTRAINT "reservations_room_member_flag_rate_types_room_member_flag_fk" FOREIGN KEY ("room","member_flag") REFERENCES "public"."rate_types"("room","member_flag") ON DELETE cascade ON UPDATE cascade;
```
</CodeTab>
<CodeTab>
```plaintext
+-----------------------------------+ +---------------------------------------------+
| rate_types | | reservations |
+-------------+------+--------------+ +------+------------+----------+--------------+
| rate_type | room | member_flag | | room | start_time | end_time | member_flag |
+-------------+------+--------------+ +------+------------+----------+--------------+
| SAVER | 101 | true | | 101 | 08:00 | 09:00 | true |
| STANDARD | 101 | false | | 101 | 09:30 | 10:30 | true |
| DELUXE-A | 201 | true | | 101 | 12:00 | 13:30 | false |
| DELUXE-B | 201 | false | | 101 | 14:00 | 15:30 | false |
+-------------+------+--------------+ | 201 | 10:00 | 11:30 | false |
| 201 | 11:30 | 13:30 | false |
| 201 | 15:00 | 16:30 | true |
+------+------------+----------+--------------+
```
</CodeTab>
</CodeTabs>

The dependency `rate_type -> room` is fully enforced in this table (and is no longer a problem, because `rate_type` is a candidate key here). The table also has another candidate key `room, member_flag`, since each combination of room and membership status determines a unique rate type.
`rate_type` is no longer stored in `reservations` table, so the redundancy is gone. Instead, the combination of `room` and `member_flag` for a reservation can be used to lookup the `rate_type` from the `rate_types` table when needed.

Both tables are now in **BCNF**. For every functional dependency in each table, the left-hand side is a super key.
Loading