|
| 1 | +--- |
| 2 | +title: 'SQL Review Rule Explained - Require Primary Key' |
| 3 | +author: Adela |
| 4 | +updated_at: 2025/11/21 22:00 |
| 5 | +feature_image: /content/blog/sql-review-rule-explained-require-primary-key/banner.webp |
| 6 | +tags: Explanation |
| 7 | +description: Learn why requiring a primary key is important and how the "Require Primary Key" review rule protects your production database. |
| 8 | +--- |
| 9 | + |
| 10 | +A table without a primary key seems harmless until your system grows and problems appear. Missing primary keys commonly lead to **duplicate data, broken CDC pipelines, replication stalls, and inconsistent analytics**. |
| 11 | + |
| 12 | +Bytebase [SQL Review includes the rule](https://docs.bytebase.com/sql-review/review-rules#require-primary-key): |
| 13 | + |
| 14 | +> Bytebase considers this rule to be violated if the SQL tries to create a no primary key table or drop the primary key. If the SQL drops all columns in the primary key, Bytebase also considers that this SQL drops the primary key. |
| 15 | +
|
| 16 | +## Real Incidents Caused by Missing Primary Keys |
| 17 | + |
| 18 | +**PostgreSQL logical replication breaks** |
| 19 | +Without a PK, Postgres cannot apply updates/deletes during logical replication (or Debezium). |
| 20 | +Reference: [TIL: Creating tables without primary keys CAN cause updates and deletes to fail in Postgres](https://abhinavomprakash.com/posts/replica-identities/) |
| 21 | + |
| 22 | +**Matomo production replication stalled** |
| 23 | +A single table without a primary key caused MySQL master–slave replication to stop. |
| 24 | +Reference: [Master–Slave Replication Stalls Because of Missing Primary Key](https://forum.matomo.org/t/master-slave-replication-stalls-because-of-missing-primary-key/36251) |
| 25 | + |
| 26 | +**GitLab reported schema inconsistencies** |
| 27 | +GitLab engineers found tables without primary keys leading to environment drift and maintenance issues. |
| 28 | +Reference: [Database schema missing many primary keys - breaks replication](https://gitlab.com/gitlab-org/gitlab-ce/-/issues/51964) |
| 29 | + |
| 30 | +## Why Missing Primary Keys Are Dangerous |
| 31 | + |
| 32 | +### **1. Duplicate rows slip in** |
| 33 | + |
| 34 | +Without a PK, the database cannot enforce uniqueness. |
| 35 | +Accidental duplicates appear, corrupting analytics and reports. |
| 36 | + |
| 37 | +### **2. CDC systems can’t track row changes** |
| 38 | + |
| 39 | +Tools like Debezium and Kafka Connect need a stable row identity. |
| 40 | +No PK → they can’t emit correct update/delete events. |
| 41 | + |
| 42 | +### **3. Replication may stop or diverge** |
| 43 | + |
| 44 | +Both MySQL and PostgreSQL depend on primary keys during replication. |
| 45 | +A missing PK can cause replication to stall or go out of sync. |
| 46 | + |
| 47 | +### **4. Upserts don’t work correctly** |
| 48 | + |
| 49 | +`INSERT … ON CONFLICT`, `MERGE`, and UPSERT patterns require a PK. |
| 50 | +Without one, the database can’t resolve conflicts reliably. |
| 51 | + |
| 52 | +### **5. Debugging becomes guesswork** |
| 53 | + |
| 54 | +Without a unique row identifier, you can’t target a specific record. |
| 55 | +Deleting, fixing, or investigating a single row becomes unsafe. |
| 56 | + |
| 57 | +## How to Fix Tables Without Primary Keys? |
| 58 | + |
| 59 | +### **1. Add a surrogate primary key** |
| 60 | + |
| 61 | +```sql |
| 62 | +ALTER TABLE events |
| 63 | +ADD COLUMN id BIGSERIAL PRIMARY KEY; |
| 64 | +``` |
| 65 | + |
| 66 | +### **2. Use a natural composite key if appropriate** |
| 67 | + |
| 68 | +```sql |
| 69 | +ALTER TABLE order_items |
| 70 | +ADD PRIMARY KEY (order_id, line_number); |
| 71 | +``` |
| 72 | + |
| 73 | +### **3. Combine surrogate PK with a unique business key** |
| 74 | + |
| 75 | +```sql |
| 76 | +ALTER TABLE shipments |
| 77 | +ADD COLUMN id BIGSERIAL PRIMARY KEY, |
| 78 | +ADD CONSTRAINT shipments_unique UNIQUE (tracking_number); |
| 79 | +``` |
0 commit comments