You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Phase 4 of RFC #1480. Separate epic from the verdict fix (Phases 1-3).
Problem
wiki_pages.id is currently a mutable text slug (e.g., "anthropic"). 11 tables reference it as a text FK. This causes:
Slower JOINs/indexing vs integer PKs
Fragile FK chains if a slug ever changes
Conflation of identity with display name
Target State
wiki_pages.id = integer — the numeric part of the EID (store 42, render E42)
wiki_pages.slug = text UNIQUE NOT NULL — replaces the old text PK
All 11 FK tables change page_id from text → integer
The entity_ids table already maintains numeric_id: integer PK ↔ slug: text UNIQUE mappings with entity_id_seq for allocation. This is the source of truth for the migration.
Current State Inventory
wiki_pages table (schema.ts:87-136)
id: text("id").primaryKey() — the slug
numericId: text("numeric_id") — sparsely populated, stores "E42" as text
claims.entity_id → entities.id (text) compared to wp.id
PR Strategy: 3 PRs, Deployed Sequentially
Why not one PR?
One PR would be ~2000+ lines touching 25+ files with a mix of "safe additive changes" and "irreversible schema swap." Reviewers can't meaningfully evaluate it. If any part is wrong, you can't revert just the broken part.
Why not more than 3?
You could split PR 1 into "SQL migration only" + "dual-write code," but that creates a window where new rows lack integer values (race condition — RT-3 below). Keeping the migration and dual-write together eliminates that gap.
SQL migration adds slug, integer_id, and all page_id_int columns with backfill. Code changes update all 12+ route files to write both old text and new integer columns on every INSERT/UPDATE.
Tests still use text IDs (they still work — old columns still exist)
Rollback: revert the PR, drop the new columns. System works exactly as before.
Size: Medium. SQL is mechanical. Dual-write in routes is repetitive but straightforward.
After merge: Run full sync, verify all integer columns populated (Phase C). This gap between PR 1 and PR 2 can be hours or days — no pressure.
PR 2: PK Swap + Code Finalize (Phases D+E) — The risky one
SQL migration does the atomic swap in one transaction (drop text columns, rename integer columns, rebuild all FKs + indexes). Code changes update all routes/raw SQL/API types to integer-only, fix cross-table JOINs, remove dual-write code.
Schema migration and code changes MUST deploy together — can't have old code against new schema.
Size: Large but reviewable — the dual-write scaffolding from PR 1 makes the diff clear (removing the old path, not adding a new one).
PR 3 (optional): Cleanup
VACUUM ANALYZE
Cosmetic renames (e.g., SyncPageSchema.id → slug)
Documentation updates
The Coordination Window
PR 1 merges → full sync → verify integer columns → PR 2 merges (schema + code together)
↑ safe gap, can be days ↑ ↑ point of no return ↑
Migration Strategy: Expand → Migrate → Contract
Phase A: Expand (Safe, Rollback = drop columns)
Migration 005X_add_slug_and_integer_id.sql:
Add slug and integer_id columns to wiki_pages
Backfill slug = id (current text PK) and integer_id from entity_ids
Add NOT NULL + UNIQUE constraints
Add page_id_int integer columns to all 11 FK tables + page_links
Backfill all page_id_int from entity_ids lookup
Verify zero NULLs (except nullable routed_to_page_id_int)
Scale: ~700 wiki_pages rows, FK tables have hundreds-to-low-thousands. All UPDATEs complete in milliseconds. No lock concerns.
Phase B: Dual-Write Code Deployment
All INSERT/UPDATE paths write BOTH old text and new integer columns.
Key change — POST /pages/sync must:
Accept slug-based input (unchanged API contract for callers)
Look up or auto-allocate integer ID via entity_ids
Write both id (slug) + integer_id and both text + integer FK columns
explore.ts:291,389: LEFT JOIN entities e ON wp.slug = e.id
links.ts:193,282: works as-is (page_links.source_id now integer = wp.id)
integrity.ts:430: WHERE c.entity_id = wp.slug
explore.ts specific:
Remove wp.numeric_id from SELECT (column gone)
recentlyCreated sort: CAST(SUBSTRING(wp.numeric_id FROM 2) AS INTEGER) → just wp.id
BASE_CONDITIONS filter numeric_id IS NOT NULL → remove (all pages have integer IDs)
Response mapping: format numericId from wp.id as E${r.id}
Phase F: Clean Up
Remove any remaining dual-write code
Update all test files (10+ files with hardcoded string page IDs)
VACUUM ANALYZE
Pre-Migration Checks (MUST ALL PASS)
-- 1. Pages without numeric IDs (must return 0)SELECTCOUNT(*) FROM wiki_pages wp
LEFT JOIN entity_ids ei ONei.slug=wp.idWHEREei.numeric_id IS NULL;
-- 2. entity_ids vs wiki_pages.numericId disagreementsSELECTwp.id, wp.numeric_id, ei.numeric_idFROM wiki_pages wp
JOIN entity_ids ei ONei.slug=wp.idWHEREwp.numeric_idIS NOT NULLANDwp.numeric_id!= CONCAT('E', ei.numeric_id);
-- 3. Orphaned FK rows (each FK table)SELECTCOUNT(*) FROM citation_quotes WHERE page_id NOT IN (SELECT id FROM wiki_pages);
-- ... repeat for all 11 FK tables-- 4. Orphaned page_linksSELECTCOUNT(*) FROM page_links WHERE source_id NOT IN (SELECT id FROM wiki_pages);
SELECTCOUNT(*) FROM page_links WHERE target_id NOT IN (SELECT id FROM wiki_pages);
-- 5. Sequence ahead of maxSELECT last_value FROM entity_id_seq;
SELECTMAX(numeric_id) FROM entity_ids;
Remedy for Check 1 failures — batch-allocate:
INSERT INTO entity_ids (numeric_id, slug)
SELECT nextval('entity_id_seq'), wp.idFROM wiki_pages wp
WHEREwp.id NOT IN (SELECT slug FROM entity_ids);
Red-Team Findings & Mitigations
CRITICAL
RT-5: explore.ts ↔ entities JOIN — LEFT JOIN entities e ON wp.id = e.id breaks because entities.id stays text while wp.id becomes integer. Fix: change to ON wp.slug = e.id.
RT-6: integrity.ts claims comparison — WHERE c.entity_id = wp.id compares text claims.entity_id (→ entities.id) against integer wp.id. Fix: change to WHERE c.entity_id = wp.slug.
HIGH
RT-2: Build system chicken-and-egg — Post-migration, sync receives slugs but must write integer IDs. New pages won't have entity_ids rows yet. Fix: sync endpoint auto-allocates via INSERT INTO entity_ids (numeric_id, slug) VALUES (nextval('entity_id_seq'), $slug).
RT-3: Race conditions during migration — If a sync runs between backfill (Phase A) and PK swap (Phase D), new rows have NULL integer columns. Fix: Phase B (dual-write) deploys BEFORE Phase D. All writes populate both columns.
RT-12: Rollback from Phase D — Not trivially reversible. Fix: deploy Phase D + E together; keep pre-written rollback SQL; test against a Phase D test database before production.
MEDIUM
RT-1: entity_ids table redundancy — After migration, entity_ids partially duplicates wiki_pages. Recommendation: keep entity_ids as allocation service (it serves entities without wiki pages; /api/ids/allocate pre-allocates IDs before pages exist).
RT-4: API contract break — Responses change from { id: "anthropic", numericId: "E42" } to { id: 42, slug: "anthropic" }. Mitigated by Hono RPC inferred types (compile-time errors), frontend uses build-time data not live API.
RT-7: page_links orphans — source_id/target_id have no FK, may reference deleted pages. Fix: pre-migration check identifies orphans; delete them (page_links is fully repopulated on every sync).
RT-8: Composite PKs — session_pages and resource_citations have composite PKs including page_id. Must drop and recreate with integer type in Phase D transaction.
RT-9: Expression indexes — citation_quotes UNIQUE(page_id, footnote) and claim_page_references UNIQUE(claim_id, page_id, COALESCE(footnote, -1)) must be explicitly rebuilt after column rename.
RT-11: Test files — 10+ test files hardcode id: "anthropic". All need updating in Phase F.
LOW
RT-10: SyncPageSchema.id stays z.string() (it's the slug from build system). Consider renaming to slug for clarity.
RT-13: citation_content has NO page_id column — URL-keyed, not affected.
RT-14: Frontend unchanged — already uses numericIdToSlug() from build-time database.json.
RT-15: Discord bot — no direct wiki_pages ID references found. Verify with team.
Open Questions
entity_ids fate: Keep as allocation service, or absorb into wiki_pages? (Recommendation: keep)
API response id type: Return id: 42 (integer) or id: "E42" (formatted string)? (Recommendation: id: 42 + slug: "anthropic" — the "E" prefix is display formatting)
page_links FK: Should we add FK constraints to page_links.source_id/target_id now?
SyncPageSchema rename: Rename id → slug in the sync input schema for clarity?
Phases 1-3 dependency: Does this plan need revision after Phases 1-3 land?
Verification Checklist
After PR 1 + full sync (Phases A-C):
SELECT COUNT(*) FROM wiki_pages WHERE slug IS NULL → 0
SELECT COUNT(*) FROM wiki_pages WHERE integer_id IS NULL → 0
Each FK table: page_id_int IS NULL AND page_id IS NOT NULL → 0
Full sync completes without errors
All tests pass
/api/integrity clean
After PR 2 (Phases D-F):
\d wiki_pages shows id integer PK, slug text UNIQUE NOT NULL
All FK constraints exist (check pg_constraint)
Row counts match pre-migration for all tables
pnpm build succeeds
pnpm test passes
Full sync works
API endpoints: search, explore, page detail, citations, edit-logs
Frontend renders /wiki/E42 correctly
New page creation + deletion works
/api/integrity clean
Estimated Effort: 3-5 days
Phase
Effort
Risk
Pre-checks + Phase A (SQL)
0.5 day
Low
Phase B (dual-write code)
1-2 days
Medium — touches 12+ route files
Phase C (verify)
0.5 day
Low
Phase D (PK swap SQL)
0.5 day
High — point of no return
Phase E (finalize code)
1-2 days
Medium — raw SQL fixes, test updates
Phase F (cleanup)
0.5 day
Low
Critical Files
apps/wiki-server/src/schema.ts — all table definitions
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
wiki_pages.idis currently a mutable text slug (e.g.,"anthropic"). 11 tables reference it as a text FK. This causes:Target State
wiki_pages.id = integer— the numeric part of the EID (store42, renderE42)wiki_pages.slug = text UNIQUE NOT NULL— replaces the old text PKpage_idfromtext → integerThe
entity_idstable already maintainsnumeric_id: integer PK↔slug: text UNIQUEmappings withentity_id_seqfor allocation. This is the source of truth for the migration.Current State Inventory
wiki_pages table (
schema.ts:87-136)id: text("id").primaryKey()— the slugnumericId: text("numeric_id")— sparsely populated, stores"E42"as textentity_ids table (
schema.ts:23-30)numericId: integer("numeric_id").primaryKey()— authoritative:42slug: text("slug").notNull().unique()—"anthropic"11 FK tables (all
text page_id → wiki_pages.id)2 additional tables (NO FK constraint)
Cross-table JOINs that break (most dangerous)
explore.ts:291,389LEFT JOIN entities e ON wp.id = e.identities.idstays text; wp.id becomes integerlinks.ts:193,282LEFT JOIN wiki_pages wp ON wp.id = pl.source_idpage_links.source_idis textintegrity.ts:430WHERE c.entity_id = wp.idclaims.entity_id→entities.id(text) compared to wp.idPR Strategy: 3 PRs, Deployed Sequentially
Why not one PR?
One PR would be ~2000+ lines touching 25+ files with a mix of "safe additive changes" and "irreversible schema swap." Reviewers can't meaningfully evaluate it. If any part is wrong, you can't revert just the broken part.
Why not more than 3?
You could split PR 1 into "SQL migration only" + "dual-write code," but that creates a window where new rows lack integer values (race condition — RT-3 below). Keeping the migration and dual-write together eliminates that gap.
PR 1: Expand + Dual-Write (Phases A+B) — Safe, fully revertible
SQL migration adds
slug,integer_id, and allpage_id_intcolumns with backfill. Code changes update all 12+ route files to write both old text and new integer columns on every INSERT/UPDATE.After merge: Run full sync, verify all integer columns populated (Phase C). This gap between PR 1 and PR 2 can be hours or days — no pressure.
PR 2: PK Swap + Code Finalize (Phases D+E) — The risky one
SQL migration does the atomic swap in one transaction (drop text columns, rename integer columns, rebuild all FKs + indexes). Code changes update all routes/raw SQL/API types to integer-only, fix cross-table JOINs, remove dual-write code.
PR 3 (optional): Cleanup
SyncPageSchema.id → slug)The Coordination Window
Migration Strategy: Expand → Migrate → Contract
Phase A: Expand (Safe, Rollback = drop columns)
Migration
005X_add_slug_and_integer_id.sql:slugandinteger_idcolumns to wiki_pagesslug = id(current text PK) andinteger_idfrom entity_idspage_id_intinteger columns to all 11 FK tables + page_linkspage_id_intfrom entity_ids lookuprouted_to_page_id_int)Scale: ~700 wiki_pages rows, FK tables have hundreds-to-low-thousands. All UPDATEs complete in milliseconds. No lock concerns.
Phase B: Dual-Write Code Deployment
All INSERT/UPDATE paths write BOTH old text and new integer columns.
Key change — POST
/pages/syncmust:id(slug) +integer_idand both text + integer FK columnsFiles requiring dual-write logic (12 route files):
pages.ts,citations.ts,edit-logs.ts,claims.ts,hallucination-risk.ts,sessions.ts,auto-update-runs.ts,auto-update-news.ts,artifacts.ts,references.ts,resources.ts,links.tsPhase C: Verify Dual-Write
pnpm crux wiki-server sync/api/integritycleanPhase D: PK Swap (Single Transaction — Point of No Return)
Migration
005Y_swap_wiki_pages_pk.sql— one atomic transaction:page_idcolumns from all 11 FK tablessource_id/target_idfrom page_linksid(text) andnumeric_idfrom wiki_pagesinteger_id → id,page_id_int → page_id, etc.Phase E: Finalize Code (Deploy with Phase D)
schema.ts:
id: integer("id").primaryKey(),slug: text("slug").notNull().unique(), removenumericIdRoute parameter resolution (pages.ts GET
/:id):Cross-table JOIN fixes:
explore.ts:291,389:LEFT JOIN entities e ON wp.slug = e.idlinks.ts:193,282: works as-is (page_links.source_id now integer = wp.id)integrity.ts:430:WHERE c.entity_id = wp.slugexplore.ts specific:
wp.numeric_idfrom SELECT (column gone)recentlyCreatedsort:CAST(SUBSTRING(wp.numeric_id FROM 2) AS INTEGER)→ justwp.idBASE_CONDITIONSfilternumeric_id IS NOT NULL→ remove (all pages have integer IDs)numericIdfromwp.idasE${r.id}Phase F: Clean Up
Pre-Migration Checks (MUST ALL PASS)
Remedy for Check 1 failures — batch-allocate:
Red-Team Findings & Mitigations
CRITICAL
RT-5: explore.ts ↔ entities JOIN —
LEFT JOIN entities e ON wp.id = e.idbreaks becauseentities.idstays text whilewp.idbecomes integer. Fix: change toON wp.slug = e.id.RT-6: integrity.ts claims comparison —
WHERE c.entity_id = wp.idcompares textclaims.entity_id(→entities.id) against integerwp.id. Fix: change toWHERE c.entity_id = wp.slug.HIGH
RT-2: Build system chicken-and-egg — Post-migration, sync receives slugs but must write integer IDs. New pages won't have entity_ids rows yet. Fix: sync endpoint auto-allocates via
INSERT INTO entity_ids (numeric_id, slug) VALUES (nextval('entity_id_seq'), $slug).RT-3: Race conditions during migration — If a sync runs between backfill (Phase A) and PK swap (Phase D), new rows have NULL integer columns. Fix: Phase B (dual-write) deploys BEFORE Phase D. All writes populate both columns.
RT-12: Rollback from Phase D — Not trivially reversible. Fix: deploy Phase D + E together; keep pre-written rollback SQL; test against a Phase D test database before production.
MEDIUM
RT-1: entity_ids table redundancy — After migration, entity_ids partially duplicates wiki_pages. Recommendation: keep entity_ids as allocation service (it serves entities without wiki pages;
/api/ids/allocatepre-allocates IDs before pages exist).RT-4: API contract break — Responses change from
{ id: "anthropic", numericId: "E42" }to{ id: 42, slug: "anthropic" }. Mitigated by Hono RPC inferred types (compile-time errors), frontend uses build-time data not live API.RT-7: page_links orphans — source_id/target_id have no FK, may reference deleted pages. Fix: pre-migration check identifies orphans; delete them (page_links is fully repopulated on every sync).
RT-8: Composite PKs — session_pages and resource_citations have composite PKs including page_id. Must drop and recreate with integer type in Phase D transaction.
RT-9: Expression indexes —
citation_quotes UNIQUE(page_id, footnote)andclaim_page_references UNIQUE(claim_id, page_id, COALESCE(footnote, -1))must be explicitly rebuilt after column rename.RT-11: Test files — 10+ test files hardcode
id: "anthropic". All need updating in Phase F.LOW
RT-10: SyncPageSchema.id stays
z.string()(it's the slug from build system). Consider renaming toslugfor clarity.RT-13:
citation_contenthas NOpage_idcolumn — URL-keyed, not affected.RT-14: Frontend unchanged — already uses
numericIdToSlug()from build-timedatabase.json.RT-15: Discord bot — no direct wiki_pages ID references found. Verify with team.
Open Questions
idtype: Returnid: 42(integer) orid: "E42"(formatted string)? (Recommendation:id: 42+slug: "anthropic"— the "E" prefix is display formatting)id → slugin the sync input schema for clarity?Verification Checklist
After PR 1 + full sync (Phases A-C):
SELECT COUNT(*) FROM wiki_pages WHERE slug IS NULL→ 0SELECT COUNT(*) FROM wiki_pages WHERE integer_id IS NULL→ 0page_id_int IS NULL AND page_id IS NOT NULL→ 0/api/integritycleanAfter PR 2 (Phases D-F):
\d wiki_pagesshowsid integer PK,slug text UNIQUE NOT NULLpnpm buildsucceedspnpm testpasses/api/integritycleanEstimated Effort: 3-5 days
Critical Files
apps/wiki-server/src/schema.ts— all table definitionsapps/wiki-server/src/routes/pages.ts— sync, CRUD, searchapps/wiki-server/src/routes/explore.ts— raw SQL with entities JOINapps/wiki-server/src/routes/links.ts— raw SQL with page_links JOINapps/wiki-server/src/routes/integrity.ts— claims.entity_id = wp.id comparisonapps/wiki-server/src/routes/citations.ts— heaviest pageId usage (80+ references)apps/wiki-server/src/api-types.ts— PageIdSchema definitionapps/web/scripts/build-data.mjs— build pipeline, page ID constructioncrux/wiki-server/sync-pages.ts— sync pipelineapps/wiki-server/drizzle/— migration SQL filesBeta Was this translation helpful? Give feedback.
All reactions