Skip to content

Commit 2b73c43

Browse files
round 49
1 parent 7361673 commit 2b73c43

9 files changed

+518
-147
lines changed

.tasks/DELEGATE_WORK_HANDOFF.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,94 @@ Artifacts:
6868

6969
---
7070

71+
## Round 2025-12-20 (49) — Fix merge pk/rowid confusion + ALTER semantics decision (3 tasks)
72+
73+
**Tasks executed**
74+
- `.tasks/done/TASK-119-fix-realistic-sync-test-failures.md`
75+
- `.tasks/done/TASK-120-fix-realistic-offline-test-failures.md` (consolidated into TASK-119)
76+
- `.tasks/done/TASK-100-decide-alter-new-column-clock-semantics.md`
77+
78+
**Commits**
79+
- (pending commit)
80+
81+
**Environment**
82+
- OS: darwin (macOS ARM64)
83+
- Tooling: nix, zig (via nix), bash
84+
85+
**Commands run (exact)**
86+
```bash
87+
bash zig/harness/test-realistic-sync.sh
88+
bash zig/harness/test-realistic-offline.sh
89+
bash zig/harness/test-realistic-collab.sh
90+
bash zig/harness/test-backfill.sh
91+
bash zig/harness/test-fract-parity.sh
92+
```
93+
94+
**Outputs (paste)**
95+
96+
<details>
97+
<summary>TASK-119: test-realistic-sync.sh (all pass)</summary>
98+
99+
```text
100+
✓ All realistic sync scenarios PASSED
101+
```
102+
103+
**Root cause:** The cached merge functions in `zig/src/merge_insert.zig` were confusing two different ID concepts:
104+
- `pk` = auto-increment key in `__crsql_pks` table (used in clock table references)
105+
- `base_rowid` = actual rowid in the user's base table
106+
107+
Three functions were fixed:
108+
1. `rowExistsInBaseTableCached()` - now looks up `base_rowid` from pks table first
109+
2. `deleteFromBaseTableCached()` - now looks up `base_rowid` and marks tombstone
110+
3. `updateBaseTableColumn()` - now looks up `base_rowid` before UPDATE
111+
112+
</details>
113+
114+
<details>
115+
<summary>TASK-119: test-realistic-offline.sh (all pass)</summary>
116+
117+
```text
118+
✓ All offline-first scenarios PASSED
119+
```
120+
121+
Same root cause as sync test - pk/rowid confusion in cached merge functions.
122+
</details>
123+
124+
<details>
125+
<summary>TASK-100: ALTER semantics decision</summary>
126+
127+
**Decision: LAZY MATERIALIZE** — Zig should NOT backfill clock entries on ADD COLUMN
128+
129+
Rationale:
130+
- Clock entries represent **write events**, not schema changes
131+
- Schema migration is not a write; the column's initial value exists by virtue of the schema definition
132+
- Sync payload: O(0) extra records vs O(N) for eager backfill
133+
- Matches Rust/C oracle behavior exactly
134+
135+
Follow-up: TASK-101 will implement this by removing `backfillNewColumns()` from `crsqlCommitAlterFunc`.
136+
</details>
137+
138+
**Files created/modified:**
139+
- `zig/src/merge_insert.zig` — fixed pk vs base_rowid confusion in 3 cached functions
140+
- `.tasks/done/TASK-119-fix-realistic-sync-test-failures.md` — moved from active, completion notes added
141+
- `.tasks/done/TASK-120-fix-realistic-offline-test-failures.md` — moved from backlog (consolidated)
142+
- `.tasks/done/TASK-100-decide-alter-new-column-clock-semantics.md` — moved from backlog, decision documented
143+
- `research/zig-cr/92-gap-backlog.md` — updated TASK-100 status and decision summary
144+
145+
**Reproduction steps (clean checkout)**
146+
1. `git clone <repo> && cd cr-sqlite`
147+
2. `bash zig/harness/test-realistic-sync.sh` — verify all pass
148+
3. `bash zig/harness/test-realistic-offline.sh` — verify all pass
149+
4. `bash zig/harness/test-realistic-collab.sh` — verify all pass
150+
5. Review `.tasks/done/TASK-100-*.md` for ALTER semantics decision
151+
152+
**Known gaps / unverified claims**
153+
- TASK-101 (implement lazy semantics) remains in backlog
154+
- No coverage captured
155+
- CI integration not verified this round (local runs only)
156+
157+
---
158+
71159
## Round 2025-12-20 (48) — Merge atomicity verify + WAL concurrency tests + sqlite-cr policy (3 tasks)
72160

73161
**Tasks executed**

.tasks/active/TASK-119-fix-realistic-sync-test-failures.md

Lines changed: 0 additions & 68 deletions
This file was deleted.

.tasks/backlog/TASK-100-decide-alter-new-column-clock-semantics.md

Lines changed: 0 additions & 63 deletions
This file was deleted.
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# TASK-100: Decide ALTER TABLE new-column clock semantics (backfill vs lazy)
2+
3+
## Status
4+
- [ ] Planned
5+
- [ ] Assigned
6+
- [ ] In Progress
7+
- [ ] Blocked (reason: ...)
8+
- [x] Complete (2025-12-20)
9+
10+
## Priority
11+
high
12+
13+
## Assigned To
14+
(unassigned)
15+
16+
## Parent Docs / Cross-links
17+
- Oracle parity test (current evidence): `zig/harness/test-alter-parity.sh`
18+
- Zig alter implementation: `zig/src/schema_alter.zig`
19+
- Rust/C alter implementation (reference): `core/rs/core/src/alter.rs`
20+
- sqlite-cr wrapper (oracle runtime): `nix run github:subtleGradient/sqlite-cr -- ...`
21+
- Origin task: `.tasks/active/TASK-094-alter-table-history-preservation.md`
22+
- Gap backlog: `research/zig-cr/92-gap-backlog.md`
23+
24+
## Description
25+
TASK-094 surfaced an **ALTER TABLE semantic ambiguity**:
26+
27+
When a CRR table undergoes `ALTER TABLE ... ADD COLUMN` (nullable or with DEFAULT), should the system:
28+
29+
1) **Eager backfill** the `__crsql_clock` table with entries for the new column for *all existing rows* (so the new column becomes part of the per-row conflict history immediately), OR
30+
31+
2) **Lazy materialize** clock entries for the new column only when that column is explicitly written (so schema evolution does not fabricate per-row “writes”).
32+
33+
The current oracle (sqlite-cr / Rust/C) behavior observed in TASK-094’s test run:
34+
- `ADD COLUMN` does **not** create clock entries for the new column.
35+
- A later `UPDATE` to the new column creates the first clock entry with `col_version = 1`.
36+
37+
The current Zig behavior observed:
38+
- `ADD COLUMN` **does** backfill clock entries for all existing rows.
39+
- A later `UPDATE` increments `col_version` (because the row already has a clock entry).
40+
41+
This task is to decide which behavior is the intended contract for Zig (and by extension, what oracle parity tests should enforce).
42+
43+
## Files to Modify
44+
- `research/zig-cr/92-gap-backlog.md` (record the decided contract)
45+
- `zig/harness/test-alter-parity.sh` (make expectations match the decided contract)
46+
- `.tasks/active/TASK-094-alter-table-history-preservation.md` (update acceptance criteria wording)
47+
48+
## Acceptance Criteria
49+
- [x] A written decision exists in this task's Completion Notes: **Eager backfill** or **Lazy materialize**.
50+
- [x] Decision includes:
51+
- the intended meaning of `__crsql_clock` entries
52+
- impact on `crsql_changes` payload size and db_version evolution
53+
- expected behavior for `ADD COLUMN DEFAULT ...` on existing rows
54+
- [ ] `zig/harness/test-alter-parity.sh` assertions reflect the decision (no "failing-by-design"). → TASK-101
55+
- [x] `research/zig-cr/92-gap-backlog.md` updated to link to follow-up implementation task(s).
56+
57+
## Progress Log
58+
### 2025-12-20
59+
- Observed divergence via `zig/harness/test-alter-parity.sh`:
60+
- Rust/sqlite-cr: no clock rows for new column until UPDATE
61+
- Zig: backfills clock rows on ADD COLUMN
62+
- Analyzed Rust source: `core/rs/core/src/alter.rs` and `core/rs/core/src/backfill.rs`
63+
- Key finding: Rust `compact_post_alter` does NOT call `backfill_table` — it only compacts/deletes
64+
- Key finding: Rust `backfill_missing_columns` is only called during `crsql_as_crr`, NOT during `crsql_commit_alter`
65+
66+
## Completion Notes
67+
68+
### Decision: **LAZY MATERIALIZE** (match Rust/C oracle behavior)
69+
70+
### Recommendation
71+
72+
Zig should adopt **lazy materialize** semantics for `ADD COLUMN`, matching the Rust/C oracle:
73+
- `crsql_commit_alter` should NOT create clock entries for newly added columns
74+
- Clock entries should only be created when the column is explicitly written (INSERT or UPDATE)
75+
76+
### Justification
77+
78+
#### 1. Intended Meaning of `__crsql_clock` Entries
79+
80+
A clock entry represents a **write event** — a deliberate modification to a specific cell (row × column). The clock captures:
81+
- `col_version`: How many times this cell has been written
82+
- `db_version`: Which logical database version this write occurred at
83+
- `site_id`: Which node performed the write
84+
85+
**Schema changes are not writes.** Adding a column doesn't represent user intent to set a value; it's a structural change. The column's initial value (NULL or DEFAULT) exists by virtue of the schema definition, not because any user wrote that value.
86+
87+
Creating clock entries for schema changes would:
88+
- Conflate schema evolution with data modification
89+
- Create "phantom writes" that no user requested
90+
- Generate misleading conflict history (col_version=1 for values no one explicitly set)
91+
92+
#### 2. Impact on `crsql_changes` Payload Size and `db_version`
93+
94+
**Eager backfill (current Zig behavior):**
95+
- `ADD COLUMN` on a table with N rows creates N new clock entries
96+
- `crsql_changes` returns N extra change records (value=NULL/DEFAULT for each row)
97+
- `db_version` advances once (via `crsql_db_version()`)
98+
- Sync payload: O(N) extra records per schema migration
99+
- For a 1M row table, this means 1M extra change records per new column!
100+
101+
**Lazy materialize (Rust/C oracle behavior):**
102+
- `ADD COLUMN` creates 0 clock entries
103+
- `crsql_changes` returns nothing for the new column until explicit writes
104+
- `db_version` does not advance for schema-only changes
105+
- Sync payload: O(0) extra records for schema migration
106+
- Nodes receiving the same schema change apply it locally; no re-sync needed
107+
108+
The lazy approach aligns with the principle stated in Rust `backfill.rs:100-104`:
109+
> "We do not grab nextdbversion on migration. The idea is that other nodes will apply the same migration in the future so if they have already seen this node up to the current db version then the migration will place them into the correct state. No need to re-sync post migration."
110+
111+
#### 3. Expected Behavior for `ADD COLUMN DEFAULT ...`
112+
113+
**Scenario:** `ALTER TABLE users ADD COLUMN status TEXT DEFAULT 'active'`
114+
115+
- **Eager (Zig):** Creates N clock entries with `value='active'`, `col_version=1`
116+
- **Lazy (Rust/C):** Creates 0 clock entries
117+
118+
**Why lazy is correct:**
119+
- Every node running the same migration gets `status='active'` for existing rows
120+
- No sync is needed — the schema migration IS the write
121+
- If Node A later sets `status='inactive'` for row 1, THEN a clock entry is created
122+
- Node B receives the change, sees `col_version=1` > 0 (local has no entry), accepts it
123+
124+
**Why eager is problematic:**
125+
- Node A creates N clock entries on ALTER
126+
- Node B receives schema change, also creates N clock entries locally
127+
- Both have `col_version=1` — no conflict, but redundant sync traffic
128+
- Or worse: if db_versions differ, may create spurious conflicts
129+
130+
#### 4. Edge Cases
131+
132+
**New row inserted after ADD COLUMN:**
133+
- INSERT trigger creates clock entry for the new column (value from DEFAULT or explicit)
134+
- This is a real write, so clock entry is appropriate
135+
136+
**UPDATE to new column on existing row:**
137+
- UPDATE trigger creates clock entry for the column
138+
- `col_version=1` (first write to this cell)
139+
- This is correct: the first explicit write is version 1
140+
141+
**Row existed before ALTER, never updated:**
142+
- No clock entry for the new column
143+
- `crsql_changes` does not return this column for this row
144+
- Other nodes apply the same ALTER and have the same state
145+
- No sync needed
146+
147+
### Follow-up Implementation Task
148+
149+
TASK-101 (`impl-alter-add-column-no-backfill.md`) should modify `zig/src/schema_alter.zig`:
150+
- Remove `backfillNewColumns(db, table_name_ptr)` call from `crsqlCommitAlterFunc`
151+
- Or: Only call backfill for rows that didn't exist before (i.e., rows missing from `__crsql_pks`)
152+
153+
The simpler approach is to remove the backfill entirely from `commit_alter`, matching Rust/C exactly.

0 commit comments

Comments
 (0)