feat(cluster): add snapshot-then-merge architecture#29
Open
mvanhorn wants to merge 1 commit intopwrdrvr:mainfrom
Open
feat(cluster): add snapshot-then-merge architecture#29mvanhorn wants to merge 1 commit intopwrdrvr:mainfrom
mvanhorn wants to merge 1 commit intopwrdrvr:mainfrom
Conversation
Store per-run cluster snapshots in a new cluster_snapshots table and track active/previous run pointers in repo_cluster_state. The read path prefers the state pointer over raw "latest completed run" queries. Prune now keeps both active and previous runs instead of deleting all but the current one. Follow-up to PR pwrdrvr#19 discussion with @huntharo on the cluster lineage tracking design. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the snapshot-then-merge cluster storage model from the design doc. Each cluster rebuild now stores a frozen snapshot of cluster membership, and a
repo_cluster_statepointer table tracks active/previous runs per repo.Why this matters
Follow-up to the PR #19 discussion where @huntharo confirmed this direction. The current prune logic deletes all but the current run on every rebuild (
service.ts:pruneOldClusterRuns), which is destructive and prevents run-to-run comparison once the lineage tracking from PR #19 lands. The design doc (docs/designs/cluster-storage-cleanup.md) already proposedrepo_cluster_stateand an append-only model - this PR implements it.Changes
New tables (
db/migrate.ts):repo_cluster_state- per-repo pointer to active and previous cluster runscluster_snapshots- frozen cluster membership per run (JSON array of thread IDs)New module (
cluster/snapshot.ts):mergeClusterSnapshots()- Jaccard-based comparison of current vs previous snapshotsupdated,new,dissolvedService updates (
service.ts):clusterRepository()now callspersistClusterSnapshots()andflipClusterState()after persisting the rungetLatestClusterRun()reads fromrepo_cluster_statefirst, falls back to raw query for backward compatibilitylistClusters()uses the updated read pathpruneOldClusterRuns()keeps both active and previous runs instead of deleting everythingTests (
cluster/snapshot.test.ts):Testing
pnpm typecheckpassespnpm --filter @ghcrawl/api-core exec tsx --tsconfig tsconfig.test.json --test 'src/cluster/*.test.ts')This contribution was developed with AI assistance (Claude Code).