fix(worker): re-index repos whose zoekt shards are missing from disk#1304
Conversation
When the index directory is lost (e.g., it lives on ephemeral storage in a k8s deployment) while the database still marks repos as indexed, search silently returns empty results and nothing triggers a rebuild until the reindex interval elapses. Add a reconciliation step that runs on scheduler startup and on every scheduler poll: repos marked as indexed in the DB that have no index shards on disk get their indexedAt cleared, so the existing scheduler re-indexes them with its usual dedup and backoff guards.
WalkthroughRepoIndexManager detects repos marked as indexed in the database but missing corresponding shard files on disk, clears their ChangesMissing Shard Reconciliation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/backend/src/repoIndexManager.ts`:
- Around line 718-720: Replace the two-step existsSync + readdir pattern so the
directory scan tolerates a race: call readdir(INDEX_CACHE_DIR) directly inside a
try/catch around the call used to populate entries (the block that constructs
repoIdsWithShards), and if the catch error.code === 'ENOENT' treat it as an
empty directory (i.e. leave repoIdsWithShards empty) instead of aborting the
poll; for other errors rethrow or log as before. Ensure you update the logic
that uses entries to handle the empty-case correctly.
- Around line 702-744: staleRepos were selected from a snapshot but the
subsequent updateMany only filters by id, which can clear indexedAt incorrectly
if a repo lost its connections or its indexedAt changed; instead, re-apply the
preconditions at write time by updating only rows that still match both the
original indexedAt value and still have connections: for example iterate
staleRepos and for each call this.db.repo.updateMany (or a single updateMany
with a where: { OR: [...] }) where each clause is { id: repo.id, indexedAt:
repo.indexedAt, connections: { some: {} } } so you only set indexedAt: null when
the repo still has the same indexedAt and at least one connection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8047b0d6-9b12-4aaf-a6bd-3c3de86c7d38
📒 Files selected for processing (3)
CHANGELOG.mdpackages/backend/src/repoIndexManager.test.tspackages/backend/src/repoIndexManager.ts
| const indexedRepos = await this.db.repo.findMany({ | ||
| where: { | ||
| indexedAt: { not: null }, | ||
| indexedCommitHash: { not: null }, | ||
| connections: { some: {} }, | ||
| }, | ||
| select: { | ||
| id: true, | ||
| name: true, | ||
| }, | ||
| }); | ||
|
|
||
| if (indexedRepos.length === 0) { | ||
| return; | ||
| } | ||
|
|
||
| const repoIdsWithShards = new Set<number>(); | ||
| if (existsSync(INDEX_CACHE_DIR)) { | ||
| const entries = await readdir(INDEX_CACHE_DIR); | ||
| for (const entry of entries) { | ||
| // Ignore temporary files (e.g., `.tmp` files from in-flight or | ||
| // failed indexing operations) - only completed shards count. | ||
| if (!entry.endsWith('.zoekt')) { | ||
| continue; | ||
| } | ||
| const repoId = getRepoIdFromShardFileName(entry); | ||
| if (repoId !== undefined) { | ||
| repoIdsWithShards.add(repoId); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| const staleRepos = indexedRepos.filter(repo => !repoIdsWithShards.has(repo.id)); | ||
| if (staleRepos.length === 0) { | ||
| return; | ||
| } | ||
|
|
||
| logger.warn(`Found ${staleRepos.length} repo(s) marked as indexed but with no index shards on disk. Marking as stale for re-indexing: ${staleRepos.map(repo => repo.name).join(', ')}`); | ||
|
|
||
| await this.db.repo.updateMany({ | ||
| where: { id: { in: staleRepos.map(repo => repo.id) } }, | ||
| data: { indexedAt: null }, | ||
| }); |
There was a problem hiding this comment.
Re-apply the stale-marking preconditions at write time.
staleRepos is derived from a snapshot, but Line 741 later clears indexedAt by id only. If a repo loses its last connection or finishes a reindex after the findMany() but before updateMany(), this write can either bypass the GC grace period or clobber a freshly-written indexedAt, which then triggers an unnecessary extra reindex. The update should re-check the repo is still connected and still has the same indexedAt value that was observed during selection.
Suggested fix
const indexedRepos = await this.db.repo.findMany({
where: {
indexedAt: { not: null },
indexedCommitHash: { not: null },
connections: { some: {} },
},
select: {
id: true,
name: true,
+ indexedAt: true,
},
});
@@
- await this.db.repo.updateMany({
- where: { id: { in: staleRepos.map(repo => repo.id) } },
- data: { indexedAt: null },
- });
+ await this.db.$transaction(
+ staleRepos.map((repo) =>
+ this.db.repo.updateMany({
+ where: {
+ id: repo.id,
+ indexedAt: repo.indexedAt,
+ indexedCommitHash: { not: null },
+ connections: { some: {} },
+ },
+ data: { indexedAt: null },
+ })
+ )
+ );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const indexedRepos = await this.db.repo.findMany({ | |
| where: { | |
| indexedAt: { not: null }, | |
| indexedCommitHash: { not: null }, | |
| connections: { some: {} }, | |
| }, | |
| select: { | |
| id: true, | |
| name: true, | |
| }, | |
| }); | |
| if (indexedRepos.length === 0) { | |
| return; | |
| } | |
| const repoIdsWithShards = new Set<number>(); | |
| if (existsSync(INDEX_CACHE_DIR)) { | |
| const entries = await readdir(INDEX_CACHE_DIR); | |
| for (const entry of entries) { | |
| // Ignore temporary files (e.g., `.tmp` files from in-flight or | |
| // failed indexing operations) - only completed shards count. | |
| if (!entry.endsWith('.zoekt')) { | |
| continue; | |
| } | |
| const repoId = getRepoIdFromShardFileName(entry); | |
| if (repoId !== undefined) { | |
| repoIdsWithShards.add(repoId); | |
| } | |
| } | |
| } | |
| const staleRepos = indexedRepos.filter(repo => !repoIdsWithShards.has(repo.id)); | |
| if (staleRepos.length === 0) { | |
| return; | |
| } | |
| logger.warn(`Found ${staleRepos.length} repo(s) marked as indexed but with no index shards on disk. Marking as stale for re-indexing: ${staleRepos.map(repo => repo.name).join(', ')}`); | |
| await this.db.repo.updateMany({ | |
| where: { id: { in: staleRepos.map(repo => repo.id) } }, | |
| data: { indexedAt: null }, | |
| }); | |
| const indexedRepos = await this.db.repo.findMany({ | |
| where: { | |
| indexedAt: { not: null }, | |
| indexedCommitHash: { not: null }, | |
| connections: { some: {} }, | |
| }, | |
| select: { | |
| id: true, | |
| name: true, | |
| indexedAt: true, | |
| }, | |
| }); | |
| if (indexedRepos.length === 0) { | |
| return; | |
| } | |
| const repoIdsWithShards = new Set<number>(); | |
| if (existsSync(INDEX_CACHE_DIR)) { | |
| const entries = await readdir(INDEX_CACHE_DIR); | |
| for (const entry of entries) { | |
| // Ignore temporary files (e.g., `.tmp` files from in-flight or | |
| // failed indexing operations) - only completed shards count. | |
| if (!entry.endsWith('.zoekt')) { | |
| continue; | |
| } | |
| const repoId = getRepoIdFromShardFileName(entry); | |
| if (repoId !== undefined) { | |
| repoIdsWithShards.add(repoId); | |
| } | |
| } | |
| } | |
| const staleRepos = indexedRepos.filter(repo => !repoIdsWithShards.has(repo.id)); | |
| if (staleRepos.length === 0) { | |
| return; | |
| } | |
| logger.warn(`Found ${staleRepos.length} repo(s) marked as indexed but with no index shards on disk. Marking as stale for re-indexing: ${staleRepos.map(repo => repo.name).join(', ')}`); | |
| await this.db.$transaction( | |
| staleRepos.map((repo) => | |
| this.db.repo.updateMany({ | |
| where: { | |
| id: repo.id, | |
| indexedAt: repo.indexedAt, | |
| indexedCommitHash: { not: null }, | |
| connections: { some: {} }, | |
| }, | |
| data: { indexedAt: null }, | |
| }) | |
| ) | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/backend/src/repoIndexManager.ts` around lines 702 - 744, staleRepos
were selected from a snapshot but the subsequent updateMany only filters by id,
which can clear indexedAt incorrectly if a repo lost its connections or its
indexedAt changed; instead, re-apply the preconditions at write time by updating
only rows that still match both the original indexedAt value and still have
connections: for example iterate staleRepos and for each call
this.db.repo.updateMany (or a single updateMany with a where: { OR: [...] })
where each clause is { id: repo.id, indexedAt: repo.indexedAt, connections: {
some: {} } } so you only set indexedAt: null when the repo still has the same
indexedAt and at least one connection.
| const repoIdsWithShards = new Set<number>(); | ||
| if (existsSync(INDEX_CACHE_DIR)) { | ||
| const entries = await readdir(INDEX_CACHE_DIR); |
There was a problem hiding this comment.
Handle ENOENT from the directory scan directly.
If INDEX_CACHE_DIR disappears between Line 719 and Line 720, readdir() throws and this poll aborts instead of recovering by marking repos stale. That is the exact failure mode this reconciliation is meant to survive. Prefer calling readdir() directly and treating ENOENT as “no shards present”.
Suggested fix
- const repoIdsWithShards = new Set<number>();
- if (existsSync(INDEX_CACHE_DIR)) {
- const entries = await readdir(INDEX_CACHE_DIR);
- for (const entry of entries) {
- // Ignore temporary files (e.g., `.tmp` files from in-flight or
- // failed indexing operations) - only completed shards count.
- if (!entry.endsWith('.zoekt')) {
- continue;
- }
- const repoId = getRepoIdFromShardFileName(entry);
- if (repoId !== undefined) {
- repoIdsWithShards.add(repoId);
- }
- }
- }
+ const repoIdsWithShards = new Set<number>();
+ let entries: string[] = [];
+ try {
+ entries = await readdir(INDEX_CACHE_DIR);
+ } catch (error) {
+ if ((error as NodeJS.ErrnoException).code !== 'ENOENT') {
+ throw error;
+ }
+ }
+
+ for (const entry of entries) {
+ // Ignore temporary files (e.g., `.tmp` files from in-flight or
+ // failed indexing operations) - only completed shards count.
+ if (!entry.endsWith('.zoekt')) {
+ continue;
+ }
+ const repoId = getRepoIdFromShardFileName(entry);
+ if (repoId !== undefined) {
+ repoIdsWithShards.add(repoId);
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const repoIdsWithShards = new Set<number>(); | |
| if (existsSync(INDEX_CACHE_DIR)) { | |
| const entries = await readdir(INDEX_CACHE_DIR); | |
| const repoIdsWithShards = new Set<number>(); | |
| let entries: string[] = []; | |
| try { | |
| entries = await readdir(INDEX_CACHE_DIR); | |
| } catch (error) { | |
| if ((error as NodeJS.ErrnoException).code !== 'ENOENT') { | |
| throw error; | |
| } | |
| } | |
| for (const entry of entries) { | |
| // Ignore temporary files (e.g., `.tmp` files from in-flight or | |
| // failed indexing operations) - only completed shards count. | |
| if (!entry.endsWith('.zoekt')) { | |
| continue; | |
| } | |
| const repoId = getRepoIdFromShardFileName(entry); | |
| if (repoId !== undefined) { | |
| repoIdsWithShards.add(repoId); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/backend/src/repoIndexManager.ts` around lines 718 - 720, Replace the
two-step existsSync + readdir pattern so the directory scan tolerates a race:
call readdir(INDEX_CACHE_DIR) directly inside a try/catch around the call used
to populate entries (the block that constructs repoIdsWithShards), and if the
catch error.code === 'ENOENT' treat it as an empty directory (i.e. leave
repoIdsWithShards empty) instead of aborting the poll; for other errors rethrow
or log as before. Ensure you update the logic that uses entries to handle the
empty-case correctly.
Fixes #1210
Problem
If the zoekt index directory is deleted while the database and repo clones remain intact (e.g.,
.sourcebot/indexplaced on ephemeral/node-local storage in a Kubernetes deployment, lost on pod replacement), Sourcebot starts normally but repos stay marked as indexed in the DB. No shard files exist, so search silently returns empty/incomplete results, and nothing schedules a rebuild untilreindexIntervalMselapses.Fix
Adds a reconciliation step,
markReposWithMissingShardsAsStale, toRepoIndexManager. It runs on scheduler startup and on every scheduler poll (so an index directory lost mid-run is also caught), and:indexedAt != nullfrom the DB.INDEX_CACHE_DIRfor completed.zoektshards and collects which repo ids have at least one.indexedAton repos with no shards, so the existing scheduler re-indexes them through its normal path (keeping the dedup/backoff guards from fix(backend): prevent duplicate index jobs from indexedAt race #1298 intact).Design notes:
indexedCommitHash != null): they complete indexing without producing a shard and would otherwise be re-indexed forever.indexedAton them would bypass the GC grace period inscheduleCleanupJobs..tmpfiles from in-flight/failed indexing don't count as shards.Test plan
repoIndexManager.test.ts(written first, TDD): missing shards on startup, all-present no-op, index dir missing entirely, tmp-file handling, query guards, and poll-time detection.tscclean..sourcebot/index/while the worker was running with an indexed repo. One poll later the worker loggedFound 1 repo(s) marked as indexed but with no index shards on disk. Marking as stale for re-indexing: ...and re-indexed the repo; the shard was restored. The warning fired exactly once (no re-index loop), and a normal restart with intact shards triggers nothing.Summary by CodeRabbit