-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: implement incremental indexing and auto-retry for Qdrant connection failures #8137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ion failures - Add connection retry mechanism when Qdrant is unavailable - Preserve cache on connection failures to enable incremental indexing - Implement automatic retry with exponential backoff (max 10 attempts) - Add comprehensive tests for the new functionality - Update translations for new error messages Fixes #8129
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.
| "unknownError": "Unknown error", | ||
| "indexingRequiresWorkspace": "Indexing requires an open workspace folder" | ||
| "indexingRequiresWorkspace": "Indexing requires an open workspace folder", | ||
| "qdrantNotAvailable": "{{errorMessage}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This translation key is only added to the English locale file. Could we ensure all other locale files also get this key to prevent CI failures? The translation check will likely fail without it.
| */ | ||
| private _setupQdrantConnectionRetry(): void { | ||
| // Clear any existing timer | ||
| if (this._qdrantRetryTimer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a risk of creating multiple retry timers if this method is called multiple times? While there's a clearTimeout at the start, consider adding a guard to ensure we don't accidentally set up multiple concurrent retry loops.
| private _isProcessing: boolean = false | ||
| private _qdrantRetryTimer: NodeJS.Timeout | undefined | ||
| private _qdrantRetryCount: number = 0 | ||
| private readonly MAX_RETRY_COUNT = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make these retry configuration values configurable through settings? Different deployment scenarios might benefit from different retry strategies:
| private readonly MAX_RETRY_COUNT = 10 | |
| private readonly MAX_RETRY_COUNT = this.configManager.getMaxRetryCount?.() ?? 10 | |
| private readonly RETRY_INTERVAL_MS = this.configManager.getRetryInterval?.() ?? 30000 // 30 seconds |
| // Check if this is a connection error (Qdrant not available) | ||
| const errorMessage = error?.message || String(error) | ||
| if ( | ||
| errorMessage.includes("qdrantConnectionFailed") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This connection error detection logic is duplicated in multiple places (also at lines 262-266). Could we extract it into a helper method?
| errorMessage.includes("qdrantConnectionFailed") || | |
| if (this._isQdrantConnectionError(error)) { |
Then add a private method:
private _isQdrantConnectionError(error: any): boolean {
const errorMessage = error?.message || String(error)
return errorMessage.includes("qdrantConnectionFailed") ||
errorMessage.includes("ECONNREFUSED") ||
errorMessage.includes("Failed to connect") ||
errorMessage.includes("connect ECONNREFUSED")
}| return | ||
| } | ||
|
|
||
| this._qdrantRetryTimer = setTimeout(async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider implementing exponential backoff instead of fixed intervals? This would reduce load on the system and be more resilient:
| this._qdrantRetryTimer = setTimeout(async () => { | |
| const backoffMs = Math.min(this.RETRY_INTERVAL_MS * Math.pow(2, this._qdrantRetryCount), 300000) // Cap at 5 minutes | |
| this._qdrantRetryTimer = setTimeout(async () => { |
| // Verify incremental indexing was performed | ||
| expect(mockScanner.scanDirectory).toHaveBeenCalled() | ||
| expect(mockCacheManager.clearCacheFile).not.toHaveBeenCalled() | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a test case for what happens if _performIncrementalIndexing() fails during the retry process? This would help ensure error handling is robust throughout the retry flow.
| const collectionCreated = await this.vectorStore.initialize() | ||
|
|
||
| // Success! Reset retry count and start indexing | ||
| console.log("[CodeIndexOrchestrator] Successfully reconnected to Qdrant!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding telemetry for retry attempts? This would help monitor the feature in production:
| console.log("[CodeIndexOrchestrator] Successfully reconnected to Qdrant!") | |
| // Success! Reset retry count and start indexing | |
| console.log("[CodeIndexOrchestrator] Successfully reconnected to Qdrant!") | |
| TelemetryService.instance.captureEvent(TelemetryEventName.CODE_INDEX_RETRY_SUCCESS, { | |
| retryCount: this._qdrantRetryCount, | |
| totalTime: this._qdrantRetryCount * this.RETRY_INTERVAL_MS | |
| }) |
Summary
This PR addresses Issue #8129 by implementing incremental indexing and automatic retry mechanism when Qdrant service is unavailable.
Problem
When users forget to start the Qdrant container before VSCode starts, the codebase indexing shows errors. When retrying after starting Qdrant, it reindexes everything from scratch, which takes too much time.
Solution
Changes
orchestrator.tsto add retry logic and preserve cache on connection failuresTesting
How to Test
Fixes #8129
Important
Adds retry mechanism and incremental indexing for Qdrant connection failures in
CodeIndexOrchestrator, with tests and new error messages.CodeIndexOrchestratorfor Qdrant connection failures, retrying every 30 seconds up to 10 times.orchestrator.spec.tsto cover retry logic, cache preservation, and incremental indexing.embeddings.jsonwith new error messages for Qdrant connection issues.This description was created by
for 8a9b198. You can customize this summary. It will automatically update as commits are pushed.