-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: prevent unnecessary reindexing after system restart #7409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add retry logic for Qdrant connection on initialization - Distinguish between connection failures and missing collections - Only clear cache when a new collection is actually created - Add comprehensive logging to help diagnose connection issues This fixes the issue where the vector database index was being recreated after every system restart, even when the collection already existed in the Docker container. Fixes #7408
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.
| let created = false | ||
| try { | ||
| // Add initial connection test with retry | ||
| await this.testConnection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? The testConnection() call here might succeed, but getCollectionInfo() could still fail if Qdrant becomes unavailable between these calls. Could we combine these operations or add a comment explaining why this edge case is acceptable?
|
|
||
| // If we get here, all retries failed | ||
| throw new Error( | ||
| t("embeddings:vectorStore.qdrantConnectionFailed", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message uses the same i18n key embeddings:vectorStore.qdrantConnectionFailed for both connection test failures and initialization failures. Would it be clearer to use a more specific key like embeddings:vectorStore.qdrantConnectionTestFailed here?
|
|
||
| private async getCollectionInfo(): Promise<Schemas["CollectionInfo"] | null> { | ||
| private async getCollectionInfo(retryCount: number = 0): Promise<Schemas["CollectionInfo"] | null> { | ||
| const maxRetries = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry configuration (3 retries, exponential backoff) is hardcoded. Have we considered making these values configurable through settings for different deployment scenarios? Some users might need more aggressive retries for flaky networks.
| * Checks if an error message indicates a connection failure | ||
| */ | ||
| private isConnectionError(errorMessage: string): boolean { | ||
| const connectionErrorPatterns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This connection error pattern list is comprehensive, but might miss cloud-specific errors like 'Service Unavailable' or rate limiting errors. Should we expand this list to cover more edge cases?
| const connectionErrorPatterns = [ | |
| const connectionErrorPatterns = [ | |
| "ECONNREFUSED", | |
| "ETIMEDOUT", | |
| "ENOTFOUND", | |
| "ENETUNREACH", | |
| "EHOSTUNREACH", | |
| "ECONNRESET", | |
| "fetch failed", | |
| "network", | |
| "connect", | |
| "service unavailable", | |
| "rate limit", | |
| "503", | |
| "429" | |
| ] |
| ) | ||
| await this.cacheManager.clearCacheFile() | ||
| } else { | ||
| console.log( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition! The logging here clearly differentiates between new collection creation and reconnection scenarios, which will help with debugging. The cache preservation logic is exactly what was needed to fix the issue.
This PR fixes issue #7408 where the vector database index was being recreated after every system restart.
Problem
After a system restart, the extension incorrectly determined that the Qdrant collection didn't exist due to initial connection failures, causing a full reindex (3+ hours for large projects).
Solution
Testing
Impact
Eliminates unnecessary reindexing after system restarts, saving hours for users with large codebases.
Fixes #7408
Important
Fixes unnecessary reindexing by adding retry logic and better error handling for Qdrant connection in
qdrant-client.ts.getCollectionInfo()andtestConnection()inqdrant-client.ts.getCollectionInfo().orchestrator.ts.qdrant-client.tsandorchestrator.ts.qdrant-client.spec.ts.getCollectionsfor connection tests inqdrant-client.spec.ts.This description was created by
for d6a72da. You can customize this summary. It will automatically update as commits are pushed.