Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 26, 2025

This PR fixes issue #7408 where the vector database index was being recreated after every system restart.

Problem

After a system restart, the extension incorrectly determined that the Qdrant collection didn't exist due to initial connection failures, causing a full reindex (3+ hours for large projects).

Solution

  • Added connection retry logic with exponential backoff (3 retries)
  • Better error differentiation between connection failures and missing collections
  • Only clear cache when a NEW collection is created, not when reconnecting
  • Enhanced logging for better debugging

Testing

  • All tests pass
  • Added proper mocking for new retry logic

Impact

Eliminates unnecessary reindexing after system restarts, saving hours for users with large codebases.

Fixes #7408


Important

Fixes unnecessary reindexing by adding retry logic and better error handling for Qdrant connection in qdrant-client.ts.

  • Behavior:
    • Added connection retry logic with exponential backoff (3 retries) in getCollectionInfo() and testConnection() in qdrant-client.ts.
    • Differentiates between connection failures and missing collections in getCollectionInfo().
    • Only clears cache when a new collection is created, not on reconnection in orchestrator.ts.
    • Enhanced logging for better debugging in qdrant-client.ts and orchestrator.ts.
  • Testing:
    • Added tests for retry logic and error handling in qdrant-client.spec.ts.
    • Mocked getCollections for connection tests in qdrant-client.spec.ts.
  • Impact:
    • Prevents unnecessary reindexing after system restarts, saving time for large projects.

This description was created by Ellipsis for d6a72da. You can customize this summary. It will automatically update as commits are pushed.

- Add retry logic for Qdrant connection on initialization
- Distinguish between connection failures and missing collections
- Only clear cache when a new collection is actually created
- Add comprehensive logging to help diagnose connection issues

This fixes the issue where the vector database index was being
recreated after every system restart, even when the collection
already existed in the Docker container.

Fixes #7408
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 26, 2025 10:28
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Aug 26, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 26, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.

let created = false
try {
// Add initial connection test with retry
await this.testConnection()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional? The testConnection() call here might succeed, but getCollectionInfo() could still fail if Qdrant becomes unavailable between these calls. Could we combine these operations or add a comment explaining why this edge case is acceptable?


// If we get here, all retries failed
throw new Error(
t("embeddings:vectorStore.qdrantConnectionFailed", {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message uses the same i18n key embeddings:vectorStore.qdrantConnectionFailed for both connection test failures and initialization failures. Would it be clearer to use a more specific key like embeddings:vectorStore.qdrantConnectionTestFailed here?


private async getCollectionInfo(): Promise<Schemas["CollectionInfo"] | null> {
private async getCollectionInfo(retryCount: number = 0): Promise<Schemas["CollectionInfo"] | null> {
const maxRetries = 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry configuration (3 retries, exponential backoff) is hardcoded. Have we considered making these values configurable through settings for different deployment scenarios? Some users might need more aggressive retries for flaky networks.

* Checks if an error message indicates a connection failure
*/
private isConnectionError(errorMessage: string): boolean {
const connectionErrorPatterns = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This connection error pattern list is comprehensive, but might miss cloud-specific errors like 'Service Unavailable' or rate limiting errors. Should we expand this list to cover more edge cases?

Suggested change
const connectionErrorPatterns = [
const connectionErrorPatterns = [
"ECONNREFUSED",
"ETIMEDOUT",
"ENOTFOUND",
"ENETUNREACH",
"EHOSTUNREACH",
"ECONNRESET",
"fetch failed",
"network",
"connect",
"service unavailable",
"rate limit",
"503",
"429"
]

)
await this.cacheManager.clearCacheFile()
} else {
console.log(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition! The logging here clearly differentiates between new collection creation and reconnection scenarios, which will help with debugging. The cache preservation logic is exactly what was needed to fix the issue.

@daniel-lxs
Copy link
Member

#7408 (comment)

@daniel-lxs daniel-lxs closed this Aug 27, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 27, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Roo code reindex my large codebase everytime I open my repo

4 participants