Skip to content

Robust Project Embeddings #6400

@VooDisss

Description

@VooDisss

Refined Technical Plan: Robust Project Embeddings

1. Problem

Project embeddings become disconnected when a project's root folder is moved. This is because the embeddings are currently stored in Qdrant and are associated with a hardcoded, absolute file path. This breaks the connection between the code and its corresponding embedding, requiring re-indexing and causing a poor user experience.

2. Solution

A hybrid approach will be implemented to create a robust and resilient embedding management system. This approach combines an automated solution for the common case with a manual override for edge cases.

  • Primary Mechanism (Automatic): The stable VS Code workspace hash will be used as the primary identifier for embedding collections. This hash is generated from the workspace folder's URI and remains consistent even if the project folder is moved on the local filesystem. This will automatically handle the vast majority of cases.
  • Secondary Mechanism (Manual Fallback): A UI-driven workflow will be created to allow users to manually re-associate "orphaned" embedding collections with the correct project. This is for edge cases where the workspace hash might change, such as in remote development scenarios or complex multi-root workspaces.

3. Implementation Sprints

The implementation will be broken down into three sprints, each focusing on a specific part of the solution.

Sprint 1: Core Workspace Hash Logic

Goal: Implement the core functionality of using the workspace hash to identify and manage embedding collections.

Tasks:

  1. Implement getWorkspaceHash Function:

    • File: src/utils/storage.ts
    • Function: getWorkspaceHash()
    • Logic:
      • Get the URI of the first workspace folder (vscode.workspace.workspaceFolders?.[0].uri.toString()).
      • If a URI exists, create a SHA1 hash of the URI string.
      • Return the hex digest of the hash.
      • Return null if no workspace folder is open.
  2. Integrate Workspace Hash into FileContextTracker:

    • File: src/core/context-tracking/FileContextTracker.ts
    • Objective: Modify the FileContextTracker to use the workspace hash instead of the task ID for identifying the project's context.
    • Modifications:
      • Update getTaskDirectoryPath in src/utils/storage.ts to accept the workspace hash.
      • Update getTaskMetadata and saveTaskMetadata to use the modified getTaskDirectoryPath.
      • Update addFileToFileContextTracker to use the workspace hash to locate the correct task-metadata.json file.

Sprint 2: Migration of Existing Data

Goal: Create a one-time migration script to update existing projects to the new workspace hash system.

Tasks:

  1. Create Migration Script:
    • Location: scripts/migrations/migrate-to-workspace-hash.ts
    • Logic:
      1. Iterate through all existing task directories in the user's data directory.
      2. For each directory, read the task-metadata.json file.
      3. Extract the file paths from the files_in_context array.
      4. From the first file path, determine the workspace root.
      5. Calculate the new workspace hash using the getWorkspaceHash function (this will require adapting it to work outside the VS Code extension context, likely by passing in the workspace URI).
      6. Create a new directory named with the workspace hash.
      7. Copy the task-metadata.json file to the new directory.
      8. In the copied file, update the path properties in the files_in_context array to be relative to the workspace root.
      9. Delete the old task directory.

Sprint 3: UI for Manual Re-linking

Goal: Implement a user interface that allows users to manually re-link orphaned embedding collections.

Tasks:

  1. Enhance Embeddings Management UI:

    • Objective: Update the existing UI to support the new re-linking functionality.
    • Changes:
      • Update the filter to support both content search and path-based search (e.g., using a path: prefix).
      • Enable multi-selection of items in the UI to allow re-linking multiple collections at once.
  2. Implement IPC Messaging:

    • Objective: Create the necessary IPC messages for communication between the webview UI and the extension backend.
    • Messages:
      • relinkEmbeddings: Sent from the UI to the backend.
        • collectionIds: string[]
        • newWorkspaceHash: string
      • relinkEmbeddingsResult: Sent from the backend to the UI.
        • success: boolean
        • error?: string
  3. Implement Backend Logic:

    • Objective: Handle the relinkEmbeddings message in the backend.
    • Logic:
      • For each collectionId in the request:
        • Update the associated workspace hash in the Qdrant metadata for the collection.
        • Duplicate the collection for the new workspace to ensure the original is preserved until the user confirms deletion.
  4. Implement User Feedback:

    • Objective: Inform the user about the outcome of the re-linking process.
    • Implementation:
      • Display a "Toast" notification to indicate whether the re-linking action was successful or failed.
      • If it failed, provide an error message.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.UI/UXUI/UX related or focusedenhancementNew feature or request

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions