-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Refined Technical Plan: Robust Project Embeddings
1. Problem
Project embeddings become disconnected when a project's root folder is moved. This is because the embeddings are currently stored in Qdrant and are associated with a hardcoded, absolute file path. This breaks the connection between the code and its corresponding embedding, requiring re-indexing and causing a poor user experience.
2. Solution
A hybrid approach will be implemented to create a robust and resilient embedding management system. This approach combines an automated solution for the common case with a manual override for edge cases.
- Primary Mechanism (Automatic): The stable VS Code workspace hash will be used as the primary identifier for embedding collections. This hash is generated from the workspace folder's URI and remains consistent even if the project folder is moved on the local filesystem. This will automatically handle the vast majority of cases.
- Secondary Mechanism (Manual Fallback): A UI-driven workflow will be created to allow users to manually re-associate "orphaned" embedding collections with the correct project. This is for edge cases where the workspace hash might change, such as in remote development scenarios or complex multi-root workspaces.
3. Implementation Sprints
The implementation will be broken down into three sprints, each focusing on a specific part of the solution.
Sprint 1: Core Workspace Hash Logic
Goal: Implement the core functionality of using the workspace hash to identify and manage embedding collections.
Tasks:
-
Implement
getWorkspaceHashFunction:- File:
src/utils/storage.ts - Function:
getWorkspaceHash() - Logic:
- Get the URI of the first workspace folder (
vscode.workspace.workspaceFolders?.[0].uri.toString()). - If a URI exists, create a SHA1 hash of the URI string.
- Return the hex digest of the hash.
- Return
nullif no workspace folder is open.
- Get the URI of the first workspace folder (
- File:
-
Integrate Workspace Hash into
FileContextTracker:- File:
src/core/context-tracking/FileContextTracker.ts - Objective: Modify the
FileContextTrackerto use the workspace hash instead of the task ID for identifying the project's context. - Modifications:
- Update
getTaskDirectoryPathinsrc/utils/storage.tsto accept the workspace hash. - Update
getTaskMetadataandsaveTaskMetadatato use the modifiedgetTaskDirectoryPath. - Update
addFileToFileContextTrackerto use the workspace hash to locate the correcttask-metadata.jsonfile.
- Update
- File:
Sprint 2: Migration of Existing Data
Goal: Create a one-time migration script to update existing projects to the new workspace hash system.
Tasks:
- Create Migration Script:
- Location:
scripts/migrations/migrate-to-workspace-hash.ts - Logic:
- Iterate through all existing task directories in the user's data directory.
- For each directory, read the
task-metadata.jsonfile. - Extract the file paths from the
files_in_contextarray. - From the first file path, determine the workspace root.
- Calculate the new workspace hash using the
getWorkspaceHashfunction (this will require adapting it to work outside the VS Code extension context, likely by passing in the workspace URI). - Create a new directory named with the workspace hash.
- Copy the
task-metadata.jsonfile to the new directory. - In the copied file, update the
pathproperties in thefiles_in_contextarray to be relative to the workspace root. - Delete the old task directory.
- Location:
Sprint 3: UI for Manual Re-linking
Goal: Implement a user interface that allows users to manually re-link orphaned embedding collections.
Tasks:
-
Enhance Embeddings Management UI:
- Objective: Update the existing UI to support the new re-linking functionality.
- Changes:
- Update the filter to support both content search and path-based search (e.g., using a
path:prefix). - Enable multi-selection of items in the UI to allow re-linking multiple collections at once.
- Update the filter to support both content search and path-based search (e.g., using a
-
Implement IPC Messaging:
- Objective: Create the necessary IPC messages for communication between the webview UI and the extension backend.
- Messages:
relinkEmbeddings: Sent from the UI to the backend.collectionIds:string[]newWorkspaceHash:string
relinkEmbeddingsResult: Sent from the backend to the UI.success:booleanerror?:string
-
Implement Backend Logic:
- Objective: Handle the
relinkEmbeddingsmessage in the backend. - Logic:
- For each
collectionIdin the request:- Update the associated workspace hash in the Qdrant metadata for the collection.
- Duplicate the collection for the new workspace to ensure the original is preserved until the user confirms deletion.
- For each
- Objective: Handle the
-
Implement User Feedback:
- Objective: Inform the user about the outcome of the re-linking process.
- Implementation:
- Display a "Toast" notification to indicate whether the re-linking action was successful or failed.
- If it failed, provide an error message.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status