-
Notifications
You must be signed in to change notification settings - Fork 2.6k
3 of 3: scalability fix: refactor task history persistence to use file-based storage #3785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is a first pass to fix the growing bloat from task history that is stored in the global extension state. It assumes that #3772 has been merged for JSON safety. The biggest point for review is that it converts existing global state task history into on-disk JSON structures. It converts my global state of 2800+ historical tasks quiet nicely, and I have verified the file output and all existing tasks and search functionality appears to function. Creating new tasks or modifying existing ones adds to the new JSON files and everything seems to proceed as expected. There are still some optimizations to do:
This is an intentional break from backwards compatibility specifically for task history storage, however the old global state is not deleted, so if you downgrade to an older version (or if you are a developer testing things prior to this PR), you will still have everything that you used to have. |
3692f2d to
b916aa4
Compare
|
|
* Improve documentation for new coders - Reorganized steps for better flow and clarity - Added missing hyperlinks for better navigation - Underlined hyperlinks for improved accessibility and visual consistency * Update docs/getting-started/for-new-coders.mdx Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
|
This single changes a 10x performance increase for every web view update because I have 2800 tasks in my history. The old form was currentTaskItem: this.getCurrentCline()?.taskId
- ? (taskHistory || []).find((item: HistoryItem) => item.id === this.getCurrentCline()?.taskId)
+ ? await getHistoryItem(this.getCurrentCline()!.taskId)
: undefined, |
|
here is another 2800x increase in performance which takes place every time a task is updated: notice that it used to always load the entire global state for every update, but that operation is now - async updateTaskHistory(item: HistoryItem): Promise<HistoryItem[]> {
- const history = (this.getGlobalState("taskHistory") as HistoryItem[] | undefined) || []
- const existingItemIndex = history.findIndex((h) => h.id === item.id)
-
- if (existingItemIndex !== -1) {
- history[existingItemIndex] = item
- } else {
- history.push(item)
- }
-
- await this.updateGlobalState("taskHistory", history)
- return history
+ async updateTaskHistory(item: HistoryItem): Promise<void> {
+ await setHistoryItems([item])
} |
39cc58b to
fd6a2bb
Compare
|
@daniel-lxs - this is ready for review Review notes:
Benefits
See the screenshots at the top post |
Ready-to-test, here is the build:https://www.linuxglobal.com/out/roo/roo-cline-3.21.2-refactor-use-files-for-history.vsix Notice to testers:
|
This commit implements a new architecture for task history persistence: - Create file-based storage system for HistoryItem objects in tasks/<id>/history_item.json - Add migration logic to transition from old globalState array to file-based storage - Implement indexing by month for efficient searching - Cache history items in memory for performance - Provide backup of old data during migration Signed-off-by: Eric Wheeler <[email protected]> refactor: migrate task history to separate directory structure - Separated task items into 'tasks' directory - Moved monthly indexes into 'taskHistory' directory - Added helper functions for path generation - Improved backup file handling with explicit directory creation - Removed unnecessary cleanup logic - Enhanced error handling and logging Signed-off-by: Eric Wheeler <[email protected]> perf: optimize task history with concurrent operations and atomic file access - Add BATCH_SIZE constant to limit concurrent operations - Replace batch array with Set-based tracking for in-flight operations - Update getHistoryItem to use safeReadJson for consistency - Implement atomic read-modify-write for month indexes - Process month index updates in parallel - Add performance timing and metrics for migration - Check for directory existence before migration - Remove unnecessary directory creation Signed-off-by: Eric Wheeler <[email protected]> perf: optimize getHistoryItemsForSearch - Serialize calls to getHistoryItemsForSearch to allow cache to heat up - Skip taskHistorySearch when search query is empty - Extract implementation to private _getHistoryItemsForSearch function Signed-off-by: Eric Wheeler <[email protected]> refactor: use globalFileNames for history_item.json
…ions - Added HistorySearchOptions interface to packages/types - Updated WebviewMessage to use historySearchOptions field instead of individual fields - Added historyItems message type to ExtensionMessage - Implemented getHistoryItems handler in webviewMessageHandler - Refactored getHistoryItemsForSearch to accept HistorySearchOptions parameter - Completely replaced client-side filtering with server-side filtering - Removed dependency on Fzf for client-side search - Added loading state to history components with loading spinner - Updated HistoryPreview to use limit parameter and respect loading state - Updated tests to account for the new loading state - Set explicit limits for history items in ChatView (10) and HistoryPreview (3) This refactoring improves performance by moving filtering to the server side, enhances type safety with the dedicated HistorySearchOptions type, reduces duplication in the interface definitions, and improves the user experience with loading indicators. Signed-off-by: Eric Wheeler <[email protected]> refactor: improve history search sorting and filtering - Created dedicated HistorySortOption type in shared types package - Modified API to take year and month as direct parameters - Added helper functions to reduce code duplication: - _getTasksByWorkspace to extract tasks from month data - _fastSortFilterTasks for efficient pre-filtering and sorting - Ensured consistent sorting across all functions - Optimized filtering to happen before file reads - Added support for custom sort order in getAvailableHistoryMonths Signed-off-by: Eric Wheeler <[email protected]>
- Move fuzzy search from frontend to backend using fzf library - Create dedicated taskHistorySearch module with configurable parameters - Add match position tracking for proper highlighting in UI - Implement debounced search in frontend to prevent flickering Signed-off-by: Eric Wheeler <[email protected]> fix: maintain sort order during search When a search string is present, the sort order specified by the user wasn't being respected. This change ensures that: - Non-relevance sorts (newest, oldest, etc.) maintain their order when searching - The 'mostRelevant' sort option continues to use fuzzy search order Signed-off-by: Eric Wheeler <[email protected]>
Implemented automatic refresh of the task history list when tasks are deleted: - Added taskDeletedConfirmation message type to WebviewMessage and ExtensionMessage - Modified webviewMessageHandler to send confirmation after task deletion - Updated useTaskSearch hook to listen for deletion confirmation and refresh the list - Implemented non-flickering refresh that maintains current search parameters Signed-off-by: Eric Wheeler <[email protected]>
- Created SpinnerOverlay component to darken the view during deletion - Added state to track deletion in progress in HistoryView - Updated DeleteTaskDialog and BatchDeleteTaskDialog to trigger the overlay - Added event listener to hide the overlay when deletion completes Signed-off-by: Eric Wheeler <[email protected]>
Add request ID tracking to useTaskSearch hook to ensure each component only processes responses to its own search requests. This prevents the issue where multiple components using the hook would all receive updates when a search response comes back, regardless of which component initiated the search. - Add global serial counter to generate unique request IDs - Add component-isolated ref to track current request ID - Modify message handler to only process matching responses - Pass request ID back in webviewMessageHandler response Signed-off-by: Eric Wheeler <[email protected]>
Removed taskHistory field and all its references from the codebase as part of migrating to file-based storage. - Removed taskHistory from GlobalSettings schema - Removed import of historyItemSchema - Removed taskHistory from ExtensionState interface - Cleared PASS_THROUGH_STATE_KEYS array in ContextProxy - Updated ClineProvider to use file-based API instead of global state - Updated UI components to work without taskHistory prop Signed-off-by: Eric Wheeler <[email protected]>
Removed redundant useTaskSearch call from ChatView since HistoryPreview already makes its own call to fetch the tasks it needs to display. This eliminates an unnecessary API call on application startup and simplifies the component by removing conditional rendering based on task count. Signed-off-by: Eric Wheeler <[email protected]>
Remove unnecessary loading and returning of entire task history array. The return value was never used by any caller, so we can make this an O(1) operation instead of O(n) by simply saving the single item. This change significantly improves performance when updating task history, which happens frequently during task execution. Signed-off-by: Eric Wheeler <[email protected]>
Allows you to filter tasks not just by all and current, but also by any
historic workspace directory that exists in existing HistoryItem metadata
- Added persistent workspace index with metadata (path, name, missing status, timestamp)
- Created a rich workspace selector UI with filtering and grouping capabilities
- Added visual indicators for missing workspaces (strikethrough)
- Improved loading states and feedback during workspace changes and searches
- Added special workspace paths handling ("all", "current", "unknown")
- Standardized empty/undefined workspace paths to "unknown" for legacy items that do not have workspace stored in their metadata
- Optimized batch processing for better performance
This enhancement provides users with a more intuitive and powerful way to navigate their task history across multiple workspaces.
Signed-off-by: Eric Wheeler <[email protected]>
Added a limit filter dropdown to the history view that allows users to control how many results are displayed. The filter: - Defaults to 50 items - Offers options for 50, 100, 200, 500, 1000 items or all results - Shows loading spinner when changing limits - Integrates with existing workspace and sort filters - Maintains consistent search options across operations Signed-off-by: Eric Wheeler <[email protected]>
This change modifies the copy button in task history to retrieve the task content from the backend storage using getHistoryItem before copying it to the clipboard. This ensures the most up-to-date content is copied. Fixes: #3648 Signed-off-by: Eric Wheeler <[email protected]>
Implement a structured upgrade system that manages the task history migration process: - Create a dedicated upgrade UI that blocks normal app usage until migration completes - Separate migration check from migration execution for better control flow - Add progress logging during migration to provide user feedback - Remove automatic migration during extension activation - Add new message types for upgrade status and completion This change improves the user experience during task history migration by providing visual feedback and ensuring the app is in a consistent state before allowing normal usage. The upgrade system is designed to be extensible for future structural upgrades beyond task history migration. Signed-off-by: Eric Wheeler <[email protected]>
- Add tests for cross-workspace functionality - Verify items can be found in all workspaces where they existed - Ensure workspace property reflects the latest workspace - Add tests for helper functions and edge cases Signed-off-by: Eric Wheeler <[email protected]>
- Removed pass-through state tests from ContextProxy that no longer apply - Updated ClineProvider tests to use file-based history instead of global state - Modified ChatTextArea tests to use useTaskSearch hook instead of taskHistory prop - Completely rewrote useTaskSearch tests to use message-based architecture - Updated other tests to remove taskHistory references from mock states Signed-off-by: Eric Wheeler <[email protected]> test: Fix ClineProvider test by mocking extension context and taskHistory This commit fixes the failing test 'correctly identifies subtask scenario for issue #4602' by: 1. Adding necessary Vitest imports 2. Mocking getExtensionContext to return a mock context with globalStorageUri 3. Mocking taskHistory module to prevent file system operations during tests Signed-off-by: Eric Wheeler <[email protected]>
This change adds missing translations for the frontend UI. The missing translations were identified by the find-missing-translations.js script. The new translations are for the 'upgrade' and 'history' sections of the UI. Signed-off-by: Eric Wheeler <[email protected]>
7a97986 to
4a8c77f
Compare
|
rebased on v3.23.14 |
|
Thanks for the contribution. At this stage, the impact of these changes isn’t clear and we’re focusing on higher-priority items. Closing for now, but we can reconsider in the future if priorities shift. |
Note to Reviewer
This is a PR series, so the line numbers shown by Github are exaggerated. The commit series clearly marks where each PR begins using lines that say
NOTICE: PR ____ STARTS HERE. See below for the annotated diffstat.The commits tell a clean story, it will be easier to understand what is happening here by looking at each commit individually under "Commits" than by looking at all of the files that were changed.
The best place to start your review is here:
getHistoryItem(taskId)setHistoryItems([ historyItem1, ... ])migrateTaskHistoryStorage()Dependencies
Depends on:
fix: use safeWriteJson for all JSON file writes with race condition fix #4733 (complete)Closes:
Blocks: feat: Implement task history scanner and recovery tools #5546
Change Overview
Previously:
These the issues create a very slow choppy experience:
taskHistorystored a copy to every singleHistoryItemever created by the user---thousands! This multi-megabyte array you to be loaded in the extension state, so every single extension state update would transfer ~10MB between the user interface and back.HistoryItem.taskcontains the first message of every task, so any time someone create a file using@mentionor pasted huge amounts of content into the original message was loaded into global stateHistoryItemmodification had to perform the following:taskHistoryobject from global stageHistoryItemobject that needed to be updatedUpgrades!
The hallmarks of this upgrade are as follows:
HistoryItem-> date indexes (~50kB in size in monthly buckets)HistoryItemobjects can be large so they are stored directly and files intasks/<uuid>/history_item.jsonHistoryViewandHistoryPreviewonly fetch the number of items that they need for the backend, and the indexes above are used to directly fetch the relevantHistoryItemobjects:Diff Annotation
Context
The current task history persistence mechanism stores all history items directly in VSCode's globalState, which is causing several critical issues:
VSCode Warnings: The extension triggers VSCode warnings about excessive globalState usage:
Extension Crashes and UI Issues: Users experience various issues that may be related to memory management and globalState limitations:
Performance Degradation: Even before crashing, the extension suffers from performance issues due to loading and processing large amounts of history data at once.
Scale Problem: A busy developer can accumulate tens of thousands of tasks over the course of a year. At this scale, the globalState approach becomes completely unsustainable.
Implementation
This PR implements a new architecture for task history persistence:
File-based Storage System:
taskHistory/workspaces.index.jsonfor fast workspace filtering (screenshot below)Performance Optimizations:
Migration Process:
Integrity:
How to Test
Screenshots
Per-workspace filtering
This uses existing metadata, so it will immediately provide access to workspaces you have already used:
Performance Demo
2025-07-11.11-43-41.mp4
Get in Touch
Discord: KJ7LNW
Fixes: #3784
Important
Refactor task history persistence to use file-based storage, improving performance and scalability, with new UI components for management.
taskHistory.ts.HistoryIndexToolsfor managing task history inSettingsView.tsx.HistoryView.tsxto support new filtering and sorting options.SpinnerOverlayfor loading states inSpinnerOverlay.tsx.taskHistoryfromglobalSettingsSchemainglobal-settings.ts.ContextProxyto remove task history handling.safeWriteJson.spec.tsandsafeReadJson.spec.ts.This description was created by
for 8c9afbdb547f277a331d2fd6260b2f9b4d86548f. You can customize this summary. It will automatically update as commits are pushed.