-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fixes #4940: Add persistent retry queue for failed telemetry events #4942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Implement TelemetryRetryQueue with persistent storage using VSCode globalState - Add ResilientTelemetryClient wrapper for automatic retry functionality - Implement exponential backoff retry strategy with configurable limits - Add priority handling for critical events (errors, crashes) - Provide connection status monitoring with visual feedback - Add VSCode settings for user control of retry behavior - Include comprehensive test coverage for all components - Add status bar indicator and user notifications for connection issues - Support manual retry triggers and queue management commands - Ensure graceful degradation when retry system fails Features: - Persistent queue survives extension restarts and VSCode crashes - Configurable retry limits, delays, and queue sizes - High priority events (errors) are processed before normal events - Batch processing for efficient network usage - User notifications for prolonged disconnection - Manual queue management through commands - Comprehensive documentation and examples
|
Generated with ❤️ by ellipsis.dev |
|
@roomote - Looks like you introduced some linter errors: https://github.com/RooCodeInc/Roo-Code/actions/runs/15783134187/job/44493462171?pr=4942 Run |
|
Thanks for catching that @cte! I'll run |
- Replace 'any' with 'unknown as vscode.ExtensionContext' for mock contexts - Use vi.mocked() instead of casting to 'any' for mock functions - Replace 'any' with proper type annotations for function parameters - Fixes all 16 ESLint warnings about @typescript-eslint/no-explicit-any Addresses feedback from @cte in PR comment
|
✅ Linter errors fixed! I've successfully resolved all 16 ESLint warnings about Changes made:
Verification:
The changes maintain the same functionality while providing proper type safety. Ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks solid, I left a couple of comments to hopefully get roomote to polish this a bit.
|
|
||
| // Create PostHog client and wrap it with resilient retry functionality | ||
| const postHogClient = new PostHogTelemetryClient() | ||
| const resilientClient = new ResilientTelemetryClient(postHogClient, context, retryConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation initializes two separate retry queues, one in extension.ts and another in CloudService.ts. This leads to duplicated retry logic. The ResilientTelemetryClient should be the single source for this funtionality
|
|
||
| private startRetryProcessor(): void { | ||
| // Process retry queue every 30 seconds | ||
| this.retryInterval = setInterval(async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 30-second retry interval is hardcoded. This should be made configurable in package.json to allow for more control over network usage.
| private wrappedClient: TelemetryClient | ||
| private retryQueue: TelemetryRetryQueue | ||
| private context: vscode.ExtensionContext | ||
| private isOnline = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isOnline property is not used within the class and can be removed.
Summary
This PR implements a comprehensive persistent retry queue system for failed telemetry events in Roo Code Cloud, addressing issue #4940.
Problem Solved
Previously, telemetry events used a "fire and forget" approach where failed events were simply lost due to network issues, server downtime, or connectivity problems. This resulted in:
Solution Implemented
Core Components
TelemetryRetryQueue: Manages persistent storage and retry logic
ResilientTelemetryClient: Wrapper that adds retry functionality to any TelemetryClient
Configuration Settings: VSCode settings for user control
Status Monitoring: Visual feedback and user interaction
Key Features
Technical Implementation
Configuration
New VSCode settings added:
Testing
Documentation
Acceptance Criteria Met
✅ Telemetry events are never lost due to temporary network issues
✅ Users receive appropriate feedback when telemetry cannot be delivered
✅ Extension performance is not degraded by the queuing system
✅ Queue persists across extension restarts
✅ Retry logic handles various failure scenarios appropriately
✅ Configuration options allow users to control behavior
Breaking Changes
None. This is a backward-compatible enhancement that automatically improves telemetry reliability without requiring any user action.
Related Issues
Fixes #4940