-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: implement persistent telemetry event queue with multi-instance support #6576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…support - Add FIFO queue structure that maintains event order - Implement persistence using VS Code's globalState API - Events only removed after successful cloud transmission - Automatic processing when new events are added - 1MB storage limit with automatic cleanup of oldest events - Multi-instance coordination with distributed locking - Prevent concurrent queue processing across VS Code instances - Add diagnostic commands for queue monitoring - Comprehensive test coverage (38 tests) Ensures reliable telemetry delivery even during network outages and prevents duplicate processing when multiple instances are running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a comprehensive persistent telemetry event queue system with sophisticated multi-instance coordination to ensure reliable delivery of telemetry events even during network outages or VS Code crashes. The system maintains strict FIFO processing order and prevents duplicate event processing when multiple VS Code instances are running.
- Core Queue Implementation: FIFO queue with persistence, confirmation-based removal, and 1MB storage limit with automatic cleanup
- Multi-Instance Coordination: Distributed locking with instance identification, lock expiration handling, and configurable modes (compete/leader/disabled)
- Diagnostic Commands: Three new commands for monitoring queue status, manual processing, and clearing events
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/extension.ts | Adds queue processing on extension startup to handle events from previous sessions |
| src/activate/registerCommands.ts | Implements three diagnostic commands for queue management and status monitoring |
| packages/types/src/vscode.ts | Adds command IDs for the new telemetry queue diagnostic commands |
| packages/cloud/src/queue/types.ts | Defines comprehensive type system for queue operations and multi-instance coordination |
| packages/cloud/src/queue/index.ts | Exports all queue-related classes and types for external consumption |
| packages/cloud/src/queue/TelemetryEventQueue.ts | Core queue implementation with FIFO processing, retry logic, and multi-instance support |
| packages/cloud/src/queue/QueuedTelemetryClient.ts | Telemetry client wrapper that intercepts events and queues them for reliable delivery |
| packages/cloud/src/queue/GlobalStateQueueStorage.ts | Persistent storage implementation using VS Code globalState with atomic operations |
| packages/cloud/src/queue/CloudQueueProcessor.ts | Handles cloud transmission of queued events with retry logic for different error types |
| packages/cloud/src/index.ts | Updates package exports to include queue functionality |
| packages/cloud/src/CloudService.ts | Integrates queued telemetry client and provides queue management methods |
Comments suppressed due to low confidence (1)
packages/cloud/src/queue/QueuedTelemetryClient.ts:61
- Using bracket notation to access private property
multiInstanceConfigbreaks encapsulation. Consider adding a public getter method to GlobalStateQueueStorage for accessing this configuration.
const interval = this.storage["multiInstanceConfig"].lockCheckIntervalMs || 5000
| } | ||
|
|
||
| // Get queue status | ||
| const clients = (TelemetryService.instance as any).clients || [] |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using type assertion as any bypasses TypeScript's type safety. Consider defining proper types or interfaces for TelemetryService to access the clients property safely.
| const clients = (TelemetryService.instance as any).clients || [] | |
| const clients = | |
| (TelemetryService.instance as TelemetryServiceWithClients).clients || []; |
| const clients = (TelemetryService.instance as any).clients || [] | ||
| const queuedClient = clients.find((c: any) => c instanceof QueuedTelemetryClient) as | ||
| | QueuedTelemetryClient | ||
| | undefined |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code pattern is duplicated in multiple command handlers. Consider extracting a helper function getQueuedTelemetryClient() to reduce code duplication and improve maintainability.
| const clients = (TelemetryService.instance as any).clients || [] | |
| const queuedClient = clients.find((c: any) => c instanceof QueuedTelemetryClient) as | |
| | QueuedTelemetryClient | |
| | undefined | |
| const queuedClient = getQueuedTelemetryClient() |
| multiInstanceEnabled: this.storage["multiInstanceConfig"].enabled, | ||
| multiInstanceMode: this.storage["multiInstanceConfig"].mode, |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bracket notation to access private property multiInstanceConfig breaks encapsulation. This pattern appears multiple times and should use a proper getter method instead.
| multiInstanceEnabled: this.storage["multiInstanceConfig"].enabled, | |
| multiInstanceMode: this.storage["multiInstanceConfig"].mode, | |
| multiInstanceEnabled: this.storage.getMultiInstanceConfig().enabled, | |
| multiInstanceMode: this.storage.getMultiInstanceConfig().mode, |
| const sizeInBytes = this.calculateSize(events) | ||
| if (sizeInBytes > this.maxStorageSize) { | ||
| // Remove oldest events until we're under the limit | ||
| let removedCount = 0 | ||
| while (events.length > 0 && this.calculateSize(events) > this.maxStorageSize) { | ||
| events.shift() // Remove oldest event (FIFO) | ||
| removedCount++ |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculateSize method is called in each loop iteration, which involves JSON.stringify and TextEncoder operations. Consider calculating size once and subtracting the size of removed events to improve performance.
| const sizeInBytes = this.calculateSize(events) | |
| if (sizeInBytes > this.maxStorageSize) { | |
| // Remove oldest events until we're under the limit | |
| let removedCount = 0 | |
| while (events.length > 0 && this.calculateSize(events) > this.maxStorageSize) { | |
| events.shift() // Remove oldest event (FIFO) | |
| removedCount++ | |
| let sizeInBytes = this.calculateSize(events) | |
| if (sizeInBytes > this.maxStorageSize) { | |
| // Remove oldest events until we're under the limit | |
| let removedCount = 0 | |
| while (events.length > 0 && sizeInBytes > this.maxStorageSize) { | |
| const removedEvent = events.shift() // Remove oldest event (FIFO) | |
| removedCount++ | |
| if (removedEvent) { | |
| sizeInBytes -= this.calculateEventSize(removedEvent) | |
| } |
| clearInterval(this.lockReleaseTimer) | ||
| } | ||
|
|
||
| const renewalInterval = Math.max((this.options.multiInstance?.lockDurationMs || 30000) / 3, 5000) |
Copilot
AI
Aug 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic numbers /3 and 5000 should be extracted as named constants to improve code readability and maintainability. Consider defining MIN_RENEWAL_INTERVAL_MS and LOCK_RENEWAL_DIVISOR.
| const renewalInterval = Math.max((this.options.multiInstance?.lockDurationMs || 30000) / 3, 5000) | |
| const renewalInterval = Math.max( | |
| (this.options.multiInstance?.lockDurationMs || 30000) / LOCK_RENEWAL_DIVISOR, | |
| MIN_RENEWAL_INTERVAL_MS | |
| ) |
| * Set up periodic queue processing for leader mode | ||
| */ | ||
| private setupPeriodicProcessing(): void { | ||
| const interval = this.storage["multiInstanceConfig"].lockCheckIntervalMs || 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using bracket notation to access the internal multi-instance config (e.g. this.storage["multiInstanceConfig"]). Consider exposing it via a typed getter for better readability and type safety.
|
|
||
| // Try to acquire lock if multi-instance is enabled | ||
| let hasLock = false | ||
| if (isMultiInstanceEnabled && this.storage instanceof GlobalStateQueueStorage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code uses 'instanceof GlobalStateQueueStorage' to determine multi-instance locking. Consider abstracting multi-instance behavior behind an interface to avoid direct type checks.
| } | ||
|
|
||
| // Get queue status | ||
| const clients = (TelemetryService.instance as any).clients || [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid casting TelemetryService.instance to 'any' to access 'clients'. Consider exposing a proper API to retrieve the QueuedTelemetryClient from TelemetryService.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I've reviewed the changes and found several issues that need attention. The implementation of the persistent telemetry queue with multi-instance support is comprehensive, but there are some critical issues around encapsulation and potential memory leaks that should be addressed.
| * Set up periodic queue processing for leader mode | ||
| */ | ||
| private setupPeriodicProcessing(): void { | ||
| const interval = this.storage["multiInstanceConfig"].lockCheckIntervalMs || 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bracket notation to access private property breaks encapsulation. Consider adding a public getter method to GlobalStateQueueStorage for accessing this configuration:
| ...status, | ||
| instanceInfo: { | ||
| ...instanceInfo, | ||
| multiInstanceEnabled: this.storage["multiInstanceConfig"].enabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same encapsulation issue here. This should use a proper getter method instead of accessing private properties.
| console.error(`[CloudQueueProcessor] Failed to process event ${event.id}:`, errorMessage) | ||
|
|
||
| // Store error for debugging | ||
| event.lastError = errorMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating the event parameter by setting could cause memory issues if errors accumulate. Consider returning an updated event object instead:
| * Atomic compare-and-swap for lock | ||
| */ | ||
| private async compareAndSwapLock(expectedLock: QueueLock | null, newLock: QueueLock): Promise<boolean> { | ||
| // VS Code's globalState doesn't provide true CAS, so we implement optimistic locking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment acknowledges a race condition window. Consider adding a version/generation number to locks to make the compare-and-swap truly atomic:
| } | ||
|
|
||
| // Get queue status | ||
| const clients = (TelemetryService.instance as any).clients || [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing private property is fragile. Consider adding a public method to TelemetryService for finding clients by type, or handle the case where this property might not exist.
| // Lifecycle | ||
|
|
||
| public dispose(): void { | ||
| public async dispose(): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VS Code's Disposable interface expects synchronous disposal, but this is now async. Consider handling the async cleanup differently, perhaps with a separate shutdown method that's called before dispose.
| this.telemetryClient = new TelemetryClient(this.authService, this.settingsService) | ||
|
|
||
| // Configure multi-instance behavior | ||
| const multiInstanceConfig: MultiInstanceConfig = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider validating the multiInstanceConfig values (e.g., ensuring durations are positive, mode is valid) before using them.
- Add public getter method getMultiInstanceConfig() to GlobalStateQueueStorage to fix encapsulation issue - Update QueuedTelemetryClient to use the new getter method instead of accessing private property - Add mock for QueuedTelemetryClient in CloudService tests to fix failing unit tests - All tests now passing on both Ubuntu and Windows platforms
- Remove debug, info, and warn logging from telemetry queue system - Keep only error logging for actual failures - This reduces log noise in production while still capturing important errors - Fix ESLint warnings for unused variables
…per queue retry - TelemetryClient now throws errors when authentication fails or session token is missing - This allows the queue processor to properly identify failed sends and retry them later - Fixes issue where telemetry events were being discarded when in inactive-session state - Events will now be queued and retried when authentication is restored
Important Update on PR StatusI've analyzed this PR and discovered a critical architectural change that affects its viability: The IssueThe directory where all the telemetry queue implementation resides has been removed from the repository in commit a1439c1 (merged on Aug 3, 2025). The cloud functionality has been moved to an external npm package . What This Means
Recommended Next Steps
I apologize for the confusion. This is an unfortunate timing issue where significant architectural changes occurred after this PR was created. The feature itself looks well-implemented, but it needs to be relocated to the appropriate repository. |
Summary
Closes: #4940
This PR implements a persistent telemetry event queue system that ensures reliable delivery of telemetry events even during network outages or VS Code crashes. The system includes sophisticated multi-instance coordination to prevent duplicate processing when multiple VS Code instances are running.
Key Features
Core Queue Implementation
Multi-Instance Support
compete(default): All instances compete for processingleader: Leader election for single-instance processingdisabled: No coordination for backward compatibilityDiagnostic Commands
Added three new commands for monitoring and managing the queue:
telemetryQueueStatus: View queue status (count, size, failed events)telemetryQueueProcess: Manually trigger queue processingtelemetryQueueClear: Clear all queued eventsImplementation Details
New Components
TelemetryEventQueue: Core FIFO queue with retry logicGlobalStateQueueStorage: Persistent storage with multi-instance lockingCloudQueueProcessor: Handles cloud transmissionQueuedTelemetryClient: Wrapper that intercepts telemetry eventsIntegration
CloudServiceto useQueuedTelemetryClientextension.tsregisterCommands.tsTesting
All tests are passing.
Benefits
Configuration
The system can be configured via environment variables or options:
This ensures reliable telemetry delivery for better product insights and debugging capabilities.
Important
Implements a persistent telemetry event queue with multi-instance support, ensuring reliable event delivery and adding diagnostic commands for management.
QueuedTelemetryClientwith FIFO structure and persistence usingglobalStateAPI.compete,leader,disabledfor different coordination strategies.telemetryQueueStatus,telemetryQueueProcess, andtelemetryQueueClearcommands for queue management.CloudServiceto useQueuedTelemetryClient.extension.ts.registerCommands.ts.This description was created by
for b1806e2. You can customize this summary. It will automatically update as commits are pushed.