Skip to content

Conversation

@tomcat323
Copy link
Contributor

@tomcat323 tomcat323 commented Apr 18, 2025

Problem

The science & service team in Next Edit Prediction (NEP) project needs initial data for model training and tuning, to provide more contextually relevant suggestions.

This requires us to track user edits in the IDE, and send changes (of current active file in unified diff format) to the codeWhisper API.
No impact on user experience.
This won't live forever, will be migrated to flare by end of May, needed now for science data collection.

Solution

Key Components

PredictionKeyStrokeHandler: Listens for document changes, maintains shadow copies of visible documents, and processes edits.
PredictionTracker: Manages file snapshots, implementing a policy for storing, retrieving, and pruning snapshots based on age and memory constraints.
DiffGenerator: Creates unified diffs between file snapshots, produces supplementalContext sent to the API.

How it Works

  • The system track shadow copies of editor visible files' content
  • Once an edit is made to a tracked file, it takes a snapshot of the file content before the edit
  • When the Inline API fires, the snapshots of the current editing files are used to generate diff context

Memory management

  • maxTotalSizeKb (default: 5000): Caps total size of snapshots storage at ~5MB, purging oldest snapshots when exceeded.
  • debounceIntervalMs (default: 2000): Prevents excessive snapshots by requiring 2 seconds between captures for the same file.
  • maxAgeMs (default: 30000): Auto-deletes snapshots after 30 seconds to maintain recent-only history.
  • maxSupplementalContext (default: 15): Limits supplementalContext sent to API to 15 entries maximum.

Changes

  • Added new NextEditPrediction module in the CodeWhisperer package
  • Updated activation code to initialize the NEP system
  • Updated codeWhisper inline API requests to fit new format

  • Treat all work as PUBLIC. Private feature/x branches will not be squash-merged at release time.
  • Your code changes must meet the guidelines in CONTRIBUTING.md.
  • License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions
Copy link

  • This pull request modifies code in src/* but no tests were added/updated.
    • Confirm whether tests should be added or ensure the PR description explains why tests are not required.
  • This pull request implements a feat or fix, so it must include a changelog entry (unless the fix is for an unreleased feature). Review the changelog guidelines.
    • Note: beta or "experiment" features that have active users should announce fixes in the changelog.
    • If this is not a feature or fix, use an appropriate type from the title guidelines. For example, telemetry-only changes should use the telemetry type.

@tomcat323 tomcat323 changed the title feat(NEP): Data Instrumentation feat(next edit prediction): Data Instrumentation Apr 21, 2025
@tomcat323 tomcat323 changed the title feat(next edit prediction): Data Instrumentation feat(nep): Data Instrumentation Apr 21, 2025
@tomcat323 tomcat323 marked this pull request as ready for review April 22, 2025 16:31
@tomcat323 tomcat323 requested review from a team as code owners April 22, 2025 16:31
@@ -0,0 +1,118 @@
/*!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this these NEP modules live in the same dir as the other "trackers"?

https://github.com/aws/aws-toolkit-vscode/tree/master/packages/core/src/codewhisperer/tracker

are they really not sharing any concepts at all? this PR is all new code.

Copy link
Contributor Author

@tomcat323 tomcat323 Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not shared IMO, the existing tracker collects statistical metrics for telemetry, the NEP tracker tracks content changes. Their hyper parameters are also very different.

Since the new tracker is populating the supplementalContext field, maybe we can also consider https://github.com/aws/aws-toolkit-vscode/tree/master/packages/core/src/codewhisperer/util/supplementalContext?

Copy link
Contributor

@opieter-aws opieter-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is clean up of snapshots not needed when the extension is closed?

let localize: nls.LocalizeFunc

export async function activate(context: ExtContext): Promise<void> {
activateNextEditPrediction(context)
Copy link
Contributor

@nkomonen-amazon nkomonen-amazon Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I would move this to the bottom of the function since the beginning is for more general activation (eg auth).

  • Also I think it is better to have this in activateAmazonQCommon(). We want to move as much Q specific code in to their so that we can reduce our dependency on core. You may need to export this in one of the index.ts, see amazonQDiffScheme and how it is exported

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : Lets scope the namining to EditTracking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this needs to have it's own index.ts and export it to QCommon, since it's not a stand-alone feature, but rather an addition on existing codeWhisper API flow. Will move it to the bottom of codeWhisper activation.

predictionSupplementalContext = await predictionTracker.generatePredictionSupplementalContext()
}
} catch (error) {
getLogger().error(`Error getting prediction supplemental context: ${error}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the final call, but is it dangerous to swallow the error here? If a user is having issues we will never have any telemetry for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not dangerous, we are collecting initial data, and from the bug-bash no issues has been found. Telemetry will come later in this project.
The main thing is if anything goes wrong it doesn't crash the inline completions flow, hence we swallow it here.

Copy link

@dk19y dk19y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this codebase will move to Flare eventually. Can we do a minimal refactor this to make the migration straight forward ?

APIs which may not have a direct mapping

VS Code API LSP Equivalent Notes
vscode.window.onDidChangeVisibleTextEditors No direct equivalent in LSP LSP focuses on opened/closed documents (didOpen, didClose) not "visibility". You may need to track didOpen/didClose and assume editors are visible.
vscode.workspace.onDidChangeTextDocument textDocument/didChange (notification) LSP server receives textDocument/didChange to reflect edits in documents.
vscode.window.visibleTextEditors No direct equivalent in LSP LSP does not track visible editors. It only tracks open documents via didOpen, didClose, didChange.
vscode.window.activeTextEditor No direct equivalent in LSP LSP does not track which document is active. It only manages documents' content and lifecycles. (The client — e.g., VS Code — knows which one is active.)

Also lets not use fs APIs and rely on message passing as-much as we can.

let localize: nls.LocalizeFunc

export async function activate(context: ExtContext): Promise<void> {
activateNextEditPrediction(context)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : Lets scope the namining to EditTracking

@tomcat323
Copy link
Contributor Author

/runIntegrationTests

@jpinkney-aws jpinkney-aws merged commit f1f5a36 into aws:master May 5, 2025
42 of 46 checks passed
jpinkney-aws pushed a commit to jpinkney-aws/aws-toolkit-vscode that referenced this pull request May 6, 2025
## Problem
The science & service team in Next Edit Prediction (NEP) project needs
initial data for model training and tuning, to provide more contextually
relevant suggestions.

This requires us to track user edits in the IDE, and send changes (of
current active file in unified diff format) to the codeWhisper API.
**No impact on user experience.**
**This won't live forever, will be migrated to flare by end of May,
needed now for science data collection.**

## Solution

### Key Components
`PredictionKeyStrokeHandler`: Listens for document changes, maintains
shadow copies of visible documents, and processes edits.
`PredictionTracker`: Manages file snapshots, implementing a policy for
storing, retrieving, and pruning snapshots based on age and memory
constraints.
`DiffGenerator`: Creates unified diffs between file snapshots, produces
`supplementalContext` sent to the API.

### How it Works
- The system track shadow copies of editor visible files' content
- Once an edit is made to a tracked file, it takes a snapshot of the
file content before the edit
- When the Inline API fires, the snapshots of the current editing files
are used to generate diff context

### Memory management
- maxTotalSizeKb (default: 5000): Caps total size of snapshots storage
at ~5MB, purging oldest snapshots when exceeded.
- debounceIntervalMs (default: 2000): Prevents excessive snapshots by
requiring 2 seconds between captures for the same file.
- maxAgeMs (default: 30000): Auto-deletes snapshots after 30 seconds to
maintain recent-only history.
- maxSupplementalContext (default: 15): Limits `supplementalContext`
sent to API to 15 entries maximum.

### Changes
- Added new NextEditPrediction module in the CodeWhisperer package
- Updated activation code to initialize the NEP system
- Updated codeWhisper inline API requests to fit new format
---

- Treat all work as PUBLIC. Private `feature/x` branches will not be
squash-merged at release time.
- Your code changes must meet the guidelines in
[CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines).
- License: I confirm that my contribution is made under the terms of the
Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants