Incremental Parsing by EagleoutIce · Pull Request #2387 · flowr-analysis/flowr

EagleoutIce · 2026-03-11T14:38:30Z

No description provided.

Remove condition that the file has to be in the consideredFilesList of files context as only parsing a file does not add it to consideredFilesList

Rework the incremental parsing handoff so (1) invalidation no longer only prepares incremental reparses for changed files while forcing unchanged files through a full parse again and (2) invalidation no longer computes eager reparse metadata inside FlowrAnalyzerCache. Previously, a mixed update such as “A changed, B unchanged” behaved like this: - reparse info was generated for A - no reparse info was generated for B - the cache pipeline was rebuilt - A was reparsed incrementally - B was parsed from scratch The new architecture fixes that by treating the previous successful parse run as the baseline for the next one: - File invalidation events now record the file path together with the old source text in the incremental analysis context. - FlowrAnalyzerCache snapshots the latest completed Tree-sitter parse results after successful parse-oriented runs. - On the next parse request, TreeSitterExecutor derives reparse info lazily from: - the old parse tree - the old source text, if the file was invalidated - the current file content - Changed files get a minimal edit region and are reparsed incrementally. - Unchanged files now reuse their previous parse tree directly instead of being parsed again from scratch. - Parser call sites, invalidation plumbing, and documentation were updated to support this context-driven flow. Net effect: incremental parsing now correctly handles mixed workloads by incrementally reparsing changed files while reusing old parse results for unchanged files.

Adapt the incremental parsing tests to the new architecture where incremental state is derived lazily from the previous successful parse run instead of from eagerly stored reparse info. The tests no longer inspect the removed reparseInfoMap. Instead they now: - capture the previous Tree-sitter trees after the first analysis run - invalidate files by updating their content - verify that invalidation clears the current parse pipeline - trace the second parse run to observe which previous tree is reused for which file - assert that changed files use their own previous tree for incremental reparsing - assert that unchanged files reuse their previous tree directly without reparsing This specifically covers the mixed case that motivated the refactor: when one file changes and another does not, the changed file must be reparsed incrementally and the unchanged file must keep its old parse tree rather than being parsed from scratch.

Group incremental parsing tests by one vs multiple update sets, remove the old broad successive-state cases, and add focused multi-step edge cases for single-file and multi-file analyzer reuse.

Copilot

Pull request overview

This PR introduces incremental parsing support (Tree-sitter) by propagating file invalidation events through the analyzer/cache and reusing prior parse trees + minimal edit regions, with new tests validating correctness against full reparses.

Changes:

Add file-level invalidation events and an incremental analysis context to carry old file contents + previous parse trees between runs.
Update parser plumbing to pass analyzer context + filePath into parse calls, enabling Tree-sitter incremental reparsing.
Add unit + integration tests to verify edit-region computation and incremental parsing equivalence to full parsing.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
test/functionality/incremental/incremental-parsing.test.ts	New end-to-end incremental parsing scenarios (tree reuse + result equivalence vs full parse).
test/functionality/incremental/edit-computation.test.ts	Unit tests for minimal edit-region computation.
src/r-bridge/shell.ts	Mark RShell as non-incremental.
src/r-bridge/shell-executor.ts	Mark RShellExecutor as non-incremental.
src/r-bridge/parser.ts	Extend parser interface to accept context + optional filePath; pass context through parseRequests.
src/r-bridge/lang-4.x/tree-sitter/tree-sitter-executor.ts	Implement incremental Tree-sitter parsing via previous tree reuse + computed edit region.
src/project/plugins/file-plugins/files/flowr-sweave-file.ts	Adjust FlowrFile base type usage for Sweave wrapper.
src/project/plugins/file-plugins/files/flowr-rmarkdown-file.ts	Adjust FlowrFile base type usage for Rmd wrapper.
src/project/plugins/file-plugins/files/flowr-jupyter-file.ts	Adjust FlowrFile base type usage for Jupyter wrapper.
src/project/incremental/incremental-parse/incremental-parse.ts	Compute incremental reparse info from context (previous tree + old/new content).
src/project/incremental/incremental-parse/edit-computation.ts	Compute a minimal Tree-sitter `Edit` region from old/new text.
src/project/flowr-analyzer.ts	Add invalidation event receiver; route parseStandalone through ctx-aware parser APIs.
src/project/flowr-analyzer-builder.ts	Add documentation for builder `configure`.
src/project/context/flowr-file.ts	Add invalidation callbacks + inline content update that triggers invalidation.
src/project/context/flowr-analyzer-meta-context.ts	Make meta context respond to invalidation events.
src/project/context/flowr-analyzer-incremental-analysis-context.ts	New context to store old contents + prior parse trees for incremental runs.
src/project/context/flowr-analyzer-files-context.ts	Register file invalidation handlers; expose `getFile`; handle invalidation events.
src/project/context/flowr-analyzer-dependencies-context.ts	Make deps context respond to invalidation events.
src/project/context/flowr-analyzer-context.ts	Wire incremental context into the analyzer context; propagate invalidation events; reset via Full invalidation.
src/project/cache/flowr-cache.ts	Generalize invalidation events (Full + FileInvalidate) and propagate to dependents.
src/project/cache/flowr-analyzer-cache.ts	Store parse snapshots for incremental reuse; handle new invalidation event types.
src/documentation/wiki-analyzer.ts	Document incremental analysis context + incremental parsing behavior.
src/dataflow/internal/process/functions/call/built-in/built-in-source.ts	Pass analyzer context into parser.parse calls.
src/config.ts	Add `this: void` to FlowrConfig helper methods to prevent accidental `this` binding.
package.json	Add `diff` dependency.
package-lock.json	Update lock entries for `diff`.

Comments suppressed due to low confidence (3)

src/project/cache/flowr-analyzer-cache.ts:164

parse/normalize/dataflow capture const d = this.get() before calling runTapeUntil(force, ...). If force is true, runTapeUntil resets/recreates the pipeline, so d refers to the old pipeline output and the until predicate will never observe the new results (or may return stale results). Consider moving the get() call inside runTapeUntil/until (e.g., () => this.get().parse) or performing the reset before capturing the output object.

	public async parse(force?: boolean): Promise<NonNullable<AnalyzerCacheType<Parser>['parse']>> {
		const d = this.get();
		return this.runTapeUntil(force, () => d.parse);
	}

	/**
	 * Get the parse output for the request if already available, otherwise return `undefined`.
	 * This will not trigger a new parse.
	 * @see {@link FlowrAnalyzerCache#parse} - to get the parse output, parsing if necessary.
	 */
	public peekParse(): NonNullable<AnalyzerCacheType<Parser>['parse']> | undefined {
		return this.get()?.parse;
	}

	/**
	 * Get the normalized abstract syntax tree for the request, normalizing if necessary.
	 * @param force - Do not use the cache, instead force new analyses.
	 * @see {@link FlowrAnalyzerCache#peekNormalize} - to get the normalized AST if already available without triggering a new normalization.
	 */
	public async normalize(force?: boolean): Promise<NonNullable<AnalyzerCacheType<Parser>['normalize']>> {
		const d = this.get();
		return this.runTapeUntil(force, () => d.normalize);
	}

	/**
	 * Get the normalized abstract syntax tree for the request if already available, otherwise return `undefined`.
	 * This will not trigger a new normalization.
	 * @see {@link FlowrAnalyzerCache#normalize} - to get the normalized AST, normalizing if necessary.
	 */
	public peekNormalize(): NonNullable<AnalyzerCacheType<Parser>['normalize']> | undefined {
		return this.get()?.normalize;
	}

	/**
	 * Get the dataflow graph for the request, computing if necessary.
	 * @param force - Do not use the cache, instead force new analyses.
	 * @see {@link FlowrAnalyzerCache#peekDataflow} - to get the dataflow graph if already available without triggering a new computation.
	 */
	public async dataflow(force?: boolean): Promise<NonNullable<AnalyzerCacheType<Parser>['dataflow']>> {
		const d = this.get();
		return this.runTapeUntil(force, () => d.dataflow);
	}

src/project/context/flowr-analyzer-files-context.ts:246

addFile registers an onInvalidate handler unconditionally. If the same FlowrFileProvider instance is added again (allowed when exist === f), this will attach duplicate handlers and cause multiple invalidation cascades per edit. Consider only registering the handler when the file is first added, or make addOnInvalidate idempotent (e.g., store handlers in a Set).

	public addFile(file: string | FlowrFileProvider | RParseRequestFromFile, roles?: readonly FileRole[]) {
		const f = this.fileLoadPlugins(wrapFile(file, roles));

		f.addOnInvalidate(c => {
			this.context.analyzer?.receive(c);
		});

		if(f.path() === FlowrFile.INLINE_PATH) {
			this.inlineFiles.push(f);
		} else {
			const exist = this.files.get(f.path());
			guard(exist === undefined || exist === f, `File ${f.path()} already added to the context.`);
			this.files.set(f.path(), f);
		}

package.json:214

The PR adds the diff dependency, but there are no imports/usages of diff in the repository. If it’s not needed anymore, please remove it from dependencies (and the lockfile); otherwise, consider adding the usage in this PR so the new dependency is justified.

  "dependencies": {
    "@eagleoutice/tree-sitter-r": "^1.1.2",
    "@jupyterlab/nbformat": "^4.5.4",
    "@xmldom/xmldom": "^0.9.7",
    "clipboardy": "^4.0.0",
    "command-line-args": "^6.0.1",
    "command-line-usage": "^7.0.3",
    "commonmark": "^0.31.2",
    "dagre": "^0.8.5",
    "diff": "^8.0.3",
    "gray-matter": "^4.0.3",
    "joi": "^18.0.1",
    "lz-string": "^1.5.0",
    "n-readlines": "^1.0.3",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T22:20:51Z

 	public parseStandalone(data: `${typeof fileProtocol}${string}` | string | RParseRequest): Tree {
+		console.log('parseStandalone');
 		const request = isParseRequest(data) ? data : requestFromInput(data);
 		if(this.parser.name === 'tree-sitter') {


Leftover console.log('parseStandalone') will print on every standalone parse (including library consumers) and is inconsistent with the rest of the codebase’s logging. Please remove it or use the existing logger infrastructure if you need trace output.

Copilot · 2026-04-08T22:20:51Z

+export function computeEditRegion(oldContent: string, newContent: string): Parser.Edit {
+	const oldLen = oldContent.length;
+	const newLen = newContent.length;
+
+	// 1) Longest common prefix
+	let startIndex = 0;
+	while(
+		startIndex < oldLen &&
+		startIndex < newLen &&
+		oldContent[startIndex] === newContent[startIndex]
+	) {
+		startIndex++;
+	}
+
+	// 2) Longest common suffix, without overlapping the prefix
+	let oldSuffixIndex = oldLen;
+	let newSuffixIndex = newLen;
+	while(
+		oldSuffixIndex > startIndex &&
+		newSuffixIndex > startIndex &&
+		oldContent[oldSuffixIndex - 1] === newContent[newSuffixIndex - 1]
+	) {
+		oldSuffixIndex--;
+		newSuffixIndex--;
+	}
+
+	const oldEndIndex = oldSuffixIndex;
+	const newEndIndex = newSuffixIndex;
+
+	return {
+		startIndex,
+		oldEndIndex,
+		newEndIndex,
+		startPosition:  indexToPoint(oldContent, startIndex),
+		oldEndPosition: indexToPoint(oldContent, oldEndIndex),
+		newEndPosition: indexToPoint(newContent, newEndIndex),
+	};


computeEditRegion calls indexToPoint three times, and indexToPoint scans from the start of the text each time. For larger files this makes edit computation O(n) with a larger constant factor on every incremental parse. Consider computing row/column positions in a single pass up to max(startIndex, oldEndIndex, newEndIndex) or caching newline offsets to avoid repeated full scans.

Copilot · 2026-04-08T22:20:52Z

+export class FlowrSweaveFile extends FlowrFile {
 	private readonly wrapped: FlowrFileProvider<string>;
 	private data?:            SweaveInfo;


This change drops the explicit FlowrFile<string> generic and weakens the content() return type to StringableContent, which reduces type safety for callers. If the content is always textual here, consider keeping extends FlowrFile<string> for a more precise API.

Copilot · 2026-04-08T22:20:52Z

+export class FlowrRMarkdownFile extends FlowrFile {
 	private data?:            RmdInfo;
 	private readonly wrapped: FlowrFileProvider<string>;



This change drops the explicit FlowrFile<string> generic and weakens the content() return type to StringableContent, which reduces type safety for callers. If the content is always textual here, consider keeping extends FlowrFile<string> for a more precise API.

Copilot · 2026-04-08T22:20:52Z

+export class FlowrJupyterFile extends FlowrFile {
 	private readonly wrapped: FlowrFileProvider<string>;



This change drops the explicit FlowrFile<string> generic and weakens the content() return type to StringableContent, which reduces type safety for callers. If the content is always textual here, consider keeping extends FlowrFile<string> for a more precise API.

EagleoutIce and others added 19 commits March 6, 2026 20:15

refactor: more doc for configure

8e006a4

refactor: this: void for FlowrConfig helper

fc133cb

refactor: a first wrong incrementality

448776e

feat-fix: implement real incremental parsing with TreeSitter

b877c4c

test: add incremental parsing tests

1857dad

refactor: edit computation into separate file

984c3b1

test: edit computation

e278045

feat-fix: inc context now stores multiple ReparseInfo

68502d5

feat: extend coarseCheckWhetherToInvalidate

7ea0881

feat: reset() in FlowrAnalyzerContext fires InvalidationEventType.Full

c7ad0bc

test-fix: check if incremental parse was attempted

889bc91

doc(wiki): add section for FlowrAnalyzerIncrementalAnalysisContext

22413ae

feat-fix: add receive(event) call for ctx

5b6ee14

lint-fix: remove implemented TODO

af3d393

feat-fix: only reset contexts of analyzer on InvalidationEventType.Full

26bcdce

feat-fix: coarseCheckWhetherToInvalidate

817f912

Remove condition that the file has to be in the consideredFilesList of files context as only parsing a file does not add it to consideredFilesList

test-fix: restructure incremental parsing scenarios

4b9d1af

Group incremental parsing tests by one vs multiple update sets, remove the old broad successive-state cases, and add focused multi-step edge cases for single-file and multi-file analyzer reuse.

EagleoutIce requested a review from Copilot April 8, 2026 22:14

Copilot started reviewing on behalf of EagleoutIce April 8, 2026 22:15 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental Parsing#2387

Incremental Parsing#2387
EagleoutIce wants to merge 19 commits intomainfrom
staging/incremental-analysis

EagleoutIce commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		export class FlowrJupyterFile extends FlowrFile {
		private readonly wrapped: FlowrFileProvider<string>;

Conversation

EagleoutIce commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants