Skip to content

feat(fsharp): route .fsi files through the dedicated signature grammar#1162

Merged
carlos-alm merged 14 commits into
mainfrom
fix/1114-fsharp-signature-grammar
May 20, 2026
Merged

feat(fsharp): route .fsi files through the dedicated signature grammar#1162
carlos-alm merged 14 commits into
mainfrom
fix/1114-fsharp-signature-grammar

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

@carlos-alm carlos-alm commented May 19, 2026

Summary

  • Adds a dedicated fsharp-signature language id mapped to LANGUAGE_SIGNATURE (native) and tree-sitter-fsharp_signature.wasm (WASM), so .fsi files no longer go through the .fs source grammar.
  • Extends the shared F# extractor with a value_definition handler that fires only when the first child is the val keyword, distinguishing signature val foo : type from source let foo = ....
  • WASM build script now produces the signature grammar; AST rules registry adds the fsharp-signature id sharing the F# string config.
  • Function vs variable kind detection accepts both function_type (WASM npm 0.1.0) and curried_spec (cargo 0.3.0) shapes so the two engines stay in parity despite a grammar version skew (F# tree-sitter grammar version skew: npm 0.1.0 vs cargo 0.3.0 #1161 tracks the bump). Since merged: chore(fsharp): align npm grammar with cargo at v0.3.0 #1165 (stacked on this branch) bumps the npm grammar to v0.3.0 and removes the dual-shape detection, so only the curried_spec / arguments_spec path remains.

Test plan

  • cargo test --package codegraph-core extractors::fsharp (3 Rust unit tests: signature val extraction, bare val extraction, source .fs parity guard)
  • npx vitest run tests/parsers/fsharp-signature.test.ts (4 WASM tests: clean parse, bare val, nested module, error recovery)
  • npx vitest run tests/parsers/fsharp.test.ts (5 existing .fs tests still green — no regression)
  • cargo test --package codegraph-core parser_registry (ABI check covers LANGUAGE_SIGNATURE)
  • Full WASM test suite green (the 2 pre-existing failures in tests/unit/snapshot.test.ts are tracked under test: snapshot concurrent save tests fail locally on main #1117)
  • npm run lint clean on changed files

Closes #1114
Closes #1161

The tree-sitter-fsharp package ships two distinct grammars: LANGUAGE_FSHARP
for .fs / .fsx source files and LANGUAGE_SIGNATURE for .fsi signature
files. Both engines previously routed all three extensions through the
source grammar, so bare `val` declarations in .fsi files surfaced as
ERROR nodes and yielded no symbols.

This change adds a separate `fsharp-signature` language for .fsi:

* native: new `FSharpSignature` LanguageKind wired to LANGUAGE_SIGNATURE
* WASM: new `fsharp-signature` registry entry using
  tree-sitter-fsharp_signature.wasm (build script now produces it)
* shared F# extractor handles `value_definition` only when its first
  child is the `val` keyword, distinguishing signature `val foo : type`
  from source `let foo = ...`
* function vs variable kind is inferred from the type shape; supports
  both `function_type` (WASM npm 0.1.0) and `curried_spec` (cargo 0.3.0)
  node shapes for engine parity

docs check acknowledged: README's F# row already covers .fs/.fsx/.fsi and
the user-facing language count is unchanged; fsharp-signature is an
internal id that mirrors how ocaml-interface backs .mli files.

Closes #1114
@claude
Copy link
Copy Markdown

claude Bot commented May 19, 2026

Claude finished @carlos-alm's task —— View job


PR Review: F# Signature File Support

Todo List

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection - check for high-risk changes
  • Phase 3: Test integrity - verify test quality and coverage
  • Phase 4: Code quality review - analyze implementation details
  • Phase 5: Backlog compliance assessment - rate against project criteria
  • Final: Provide comprehensive review recommendation

Root Cause Analysis

Problem Being Solved: F# signature files (.fsi) were being parsed through the regular F# source grammar, which produces ERROR nodes for val declarations that are valid in signature files. This created parsing failures and prevented proper symbol extraction from signature files.

Why This Approach: Adding a dedicated fsharp-signature language with its own grammar file ensures .fsi files use the appropriate tree-sitter grammar designed for F# interface files. The shared extractor logic handles both source and signature files appropriately.

Risk Assessment: Low risk - this is an additive change that doesn't modify existing F# source file parsing.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added, uses existing tree-sitter infrastructure
  • Foundation-aligned: ✓ — Fully compliant with all core principles
  • Problem-fit: 4/5 — Directly improves parsing accuracy for F# signature files, reducing AI confusion about F# module interfaces
  • Breaking: No — Purely additive, existing .fs parsing unchanged
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a well-architected additive feature.

Code Quality Assessment

Excellent Architecture

  • Dual-engine parity: Both native (Rust) and WASM engines properly support the new language
  • Shared extractor logic: Reuses extractFSharpSymbols with signature-specific handling in handleValueDefinition
  • Grammar version compatibility: Handles differences between WASM (0.1.0) and native (0.3.0) grammars gracefully
  • Consistent AST configuration: Properly extends AST type maps and string configs

Comprehensive Implementation

The implementation touches all necessary layers:

  • Native: crates/codegraph-core/src/extractors/fsharp.rs:305-386 - Added handleValueDefinition with grammar-version compatibility
  • WASM: src/domain/parser.ts:840-845 - Added language registry entry
  • Build: scripts/build-wasm.ts:209 - Added signature grammar to build process
  • AST: src/ast-analysis/rules/index.ts:236 - Added AST type mapping

Excellent Test Coverage

Native tests (crates/codegraph-core/src/extractors/fsharp.rs:388-454):

  • signature_extracts_val_declarations() - Core functionality
  • signature_extracts_bare_val_declarations() - Edge case handling
  • source_grammar_does_not_extract_let_bindings_as_val() - Regression prevention

WASM tests (tests/parsers/fsharp-signature.test.ts):

  • Clean parse verification (no ERROR nodes)
  • Function vs variable kind detection
  • Nested module handling
  • Error recovery for unsupported constructs

Implementation Details

Smart Grammar Compatibility (lines fsharp.rs:357-386):

// Handles both grammar versions elegantly:
// WASM: looks for `function_type` nodes
// Native: looks for `curried_spec` containing `arguments_spec`
fn has_function_type(node: &Node) -> bool {
    // Both patterns checked for cross-engine compatibility
}

Proper Symbol Extraction (lines fsharp.ts:258-318):

  • Only processes val declarations (first child must be val keyword)
  • Distinguishes functions from variables using type signatures
  • Maintains module qualification consistency with source files

Regression Protection:

  • Test ensures .fs files don't extract let bindings as val declarations
  • Preserves all existing F# source parsing behavior

Minor Observations

  1. Documentation: The PR description is excellent and covers the compatibility handling thoroughly
  2. Test quality: Tests are focused and cover both happy path and edge cases
  3. Error handling: Graceful fallback for grammar parsing issues

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning: This is an exemplary implementation that:
    • Solves a real parsing problem for F# signature files
    • Maintains dual-engine parity with version-aware compatibility
    • Has comprehensive test coverage
    • Uses proper architecture patterns
    • Is purely additive with no breaking changes
    • Follows all project conventions and quality standards

The handling of grammar version differences between WASM and native engines demonstrates excellent engineering practices. This is the kind of high-quality, well-tested feature addition that strengthens the codebase.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR adds a dedicated fsharp-signature language ID wired to its own WASM grammar (tree-sitter-fsharp_signature.wasm) and the cargo LANGUAGE_SIGNATURE binding, so .fsi files are parsed by the correct signature grammar instead of the source grammar. It also extends the shared F# extractor (both Rust and TypeScript) with value_definition/module_defn handlers that enable extraction of val declarations and proper dotted-module qualification in signature files.

  • Adds FSharpSignature to LanguageKind, routes it through FSharpExtractor, and maps .fsi exclusively to the new entry in every relevant registry (parser.ts, wasm-worker-entry.ts, parser_registry.rs, ast-analysis/rules/index.ts, src/types.ts).
  • Implements handle_value_definition / handleValueDefinition (Rust + TS) that fire only when the first child is the val keyword, guarding .fs parity; and handle_module_defn / handleModuleDefn to qualify nested val declarations with their dotted module path.
  • Upgrades the npm tree-sitter-fsharp dependency from ^0.1.0 to a pinned v0.3.0 GitHub tarball (the release isn't on npm), with a dual-shape function-type detector to bridge the remaining node-shape skew between the WASM ionide tarball and the cargo crate at the same version tag.

Confidence Score: 5/5

Safe to merge — the changes are well-isolated to the F# extractor path and all affected registries are updated consistently.

All changed code paths are covered by new Rust unit tests and WASM integration tests. The val-first-child guard ensures no regression for .fs source files; the member_defn empirical finding is locked in by a regression test. The only finding is a stale sentence in a doc-comment that became inaccurate after the package bump within this same PR.

The enclosing_module_name doc-comment in fsharp.rs has a sentence that no longer accurately describes the WASM grammar behaviour after the 0.3.0 tarball upgrade.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/fsharp.rs Adds module_defn and value_definition handlers; enclosing_module_name now walks through both node kinds to build dotted paths. Logic and tests look correct; one doc-comment sentence is stale after the WASM 0.3.0 upgrade.
crates/codegraph-core/src/parser_registry.rs Adds FSharpSignature variant with correct extension mapping (.fsi), language string, LANGUAGE_SIGNATURE binding, all() entry, and count guard — all updated consistently.
src/extractors/fsharp.ts New handleModuleDefn and handleValueDefinition handlers with dual-shape function-type detection (WASM function_type vs cargo curried_spec/arguments_spec); both well-documented and exercised by tests.
src/domain/parser.ts Splits .fsi into a dedicated fsharp-signature registry entry pointing to the new WASM grammar; .fsx stays on the source grammar.
package.json Pins tree-sitter-fsharp to a GitHub tarball for v0.3.0 because the release isn't on npm; package-lock.json captures the integrity hash, though npm audit won't track this dependency.
tests/parsers/fsharp-signature.test.ts New WASM test suite covering clean parse, bare val, nested module qualification, and error-recovery; all targeted assertions look correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[".fsi file"] -->|extension lookup| B["fsharp-signature\nlanguage ID"]
    C[".fs / .fsx file"] -->|extension lookup| D["fsharp\nlanguage ID"]

    B --> E["tree-sitter-fsharp_signature.wasm\n(WASM)"]
    B --> F["LANGUAGE_SIGNATURE\n(native cargo)"]
    D --> G["tree-sitter-fsharp.wasm\n(WASM)"]
    D --> H["LANGUAGE_FSHARP\n(native cargo)"]

    E --> I["FSharpExtractor / extractFSharpSymbols"]
    F --> I
    G --> I
    H --> I

    I --> J{node.kind}
    J -->|value_definition\nfirst child = 'val'| K["handleValueDefinition\nfunction_type OR curried_spec+arguments_spec"]
    J -->|module_defn| L["handleModuleDefn\ndotted path accumulation"]
    J -->|named_module| M["handleNamedModule"]
    J -->|function_declaration_left| N["handleFunctionDecl"]

    K --> O["Definition: variable or function\nwith qualified name"]
    L --> P["Definition: module\n+ nextModule for children"]
Loading

Fix All in Claude Code

Reviews (13): Last reviewed commit: "Merge branch 'main' into fix/1114-fsharp..." | Re-trigger Greptile

Comment thread src/extractors/fsharp.ts
Comment on lines +264 to +270
function handleValueDefinition(
node: TreeSitterNode,
ctx: ExtractorOutput,
currentModule: string | null,
): void {
const first = node.child(0);
if (!first || first.type !== 'val') return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 val mutable class fields may produce false extractions in .fs source files

The value_definition handler now fires for any value_definition node whose first child is val. In F# source files, explicit class fields (val mutable count: int = 0) are also valid, and the source grammar may represent them as value_definition nodes with val as the first child—the same structure this handler targets. If so, these would silently be extracted as variable definitions in .fs files, a case not covered by the existing parity test (which only checks let x = 5). Adding a test like type C() =\n val mutable count: int = 0 through parse_source would confirm or rule out this path in both the Rust and TypeScript extractors.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirically verified in both grammars (cargo 0.3.0 and WASM 0.1.0): val mutable count: int = 0 inside a class is parsed as a member_defn node — NOT value_definition — so the new val-style handler never sees it. The first-child=val guard reliably distinguishes signature val declarations from any other shape.

Added regression tests in both engines (9cfee7b) so a future grammar change cannot silently start mis-classifying class fields as variables.

Comment thread src/extractors/fsharp.ts
Comment on lines +311 to +314
const name = currentModule ? `${currentModule}.${ident.text}` : ident.text;
ctx.definitions.push({
name,
kind: hasFunctionType ? 'function' : 'variable',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Module qualification silently dropped for val inside module Foo = ... signatures

In .fsi files, val add : int -> int nested inside module Foo = ... is indexed as add rather than Foo.add. The signature grammar's module-signature nodes appear to use a node kind other than named_module, so handleNamedModule never fires and currentModule stays null for these declarations. The test explicitly asserts name: 'add', confirming the behavior, but it means any consumer searching for the fully-qualified name Foo.add will miss it. Adding a comment to the test that this is a known limitation (and perhaps a follow-up ticket) would make the intent clearer. The same gap exists in the Rust extractor via enclosing_module_name.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this was a real bug, not a documentation gap. Fixed in 9cfee7b:

The cargo 0.3.0 signature grammar wraps module Foo = ... as a module_defn node (distinct from named_module), so enclosing_module_name never reached it. Both engines now handle module_defn, emit it as a module definition with the dotted parent path, and qualify nested val declarations as Foo.add. New native test signature_qualifies_val_inside_nested_module_defn covers this.

The WASM 0.1.0 grammar still emits ERROR nodes for the same construct, so the WASM-only test continues to assert add — clarifying comment now points at #1161 (the grammar version bump that will let WASM reach the new code path too).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Codegraph Impact Analysis

23 functions changed12 callers affected across 3 files

  • match_fsharp_node in crates/codegraph-core/src/extractors/fsharp.rs:19 (0 transitive callers)
  • enclosing_module_name in crates/codegraph-core/src/extractors/fsharp.rs:42 (4 transitive callers)
  • handle_module_defn in crates/codegraph-core/src/extractors/fsharp.rs:91 (1 transitive callers)
  • handle_value_definition in crates/codegraph-core/src/extractors/fsharp.rs:372 (1 transitive callers)
  • extract_value_name in crates/codegraph-core/src/extractors/fsharp.rs:409 (2 transitive callers)
  • has_function_type in crates/codegraph-core/src/extractors/fsharp.rs:417 (2 transitive callers)
  • parse_source in crates/codegraph-core/src/extractors/fsharp.rs:438 (2 transitive callers)
  • parse_signature in crates/codegraph-core/src/extractors/fsharp.rs:447 (3 transitive callers)
  • signature_extracts_val_declarations in crates/codegraph-core/src/extractors/fsharp.rs:457 (0 transitive callers)
  • signature_extracts_bare_val_declarations in crates/codegraph-core/src/extractors/fsharp.rs:474 (0 transitive callers)
  • source_grammar_does_not_extract_let_bindings_as_val in crates/codegraph-core/src/extractors/fsharp.rs:487 (0 transitive callers)
  • signature_qualifies_val_inside_nested_module_defn in crates/codegraph-core/src/extractors/fsharp.rs:500 (0 transitive callers)
  • source_grammar_does_not_extract_val_mutable_class_fields in crates/codegraph-core/src/extractors/fsharp.rs:518 (0 transitive callers)
  • extract_symbols_with_opts in crates/codegraph-core/src/extractors/mod.rs:69 (1 transitive callers)
  • LanguageKind.lang_id_str in crates/codegraph-core/src/parser_registry.rs:47 (0 transitive callers)
  • LanguageKind.from_extension in crates/codegraph-core/src/parser_registry.rs:89 (0 transitive callers)
  • LanguageKind.from_lang_id in crates/codegraph-core/src/parser_registry.rs:144 (0 transitive callers)
  • LanguageKind.tree_sitter_language in crates/codegraph-core/src/parser_registry.rs:187 (0 transitive callers)
  • LanguageKind.all in crates/codegraph-core/src/parser_registry.rs:235 (0 transitive callers)
  • all_kinds_listed_in_all in crates/codegraph-core/src/parser_registry.rs:280 (0 transitive callers)

…#1162)

Greptile review caught two .fsi extraction corners:

1. **Module qualification dropped for `val` inside `module Foo = ...`.**
   The cargo 0.3.0 signature grammar wraps nested signature modules in a
   `module_defn` node (distinct from `named_module`), so the existing
   `enclosing_module_name` walk never reached it and `val add : int -> int`
   was indexed as `add` instead of `Foo.add`. Both engines now handle
   `module_defn`, emit it as a `module` definition with the dotted parent
   path, and qualify nested `val` declarations accordingly.

   The WASM 0.1.0 signature grammar still emits ERROR nodes for the same
   construct, so the WASM-only test continues to assert `add` (with an
   explicit comment pointing at the grammar bump tracked under #1161).

2. **`val mutable count: int = 0` in `.fs` source files.** Empirically
   confirmed in both engines that the source grammar parses this as a
   `member_defn` node (NOT a `value_definition`), so the new `val`-style
   handler never sees it. Added regression tests in both engines so a
   future grammar change cannot silently start mis-classifying class
   fields as variables.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

carlos-alm and others added 5 commits May 19, 2026 16:24
* chore(fsharp): align npm grammar with cargo at v0.3.0

The WASM engine pulled tree-sitter-fsharp 0.1.0 from npm while the native
engine used 0.3.0 from crates.io. The two versions diverged in how they
parse type signatures in .fsi files: 0.1.0 emits `function_type` nodes
for `a -> b` types, while 0.3.0 wraps every signature in `curried_spec`
with `arguments_spec` children for function shapes.

The F# extractor was forced to detect both shapes simultaneously, which
is fragile — future grammar churn could silently desync further.

* package.json now installs tree-sitter-fsharp from the ionide v0.3.0
  GitHub tarball (npm has no 0.3.0 release; ionide is the upstream the
  cargo crate also tracks). Lockfile pins via SRI hash.
* Both extractors now check only `curried_spec` → `arguments_spec`,
  removing the dead `function_type` branch from each.

docs check acknowledged: README's F# row already covers .fs/.fsx/.fsi and
the user-facing language count is unchanged; the grammar version is an
internal implementation detail.

Closes #1161

* docs(fsharp): explain tree-sitter-fsharp tarball pin (#1165)
#1162)

The npm and cargo tree-sitter-fsharp 0.3.0 grammars — though sharing a
version tag — still emit type signatures with different node shapes:
WASM 0.3.0 produces `function_type` directly under `value_definition`,
while cargo 0.3.0 wraps every signature in `curried_spec` with
`arguments_spec` children for function types.

#1165 removed the `function_type` branch on the assumption that both
grammars had converged at v0.3.0, which broke WASM extraction: every
`val name : a -> b` declaration was being indexed as a `variable`
instead of a `function`. Restore the dual-shape detection in the
TypeScript extractor and update the documentation accordingly.

Also clarifies the nested-module test comment in fsharp-signature.test
to reflect that the WASM signature grammar is now at v0.3.0 but still
emits ERROR nodes for `module Foo = ...` (the fix is still pending,
tracked under #1161).
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile latest review:

  • tests/parsers/fsharp-signature.test.ts:45-51 — Updated the comment to reflect that the WASM signature grammar is now at v0.3.0 but still emits ERROR nodes for module Foo = .... The trigger condition is no longer "npm bumps to 0.3.0+" but rather "the signature grammar emits module_defn for nested modules". Still tracked under F# tree-sitter grammar version skew: npm 0.1.0 vs cargo 0.3.0 #1161.

While verifying the test change, I discovered that the merge of #1165 into this branch silently broke WASM extraction — val name : a -> b declarations were being indexed as variable rather than function. The npm and cargo 0.3.0 grammars still emit different shapes (npm uses function_type, cargo uses curried_spec), so the simplification in #1165 was a regression. Restored the dual-shape detection in src/extractors/fsharp.ts in c12bb60.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

carlos-alm and others added 3 commits May 19, 2026 21:04
…1162)

The WASM tree-sitter-fsharp signature grammar was upgraded from v0.1.0
to v0.3.0 in adcaf40. v0.3.0 emits `module_defn` for nested
`module Foo = ...` blocks (v0.1.0 emitted ERROR nodes), so the existing
qualification logic now fires for the WASM engine too — `val` symbols
get the parent module prefix in both engines.

The signature test still expected the pre-bump behaviour (bare `add`),
which made it fail in CI where the grammar bump landed. Update the
assertion to lock in engine parity:
  - assert the qualified `Foo.add` function and the outer `Foo` module
  - assert the unqualified `add` is NOT emitted, so any future
    regression where the walker drops the enclosing module is caught

Also refresh the `module_defn` comment in src/extractors/fsharp.ts —
it still claimed the WASM grammar emitted ERROR nodes for this
construct, which became stale after the v0.3.0 bump.
@carlos-alm carlos-alm merged commit 1a6ee7b into main May 20, 2026
28 checks passed
@carlos-alm carlos-alm deleted the fix/1114-fsharp-signature-grammar branch May 20, 2026 04:51
@github-actions github-actions Bot locked and limited conversation to collaborators May 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

F# tree-sitter grammar version skew: npm 0.1.0 vs cargo 0.3.0 follow-up: F# .fsi signature files parsed with main F# grammar (both engines)

1 participant