feat: extend Layer 1 file-ref validator to scan CSV workflow-file references #1573

arcaven · 2026-02-06T23:10:00Z

What

Extends tools/validate-file-refs.js (the Layer 1 file reference validator) to scan CSV files for broken _bmad/ workflow-file references. Adds .csv to the scan pipeline with a dedicated extractCsvRefs() function, 7 test fixtures, and a standalone test runner.

Why

After the Layer 2 schema validation work in PR #1529 (closed — being re-planned for the workflow.yaml → workflow*.md migration), analysis of community issue patterns revealed an additional class of repeat bugs: broken or incorrect paths in CSV manifest and help files. These CSV references were invisible to Layer 1 because it only scanned YAML, markdown, and XML. This PR closes that gap.

Additional schema-aware CSV validation (column-level type checking, enum validation) is planned for Layer 2. This PR focuses on structural file-reference checking, which fits naturally in Layer 1.

                ┌─────────────┐
                │   Layer 3   │  Graph Validation
                │             │  Step transitions, reachability
                │             │  (planned)
            ┌───┴─────────────┴───┐
            │      Layer 2        │  Schema Validation
            │                     │  YAML field types, required fields, enums
            │                     │  ❌ CLOSED (PR #1529 ) legacy yaml
            │                     │  (planned — workflow*.md validation)
        ┌───┴─────────────────────┴───┐
        │          Layer 1            │  File Reference Validation
        │                             │  Cross-file refs, path resolution
        │                             │  ✅ MERGED (PR #1494)
        │                             │  + CSV workflow-file scan  ◄── THIS PR
    ┌───┴─────────────────────────────┴───┐
    │              Layer 0                │  Formatting & Linting
    │                                     │  Prettier, ESLint, markdownlint
    │                                     │  ✅ EXISTING
    └─────────────────────────────────────┘

NOTE: this shows one validator at each level, but I have plans for:

Layer 2: Schema Validity (per artifact type)

ID	Validator	Target Artifact	Status
2a	`validate-agent-schema.js`	`*.agent.yaml`	✅ Exists
2b	Workflow Schema	`workflow*.md` frontmatter	Backlog
2c	CSV Schema	`module-help.csv` catalog	Backlog

Layer 3: Structural/Semantic Validation

ID	Validator	Scope	Status
3a	Workflow Graph	Workflow internal graph (steps, gotos, invokes)	Backlog
3b	(Entity Refs	Cross-entity refs (agent↔CSV↔workflow)	Backlog

Summary

Layer	Exists	Planned	Total
0	4	0	4
1	1	0	1
2	1	2	3
3	0	2	2
Total	6	4	10

Issues and bugs this class of validator addresses

Issue	Status	What happened	Coverage
#1519	Open	`bmad-help.csv` has incorrect command for Create Brief workflow	Inspiring issue. Workflow-file column validation catches broken `_bmad/` paths; command column validation planned for Layer 2
#1136	Closed	`task-manifest.csv` references non-existent `daily-standup.xml` using old `.bmad` prefix	Direct catch. File ref validator flags the broken path
#1070	Closed	Installer writes `workflow-manifest.csv` with column count mismatch (5 vs 4 cols)	Partial. `relax_column_count` handles gracefully; schema validation is Layer 2
#1097	Closed	Upgrade fails with CSV "Invalid Record Length" — schema mismatch	Same class as #1070
#832	Closed	CIS workflow CSV files have formatting errors and missing column data	Partial. Parser handles gracefully; structural validation is Layer 2
#1536	Open	Validate Epics shares command with Create — CSV command column mismatch	Layer 2 scope. Column-value validation planned
#1260	Closed	`shard-doc` task orphaned: manifest mismatch	Direct catch for file-path columns
Unreported	Fixed	`731bee26`: `module-help.csv` referenced `workflow-create-prd.md` before file existed; fixed same-day in `bd620e38`	Direct catch. Validator would have flagged this at commit time

Current validation status

Files scanned: 212 (including 10 CSV files)
References checked: 501
Broken references: 2 (pre-existing, not CSV-related — see #1530)
Absolute path leaks: 0

All CSV workflow-file references resolve correctly.

The 2 broken refs are core/tasks/validate-workflow.xml referenced from create-story/checklist.md — tracked in #1530, pre-existing.

How

Added csv-parse/sync import (already a dependency, v6.1.0) and .csv to SCAN_EXTENSIONS
Created extractCsvRefs(filePath, content) — parses CSV with {columns: true, skip_empty_lines: true, relax_column_count: true}, extracts workflow-file column values as type: 'project-root' refs
Added else if (ext === '.csv') dispatch in the main scan loop (without this, CSV falls through to extractMarkdownRefs)
Wrapped main execution in require.main === module guard for testability, exported extractCsvRefs
7 test fixtures: 3 valid (bmm-style, core-style, minimal) + 4 invalid (no-workflow-column, empty-data, all-empty-workflow, unresolvable-vars)
Added test:refs npm script and file-pattern-to-validator mapping table in CONTRIBUTING.md

Design decisions

Column scope: workflow-file only (v1). Other columns don't contain file refs.
Ref type: Reuses existing project-root type — no new resolution logic needed.
Parser options: relax_column_count: true handles the trailing commas in src/bmm/module-help.csv.
require.main guard: Standard Node.js pattern. Allows require() for unit testing without triggering the scan + process.exit().

Why is this safe to adopt

Adapted from the original Layer 1 PR (#1494) — same design principles apply.

The validator runs in warning mode by default (exit 0). Broken references appear in the build log for visibility, but build results are unaffected. No existing CI checks, pre-commit hooks, or npm scripts are modified in behavior. The validator is purely additive.

Every existing CI check continues to enforce exactly as before:

CI check	Before this PR	After this PR
Prettier, ESLint, markdownlint	Enforced (exit 1)	Enforced (exit 1)
Schema validation, agent tests, install tests	Enforced (exit 1)	Enforced (exit 1)
File ref validation (CSV extension)	Scans YAML/MD/XML only	+ CSV scanning (same exit behavior)

The test:refs unit tests validate the extraction function in isolation — they test code correctness, not file resolution. The full scan (npm run validate:refs) remains warning-only.

When ready to enforce strict mode:

- "validate:refs": "node tools/validate-file-refs.js"
+ "validate:refs": "node tools/validate-file-refs.js --strict"

The validator only reads files. It makes no changes to disk.

What's next — and why BMAD-METHOD first

BMAD-METHOD is where the formats are defined, the primary install source, and where most community contributions land as direct PRs. Errors introduced here bypass any protections bmad-builder might provide for the subset of the community using it to generate content. CI validators in this repo catch those errors at the PR stage, before they reach users. That's why the validation pipeline starts here.

Layer 2 schema-aware CSV validation (column types, command uniqueness, enum constraints) will follow when the workflow.yaml → workflow*.md migration is complete. This PR handles only structural file-reference integrity, which is Layer 1's job.

sequenceDiagram
    participant BM as BMAD-METHOD
    participant BMB as bmad-builder

    Note over BM: ✅ Layer 0 — Formatting & Linting<br/>Existing

    Note over BM: ✅ Layer 1 — File Reference Validator<br/>PR #1494 merged

    rect rgba(0, 128, 255, 0.1)
    BM->>BM: Layer 1 + CSV extension<br/>PR #1573 (this PR)
    end

    BM->>BM: Layer 2 — Schema Validator<br/>(planned — replanning for workflow*.md)

    BM->>BM: Layer 3 — Graph Validator (planned)

    Note over BM,BMB: ── community checkpoint ──

    BM-->>BMB: Extend validators with --installed mode<br/>CI validation of BMB source files

    BMB->>BMB: Post-generation validation<br/>Run Layers 1-3 on builder output

Testing

7/7 CSV fixture tests passing (node test/test-file-refs-csv.js)
npm run validate:refs scans 212 files including 10 CSV files, 501 refs checked
All CSV workflow-file paths resolve correctly (zero new broken refs)
52/52 agent schema tests passing (no regression)
12/12 installation component tests passing
ESLint, Prettier, markdownlint all clean

Refs: #1519, #1136, #1070, #1097, #832, #1536, #1260

coderabbitai · 2026-02-06T23:14:51Z

📝 Walkthrough

Walkthrough

The PR extends the file-reference validator with CSV support, adding CSV parsing capabilities to extract references from CSV files with workflow columns, a corresponding test suite using fixtures to validate the CSV extraction logic, a new npm script to run the tests, and documentation updates describing the new validation capability.

Changes

Cohort / File(s)	Summary
CSV Reference Validation Implementation `tools/validate-file-refs.js`, `test/test-file-refs-csv.js`	Adds CSV parsing via csv-parse/sync and new extractCsvRefs function to extract references from CSV files with workflow-file columns. Implements comprehensive test suite with valid/invalid fixtures covering CSV formats, template variables, and edge cases. Exports extractCsvRefs for modular usage.
Documentation `CONTRIBUTING.md`	Adds new "Validate file references" section with command snippet and "File-Pattern-to-Validator Mapping" table documenting validation coverage for YAML, Markdown, XML, and CSV file patterns.
Build Configuration `package.json`	Adds new npm script `test:refs` pointing to the CSV reference validation test runner.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: cross-file reference validator for BMAD source files #1494: Introduces the initial validator tool and CI/script infrastructure that this PR extends with CSV parsing, extractCsvRefs export, test suite, and documentation.

Suggested reviewers

alexeyv

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically describes the main change: extending the Layer 1 file reference validator to support CSV files with workflow-file references.
Description check	✅ Passed	The pull request description comprehensively explains what changes were made (CSV scanning support), why it was needed (to catch broken file references in CSV files), and how it was implemented, directly relating to the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@package.json`:
- Around line 48-52: The aggregate "test" script in package.json omits the new
"test:refs" task so npm test won't run the CSV extraction tests; update the
"test" script to include "npm run test:refs" (e.g., add "&& npm run test:refs"
into the "test" script sequence alongside "test:schemas", "test:install",
"validate:schemas", "lint", "lint:md", and "format:check") so that running npm
test executes the "test:refs" job as well.

🧹 Nitpick comments (2)

tools/validate-file-refs.js (2)
296-327: CSV refs always typed as project-root — consider future-proofing.

All extracted CSV refs are hardcoded to type: 'project-root' (Line 323). If a workflow-file cell ever contains a relative path (e.g., ./local-workflow.md), resolveRef would route it through mapInstalledToSource, which would produce an incorrect resolved path. This is fine for v1 scope limited to _bmad/ prefixed paths, but worth a comment or a guard.
Optional: detect relative refs
     if (!isResolvable(raw)) continue;

     // Line = header (1) + data row index (0-based) + 1
     const line = i + 2;
-    refs.push({ file: filePath, raw, type: 'project-root', line });
+    const type = raw.startsWith('./') || raw.startsWith('../') ? 'relative' : 'project-root';
+    refs.push({ file: filePath, raw, type, line });
388-389: Only extractCsvRefs is exported — intentional?

extractYamlRefs and extractMarkdownRefs are not exported, which means they can't be unit-tested in isolation. If this is intentional for now that's fine, but consider exporting them too for consistency and testability.

package.json

coderabbitai · 2026-02-06T23:35:11Z

Tip

For best results, initiate chat on the files or code changes.

@arcaven, excellent! Thank you for adding test:refs to the aggregate test script in f15d773. Having it run alongside the other test suites ensures comprehensive validation coverage with a single npm test command. 🎉

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

augmentcode · 2026-02-07T00:30:01Z

🤖 Augment PR Summary

Summary: This PR extends the Layer 1 file reference validator to detect broken workflow file references inside CSV catalogs.

Changes:

Adds .csv to the scan pipeline and dispatches CSV files to a new extractCsvRefs() extractor.
Introduces CSV parsing via csv-parse/sync and extracts workflow-file column values as project-root references.
Wraps CLI execution in a require.main === module guard and exports extractCsvRefs for unit testing.
Adds a dedicated test runner (test/test-file-refs-csv.js) plus CSV fixtures to validate extraction behavior.
Updates package.json to include test:refs and ensures it runs as part of npm test.
Documents the file-pattern-to-validator mapping in CONTRIBUTING.md.

Technical Notes: CSV scanning is currently focused on the workflow-file column and uses relaxed column-count parsing to accommodate known trailing-comma formats.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

tools/validate-file-refs.js

test/test-file-refs-csv.js

Add CSV file reference extraction to the Layer 1 validation pipeline, preventing broken _bmad/ workflow-file paths in module-help.csv files. Closes the gap identified after PR bmad-code-org#1529 where CSV references were unvalidated despite being a source of repeat community issues. Refs: bmad-code-org#1519

Add CSV file-ref extraction tests to the aggregate `npm test` pipeline, matching the existing pattern for test:schemas and test:install. Thanks to CodeRabbit for catching the omission.

alexeyv

Another deterministic quality gate. Love it!
Code looks fine, with a couple of questions.

tools/validate-file-refs.js

- Surface CSV parse errors visibly instead of silently swallowing (no Layer 2c schema validator exists yet to catch these) - Add explanatory comments for the !VERBOSE logging pattern (non-verbose prints file headers only when issues found) - Add verbose-mode diagnostics for extensionless path handling ([SKIP] when nothing exists, [OK-DIR] for valid directories)

Replace the split header-printing logic (print early in verbose mode, print late in non-verbose mode with a !VERBOSE guard) with a simpler collect-then-print approach. Refs are now classified into ok[] and broken[] arrays first, then printed in a single location with one straightforward if/else if decision. Addresses alexeyv's review feedback about the counterintuitive "if not verbose, log" pattern.

…ESOLVED] Paths without file extensions that don't exist as files or directories are now flagged as [UNRESOLVED] — a distinct tag from [BROKEN] (which means a file with a known extension wasn't found). Both count toward the broken reference total and appear in CI annotations. This catches real bugs like wrong directory names in installed_path metadata and dead invoke-workflow references to removed workflows. Extensionless paths that DO exist as directories are still [OK-DIR].

alexeyv

Looks like a merge to me

arcaven marked this pull request as draft February 6, 2026 23:10

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

package.json Outdated Show resolved Hide resolved

arcaven marked this pull request as ready for review February 7, 2026 00:27

augmentcode bot reviewed Feb 7, 2026

View reviewed changes

tools/validate-file-refs.js Show resolved Hide resolved

tools/validate-file-refs.js Outdated Show resolved Hide resolved

test/test-file-refs-csv.js Show resolved Hide resolved

arcaven added 2 commits February 6, 2026 23:41

fix: include test:refs in aggregate test script

ac156fc

Add CSV file-ref extraction tests to the aggregate `npm test` pipeline, matching the existing pattern for test:schemas and test:install. Thanks to CodeRabbit for catching the omission.

arcaven force-pushed the feat/csv-file-ref-validation branch from 4b19304 to ac156fc Compare February 7, 2026 05:42

Merge branch 'main' into feat/csv-file-ref-validation

e2acf1e

alexeyv requested changes Feb 7, 2026

View reviewed changes

tools/validate-file-refs.js Outdated Show resolved Hide resolved

tools/validate-file-refs.js Show resolved Hide resolved

tools/validate-file-refs.js Show resolved Hide resolved

arcaven added 3 commits February 7, 2026 13:59

arcaven requested a review from alexeyv February 7, 2026 20:30

Merge branch 'main' into feat/csv-file-ref-validation

8c4058b

alexeyv approved these changes Feb 8, 2026

View reviewed changes

Merge branch 'main' into feat/csv-file-ref-validation

1f97b1b

bmadcode merged commit 24cf444 into bmad-code-org:main Feb 8, 2026
5 checks passed

arcaven mentioned this pull request Feb 8, 2026

feat: add file reference validator (MSSCI-14579) bmad-code-org/bmad-builder#8

Merged

5 tasks

coderabbitai bot mentioned this pull request Feb 11, 2026

validate-file-refs.js runs warning-only — broken refs never fail CI #1626

Open

Uh oh!

feat: extend Layer 1 file-ref validator to scan CSV workflow-file references #1573

feat: extend Layer 1 file-ref validator to scan CSV workflow-file references #1573

Uh oh!

Conversation

arcaven commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Layer 2: Schema Validity (per artifact type)

Layer 3: Structural/Semantic Validation

Summary

Issues and bugs this class of validator addresses

Current validation status

How

Design decisions

Why is this safe to adopt

What's next — and why BMAD-METHOD first

Testing

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Feb 6, 2026

Uh oh!

augmentcode bot commented Feb 7, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexeyv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexeyv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arcaven commented Feb 6, 2026 •

edited

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading