Skip to content

Conversation

@arcaven
Copy link
Contributor

@arcaven arcaven commented Feb 6, 2026

What

Extends tools/validate-file-refs.js (the Layer 1 file reference validator) to scan CSV files for broken _bmad/ workflow-file references. Adds .csv to the scan pipeline with a dedicated extractCsvRefs() function, 7 test fixtures, and a standalone test runner.

Why

After the Layer 2 schema validation work in PR #1529 (closed — being re-planned for the workflow.yamlworkflow*.md migration), analysis of community issue patterns revealed an additional class of repeat bugs: broken or incorrect paths in CSV manifest and help files. These CSV references were invisible to Layer 1 because it only scanned YAML, markdown, and XML. This PR closes that gap.

Additional schema-aware CSV validation (column-level type checking, enum validation) is planned for Layer 2. This PR focuses on structural file-reference checking, which fits naturally in Layer 1.

                ┌─────────────┐
                │   Layer 3   │  Graph Validation
                │             │  Step transitions, reachability
                │             │  (planned)
            ┌───┴─────────────┴───┐
            │      Layer 2        │  Schema Validation
            │                     │  YAML field types, required fields, enums
            │                     │  ❌ CLOSED (PR #1529 ) legacy yaml
            │                     │  (planned — workflow*.md validation)
        ┌───┴─────────────────────┴───┐
        │          Layer 1            │  File Reference Validation
        │                             │  Cross-file refs, path resolution
        │                             │  ✅ MERGED (PR #1494)
        │                             │  + CSV workflow-file scan  ◄── THIS PR
    ┌───┴─────────────────────────────┴───┐
    │              Layer 0                │  Formatting & Linting
    │                                     │  Prettier, ESLint, markdownlint
    │                                     │  ✅ EXISTING
    └─────────────────────────────────────┘

NOTE: this shows one validator at each level, but I have plans for:

Layer 2: Schema Validity (per artifact type)

ID Validator Target Artifact Status
2a validate-agent-schema.js *.agent.yaml ✅ Exists
2b Workflow Schema workflow*.md frontmatter Backlog
2c CSV Schema module-help.csv catalog Backlog

Layer 3: Structural/Semantic Validation

ID Validator Scope Status
3a Workflow Graph Workflow internal graph (steps, gotos, invokes) Backlog
3b (Entity Refs Cross-entity refs (agent↔CSV↔workflow) Backlog

Summary

Layer Exists Planned Total
0 4 0 4
1 1 0 1
2 1 2 3
3 0 2 2
Total 6 4 10

Issues and bugs this class of validator addresses

Issue Status What happened Coverage
#1519 Open bmad-help.csv has incorrect command for Create Brief workflow Inspiring issue. Workflow-file column validation catches broken _bmad/ paths; command column validation planned for Layer 2
#1136 Closed task-manifest.csv references non-existent daily-standup.xml using old .bmad prefix Direct catch. File ref validator flags the broken path
#1070 Closed Installer writes workflow-manifest.csv with column count mismatch (5 vs 4 cols) Partial. relax_column_count handles gracefully; schema validation is Layer 2
#1097 Closed Upgrade fails with CSV "Invalid Record Length" — schema mismatch Same class as #1070
#832 Closed CIS workflow CSV files have formatting errors and missing column data Partial. Parser handles gracefully; structural validation is Layer 2
#1536 Open Validate Epics shares command with Create — CSV command column mismatch Layer 2 scope. Column-value validation planned
#1260 Closed shard-doc task orphaned: manifest mismatch Direct catch for file-path columns
Unreported Fixed 731bee26: module-help.csv referenced workflow-create-prd.md before file existed; fixed same-day in bd620e38 Direct catch. Validator would have flagged this at commit time

Current validation status

Files scanned: 212 (including 10 CSV files)
References checked: 501
Broken references: 2 (pre-existing, not CSV-related — see #1530)
Absolute path leaks: 0

All CSV workflow-file references resolve correctly.

The 2 broken refs are core/tasks/validate-workflow.xml referenced from create-story/checklist.md — tracked in #1530, pre-existing.

How

  • Added csv-parse/sync import (already a dependency, v6.1.0) and .csv to SCAN_EXTENSIONS
  • Created extractCsvRefs(filePath, content) — parses CSV with {columns: true, skip_empty_lines: true, relax_column_count: true}, extracts workflow-file column values as type: 'project-root' refs
  • Added else if (ext === '.csv') dispatch in the main scan loop (without this, CSV falls through to extractMarkdownRefs)
  • Wrapped main execution in require.main === module guard for testability, exported extractCsvRefs
  • 7 test fixtures: 3 valid (bmm-style, core-style, minimal) + 4 invalid (no-workflow-column, empty-data, all-empty-workflow, unresolvable-vars)
  • Added test:refs npm script and file-pattern-to-validator mapping table in CONTRIBUTING.md

Design decisions

  • Column scope: workflow-file only (v1). Other columns don't contain file refs.
  • Ref type: Reuses existing project-root type — no new resolution logic needed.
  • Parser options: relax_column_count: true handles the trailing commas in src/bmm/module-help.csv.
  • require.main guard: Standard Node.js pattern. Allows require() for unit testing without triggering the scan + process.exit().

Why is this safe to adopt

Adapted from the original Layer 1 PR (#1494) — same design principles apply.

The validator runs in warning mode by default (exit 0). Broken references appear in the build log for visibility, but build results are unaffected. No existing CI checks, pre-commit hooks, or npm scripts are modified in behavior. The validator is purely additive.

Every existing CI check continues to enforce exactly as before:

CI check Before this PR After this PR
Prettier, ESLint, markdownlint Enforced (exit 1) Enforced (exit 1)
Schema validation, agent tests, install tests Enforced (exit 1) Enforced (exit 1)
File ref validation (CSV extension) Scans YAML/MD/XML only + CSV scanning (same exit behavior)

The test:refs unit tests validate the extraction function in isolation — they test code correctness, not file resolution. The full scan (npm run validate:refs) remains warning-only.

When ready to enforce strict mode:

- "validate:refs": "node tools/validate-file-refs.js"
+ "validate:refs": "node tools/validate-file-refs.js --strict"

The validator only reads files. It makes no changes to disk.

What's next — and why BMAD-METHOD first

BMAD-METHOD is where the formats are defined, the primary install source, and where most community contributions land as direct PRs. Errors introduced here bypass any protections bmad-builder might provide for the subset of the community using it to generate content. CI validators in this repo catch those errors at the PR stage, before they reach users. That's why the validation pipeline starts here.

Layer 2 schema-aware CSV validation (column types, command uniqueness, enum constraints) will follow when the workflow.yamlworkflow*.md migration is complete. This PR handles only structural file-reference integrity, which is Layer 1's job.

sequenceDiagram
    participant BM as BMAD-METHOD
    participant BMB as bmad-builder

    Note over BM: ✅ Layer 0 — Formatting & Linting<br/>Existing

    Note over BM: ✅ Layer 1 — File Reference Validator<br/>PR #1494 merged

    rect rgba(0, 128, 255, 0.1)
    BM->>BM: Layer 1 + CSV extension<br/>PR #1573 (this PR)
    end

    BM->>BM: Layer 2 — Schema Validator<br/>(planned — replanning for workflow*.md)

    BM->>BM: Layer 3 — Graph Validator (planned)

    Note over BM,BMB: ── community checkpoint ──

    BM-->>BMB: Extend validators with --installed mode<br/>CI validation of BMB source files

    BMB->>BMB: Post-generation validation<br/>Run Layers 1-3 on builder output
Loading

Testing

  • 7/7 CSV fixture tests passing (node test/test-file-refs-csv.js)
  • npm run validate:refs scans 212 files including 10 CSV files, 501 refs checked
  • All CSV workflow-file paths resolve correctly (zero new broken refs)
  • 52/52 agent schema tests passing (no regression)
  • 12/12 installation component tests passing
  • ESLint, Prettier, markdownlint all clean

Refs: #1519, #1136, #1070, #1097, #832, #1536, #1260

@arcaven arcaven marked this pull request as draft February 6, 2026 23:10
@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

The PR extends the file-reference validator with CSV support, adding CSV parsing capabilities to extract references from CSV files with workflow columns, a corresponding test suite using fixtures to validate the CSV extraction logic, a new npm script to run the tests, and documentation updates describing the new validation capability.

Changes

Cohort / File(s) Summary
CSV Reference Validation Implementation
tools/validate-file-refs.js, test/test-file-refs-csv.js
Adds CSV parsing via csv-parse/sync and new extractCsvRefs function to extract references from CSV files with workflow-file columns. Implements comprehensive test suite with valid/invalid fixtures covering CSV formats, template variables, and edge cases. Exports extractCsvRefs for modular usage.
Documentation
CONTRIBUTING.md
Adds new "Validate file references" section with command snippet and "File-Pattern-to-Validator Mapping" table documenting validation coverage for YAML, Markdown, XML, and CSV file patterns.
Build Configuration
package.json
Adds new npm script test:refs pointing to the CSV reference validation test runner.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • alexeyv
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: extending the Layer 1 file reference validator to support CSV files with workflow-file references.
Description check ✅ Passed The pull request description comprehensively explains what changes were made (CSV scanning support), why it was needed (to catch broken file references in CSV files), and how it was implemented, directly relating to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@package.json`:
- Around line 48-52: The aggregate "test" script in package.json omits the new
"test:refs" task so npm test won't run the CSV extraction tests; update the
"test" script to include "npm run test:refs" (e.g., add "&& npm run test:refs"
into the "test" script sequence alongside "test:schemas", "test:install",
"validate:schemas", "lint", "lint:md", and "format:check") so that running npm
test executes the "test:refs" job as well.
🧹 Nitpick comments (2)
tools/validate-file-refs.js (2)

296-327: CSV refs always typed as project-root — consider future-proofing.

All extracted CSV refs are hardcoded to type: 'project-root' (Line 323). If a workflow-file cell ever contains a relative path (e.g., ./local-workflow.md), resolveRef would route it through mapInstalledToSource, which would produce an incorrect resolved path. This is fine for v1 scope limited to _bmad/ prefixed paths, but worth a comment or a guard.

Optional: detect relative refs
     if (!isResolvable(raw)) continue;

     // Line = header (1) + data row index (0-based) + 1
     const line = i + 2;
-    refs.push({ file: filePath, raw, type: 'project-root', line });
+    const type = raw.startsWith('./') || raw.startsWith('../') ? 'relative' : 'project-root';
+    refs.push({ file: filePath, raw, type, line });

388-389: Only extractCsvRefs is exported — intentional?

extractYamlRefs and extractMarkdownRefs are not exported, which means they can't be unit-tested in isolation. If this is intentional for now that's fine, but consider exporting them too for consistency and testability.

@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

Tip

For best results, initiate chat on the files or code changes.

@arcaven, excellent! Thank you for adding test:refs to the aggregate test script in f15d773. Having it run alongside the other test suites ensures comprehensive validation coverage with a single npm test command. 🎉


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

@arcaven arcaven marked this pull request as ready for review February 7, 2026 00:27
@augmentcode
Copy link

augmentcode bot commented Feb 7, 2026

🤖 Augment PR Summary

Summary: This PR extends the Layer 1 file reference validator to detect broken workflow file references inside CSV catalogs.

Changes:

  • Adds .csv to the scan pipeline and dispatches CSV files to a new extractCsvRefs() extractor.
  • Introduces CSV parsing via csv-parse/sync and extracts workflow-file column values as project-root references.
  • Wraps CLI execution in a require.main === module guard and exports extractCsvRefs for unit testing.
  • Adds a dedicated test runner (test/test-file-refs-csv.js) plus CSV fixtures to validate extraction behavior.
  • Updates package.json to include test:refs and ensures it runs as part of npm test.
  • Documents the file-pattern-to-validator mapping in CONTRIBUTING.md.

Technical Notes: CSV scanning is currently focused on the workflow-file column and uses relaxed column-count parsing to accommodate known trailing-comma formats.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Add CSV file reference extraction to the Layer 1 validation pipeline,
preventing broken _bmad/ workflow-file paths in module-help.csv files.
Closes the gap identified after PR bmad-code-org#1529 where CSV references were
unvalidated despite being a source of repeat community issues.

Refs: bmad-code-org#1519
Add CSV file-ref extraction tests to the aggregate `npm test` pipeline,
matching the existing pattern for test:schemas and test:install.

Thanks to CodeRabbit for catching the omission.
@arcaven arcaven force-pushed the feat/csv-file-ref-validation branch from 4b19304 to ac156fc Compare February 7, 2026 05:42
Copy link
Contributor

@alexeyv alexeyv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another deterministic quality gate. Love it!
Code looks fine, with a couple of questions.

- Surface CSV parse errors visibly instead of silently swallowing
  (no Layer 2c schema validator exists yet to catch these)
- Add explanatory comments for the !VERBOSE logging pattern
  (non-verbose prints file headers only when issues found)
- Add verbose-mode diagnostics for extensionless path handling
  ([SKIP] when nothing exists, [OK-DIR] for valid directories)
Replace the split header-printing logic (print early in verbose mode,
print late in non-verbose mode with a !VERBOSE guard) with a simpler
collect-then-print approach. Refs are now classified into ok[] and
broken[] arrays first, then printed in a single location with one
straightforward if/else if decision.

Addresses alexeyv's review feedback about the counterintuitive
"if not verbose, log" pattern.
…ESOLVED]

Paths without file extensions that don't exist as files or directories
are now flagged as [UNRESOLVED] — a distinct tag from [BROKEN] (which
means a file with a known extension wasn't found). Both count toward
the broken reference total and appear in CI annotations.

This catches real bugs like wrong directory names in installed_path
metadata and dead invoke-workflow references to removed workflows.
Extensionless paths that DO exist as directories are still [OK-DIR].
@arcaven arcaven requested a review from alexeyv February 7, 2026 20:30
Copy link
Contributor

@alexeyv alexeyv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a merge to me

@bmadcode bmadcode merged commit 24cf444 into bmad-code-org:main Feb 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants